Manual:Pywikibot/2.0

From Linux Web Expert

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Pywikibot 2.0 aka "rewrite" was proposed in 2007, but has never been ready to replace the current "trunk" version. This is a list of things that we should do to get it ready.

Goals

  • Be ready to merge at the end of Summer 2013.
  • Provide a seamless transition for bot operators and writers.
  • Make it easier to make raw API calls.
  • Proper distinction between programs (give output and user interaction) and libraries

To be done

  • Many site methods not yet implemented.
  • All trunk scripts should be converted over. (Project:Pywikibot/2.0/Porting status )
  • Create a script called wikipedia.py to provide a compatibility layer.
  • Some programs need to be split into a program and library part (for example upload.py)
  • login.py has some weird quirks that need to be looked into (hack workaround at /login.py, maybe this has to do with the path itself?)
  • Move api.update_page to a method of the page class
  • i18n.translate fallback flag default is different in core
  • Find a way to cache tokens between login sessions.
  • Documentation, documentation, documentation!

Ideas

  • Content+Site model is needed. The current structure of pywiki centers around a "page" object. This object is both the container of data, plus the networking code. It allows someone to write page.content() to get the page markup from a site, which might be somewhat convenient, yet it promotes non-batched usage which is much heavier on wiki. Instead, page object should be a local-only container of data and related (parsing) code, whereas all network access should be done by the Site object. Syntax might be different, but the concepts should stay:
page1 = Page('Main Page')
batch = [page1,page2,page3]
site.populateStuff(batch, content | links | categories) # Sets various properties in the batch based on bit flags
batch2 = site.query({api parameters}) # page objects are created from a result of a user-supplied query
print(page1.links()) # list of links.
print(page1.templates()) # throws an error - templates are not populated

--Yurik (talk) 06:13, 22 March 2013 (UTC)

@Yurik: as part of phabricator:T101587, this feature request needs to be added as a Phabricator task. However as you mentioned something like this at Lyon, and it has been over two years since you wrote this, a recap and reply is probably in order.
Pywikipedia and Pywikibot 2 have very strong batching concepts which preload props (See APISite.preloadpages), though Pywikipedia is such a terrible mess that the good stuff in it is usually only known and used by the initial coder. We have patches pending for Pywikibot 2 to vastly improve preloading/batching, however a few huge bugs need to be fixed first. But, the design is OK and the direction almost certainly matches what your wanting here.
Where Pywikibot 2 doesnt match your design is the last step: page1.templates() raising an error if 'templates' was not batched. Pywikibot 2 will recognise that the templates were not preloaded during the batch operation, and will proceed to fetch them. I think I can safely say that Pywikibot 2 (and even 3, and probably never) are not going to raise an exception on a Page operation because the caller didnt batch them properly in advance. The first reason is, as you say, the Page-centric model is convenient. Also backwards compatibility. And finally, the script writer may know what they are doing: it may be that only one page out of 100 needs to call .templates(), and preloading templates that would be less efficient.
However I see two approaches we can take to promote and even enforce batch centric programming.
  1. issue a warning if page1.templates() is called and the data wasnt preloaded
  2. add a 'batch only mode' which does raise an exception
John Vandenberg (talk) 00:48, 6 June 2015 (UTC)

See also