Manual:Pywikibot/2.0

From Linux Web Expert

Revision as of 05:16, 30 May 2020 by imported>Dexbot (Deprecating <source> tag: phab:T237267)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Pywikibot 2.0 aka "rewrite" was proposed in 2007, but has never been ready to replace the current "trunk" version. This is a list of things that we should do to get it ready.

Goals

  • Be ready to merge at the end of Summer 2013.
  • Provide a seamless transition for bot operators and writers.
  • Make it easier to make raw API calls.
  • Proper distinction between programs (give output and user interaction) and libraries

To be done

  • Many site methods not yet implemented.
  • All trunk scripts should be converted over. (Project:Pywikibot/2.0/Porting status )
  • Create a script called wikipedia.py to provide a compatibility layer.
  • Some programs need to be split into a program and library part (for example upload.py)
  • login.py has some weird quirks that need to be looked into (hack workaround at /login.py, maybe this has to do with the path itself?)
  • Move api.update_page to a method of the page class
  • i18n.translate fallback flag default is different in core
  • Find a way to cache tokens between login sessions.
  • Documentation, documentation, documentation!

Ideas

  • Content+Site model is needed. The current structure of pywiki centers around a "page" object. This object is both the container of data, plus the networking code. It allows someone to write page.content() to get the page markup from a site, which might be somewhat convenient, yet it promotes non-batched usage which is much heavier on wiki. Instead, page object should be a local-only container of data and related (parsing) code, whereas all network access should be done by the Site object. Syntax might be different, but the concepts should stay:
page1 = Page('Main Page')
batch = [page1,page2,page3]
site.populateStuff(batch, content | links | categories) # Sets various properties in the batch based on bit flags
batch2 = site.query({api parameters}) # page objects are created from a result of a user-supplied query
print(page1.links()) # list of links.
print(page1.templates()) # throws an error - templates are not populated

--Yurik (talk) 06:13, 22 March 2013 (UTC)

@Yurik: as part of phabricator:T101587, this feature request needs to be added as a Phabricator task. However as you mentioned something like this at Lyon, and it has been over two years since you wrote this, a recap and reply is probably in order.
Pywikipedia and Pywikibot 2 have very strong batching concepts which preload props (See APISite.preloadpages), though Pywikipedia is such a terrible mess that the good stuff in it is usually only known and used by the initial coder. We have patches pending for Pywikibot 2 to vastly improve preloading/batching, however a few huge bugs need to be fixed first. But, the design is OK and the direction almost certainly matches what your wanting here.
Where Pywikibot 2 doesnt match your design is the last step: page1.templates() raising an error if 'templates' was not batched. Pywikibot 2 will recognise that the templates were not preloaded during the batch operation, and will proceed to fetch them. I think I can safely say that Pywikibot 2 (and even 3, and probably never) are not going to raise an exception on a Page operation because the caller didnt batch them properly in advance. The first reason is, as you say, the Page-centric model is convenient. Also backwards compatibility. And finally, the script writer may know what they are doing: it may be that only one page out of 100 needs to call .templates(), and preloading templates that would be less efficient.
However I see two approaches we can take to promote and even enforce batch centric programming.
  1. issue a warning if page1.templates() is called and the data wasnt preloaded
  2. add a 'batch only mode' which does raise an exception
John Vandenberg (talk) 00:48, 6 June 2015 (UTC)

See also