Manual:Pywikibot/2.0
From Linux Web Expert
Pywikibot 2.0 aka "rewrite" was proposed in 2007, but has never been ready to replace the current "trunk" version. This is a list of things that we should do to get it ready.
Goals
- Be ready to merge at the end of Summer 2013.
- Provide a seamless transition for bot operators and writers.
- Make it easier to make raw API calls.
- Proper distinction between programs (give output and user interaction) and libraries
To be done
- Many site methods not yet implemented.
- All trunk scripts should be converted over. (Project:Pywikibot/2.0/Porting status )
- Create a script called
wikipedia.py
to provide a compatibility layer. - Some programs need to be split into a program and library part (for example upload.py)
- login.py has some weird quirks that need to be looked into (hack workaround at /login.py, maybe this has to do with the path itself?)
- Move api.update_page to a method of the page class
- i18n.translate fallback flag default is different in core
- Find a way to cache tokens between login sessions.
- Documentation, documentation, documentation!
Ideas
- Content+Site model is needed. The current structure of pywiki centers around a "page" object. This object is both the container of data, plus the networking code. It allows someone to write page.content() to get the page markup from a site, which might be somewhat convenient, yet it promotes non-batched usage which is much heavier on wiki. Instead, page object should be a local-only container of data and related (parsing) code, whereas all network access should be done by the Site object. Syntax might be different, but the concepts should stay:
page1 = Page('Main Page')
batch = [page1,page2,page3]
site.populateStuff(batch, content | links | categories) # Sets various properties in the batch based on bit flags
batch2 = site.query({api parameters}) # page objects are created from a result of a user-supplied query
print(page1.links()) # list of links.
print(page1.templates()) # throws an error - templates are not populated
--Yurik (talk) 06:13, 22 March 2013 (UTC)
- @Yurik: as part of phabricator:T101587, this feature request needs to be added as a Phabricator task. However as you mentioned something like this at Lyon, and it has been over two years since you wrote this, a recap and reply is probably in order.
- Pywikipedia and Pywikibot 2 have very strong batching concepts which preload props (See APISite.preloadpages), though Pywikipedia is such a terrible mess that the good stuff in it is usually only known and used by the initial coder. We have patches pending for Pywikibot 2 to vastly improve preloading/batching, however a few huge bugs need to be fixed first. But, the design is OK and the direction almost certainly matches what your wanting here.
- Where Pywikibot 2 doesnt match your design is the last step: page1.templates() raising an error if 'templates' was not batched. Pywikibot 2 will recognise that the templates were not preloaded during the batch operation, and will proceed to fetch them. I think I can safely say that Pywikibot 2 (and even 3, and probably never) are not going to raise an exception on a Page operation because the caller didnt batch them properly in advance. The first reason is, as you say, the Page-centric model is convenient. Also backwards compatibility. And finally, the script writer may know what they are doing: it may be that only one page out of 100 needs to call .templates(), and preloading templates that would be less efficient.
- However I see two approaches we can take to promote and even enforce batch centric programming.
- issue a warning if page1.templates() is called and the data wasnt preloaded
- add a 'batch only mode' which does raise an exception
- John Vandenberg (talk) 00:48, 6 June 2015 (UTC)