Manual:Pywikibot/table2wiki.py

From Linux Web Expert

Revision as of 23:31, 3 June 2022 by imported>Shirayuki (wrong markup)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

table2wiki.py is a Pywikibot script used to convert HTML-tables to MediaWiki's own syntax.

Specific arguments:


<translate> Parameter</translate><translate> Description</translate>
-always The bot won't ask for confirmation when putting a page
-skipwarning Skip processing a page when a warning occurred. Only used when -always is or becomes True.
-quiet Don't show diffs in -always mode
-mysqlquery Retrieve information from a local mirror. Searches for pages with HTML tables, and tries to convert them on the live wiki.
-xml Retrieve information from a local XML dump (pages_current, see https://download.wikimedia.org). Argument can also be given as "-xml:filename". Searches for pages with HTML tables, and tries to convert them on the live wiki.

Example:

$ python pwb.py table2wiki -xml:20050713_pages_current.xml -lang:de

Features:

  • Save against missing ‎</td>
  • Corrects attributes of tags

Known bugs:

  • Broken HTML tables will most likely result in broken wiki tables!
Every change needs to be checked. This bot can make mistakes.


<translate> Generators and filters available</translate>

<translate> Generator options</translate>
<translate> Parameter</translate> <translate> Description</translate>
-cat <translate> Work on all pages which are in a specific category.</translate> <translate> Argument can also be given as "<tvar name=1>-cat</tvar>:categoryname" or as "<tvar name=1>-cat</tvar>:categoryname<tvar name=2>|</tvar>fromtitle" (using <tvar name=3>#</tvar> instead of <tvar name=2>|</tvar> is also allowed in this one and the following)</translate>
-catr <translate> Like <tvar name=1>-cat</tvar>, but also recursively includes pages in subcategories, sub-subcategories etc. of the given category. Argument can also be given as "<tvar name=2>-catr</tvar>:categoryname" or as "<tvar name=2>-catr</tvar>:categoryname|fromtitle".</translate>
-subcats <translate> Work on all subcategories of a specific category.</translate> <translate> Argument can also be given as "<tvar name=1>-subcats</tvar>:categoryname" or as "<tvar name=1>-subcats</tvar>:categoryname<tvar name=2>|</tvar>fromtitle".</translate>
-subcatsr <translate> Like <tvar name=1>-subcats</tvar>, but also includes sub-subcategories etc. of the given category. Argument can also be given as "<tvar name=2>-subcatsr</tvar>:categoryname" or as "<tvar name=2>-subcatsr</tvar>:categoryname<tvar name=3>|</tvar>fromtitle".</translate>
-uncat <translate> Work on all pages which are not categorised.</translate>
-uncatcat <translate> Work on all categories which are not categorised.</translate>
-uncatfiles <translate> Work on all files which are not categorised.</translate>
-file <translate> Read a list of pages to treat from the named text file. Page titles in the file may be either enclosed with brackets (example: <tvar name=brackets>[[Page]]</tvar>), or be separated by new lines. Argument can also be given as "<tvar name=1>-file</tvar>:filename".</translate>
-filelinks <translate> Work on all pages that use a certain image/media file.</translate> <translate> Argument can also be given as "<tvar name=1>-filelinks</tvar>:filename".</translate>
-search <translate> Work on all pages that are found in a MediaWiki search across all {{<tvar name=ns>ll|Help:Namespaces</tvar>|namespaces}}.</translate>
-logevents <translate> Work on articles that were on a specified Special:Log.</translate> <translate> The value may be a comma separated list of these values:</translate>
logevent,username,start,end

<translate> or for backward compatibility:</translate>

logevent,username,total

<translate> To use the default value, use an empty string. You have options for every type of logs given by the log event parameter which could be one of the following:</translate>

spamblacklist, titleblacklist, gblblock, renameuser, globalauth, gblrights, gblrename, abusefilter, massmessage, thanks, usermerge, block, protect, rights, delete, upload, move, import, patrol, merge, suppress, tag, managetags, contentmodel, review, stable, timedmediahandler, newusers

<translate> It uses the default number of pages <tvar name=1>10</tvar>.

Examples: </translate>

-logevents:move <translate> gives pages from move log (usually redirects)</translate>
-logevents:delete,,20 <translate> gives <tvar name=1>20</tvar> pages from deletion log</translate>
-logevents:protect,Usr <translate> gives pages from protect by user <tvar name=1>Usr</tvar></translate>
-logevents:patrol,Usr,20 <translate> gives <tvar name=1>20</tvar> patroled pages by Usr</translate>
-logevents:upload,,20121231,20100101 <translate> gives upload pages in the <tvar name=1>2010</tvar>s, <tvar name=2>2011</tvar>s, and <tvar name=3>2012</tvar>s</translate>
-logevents:review,,20121231 <translate> gives review pages since the beginning till the 31 Dec 2012</translate>
-logevents:review,Usr,20121231 <translate> gives review pages by user <tvar name=1>Usr</tvar> since the beginning till the 31 Dec 2012</translate>
<translate> In some cases it must be given as <tvar name=1>-logevents:"move,Usr,20"</tvar></translate>
-interwiki <translate> Work on the given page and all equivalent pages in other languages.</translate> <translate> This can, for example, be used to fight multi-site spamming. Attention: this will cause the bot to modify pages on several wiki sites, this is not well tested, so check your edits!</translate>
-links <translate> Work on all pages that are linked from a certain page.</translate> <translate> Argument can also be given as "<tvar name=1>-links</tvar>:linkingpagetitle".</translate>
-liverecentchanges <translate> Work on pages from the live recent changes feed.</translate> <translate> If used as <tvar name=1>-liverecentchanges</tvar>:x, work on x recent changes.</translate>
-imagesused <translate> Work on all images that contained on a certain page.</translate> <translate> Can also be given as "<tvar name=1>-imagesused</tvar>:linkingpagetitle".</translate>
-newimages <translate> Work on the most recent new images.</translate> <translate> If given as <tvar name=1>-newimages:x</tvar>, will work on <tvar name=2>x</tvar> newest images.</translate>
-newpages <translate> Work on the most recent new pages.</translate> <translate> If given as <tvar name=1>-newpages:x</tvar>, will work on <tvar name=2>x</tvar> newest pages.</translate>
-recentchanges <translate> Work on the pages with the most recent changes.</translate> <translate> If given as <tvar name=1>-recentchanges:x</tvar>, will work on the x most recently changed pages. If given as <tvar name=2>-recentchanges:offset,duration</tvar> it will work on pages changed from 'offset' minutes with 'duration' minutes of timespan.</translate>

<translate> Examples:</translate>
-recentchanges:20 - <translate> gives the <tvar name=1>20</tvar> most recently changed pages</translate>
-recentchanges:120,70 - <translate> will give pages with <tvar name=1>120</tvar> offset minutes and <tvar name=2>70</tvar> minutes of timespan</translate>
-recentchanges:visualeditor,10 - <translate> gives the <tvar name=2>10</tvar> most recently changed pages marked with '<tvar name=1>visualeditor</tvar>'</translate>
-recentchanges:"mobile edit,60,35" - <translate> will retrieve pages marked with 'mobile edit' for the given offset and timespan</translate>

<translate> rctags are supported, and the rctag must be the very first parameter part.</translate>
-unconnectedpages <translate> Work on the most recent unconnected pages to the Wikibase repository.</translate> <translate> Given as <tvar name=1>-unconnectedpages:x</tvar>, will work on the <tvar name=2>x</tvar> most recent unconnected pages.</translate>
-ref <translate> Work on all pages that link to a certain page.</translate> <translate> Argument can also be given as "<tvar name=1>-ref</tvar>:referredpagetitle".</translate>
-start <translate> Specifies that the robot should go alphabetically through all pages on the home wiki, starting at the named page.</translate> <translate> Argument can also be given as "<tvar name=1>-start</tvar>:pagetitle".</translate> <translate> You can also include a namespace.</translate> <translate> For example, "<tvar name=1>-start:Template:!</tvar>" will make the bot work on all pages in the template namespace.</translate> <translate> default value is <tvar name=1>start:!</tvar></translate>
-prefixindex <translate> Work on pages commencing with a common prefix.</translate>
-transcludes <translate> Work on all pages that use a certain template.</translate> <translate> Argument can also be given as "<tvar name=1>-transcludes</tvar>:Title".</translate>
-unusedfiles <translate> Work on all description pages of images/media files that are not used anywhere.</translate> <translate> Argument can be given as "<tvar name=1>-unusedfiles:n</tvar>" where <tvar name=2>n</tvar> is the maximum number of articles to work on.</translate>
-lonelypages <translate> Work on all articles that are not linked from any other article.</translate> <translate> Argument can be given as "<tvar name=1>-lonelypages:n</tvar>" where <tvar name=2>n</tvar> is the maximum number of articles to work on.</translate>
-unwatched <translate> Work on all articles that are not watched by anyone.</translate> <translate> Argument can be given as "<tvar name=1>-unwatched:n</tvar>" where <tvar name=2>n</tvar> is the maximum number of articles to work on.</translate>
-property:name <translate> Work on all pages with a given property name from Special:PagesWithProp.</translate>
-usercontribs <translate> Work on all articles that were edited by a certain user.</translate> <translate> (Example : <tvar name=1>-usercontribs:DumZiBoT</tvar>)</translate>
-weblink <translate> Work on all articles that contain an external link to a given URL; may be given as "<tvar name=1>-weblink:url</tvar>"</translate>
-withoutinterwiki <translate> Work on all pages that don't have interlanguage links.</translate> <translate> Argument can be given as "<tvar name=1>-withoutinterwiki:n</tvar>" where <tvar name=2>n</tvar> is the total to fetch.</translate>
-mysqlquery <translate> Takes a Mysql query string like <tvar name=1>"SELECT page_namespace, page_title, FROM page WHERE page_namespace = 0"</tvar> and works on the resulting pages. See <tvar name=mysql>Manual:Pywikibot/MySQL </tvar>.</translate>
-sparql <translate> Takes a SPARQL SELECT query string including ?item and works on the resulting pages.</translate>
-sparqlendpoint <translate> Specify SPARQL endpoint URL (optional).</translate> <translate> (Example : <tvar name=1>-sparqlendpoint:http://myserver.com/sparql</tvar>)</translate>
-searchitem <translate> Takes a search string and works on Wikibase pages that contain it.</translate> <translate> Argument can be given as "<tvar name=1>-searchitem</tvar>:text", where text is the string to look for, or "<tvar name=2>-searchitem:lang</tvar>:text", where <tvar name=3>lang</tvar> is the language to search items in.</translate>
-random <translate> Work on random pages returned by Special:Random.</translate> <translate> Can also be given as "<tvar name=1>-random:n</tvar>" where <tvar name=2>n</tvar> is the number of pages to be returned.</translate>
-randomredirect <translate> Work on random redirect pages returned by Special:RandomRedirect.</translate> <translate> Can also be given as "<tvar name=1>-randomredirect:n</tvar>" where <tvar name=2>n</tvar> is the number of pages to be returned.</translate>
-google <translate> Work on all pages that are found in a Google search.</translate> <translate> You need a Google Web API license key.</translate> <translate> Note that Google doesn't give out license keys anymore.</translate> <translate> See <tvar name=1>google_key</tvar> in <tvar name=2>config.py</tvar> for instructions.</translate> <translate> Argument can also be given as "<tvar name=1>-google</tvar>:searchstring".</translate>
-yahoo <translate> Work on all pages that are found in a Yahoo search.</translate> <translate> Depends on python module <tvar name=module>pYsearch</tvar>.</translate> <translate> See <tvar name=1>yahoo_appid</tvar> in <tvar name=config>config.py</tvar> for instructions.</translate>
-page <translate> Work on a single page.</translate> <translate> Argument can also be given as "<tvar name=1>-page</tvar>:pagetitle", and supplied multiple times for multiple pages.</translate>
-pageid <translate> Work on a single pageid.</translate> <translate> Argument can also be given as "<tvar name=1>-pageid</tvar>:pageid1,pageid2,." or "<tvar name=1>-pageid</tvar>:'pageid1|pageid2|..'" and supplied multiple times for multiple pages.</translate>
-linter <translate> Work on pages that contains lint errors.</translate> <translate> Extension <tvar name=1>Linter </tvar> must be available on the site.</translate> <translate> <tvar name=1>-linter</tvar> select all categories.</translate> <translate> <tvar name=1>-linter:high</tvar>, <tvar name=2>-linter:medium</tvar> or <tvar name=3>-linter:low</tvar> select all categories for that prio.</translate> <translate> Single categories can be selected with commas as in <tvar name=1>-linter</tvar>:cat1,cat2,cat3</translate> <translate> Adding '<tvar name=1>/int</tvar>' identifies Lint ID to start querying from: e.g. <tvar name=2>-linter:high/10000</tvar></translate> <translate> <tvar name=1>-linter:show</tvar> just shows available categories.</translate>
<translate> Filter options</translate>
<translate> Parameter</translate> <translate> Description</translate>
-catfilter <translate> Filter the page generator to only yield pages in the specified category.</translate> <translate> See <tvar name=1>-cat</tvar> generator for argument format.</translate>
-grep <translate> A regular expression that needs to match the article otherwise the page won't be returned.</translate> <translate> Multiple <tvar name=1>-grep:regexpr</tvar> can be provided and the page will be returned if content is matched by any of the regexpr provided.</translate> <translate> Case insensitive regular expressions will be used and dot matches any character, including a newline.</translate>
-grepnot <translate> Like <tvar name=1>-grep</tvar>, but return the page only if the regular expression does not match.</translate>
-intersect <translate> Work on the intersection of all the provided generators.</translate>
-limit <translate> When used with any other argument <tvar name=1>-limit:n</tvar> specifies a set of pages, work on no more than <tvar name=2>n</tvar> pages in total.</translate>
-namespaces
-namespace
-ns
<translate> Filter the page generator to only yield pages in the specified namespaces.</translate> <translate> Separate multiple namespace numbers or names with commas.</translate>

<translate> Examples:</translate>

-ns:0,2,4 -ns:Help,MediaWiki

<translate> You may use a preleading "<tvar name=1>not</tvar>" to exclude the namespace.</translate> <translate> Examples:</translate>

-ns:not:2,3 -ns:not:Help,File

<translate> If used with <tvar name=1>-newpages/-random/-randomredirect/-linter</tvar> generators, <tvar name=2>-namespace/-ns</tvar> must be provided before <tvar name=1>-newpages/-random/-randomredirect/-linter</tvar>.</translate> <translate> If used with <tvar name=1>-recentchanges</tvar> generator, efficiency is improved if <tvar name=2>-namespace</tvar> is provided before <tvar name=1>-recentchanges</tvar>.</translate>

<translate> If used with <tvar name=1>-start</tvar> generator, <tvar name=2>-namespace/-ns</tvar> shall contain only one value.</translate>
-onlyif <translate> A claim the page needs to contain, otherwise the item won't be returned.</translate> <translate> The format is property=value,qualifier=value.</translate> <translate> Multiple (or none) qualifiers can be passed, separated by commas.</translate>

<translate> Examples:</translate>
P1=Q2 (<translate> property <tvar name=1>P1</tvar> must contain value <tvar name=2>Q2</tvar></translate>)
P3=Q4,P5=Q6,P6=Q7 (<translate> property <tvar name=1>P3</tvar> with value <tvar name=2>Q4</tvar> and qualifiers: <tvar name=3>P5</tvar> with value <tvar name=4>Q6</tvar> and <tvar name=5>P6</tvar> with value <tvar name=6>Q7</tvar></translate>)

<translate> Value can be page ID, coordinate in format:</translate> latitude,longitude[,precision] (<translate> all values are in decimal degrees), year, or plain string.</translate> <translate> The argument can be provided multiple times and the item page will be returned only if all claims are present.</translate> <translate> Argument can be also given as "<tvar name=1>-onlyif:expression</tvar>".</translate>
-onlyifnot <translate> A claim the page must not contain, otherwise the item won't be returned.</translate> <translate> For usage and examples, see <tvar name=1>-onlyif</tvar> above.</translate>
-ql <translate> Filter pages based on page quality.</translate> <translate> This is only applicable if contentmodel equals '<tvar name=1>proofread-page</tvar>', otherwise has no effects.</translate> <translate> Valid values are in range 0-4.</translate> <translate> Multiple values can be comma-separated.</translate>
-subpage <translate> <tvar name=1>-subpage:n</tvar> filters pages to only those that have depth n i.e. a depth of 0 filters out all pages that are subpages, and a depth of 1 filters out all pages that are subpages of subpages.</translate>
-titleregex <translate> A regular expression that needs to match the article title otherwise the page won't be returned.</translate> <translate> Multiple <tvar name=1>-titleregex:regexpr</tvar> can be provided and the page will be returned if title is matched by any of the regexpr provided.</translate> <translate> Case insensitive regular expressions will be used and dot matches any character.</translate>
-titleregexnot <translate> Like <tvar name=1>-titleregex</tvar>, but return the page only if the regular expression does not match.</translate>


<translate> Global arguments available</translate>

<translate> These options will override the configuration in <tvar|1>user-config.py </> settings. </translate>

<translate> Global options</translate>
<translate> Parameter</translate> <translate> Description</translate> <translate> Config variable</translate>
-dir:<translate> PATH</translate> <translate> Read the bot's configuration data from directory given by PATH, instead of from the default directory.</translate>  
-config:<translate> file</translate> <translate> The user config filename.</translate> Default is user-config.py. user-config.py
-configfile>user-config.py</>. <tvar xx>xx</> should be the language code.</translate> mylang
-configfile>user-config.py</>.</translate> <translate> Set the family of the wiki you want to work on, e.g. wikipedia, wiktionary, wikitravel, ...</translate> <translate> This will override the configuration in <tvar family
-xyz>xyz</>' instead of the default username.</translate> <translate> Log in as user '<tvar usernames
-xyz>xyz</>. (only use for bots that require no input from stdin).</translate> <translate> Immediately return control to the terminal and redirect stdout and stderr to file <tvar  
-help <translate> Show the help text.</translate>  
-1>-bot.log</>' Logs will be stored in the logs subdirectory.</translate> <translate> Enable the log file, using the default filename 'script_name<tvar log
-xyz>xyz</>' as the filename.</translate> <translate> Enable the log file, using '<tvar logfilename
-nolog <translate> Disable the log file (if it is enabled by default).</translate>  
-1>config.py</></translate> <translate> Sets a new maxlag parameter to a number of seconds.</translate> <translate> Defer bot edits during periods of database server lag. Default is set by <tvar maxlag
-putthrottle:n
-pt:n
-put_throttle:n
<translate> Set the minimum time (in seconds) the bot will wait between saving pages.</translate> put_throttle
-debug:item
-debug
<translate> Enable the log file and include extensive debugging data for component "item" (for all components if the second form is used).</translate> debug_log
-verbose
-v
<translate> Have the bot provide additional console output that may be useful in debugging.</translate> verbose_output
-cosmetic>cosmetic_changes</> setting made in <tvar
-cc
config>config.py</> or <tvar cosmetic_changes
-simulate <translate> Disables writing to the server.</translate> <translate> Useful for testing and debugging of new code (if given, doesn't do any real changes, but only shows what would have been changed).</translate> simulate
-<<translate> config var</translate>>:n <translate> You may use all given numeric config variables as option and modify it with command line.</translate>