Manual:Pywikibot/table2wiki.py/cs

From Linux Web Expert

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

table2wiki.py is a Pywikibot script used to convert HTML-tables to MediaWiki's own syntax.

Specific arguments:


<translate> Parameter</translate><translate> Description</translate>
-always The bot won't ask for confirmation when putting a page
-skipwarning Skip processing a page when a warning occurred. Only used when -always is or becomes True.
-quiet Don't show diffs in -always mode
-mysqlquery Retrieve information from a local mirror. Searches for pages with HTML tables, and tries to convert them on the live wiki.
-xml Retrieve information from a local XML dump (pages_current, see https://download.wikimedia.org). Argument can also be given as "-xml:filename". Searches for pages with HTML tables, and tries to convert them on the live wiki.

Ukázka:

$ python pwb.py table2wiki -xml:20050713_pages_current.xml -lang:de

Funkce:

  • Save against missing ‎</td>
  • Corrects attributes of tags

Známé chyby:

  • Broken HTML tables will most likely result in broken wiki tables!
Every change needs to be checked. This bot can make mistakes.


<translate> Generators and filters available</translate>

<translate> Generator options</translate>
<translate> Parameter</translate> <translate> Description</translate>
-cat <translate> Work on all pages which are in a specific category.</translate> <translate> Argument can also be given as "<tvar name=1>-cat</tvar>:categoryname" or as "<tvar name=1>-cat</tvar>:categoryname<tvar name=2>|</tvar>fromtitle" (using <tvar name=3>#</tvar> instead of <tvar name=2>|</tvar> is also allowed in this one and the following)</translate>
-catr <translate> Like <tvar name=1>-cat</tvar>, but also recursively includes pages in subcategories, sub-subcategories etc. of the given category. Argument can also be given as "<tvar name=2>-catr</tvar>:categoryname" or as "<tvar name=2>-catr</tvar>:categoryname|fromtitle".</translate>
-subcats <translate> Work on all subcategories of a specific category.</translate> <translate> Argument can also be given as "<tvar name=1>-subcats</tvar>:categoryname" or as "<tvar name=1>-subcats</tvar>:categoryname<tvar name=2>|</tvar>fromtitle".</translate>
-subcatsr <translate> Like <tvar name=1>-subcats</tvar>, but also includes sub-subcategories etc. of the given category. Argument can also be given as "<tvar name=2>-subcatsr</tvar>:categoryname" or as "<tvar name=2>-subcatsr</tvar>:categoryname<tvar name=3>|</tvar>fromtitle".</translate>
-uncat <translate> Work on all pages which are not categorised.</translate>
-uncatcat <translate> Work on all categories which are not categorised.</translate>
-uncatfiles <translate> Work on all files which are not categorised.</translate>
-file <translate> Read a list of pages to treat from the named text file. Page titles in the file may be either enclosed with brackets (example: <tvar name=brackets>[[Page]]</tvar>), or be separated by new lines. Argument can also be given as "<tvar name=1>-file</tvar>:filename".</translate>
-filelinks <translate> Work on all pages that use a certain image/media file.</translate> <translate> Argument can also be given as "<tvar name=1>-filelinks</tvar>:filename".</translate>
-search <translate> Work on all pages that are found in a MediaWiki search across all {{<tvar name=ns>ll|Help:Namespaces</tvar>|namespaces}}.</translate>
-logevents <translate> Work on articles that were on a specified Special:Log.</translate> <translate> The value may be a comma separated list of these values:</translate>
logevent,username,start,end

<translate> or for backward compatibility:</translate>

logevent,username,total

<translate> To use the default value, use an empty string. You have options for every type of logs given by the log event parameter which could be one of the following:</translate>

spamblacklist, titleblacklist, gblblock, renameuser, globalauth, gblrights, gblrename, abusefilter, massmessage, thanks, usermerge, block, protect, rights, delete, upload, move, import, patrol, merge, suppress, tag, managetags, contentmodel, review, stable, timedmediahandler, newusers

<translate> It uses the default number of pages <tvar name=1>10</tvar>.

Examples: </translate>

-logevents:move <translate> gives pages from move log (usually redirects)</translate>
-logevents:delete,,20 <translate> gives <tvar name=1>20</tvar> pages from deletion log</translate>
-logevents:protect,Usr <translate> gives pages from protect by user <tvar name=1>Usr</tvar></translate>
-logevents:patrol,Usr,20 <translate> gives <tvar name=1>20</tvar> patroled pages by Usr</translate>
-logevents:upload,,20121231,20100101 <translate> gives upload pages in the <tvar name=1>2010</tvar>s, <tvar name=2>2011</tvar>s, and <tvar name=3>2012</tvar>s</translate>
-logevents:review,,20121231 <translate> gives review pages since the beginning till the 31 Dec 2012</translate>
-logevents:review,Usr,20121231 <translate> gives review pages by user <tvar name=1>Usr</tvar> since the beginning till the 31 Dec 2012</translate>
<translate> In some cases it must be given as <tvar name=1>-logevents:"move,Usr,20"</tvar></translate>
-interwiki <translate> Work on the given page and all equivalent pages in other languages.</translate> <translate> This can, for example, be used to fight multi-site spamming. Attention: this will cause the bot to modify pages on several wiki sites, this is not well tested, so check your edits!</translate>
-links <translate> Work on all pages that are linked from a certain page.</translate> <translate> Argument can also be given as "<tvar name=1>-links</tvar>:linkingpagetitle".</translate>
-liverecentchanges <translate> Work on pages from the live recent changes feed.</translate> <translate> If used as <tvar name=1>-liverecentchanges</tvar>:x, work on x recent changes.</translate>
-imagesused <translate> Work on all images that contained on a certain page.</translate> <translate> Can also be given as "<tvar name=1>-imagesused</tvar>:linkingpagetitle".</translate>
-newimages <translate> Work on the most recent new images.</translate> <translate> If given as <tvar name=1>-newimages:x</tvar>, will work on <tvar name=2>x</tvar> newest images.</translate>
-newpages <translate> Work on the most recent new pages.</translate> <translate> If given as <tvar name=1>-newpages:x</tvar>, will work on <tvar name=2>x</tvar> newest pages.</translate>
-recentchanges <translate> Work on the pages with the most recent changes.</translate> <translate> If given as <tvar name=1>-recentchanges:x</tvar>, will work on the x most recently changed pages. If given as <tvar name=2>-recentchanges:offset,duration</tvar> it will work on pages changed from 'offset' minutes with 'duration' minutes of timespan.</translate>

<translate> Examples:</translate>
-recentchanges:20 - <translate> gives the <tvar name=1>20</tvar> most recently changed pages</translate>
-recentchanges:120,70 - <translate> will give pages with <tvar name=1>120</tvar> offset minutes and <tvar name=2>70</tvar> minutes of timespan</translate>
-recentchanges:visualeditor,10 - <translate> gives the <tvar name=2>10</tvar> most recently changed pages marked with '<tvar name=1>visualeditor</tvar>'</translate>
-recentchanges:"mobile edit,60,35" - <translate> will retrieve pages marked with 'mobile edit' for the given offset and timespan</translate>

<translate> rctags are supported, and the rctag must be the very first parameter part.</translate>
-unconnectedpages <translate> Work on the most recent unconnected pages to the Wikibase repository.</translate> <translate> Given as <tvar name=1>-unconnectedpages:x</tvar>, will work on the <tvar name=2>x</tvar> most recent unconnected pages.</translate>
-ref <translate> Work on all pages that link to a certain page.</translate> <translate> Argument can also be given as "<tvar name=1>-ref</tvar>:referredpagetitle".</translate>
-start <translate> Specifies that the robot should go alphabetically through all pages on the home wiki, starting at the named page.</translate> <translate> Argument can also be given as "<tvar name=1>-start</tvar>:pagetitle".</translate> <translate> You can also include a namespace.</translate> <translate> For example, "<tvar name=1>-start:Template:!</tvar>" will make the bot work on all pages in the template namespace.</translate> <translate> default value is <tvar name=1>start:!</tvar></translate>
-prefixindex <translate> Work on pages commencing with a common prefix.</translate>
-transcludes <translate> Work on all pages that use a certain template.</translate> <translate> Argument can also be given as "<tvar name=1>-transcludes</tvar>:Title".</translate>
-unusedfiles <translate> Work on all description pages of images/media files that are not used anywhere.</translate> <translate> Argument can be given as "<tvar name=1>-unusedfiles:n</tvar>" where <tvar name=2>n</tvar> is the maximum number of articles to work on.</translate>
-lonelypages <translate> Work on all articles that are not linked from any other article.</translate> <translate> Argument can be given as "<tvar name=1>-lonelypages:n</tvar>" where <tvar name=2>n</tvar> is the maximum number of articles to work on.</translate>
-unwatched <translate> Work on all articles that are not watched by anyone.</translate> <translate> Argument can be given as "<tvar name=1>-unwatched:n</tvar>" where <tvar name=2>n</tvar> is the maximum number of articles to work on.</translate>
-property:name <translate> Work on all pages with a given property name from Special:PagesWithProp.</translate>
-usercontribs <translate> Work on all articles that were edited by a certain user.</translate> <translate> (Example : <tvar name=1>-usercontribs:DumZiBoT</tvar>)</translate>
-weblink <translate> Work on all articles that contain an external link to a given URL; may be given as "<tvar name=1>-weblink:url</tvar>"</translate>
-withoutinterwiki <translate> Work on all pages that don't have interlanguage links.</translate> <translate> Argument can be given as "<tvar name=1>-withoutinterwiki:n</tvar>" where <tvar name=2>n</tvar> is the total to fetch.</translate>
-mysqlquery <translate> Takes a Mysql query string like <tvar name=1>"SELECT page_namespace, page_title, FROM page WHERE page_namespace = 0"</tvar> and works on the resulting pages. See <tvar name=mysql>Manual:Pywikibot/MySQL </tvar>.</translate>
-sparql <translate> Takes a SPARQL SELECT query string including ?item and works on the resulting pages.</translate>
-sparqlendpoint <translate> Specify SPARQL endpoint URL (optional).</translate> <translate> (Example : <tvar name=1>-sparqlendpoint:http://myserver.com/sparql</tvar>)</translate>
-searchitem <translate> Takes a search string and works on Wikibase pages that contain it.</translate> <translate> Argument can be given as "<tvar name=1>-searchitem</tvar>:text", where text is the string to look for, or "<tvar name=2>-searchitem:lang</tvar>:text", where <tvar name=3>lang</tvar> is the language to search items in.</translate>
-random <translate> Work on random pages returned by Special:Random.</translate> <translate> Can also be given as "<tvar name=1>-random:n</tvar>" where <tvar name=2>n</tvar> is the number of pages to be returned.</translate>
-randomredirect <translate> Work on random redirect pages returned by Special:RandomRedirect.</translate> <translate> Can also be given as "<tvar name=1>-randomredirect:n</tvar>" where <tvar name=2>n</tvar> is the number of pages to be returned.</translate>
-google <translate> Work on all pages that are found in a Google search.</translate> <translate> You need a Google Web API license key.</translate> <translate> Note that Google doesn't give out license keys anymore.</translate> <translate> See <tvar name=1>google_key</tvar> in <tvar name=2>config.py</tvar> for instructions.</translate> <translate> Argument can also be given as "<tvar name=1>-google</tvar>:searchstring".</translate>
-yahoo <translate> Work on all pages that are found in a Yahoo search.</translate> <translate> Depends on python module <tvar name=module>pYsearch</tvar>.</translate> <translate> See <tvar name=1>yahoo_appid</tvar> in <tvar name=config>config.py</tvar> for instructions.</translate>
-page <translate> Work on a single page.</translate> <translate> Argument can also be given as "<tvar name=1>-page</tvar>:pagetitle", and supplied multiple times for multiple pages.</translate>
-pageid <translate> Work on a single pageid.</translate> <translate> Argument can also be given as "<tvar name=1>-pageid</tvar>:pageid1,pageid2,." or "<tvar name=1>-pageid</tvar>:'pageid1|pageid2|..'" and supplied multiple times for multiple pages.</translate>
-linter <translate> Work on pages that contains lint errors.</translate> <translate> Extension <tvar name=1>Linter </tvar> must be available on the site.</translate> <translate> <tvar name=1>-linter</tvar> select all categories.</translate> <translate> <tvar name=1>-linter:high</tvar>, <tvar name=2>-linter:medium</tvar> or <tvar name=3>-linter:low</tvar> select all categories for that prio.</translate> <translate> Single categories can be selected with commas as in <tvar name=1>-linter</tvar>:cat1,cat2,cat3</translate> <translate> Adding '<tvar name=1>/int</tvar>' identifies Lint ID to start querying from: e.g. <tvar name=2>-linter:high/10000</tvar></translate> <translate> <tvar name=1>-linter:show</tvar> just shows available categories.</translate>
<translate> Filter options</translate>
<translate> Parameter</translate> <translate> Description</translate>
-catfilter <translate> Filter the page generator to only yield pages in the specified category.</translate> <translate> See <tvar name=1>-cat</tvar> generator for argument format.</translate>
-grep <translate> A regular expression that needs to match the article otherwise the page won't be returned.</translate> <translate> Multiple <tvar name=1>-grep:regexpr</tvar> can be provided and the page will be returned if content is matched by any of the regexpr provided.</translate> <translate> Case insensitive regular expressions will be used and dot matches any character, including a newline.</translate>
-grepnot <translate> Like <tvar name=1>-grep</tvar>, but return the page only if the regular expression does not match.</translate>
-intersect <translate> Work on the intersection of all the provided generators.</translate>
-limit <translate> When used with any other argument <tvar name=1>-limit:n</tvar> specifies a set of pages, work on no more than <tvar name=2>n</tvar> pages in total.</translate>
-namespaces
-namespace
-ns
<translate> Filter the page generator to only yield pages in the specified namespaces.</translate> <translate> Separate multiple namespace numbers or names with commas.</translate>

<translate> Examples:</translate>

-ns:0,2,4 -ns:Help,MediaWiki

<translate> You may use a preleading "<tvar name=1>not</tvar>" to exclude the namespace.</translate> <translate> Examples:</translate>

-ns:not:2,3 -ns:not:Help,File

<translate> If used with <tvar name=1>-newpages/-random/-randomredirect/-linter</tvar> generators, <tvar name=2>-namespace/-ns</tvar> must be provided before <tvar name=1>-newpages/-random/-randomredirect/-linter</tvar>.</translate> <translate> If used with <tvar name=1>-recentchanges</tvar> generator, efficiency is improved if <tvar name=2>-namespace</tvar> is provided before <tvar name=1>-recentchanges</tvar>.</translate>

<translate> If used with <tvar name=1>-start</tvar> generator, <tvar name=2>-namespace/-ns</tvar> shall contain only one value.</translate>
-onlyif <translate> A claim the page needs to contain, otherwise the item won't be returned.</translate> <translate> The format is property=value,qualifier=value.</translate> <translate> Multiple (or none) qualifiers can be passed, separated by commas.</translate>

<translate> Examples:</translate>
P1=Q2 (<translate> property <tvar name=1>P1</tvar> must contain value <tvar name=2>Q2</tvar></translate>)
P3=Q4,P5=Q6,P6=Q7 (<translate> property <tvar name=1>P3</tvar> with value <tvar name=2>Q4</tvar> and qualifiers: <tvar name=3>P5</tvar> with value <tvar name=4>Q6</tvar> and <tvar name=5>P6</tvar> with value <tvar name=6>Q7</tvar></translate>)

<translate> Value can be page ID, coordinate in format:</translate> latitude,longitude[,precision] (<translate> all values are in decimal degrees), year, or plain string.</translate> <translate> The argument can be provided multiple times and the item page will be returned only if all claims are present.</translate> <translate> Argument can be also given as "<tvar name=1>-onlyif:expression</tvar>".</translate>
-onlyifnot <translate> A claim the page must not contain, otherwise the item won't be returned.</translate> <translate> For usage and examples, see <tvar name=1>-onlyif</tvar> above.</translate>
-ql <translate> Filter pages based on page quality.</translate> <translate> This is only applicable if contentmodel equals '<tvar name=1>proofread-page</tvar>', otherwise has no effects.</translate> <translate> Valid values are in range 0-4.</translate> <translate> Multiple values can be comma-separated.</translate>
-subpage <translate> <tvar name=1>-subpage:n</tvar> filters pages to only those that have depth n i.e. a depth of 0 filters out all pages that are subpages, and a depth of 1 filters out all pages that are subpages of subpages.</translate>
-titleregex <translate> A regular expression that needs to match the article title otherwise the page won't be returned.</translate> <translate> Multiple <tvar name=1>-titleregex:regexpr</tvar> can be provided and the page will be returned if title is matched by any of the regexpr provided.</translate> <translate> Case insensitive regular expressions will be used and dot matches any character.</translate>
-titleregexnot <translate> Like <tvar name=1>-titleregex</tvar>, but return the page only if the regular expression does not match.</translate>


<translate> Global arguments available</translate>

<translate> These options will override the configuration in <tvar|1>user-config.py </> settings. </translate>

<translate> Global options</translate>
<translate> Parameter</translate> <translate> Description</translate> <translate> Config variable</translate>
-dir:<translate> PATH</translate> <translate> Read the bot's configuration data from directory given by PATH, instead of from the default directory.</translate>  
-config:<translate> file</translate> <translate> The user config filename.</translate> Default is user-config.py. user-config.py
-configfile>user-config.py</>. <tvar xx>xx</> should be the language code.</translate> mylang
-configfile>user-config.py</>.</translate> <translate> Set the family of the wiki you want to work on, e.g. wikipedia, wiktionary, wikitravel, ...</translate> <translate> This will override the configuration in <tvar family
-xyz>xyz</>' instead of the default username.</translate> <translate> Log in as user '<tvar usernames
-xyz>xyz</>. (only use for bots that require no input from stdin).</translate> <translate> Immediately return control to the terminal and redirect stdout and stderr to file <tvar  
-help <translate> Show the help text.</translate>  
-1>-bot.log</>' Logs will be stored in the logs subdirectory.</translate> <translate> Enable the log file, using the default filename 'script_name<tvar log
-xyz>xyz</>' as the filename.</translate> <translate> Enable the log file, using '<tvar logfilename
-nolog <translate> Disable the log file (if it is enabled by default).</translate>  
-1>config.py</></translate> <translate> Sets a new maxlag parameter to a number of seconds.</translate> <translate> Defer bot edits during periods of database server lag. Default is set by <tvar maxlag
-putthrottle:n
-pt:n
-put_throttle:n
<translate> Set the minimum time (in seconds) the bot will wait between saving pages.</translate> put_throttle
-debug:item
-debug
<translate> Enable the log file and include extensive debugging data for component "item" (for all components if the second form is used).</translate> debug_log
-verbose
-v
<translate> Have the bot provide additional console output that may be useful in debugging.</translate> verbose_output
-cosmetic>cosmetic_changes</> setting made in <tvar
-cc
config>config.py</> or <tvar cosmetic_changes
-simulate <translate> Disables writing to the server.</translate> <translate> Useful for testing and debugging of new code (if given, doesn't do any real changes, but only shows what would have been changed).</translate> simulate
-<<translate> config var</translate>>:n <translate> You may use all given numeric config variables as option and modify it with command line.</translate>