Help:Maintenance script rebuildData.php

From Linux Web Expert

Revision as of 02:34, 26 February 2023 by >Andre
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Template:MW file/noslash Template:MW file/noslash The "rebuildData.php" maintenance script recreates all the semantic data in the database, by cycling through all the pages that might have semantic data, and calling functions that re-save semantic data for each one, i.e. doing a full re-parse.

This script is a command line tool, while special page "SemanticMediaWiki"No description was provided. (formerly known as special page "SMWAdmin") data rebuilding (repair) uses the job queue to process all pages. If possible use this maintenance script for data rebuilding. See also the help page on using special page "SemanticMediaWiki" for data rebuilding.

Semantic MediaWiki 3.2.0Released on an unknown date unknown versions of MediaWiki brought an improved client output to this maintenance script.1

This maintenance script deprecated the former "SMW_refreshData.php" script starting with Semantic MediaWiki 1.9.2Released on an unknown date unknown versions of MediaWiki which was removed with the release of Semantic MediaWiki 3.0.0Released on an unknown date unknown versions of MediaWiki in October 2018. When upgrading to Semantic MediaWiki 3.0.0Released on an unknown date unknown versions of MediaWiki the migration to this script must be done.

Usage

php rebuildData.php [-d|-s|-e|-f|-n|--startidfile|-b|-v|-c|-p|-t|--namespace|--page|--redirects|--query|-f|--no-cache|--report-runtime|--debug|--skip-properties|--shallow-update|--ignore-exceptions|--exception-log|--with-maintenance-log|--revision-mode|--force-update|--dispose-outdated]
This only shows the script specific parameters.

Parameters

Maintenance scripts provide generic maintenance parameters, script dependent parameters and depending on the maintenance script script specific parameters which are described on this page if provided.

Script specific parameters
Parameter Description
-d <delay> Wait for this many milliseconds after processing an article. Useful for limiting server load.
-s <startid> Start refreshing at given object ID. Useful for partial refreshing.
-e <endid> Stop refreshing at given object ID. Useful for partial refreshing.
-n <numids> Stop refreshing after processing a given number of IDs. Useful for partial refreshing.
--startidfile <startidfile> Read <startid> from a file instead of the arguments and write the next ID to the file when finished. Useful for continual partial refreshing from cron.
-b <backend> Execute the operation for the storage backend of the given name (default is to use the current data backend/store)
-v Be verbose about the progress.
-c or --categories Will refresh only category pages (and other explicitly named categories).
The --categories option is only available starting with Semantic MediaWiki 2.4.0Released on an unknown date unknown versions of MediaWiki.2
-p Will refresh only property pages (and other explicitly named namespaces)
--namespace Only refresh pages in the selected namespace identified by its constant, e.g. --namespace="NS_MAIN". Available since Semantic MediaWiki 3.1.0Released on an unknown date unknown versions of MediaWiki.3
-t Will refresh only type pages (and other explicitly named namespaces)
This option was removed with Semantic MediaWiki 2.3.0Released on an unknown date unknown versions of MediaWiki since namespace "Type" is no longer used. 4
--page=<pagelist> Will refresh only the pages of the given names, with | used as a separator.
The options -s, -e, -n, --startidfile, -c, -p, -t are ignored if --page is given.
--redirects Will refresh only the pages which are redirecting to another page. Available since Semantic MediaWiki 2.4.0Released on an unknown date unknown versions of MediaWiki.2
--query Will refresh only pages returned by a given query. Available since Semantic MediaWiki 1.9.2Released on an unknown date unknown versions of MediaWiki.5
The options -s, -e, -n, --startidfile, -c, -p, -t are ignored if --query is given.
-f Fully delete all content instead of just refreshing relevant entries. This will also rebuild the whole storage structure. May leave the wiki temporarily incomplete.
--no-cache Sets the $wgMainCacheType to none while running the script. Available since Semantic MediaWiki 2.2.0Released on an unknown date unknown versions of MediaWiki6 with improvements in Semantic MediaWiki 2.4.0Released on an unknown date unknown versions of MediaWiki7
--report-runtime Will return memory usage and runtime of the respective script execution. Available since Semantic MediaWiki 2.1.0Released on an unknown date unknown versions of MediaWiki as --runtime.8
Since Semantic MediaWiki 2.2.0Released on an unknown date unknown versions of MediaWiki this parameter was renamed to --report-runtime.9
--debug Sets global variables to support debug ouput while running. Available since Semantic MediaWiki 2.2.0Released on an unknown date unknown versions of MediaWiki.10
--skip-properties Is to skip the default properties rebuild (only recommended when successive build steps are used). Available since Semantic MediaWiki 2.3.0Released on an unknown date unknown versions of MediaWiki.11 From Semantic MediaWiki 2.2.0Released on an unknown date unknown versions of MediaWiki to Semantic MediaWiki 2.2.3Released on an unknown date unknown versions of MediaWiki it was not possible to avoid the properties being rebuild first.12
--shallow-update As option is to parse only those entities that have a different last modified timestamp compared to that of its last revision and should only be used to run a quick update on deleted, redirects, and other out of sync entities. Available since Semantic MediaWiki 2.3.0Released on an unknown date unknown versions of MediaWiki.4
--ignore-exceptions Allows to ignore encountered exceptions, i.e. the script does not stop as soon as an exception (error) appears. Available since Semantic MediaWiki 2.4.0Released on an unknown date unknown versions of MediaWiki.2
This option is best used together with the --exception-log option.
--exception-log="/path/to/smw/logs/directory/" Writes exceptions (errors) encountered to a log file allowing for later debugging. Available since Semantic MediaWiki 2.4.0Released on an unknown date unknown versions of MediaWiki.13
A file name is automatically being created containing the string "logrebuilddata-exceptions" and the timestamp (ISO format), e.g. "logrebuilddata-exceptions-2016-12-05.log". In case an unambiguous name is needed just add an identifier to the option, e.g. --exception-log="/path/to/smw/logs/directory/mywiki-". It will be prepended to the file name, e.g. "mywiki-logrebuilddata-exceptions-2016-12-05.log".
--with-maintenance-log Adds a log entry to "Special:Logs" on the wiki and reports the script's runtime. Available since Semantic MediaWiki 2.4.0Released on an unknown date unknown versions of MediaWiki.13

File:OOjs UI icon lightbulb-yellow.svg <translate> Note:</translate> If you are using this parameter make sure that MediaWiki's configuration parameter $wgMaxNameChars (MediaWiki.org) is set to a value not lower than "17".14 Otherwise an exception will be issued informing about the minimum value for this setting ("32" or higher is recommended).15

--revision-mode Use the revision information and compares the latest title/page revision ID with that of the associated revision ID in SMW hereby allowing to make some assumptions about the content state including:
  • If both revision IDs match then it is assumed that no content divergence occurred.
  • The wiki content (including those embedded annotations) should match on what is stored in SMW for that particular entity.
  • As a result, further processing (especially the parsing of content which is the most costly operation during a data rebuild) of that entity is skipped. Available since Semantic MediaWiki 3.0.0Released on an unknown date unknown versions of MediaWiki.16
--force-update To "force" an update under any circumstance. Available since Semantic MediaWiki 3.0.0Released on an unknown date unknown versions of MediaWiki.16
--dispose-outdated To dispose of outdated entities without starting a data rebuild an update under any circumstance. Available since Semantic MediaWiki 3.0.0Released on an unknown date unknown versions of MediaWiki.17

File:OOjs UI icon lightbulb-yellow.svg <translate> Note:</translate> Since Semantic MediaWiki 3.2.0Released on an unknown date unknown versions of MediaWiki18 one can use maintenance script "disposeOutdatedEntities.php"No description was provided. for this task.

--auto-recovery Allows to restart from a canceled (or aborted) index run. Available since Semantic MediaWiki 3.1.0Released on an unknown date unknown versions of MediaWiki.

Progress display

The progress (starting with Semantic MediaWiki 2.3.0Released on an unknown date unknown versions of MediaWiki) that is displayed during a rebuild process is self-adjusting based in the amount of expected ID's vs. the actual amount of ID's being processed.19 Due to each entity (i.e. subobject, property, and subject) being assigned an ID it does not necessarily correspond to the page ID of MediaWiki as various types of subobjects embedded in a page are assigned an ID as well.

Especially in case of a full rebuild (-f) is the progress slanting where the start amount is lower than the final ID count (which is predicted from the MediaWiki articles count).

Quick and slow progress

ID's assigned to a "real" page are parsed using MW's Parser to ensure that all data and extensions influencing the state of the data are being accounted for which amounts to the extensive memory and time effort required to finalize a full parse of a page including all #subobject, #ask plus any other embedded parser function calls.20

ID's that represent data items such as subobjects or value objects can be processed using Semantic MediaWiki internal functions hence the comparatively quick update progress.

Verbose output

The verbose output (-v) got extended2 in Semantic MediaWiki 2.4.0Released on an unknown date unknown versions of MediaWiki to display additional information about an entity that is being processed. The marker * identifies a regular MediaWiki page with the ID corresponding to the page table entry while non-marked ID's are matched to an entry in the smw object ids database table.

Marked for deletion entries

Starting with Semantic MediaWiki 2.3.0Released on an unknown date unknown versions of MediaWiki, entities marked as deleted :smw-delete are being removed at each "rebuildData.php" run to free tables of outdated entities.11

The following command quietly removes just the outdated entities21
php rebuildData.php --skip-properties -s 1 -e 1 --quiet

Since Semantic MediaWiki 3.0.0Released on an unknown date unknown versions of MediaWiki a dedicated flag is available:17

php rebuildData.php --skip-properties --dispose-outdated --quiet

Starting with Semantic MediaWiki 3.2.0Released on an unknown date unknown versions of MediaWiki the dedicated maintenance script "disposeOutdatedEntities.php"No description was provided. is available.18

Dispose of outdated object ID references

Starting with Semantic MediaWiki 2.4.0Released on an unknown date unknown versions of MediaWiki outdated object ID references are disposed when running "rebuildData.php".22 When the data type of a property type is changed, a property is removed or other object values are deleted chances are that some ID's remain in the smw object ids database table.23 To avoid a pile of garbage references being collected in this database table it is checked if for the ID's whether they can safely be removed or not during the "rebuidData.php" run. This is best and frequently done using the --shallow-update option.21.

The following command removes outdated object ID references
php rebuildData.php --shallow-update

Examples

The following command refreshes existing semantic data items with a delay of 50 ms between every data item without prompting progress information.
php rebuildData.php -d 50 -q
The following command verbosely rebuilds semantic data after deleting existing items with a delay of 100 ms between every data item.
php rebuildData.php -f -d 100 -v
The following command verbosely rebuilds semantic data of pages in a given category.
php rebuildData.php --query='[[Category:SomeCategory]]' -v
The following command verbosely rebuilds semantic data with a delay of 75 ms between every data item and provides memory usage information after it has been completed.
php rebuildData.php -d 75 --report-runtime
Example output:
Memory used: 25543928 (b: 11429464, a: 36973392) with a runtime of 81.62 sec (1.36 min)
a) memory used after execution and b) memory used before the execution
The following command refreshes the wiki pages "Page 1" and "Page 2" without prompting progress information.
php rebuildData.php --page="Page 1|Page 2"
The following command rebuilds semantic data with a delay of 50 ms between every data item, ignores errors which may arise during execution and writes them to a file in the directory provided.
php rebuildData.php -d 50 --ignore-exceptions --exception-log="/var/log/mediawiki/"
Exceptions are e.g written to the "mywiki.logrebuilddata-exceptions-2016-08-14.log" file if the wiki ID was "mywiki" and the script was run on August 14, 2016.
The following command removes outdated object ID references and adds an maintenance log entry to special page "Log" (Semantic MediaWiki log)13
php rebuildData.php --shallow-update --with-maintenance-log

Note

There was some discussion on the mailing list about the occasions it is required to run this maintenance script.24

See also

#scite could not render a citation text for reference "gh:smw:243" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:498" because type "issue" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:643" because type "issue" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:749" because type "issue" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:766" because type "issue" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:820" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:877" because type "issue" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:1042" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:1106" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:1127" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:1216" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:1433" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:1698" because type "issue" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:3284" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:3441" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:3960" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:4517" because type "pullrequest" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:1754:236913464" because type "issuecomment" was not assigned to a template.
#scite could not render a citation text for reference "gh:smw:68e8bc9" because type "commit" was not assigned to a template.

References

  1. ^ gh:smw:4517 
  2. a b c d gh:smw:1433 
  3. ^ gh:smw:3960 
  4. a b gh:smw:1127 
  5. ^ gh:smw:243 
  6. ^ gh:smw:749 
  7. ^ gh:smw:820 
  8. ^ gh:smw:643 
  9. ^ gh:smw:68e8bc9 
  10. ^ gh:smw:766 
  11. a b gh:smw:1106 
  12. ^ gh:smw:877 
  13. a b c gh:smw:1361 
  14. ^ gh:smw:1983 
  15. ^ gh:smw:1985 
  16. a b gh:smw:3441 
  17. a b gh:smw:3284 
  18. a b gh:smw:4484 
  19. ^ gh:smw:1042 
  20. ^ gh:smw:1698 
  21. a b gh:smw:1754:236913464 
  22. ^ gh:smw:1216 
  23. ^ gh:smw:498 
  24. ^  Semantic MediaWiki: User mailing list thread "When is it required to run rebuildData.php"