Manual:refreshLinks.php

From Linux Web Expert

Details

refreshLinks.php file is a maintenance script to [re]fill the pagelinks , categorylinks , and imagelinks tables. You should run it if you found that categories are empty or don't show all relevant pages, if "What links here?" doesn't work well, or some other link-related trouble. Additionally this script purges links that point to non-existing pages from the following tables: pagelinks, categorylinks, imagelinks, templatelinks , externallinks , iwlinks , langlinks , redirect , page_props

Usage

Basic

php maintenance/refreshLinks.php [starting_article]

for example, if you want the script to start with the page with id 8,000:

php maintenance/refreshLinks.php 8000

Advanced

php refreshLinks.php [--conf|--dbpass|--dbuser|--dfn-only|--e|--globals|--help|--m|--new-only|--old-redirects-only|--quiet|--redirects-only|--wiki] <start>

Parameters

Option/Parameter Description
--dfn-only Delete links from nonexistent articles only
--new-only Only affect articles with just a single edit
--redirects-only Only fix redirects, not all links
--old-redirects-only Only fix redirects with no redirect table entry
--e <page_id> Last page id to refresh
--dfn-chunk-size Maximum number of existent IDs to check per query, default 100,000
--namespace Only fix pages in this namespace. The namespace should be the numeric ID.
--category Only fix pages in this category
--tracking-category Only fix pages in this tracking category
--m <max_lag> Maximum replication lag
--wiki For specifying the wiki ID
--help Show help text
<start> Article number (page_id) to start at
no parameters Will refresh all articles

This also supports the common options as well.

Example output

me@server:/var/www/htdocs/mw/w/maintenance$ php refreshLinks.php
Refreshing redirects table.
Starting from page_id 1 of 309.
100
200
300
Refreshing links tables.
Starting from page_id 1 of 309.
100         
200
300
Retrieving illegal entries from pagelinks... 0..0
Retrieving illegal entries from imagelinks... 0..0
Retrieving illegal entries from categorylinks... 0..0
Retrieving illegal entries from templatelinks... 0..0
Retrieving illegal entries from externallinks... 0..0
Retrieving illegal entries from iwlinks... 0..0
Retrieving illegal entries from langlinks... 0..0
Retrieving illegal entries from redirect... 0..0
Retrieving illegal entries from page_props... 0..0


Avoiding memory issues

This script may run into memory issues. To avoid this you may like to set a last page_id to refresh.

php refreshLinks.php --e 1500

To do the next set of page_ids you enter

php refreshLinks.php --e 3000 -- 1500

Just continue until all page ids in your wiki were refreshed.

If you forgot to set a last page_id to refresh and the script runs out of memory simply rerun it with the last output page_id as the article to start at, e.g.

php refreshLinks.php -- 1600

Chunking refreshLinks.php to refresh all links without memory leak

Below is an example script to run refreshLinks.php against all pages but without having memory issues.

num_pages=$(php /path/to/mediawiki/maintenance/showSiteStats.php | grep "Total pages" | sed 's/[^0-9]*//g')
end_id=0
delta=2000

echo "Beginning refreshLinks.php script"
echo "  Total pages = $num_pages"
echo "  Doing it in $delta-page chunks to avoid memory leak"

while [ "$end_id" -lt "$num_pages" ]; do
start_id=$(($end_id + 1))
end_id=$(($end_id + $delta))
echo "Running refreshLinks.php from $start_id to $end_id"
php /path/to/mediawiki/maintenance/refreshLinks.php --e "$end_id" -- "$start_id"
done

# Just in case there are more IDs beyond the guess we made with showSiteStats, run 
# one more unbounded refreshLinks.php starting at the last ID previously done
start_id=$(($end_id + 1))
echo "Running final refreshLinks.php in case there are more pages beyond $num_pages"
php /path/to/mediawiki/maintenance/refreshLinks.php "$start_id"