Manual:Importing external content

From Linux Web Expert

Revision as of 06:52, 11 September 2023 by imported>Clump (Reverted edits by 2405:204:810C:A7DC:9F50:509F:24B5:482B (talk) to last version by Ernstkm)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

To import a wiki XML file, see Manual:Importing XML dumps

Existing sites are difficult to move to MediaWiki.

"Wikifying" existing content from text files, HTML websites, or even office documents can be automated, but you'll have to write appropriate scripts on your own, and almost always must edit manually.

Learn MediaWiki's wikitext language first, because you will want to be sure to do all manual edits correctly, once, using every notable feature.

Consider installing Semantic Bundle and other extensions that extend markup before devoting a lot of manual effort to what may end up being an unsupportable set of conventions. Semantic Bundle deals well with page and object properties.

There are no general-user ready-to-run scripts available for data imports that anyone supports on more than a case-by-case basis. Any wiki farm or MediaWiki administrator usually knows some ways to convert data to SQL or to supportable files, and a number of commercial forks of MediaWiki (like BlueSpice) claim to offer some additional facilities for commonly used formats. These are commercial efforts and documenting them is beyond the scope of this manual.

Unlike proprietary CMSes like Hyperwave or HTML editors like Microsoft FrontPage, MediaWiki (and most open source software except WordPress) includes few import filters. With tens of millions of some of the most heavily accessed and trusted content in the world already in MediaWiki format, it is generally up to those maintaining data in incompatible formats to make it accessible in MediaWiki, not the other way around. MediaWiki is focused on presenting its own wikitext effectively and in every language and device, it is fundamentally not focused on old or obsolete database import. It is not contemplated to "sync" MediaWiki to anything but other MediaWiki based sites. Conversion to MediaWiki should be one-time and data maintained in its format thereafter, with a few exceptions discussed below.

One notable glaring lack is any ability to easily mirror LDAP, SMB or NFS directories into wiki pages, the best workaround at present is to link an HTML page on the same server since HTML browsers usually make pages on the fly for this.

MediaWiki and wikitext-based compatibles

To do a one-time import from another MediaWiki wiki, please see Manual:Importing XML dumps, Manual:Importing revisions, and Manual:Restoring a wiki from backup. XML dumps are required, as to import an older MediaWiki's SQL requires significant expertise with MySQL etc. To restore an old SQL version, it's best to install the original MediaWiki it was for, restore the database in phpMyAdmin, and then forward-upgrade the code. A riskier method is to install a new MediaWiki, rename its database in phpMyAdmin, import the older one, rename it to match the name used in the newer install expected in its code, and carefully restore privileges from the renamed DB to match. Then run mw-config update (not the maintenance script, it may not work) and export the MediaWiki wiki to XML properly as backup.

Live mirroring

There is at present no facility for current MediaWiki wikis similar to the old GetWiki 1.0 live XML importing which mirrored another (usually Wikipedia) MediaWiki and used its page content as default unless edited at the new (mirroring) MediaWiki. This feature however can be weakly simulated with frequent import of XML dumps for pages that don't exist on an importing wiki, ignoring versions of pages that do or over-writing them from a backup just prior to importing the XML dump.

Wikipedia did not support the GetWiki approach to live mirroring for load reasons and perhaps to avoid early forks of "community". These are historical and do not prevent pursuing this approach for current extensions. The feature would be immensely useful for intranet purposes, for instance, having more and less secure versions of the same wiki content, with extra details added for the more trusted users.

Ironically it's easier to embed MediaWiki content in WordPress than in MediaWiki itself. Some users of both like PRwatch/SourceWatch make use of this approach.

So-called "embedded wikis" are never MediaWiki-based due to this lack of capability, developers who want to embed wiki into other social media or websites almost never use MediaWiki.

JAMWiki

JAMWiki was developed between 2006 and 2013[1], was written in Java, and aimed to be mostly MediaWiki-compatible. Its major selling point was ease of installation. It is no longer actively developed.[2]

Its underlying file format, being standard wikitext, should be able to be copy-pasted from its flat-file document storage into a running MediaWiki instance. Have a look at tools like pywikibot if you need to automate this for a large JAMWiki site.

Incompatible (non-wikitext) wikis

MediaWiki's wikitext markup is among the most commonly used in the world, supporting most natural languages and character sets. Various efforts at standardizing wiki markup[3] have been attempted, but the fact remains that no "universal wiki" converter exists.

Most conversion tools are one-way, rather than bidirectional, requiring you to abandon the legacy wiki after the import is completed. However, many good options now exist for self-hosting (e.g., Docker containers and software bundles) or cloud-hosting (e.g., Fandom) your own MediaWiki instance. Good out-of-the-box support for setting up new wikis on shared hosting providers (e.g., DreamHost), more powerful desktops and LAN servers, and improved XML based backup and restore tools, have made it possible to maintain a MediaWiki instance without special expertise.

Converting content from a UseMod Wiki

Prior to MediaWiki (Wikipedia Software Phase III and Phase II), Wikipedia ran on the UseMod Wiki software written by Clifford Adams. UseModWiki is a Perl script which uses a database of text files to generate a WikiWiki site. It usually runs as a CGI script in response to web requests, but can be called directly by other Perl programs.

The storage format of UseMod Wiki is well documented.

Converting content from a PHPWiki

PHPWiki is a clone of the original WikiWikiWeb in PHP.[4] It is still receiving updates, and works with PHP 8.x.

Isaac Wilcox wrote a Perl script which converts all the commonly used PHPWiki markup to MediaWiki wikitext, although some tweaking of the results may be necessary. It was written in the MediaWiki 1.4.x – 1.5.x days, and it's likely to have problems with more recent MediaWiki releases, due to schema changes. At that time, though, it was reputed to do an excellent job of preserving almost all of the formatting, so it could still be a useful resource, even if minor changes are necessary to make it compatible with recent versions.

See PhpWiki conversion for a solution using sed. For another solution, combining the already mentioned ones, see User:Atrox~mediawikiwiki/Phpwiki2Mediawiki.

For a PHP-based solution, this script (by a now-defunct New Zealand web design shop) might be workable, after some cleanup to remove the archive.org banners and analytics.

Converting MoinMoin format to MediaWiki format

There are various scripts for this, all dodgy. See MoinMoin.

Converting WackoWiki to MediaWiki

There is WackoWiki converter (developed for http://freesource.info/ migration to http://altlinux.org/), however it will need additional tweaking before use.

Converting TikiWiki format to MediaWiki format

You can convert TikiWiki pages to MediaWiki format using this script.

Converting GoogleCode Wiki to MediaWiki

There is example of migrating site to MediaWiki, with remaining storage of pages in SNV - http://ahuman.org, code in http://usvn.ahuman.org/svn/ahwiki/tools.

It allows to store pages in both formats - .gw (googlecode) and .mw (MediaWiki), and scripts to support bidirectional SVN↔MediaWiki transfer.

Converting WikiSpaces format to MediaWiki

See Wikispaces.

Converting content from tabular (row/column) formats

See Commons methods of converting tables and charts. This includes LibreOffice Calc, Excel, OpenOffice.org formats, and etc. - it's been proposed to merge that page with this one.

Most simple tabular formats can be exported to "comma-separated" CSV files so these are commonly

Linux

On Linux csv2wiki [1] imports CSV format.

Windows

from/on Windows CSV text file

If you are using Windows you can try csv2other. It produces an output file with .txt extension containing code for a wiki table.

from/on Windows directory/folder listing

Dir2html creates simple HTML pages from Windows directories, so that these may be treated like any other HTML when imported below:

Converting content from HTML

Older tools

  • https://magnustools.toolforge.org/html2wiki.php can convert HTML tables into MediaWiki table syntax
  • HTML-WikiConverter Perl module
  • MwImporter, a PHP script for importing entire websites; it uses html2wiki and other MediaWiki maintenance scripts to import entire directories of static html and image files while preserving relative links, etc.
  • The Html2Wiki extension. The (unmaintained) extension is a wrapper around Pandoc.

Converting content from a MS-Word document

Microsoft Office Word Add-in For MediaWiki saves documents from Microsoft Office Word straight into MediaWiki.

LibreOffice also does a good job of reading MS Word and a usable job of exporting as MediaWiki wikitext.

Converting content from plain text files

You can use the importTextFiles.php maintenance script.

Converting content from other sources

If you are able and willing to do some scripting by yourself, it is possible to import almost any existing textual content with a documented file format into MediaWiki.

Example: CIA World Factbook 2002

As an example, there is the public domain data from the CIA World Factbook 2002 which was imported into the MediaWiki Wikitravel.

This is a one-time script; most paths and coding are hard-coded, and lots of the code is for parsing the CIA World Factbook print pages, but it might serve as a good example of what can be done.

Importing content in Windows PowerShell

Manual:Importing XML dumps describes various tools to import XML dumps of wiki pages, including the Special:Import wiki page.

References