Manual:XML Import file manipulation in CSharp
Overview
This page shows how to use the MediaWiki schema with Visual Studio .NET C# to manipulate a MediaWiki XML import file in code using object-oriented programming instead of working directly with raw XML.
One use case for this is that you might have a number of pages in a wiki site that need to be modified. One way to do this is to export them to an XML file, then manipulate the XML file, and then import the XML file back. Of course, you should be sure that users cannot modify these files during the span between export and re-import. For sites with moderate usage, this approach might be appropriate.
Schema
As shown in this abbreviated example of an XML import file below, the schemaLocation of the XML file is at https://www.mediawiki.org/xml/export-0.3.xsd:
<mediawiki xmlns="https://www.mediawiki.org/xml/export-0.3/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://www.mediawiki.org/xml/export-0.3/ https://www.mediawiki.org/xml/export-0.3.xsd"
version="0.3"
xml:lang="en">
<siteinfo>...</siteinfo>
<page>...</page>
<page>...</page>
<page>...</page>
</mediawiki>
First, download the MediaWiki schema at https://www.mediawiki.org/xml/export-0.3.xsd. Place the schema file in a .NET project folder, and consider renaming the file to something more intuitive such as MediaWikiExport.xsd. Using Visual Studio.NET's xsd.exe tool, you can generate a .NET class file based on this schema using this VS.NET command line prompt:
xsd c:/inetpub/wwwroot/MyProject/MediaWikiExport.xsd /c
This command will create a class file named MediaWikiExport.cs.
Class Diagram
The auto-generated Class file will look like this:
Schema Diagram
The schema will look like this:
.NET Project
After you add your new auto-generated class file, add the file into your .NET project, such as a console application project.
In this code sample, you will see examples of how to work with the XML file in an object-oriented way instead of parsing the raw XML. Note that this code sample below was used for the 1.13.2 version of MediaWiki.
using System;
using System.Collections.Generic;
using System.Text;
using System.Xml;
using System.Xml.Serialization;
namespace WikiFileManipulation
{
class Program
{
static void Main(string[] args)
{
// name of the exported wiki file
string file = "ExportedWikiPages.xml";
// instantiate MediaWikiType object
MediaWikiType mw = new MediaWikiType();
// Open XML file containing exported wiki pages
System.Xml.XmlDataDocument xml = new System.Xml.XmlDataDocument();
xml.Load(file);
// Deserialize the XML file into the MediaWikiType object
XmlSerializer serializer = new XmlSerializer(typeof(MediaWikiType));
System.Xml.XmlNodeReader oReader = new System.Xml.XmlNodeReader(xml);
mw = (MediaWikiType)serializer.Deserialize(oReader);
// Loop through all the Pages in the MediaWikiType object
foreach (PageType p in mw.page)
{
foreach (object o in p.Items)
{
// Examine the RevisionType
if (o is RevisionType)
{
// Cast to RevisionType object
RevisionType r = o as RevisionType;
// if you increment "timestamp" by one minute, then you'll be able to re-import file
r.timestamp = r.timestamp.AddMinutes(1);
// Update the value of the "text" of the revision
// this is the page text
TextType text = r.text as TextType;
text.Value = text.Value.Replace("oldvalue", "newvalue");
}
}
}
//serialize the updated object back to the original file with the corrections/additions
System.IO.TextWriter writer = new System.IO.StreamWriter(file);
serializer.Serialize(writer, mw);
writer.Close();
}
}
}
C# 3.0 version
Here's the same example using C# 3.0 features, including type inference and a lambda expression.
using System.IO;
using System.Linq;
using System.Xml;
using System.Xml.Serialization;
namespace WikiFileManipulation {
class Program {
static void Main(string[] args) {
// name of the exported wiki file
var file = "ExportedWikiPages.xml";
// Open XML file containing exported wiki pages
var xml =new XmlDataDocument();
xml.Load(file);
// Deserialize the XML file into the MediaWikiType object
var serializer = new XmlSerializer(typeof(MediaWikiType));
var nodeReader = new XmlNodeReader(xml);
var mw = (MediaWikiType)serializer.Deserialize(nodeReader);
// Loop through all the RevisionType Items from each Page
foreach (var r in mw.page.SelectMany(p=>p.Items.OfType<RevisionType>())) {
// increment the "timestamp" in order to re-import file
r.timestamp = r.timestamp.AddMinutes(1);
// Update each revision's text
r.text.Value = r.text.Value.Replace("oldvalue", "newvalue");
}
// serialize the updates back to the same file
var writer = new StreamWriter(file);
serializer.Serialize(writer, mw);
writer.Close();
}
}
}