Manual:XML Import file manipulation in CSharp

From Linux Web Expert

Overview

This page shows how to use the MediaWiki schema with Visual Studio .NET C# to manipulate a MediaWiki XML import file in code using object-oriented programming instead of working directly with raw XML.

One use case for this is that you might have a number of pages in a wiki site that need to be modified. One way to do this is to export them to an XML file, then manipulate the XML file, and then import the XML file back. Of course, you should be sure that users cannot modify these files during the span between export and re-import. For sites with moderate usage, this approach might be appropriate.

Schema

As shown in this abbreviated example of an XML import file below, the schemaLocation of the XML file is at https://www.mediawiki.org/xml/export-0.3.xsd:

<mediawiki xmlns="https://www.mediawiki.org/xml/export-0.3/" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="https://www.mediawiki.org/xml/export-0.3/ https://www.mediawiki.org/xml/export-0.3.xsd" 
version="0.3" 
xml:lang="en">
  <siteinfo>...</siteinfo>
  <page>...</page>
  <page>...</page>
  <page>...</page>
</mediawiki>

First, download the MediaWiki schema at https://www.mediawiki.org/xml/export-0.3.xsd. Place the schema file in a .NET project folder, and consider renaming the file to something more intuitive such as MediaWikiExport.xsd. Using Visual Studio.NET's xsd.exe tool, you can generate a .NET class file based on this schema using this VS.NET command line prompt:

xsd c:/inetpub/wwwroot/MyProject/MediaWikiExport.xsd /c

This command will create a class file named MediaWikiExport.cs.

Class Diagram

The auto-generated Class file will look like this:

File:MediaWikiCSharpClassDiagram.png
Auto-generated VS.NET C# class file based on the MediaWiki import schema


Schema Diagram

The schema will look like this:

File:MediaWikiImportSchema.png
MediaWiki import file schema


.NET Project

After you add your new auto-generated class file, add the file into your .NET project, such as a console application project.

In this code sample, you will see examples of how to work with the XML file in an object-oriented way instead of parsing the raw XML. Note that this code sample below was used for the 1.13.2 version of MediaWiki.

using System;
using System.Collections.Generic;
using System.Text;
using System.Xml;
using System.Xml.Serialization;

namespace WikiFileManipulation
{
    class Program
    {
        static void Main(string[] args)
        {
            // name of the exported wiki file
            string file = "ExportedWikiPages.xml";

            // instantiate MediaWikiType object
            MediaWikiType mw = new MediaWikiType();

            // Open XML file containing exported wiki pages
            System.Xml.XmlDataDocument xml = new System.Xml.XmlDataDocument();
            xml.Load(file);

            // Deserialize the XML file into the MediaWikiType object
            XmlSerializer serializer = new XmlSerializer(typeof(MediaWikiType));
            System.Xml.XmlNodeReader oReader = new System.Xml.XmlNodeReader(xml);
            mw = (MediaWikiType)serializer.Deserialize(oReader);

            // Loop through all the Pages in the MediaWikiType object
            foreach (PageType p in mw.page)
            {
                foreach (object o in p.Items)
                {
                    // Examine the RevisionType
                    if (o is RevisionType)
                    {
                        // Cast to RevisionType object
                        RevisionType r = o as RevisionType;

                        // if you increment "timestamp" by one minute, then you'll be able to re-import file
                        r.timestamp = r.timestamp.AddMinutes(1);

                        // Update the value of the "text" of the revision
                        // this is the page text
                        TextType text = r.text as TextType;
                        text.Value = text.Value.Replace("oldvalue", "newvalue");
                    }
                }
            }

            //serialize the updated object back to the original file with the corrections/additions
            System.IO.TextWriter writer = new System.IO.StreamWriter(file);
            serializer.Serialize(writer, mw);
            writer.Close();
        }
    }
}

C# 3.0 version

Here's the same example using C# 3.0 features, including type inference and a lambda expression.

using System.IO;
using System.Linq;
using System.Xml;
using System.Xml.Serialization;

namespace WikiFileManipulation {
    class Program {
        static void Main(string[] args) {

            // name of the exported wiki file
            var file = "ExportedWikiPages.xml";
 
            // Open XML file containing exported wiki pages
            var xml =new XmlDataDocument();
            xml.Load(file);
 
            // Deserialize the XML file into the MediaWikiType object
            var serializer = new XmlSerializer(typeof(MediaWikiType));
            var nodeReader = new XmlNodeReader(xml);
            var mw = (MediaWikiType)serializer.Deserialize(nodeReader);
            
            // Loop through all the RevisionType Items from each Page
            foreach (var r in mw.page.SelectMany(p=>p.Items.OfType<RevisionType>())) { 
                // increment the "timestamp" in order to re-import file
                r.timestamp = r.timestamp.AddMinutes(1);
 
                // Update each revision's text
                r.text.Value = r.text.Value.Replace("oldvalue", "newvalue");
            }
 
            // serialize the updates back to the same file
            var writer = new StreamWriter(file);
            serializer.Serialize(writer, mw);
            writer.Close();
        }
    }
}