From MediaWiki to XWiki part III
Published by Patrick on 22. Jan 2008 15:37:01
------------------------------------------------------------------------

As mentioned in the "previous article"
, one of the loose ends to
clear up is importing into separate spaces. One of the reasons we moved to XWiki
was for the multiple namespace support; it would have been a shame to import all
of our existing data into one, gigantic namespace.

[Importing separate spaces]

What we've got after running wikifetch.pl is one directory with all MediaWiki
pages in it and all links pointing to the space MySpacePlaceholder.

First, we needed to decide which page goes where. There is no easy way to do
this, so we just sorted them out manually, moving the page files to different
directories. For example, all development-related stuff went into a directory
named development, whereas all sales-related files went to the sales directory
and so on.

We could now go ahead and import the directories one by one if it weren't for
the backlinks. Say you've got a link from Main.WebHome to
Development.Development. How would the import script know that Development is
now located in the Development space?

We modified import.groovy to resolve the back links using a copy of the pages to
be imported (the originals will get deleted as mentioned in part II).

First we'll need a hash map of the spaces:

[...]
def fileSpaces = [:];

new File( "C:/temp/copy_wiki" ).eachFileRecurse() {
  f->
  parentDir = f.parentFile.name;
  fileSpaces[ f.name ] = parentDir + "." + f.name;
}
[...]

What we've got now is a hash map where we can look up a "Space.Page" for a given
"Page".

This we'll need in the main import loop:

[...]
		  fileAsText = f.getText();
		  
		  fileSpaces.each { 
				pageName, newName ->
		  		    fileAsText = fileAsText.replaceAll( "MySpacePlaceholder." + pageName,
newName );
		  };
[...]

Which replaces all "MySpacePlaceholder.Page" with "Space.Page" where "Space" is
the correct space-name. This could be optimized as it does a search replace n^2
times but for a one-time import this shouldn't matter much -- it wasn't much of
a problem with our few hundred pages.

Now we're free to start the import, one directory at the time. The new version
of import.groovy can be downloaded "here"
.

[Problems]

There are still some unresolved problems regarding the export/import chain. One
is that some pages ended up with garbled pre sections (the handling in XWiki is
somewhat "unusual" by having no closing tags). Another one is that we haven't
set the page's title automatically. This could have been the first h1 or h2 on
the page. But apart from these problems and some special characters, it "worked"
for us and we've since happily moved on with our XWiki.