User:Quaid/Post-processing wiki2xml results

This page is a random set of notes about what needs to be changed, hopefully with a script, after converting wiki Beats content to XML using:

mw-render -c http://fedoraproject.org/w/ -w docbook \ Some_wiki_file_name -o Some_wiki_file_name.xml

(The '\' is a line broken to appear on the screen; remove and make the command all one line.)


 * The content renders each page as a stand-alone book. (This is different from previous Moin Moin behavior, which made every page a chapter.)  There is content that needs removing of changing to be a chapter.
 * Change the !DOCTYPE to 'chapter'
 * Remove the ?xml-stylesheet call entirely, or the <? remnant
 * Change the actual document from &lt;book&gt; to &lt;chapter&;
 * Using XSLT?
 * Hacky way is to chop the &lt;book&gt;&lt;/book&gt;, convert the &lt;article...&gt;&lt;/article&gt; to &lt;chapter&gt;&lt;/chapter&gt;; remove the &lt;articleinfo&gt;...&lt;/articleinfo&gt; block entirely
 * Look for random empty containers, such as 'para' and 'literallayout' that likely came from extraneous empty lines
 * Many list elements and titles have a leading or following space inherited from something in the wiki
 * Need to figure out what that is and change the mw-render or our markup practices
 * Give each &lt;section&gt; an ID value equal to the contents of the &lt;title&gt;...&lt;/title&gt; with '_' instead of spaces, starting with 'sn-'
 * for relnotes, all sections now have 'section id="sn-"'
 * Turn the admonition output into the equivalent DocBook admonition. Note that we are using only three admonitions, so a specific mapping needs to be made.
 * Run the page through something similar to xmlformat or ... xmllint?
 * Search through the file for each of the markup output types covered in [#Wiki_markup_output_to_XML,_mapped_to_DocBook_XML Wiki markup output to XML, mapped to DocBook XML] ; that is, do the following:
 * Search for each instance of 'emphasis' and replace it with the proper DocBook contextual markup
 * Search for each instance of 'code' and 'programlisting' and replace it with the proper DocBook contextual markup
 * Search and replace empty literallayout containers with proper markup
 * Convert inlinemediaobject to proper admonition
 * Make 'ulink' entries recursively single -- &lt;ulink url="" /&gt;

&lt;section&gt;