User:Quaid/Post-processing wiki2xml results
This page is a random set of notes about what needs to be changed, hopefully with a script, after converting wiki Beats content to XML using:
mw-render -c http://fedoraproject.org/w/ -w docbook \ Some_wiki_file_name -o Some_wiki_file_name.xml
(The '\' is a line broken to appear on the screen; remove and make the command all one line.)
- The content renders each page as a stand-alone book. (This is different from previous Moin Moin behavior, which made every page a chapter.) There is content that needs removing of changing to be a chapter.
- Change the !DOCTYPE to 'chapter'
- Remove the ?xml-stylesheet call entirely, or the <? remnant
- Change the actual document from <book> to <chapter&;
- Using XSLT?
- Hacky way is to chop the <book></book>, convert the <article...></article> to <chapter></chapter>; remove the <articleinfo>...</articleinfo> block entirely
- Look for random empty containers, such as 'para' and 'literallayout' that likely came from extraneous empty lines
- Many list elements and titles have a leading or following space inherited from something in the wiki
- Need to figure out what that is and change the mw-render or our markup practices
- Give each <section> an ID value equal to the contents of the <title>...</title> with '_' instead of spaces, starting with 'sn-'
- for relnotes, all sections now have 'section id="sn-"'
- Turn the admonition output into the equivalent DocBook admonition. Note that we are using only three admonitions, so a specific mapping needs to be made.
- Run the page through something similar to xmlformat or ... xmllint?
- Search through the file for each of the markup output types covered in [#Wiki_markup_output_to_XML,_mapped_to_DocBook_XML Wiki markup output to XML, mapped to DocBook XML] ; that is, do the following:
- Search for each instance of 'emphasis' and replace it with the proper DocBook contextual markup
- Search for each instance of 'code' and 'programlisting' and replace it with the proper DocBook contextual markup
- Search and replace empty literallayout containers with proper markup
- Convert inlinemediaobject to proper admonition
- Make 'ulink' entries recursively single -- <ulink url="" />
<para> <para> <para> <inlinemediaobject> <imageobject> <imagedata contentwidth="35px" fileref="http://fedoraproject.org/w/uploads/a/a\ 4/Idea.png" scalefit="1" width="35px" /> </imageobject><caption> <para /> </caption> </inlinemediaobject> </para><para> <emphasis> Visit <ulink url="http://docs.fedoraproject.org/release-notes/">http://\ docs.fedoraproject.org/release-notes/</ulink> to view the latest release notes for Fedora, espec\ ially if you are upgrading.</emphasis><literallayout> </literallayout>If you are migrating from a release of Fedora older than the immediately previou\ s one, you should refer to older Release Notes for additional information. You can find older Re\ lease Notes at <ulink url="http://docs.fedoraproject.org/release-notes/.">http://docs.fedoraproj\ ect.org/release-notes/.</ulink> </para> </para> </para>