User:Quaid/Post-processing wiki2xml results

From FedoraProject

< User:Quaid(Difference between revisions)
Jump to: navigation, search
(line breaking the command that is going off the screen)
m (User:Kwade/Post-processing wiki2xml results moved to User:Quaid/Post-processing wiki2xml results: Consolidating user accounts to just 'quaid' means it's time to be only quaid on the wiki.)

Revision as of 22:58, 17 February 2009

This page is a random set of notes about what needs to be changed, hopefully with a script, after converting wiki Beats content to XML using:

mw-render -c -w docbook \
Some_wiki_file_name -o Some_wiki_file_name.xml

(The '\' is a line broken to appear on the screen; remove and make the command all one line.)

  • The content renders each page as a stand-alone book. (This is different from previous Moin Moin behavior, which made every page a chapter.) There is content that needs removing of changing to be a chapter.
    • Change the !DOCTYPE to 'chapter'
    • Remove the ?xml-stylesheet call entirely, or the <? remnant
    • Change the actual document from <book> to <chapter&;
      • Using XSLT?
      • Hacky way is to chop the <book></book>, convert the <article...></article> to <chapter></chapter>; remove the <articleinfo>...</articleinfo> block entirely
    • Look for random empty containers, such as 'para' and 'literallayout' that likely came from extraneous empty lines
    • Many list elements and titles have a leading or following space inherited from something in the wiki
      • Need to figure out what that is and change the mw-render or our markup practices
    • Give each <section> an ID value equal to the contents of the <title>...</title> with '_' instead of spaces, starting with 'sn-'
      • for relnotes, all sections now have 'section id="sn-"'
    • Turn the admonition output into the equivalent DocBook admonition. Note that we are using only three admonitions, so a specific mapping needs to be made.[1]
    • Run the page through something similar to xmlformat or ... xmllint?
  • Search through the file for each of the markup output types covered in [#Wiki_markup_output_to_XML,_mapped_to_DocBook_XML Wiki markup output to XML, mapped to DocBook XML] ; that is, do the following:
    • Search for each instance of 'emphasis' and replace it with the proper DocBook contextual markup
    • Search for each instance of 'code' and 'programlisting' and replace it with the proper DocBook contextual markup
    • Search and replace empty literallayout containers with proper markup
    • Convert inlinemediaobject to proper admonition
    • Make 'ulink' entries recursively single -- <ulink url="" />



  1.         <para>
                      <imagedata contentwidth="35px" fileref="\
    4/Idea.png" scalefit="1" width="35px" />
                      <para />
                  <emphasis> Visit <ulink url="">http://\</ulink> to view the latest release notes for Fedora, espec\
    ially if you are upgrading.</emphasis><literallayout>
    </literallayout>If you are migrating from a release of Fedora older than the immediately previou\
    s one, you should refer to older Release Notes for additional information. You can find older Re\
    lease Notes at <ulink url="">http://docs.fedoraproj\</ulink>