User:Quaid/Post-processing wiki2xml results

From FedoraProject

< User:Quaid
Revision as of 22:58, 17 February 2009 by Kwade (Talk | contribs)

Jump to: navigation, search

This page is a random set of notes about what needs to be changed, hopefully with a script, after converting wiki Beats content to XML using:

mw-render -c http://fedoraproject.org/w/ -w docbook \
Some_wiki_file_name -o Some_wiki_file_name.xml

(The '\' is a line broken to appear on the screen; remove and make the command all one line.)

  • The content renders each page as a stand-alone book. (This is different from previous Moin Moin behavior, which made every page a chapter.) There is content that needs removing of changing to be a chapter.
    • Change the !DOCTYPE to 'chapter'
    • Remove the ?xml-stylesheet call entirely, or the <? remnant
    • Change the actual document from <book> to <chapter&;
      • Using XSLT?
      • Hacky way is to chop the <book></book>, convert the <article...></article> to <chapter></chapter>; remove the <articleinfo>...</articleinfo> block entirely
    • Look for random empty containers, such as 'para' and 'literallayout' that likely came from extraneous empty lines
    • Many list elements and titles have a leading or following space inherited from something in the wiki
      • Need to figure out what that is and change the mw-render or our markup practices
    • Give each <section> an ID value equal to the contents of the <title>...</title> with '_' instead of spaces, starting with 'sn-'
      • for relnotes, all sections now have 'section id="sn-"'
    • Turn the admonition output into the equivalent DocBook admonition. Note that we are using only three admonitions, so a specific mapping needs to be made.[1]
    • Run the page through something similar to xmlformat or ... xmllint?
  • Search through the file for each of the markup output types covered in [#Wiki_markup_output_to_XML,_mapped_to_DocBook_XML Wiki markup output to XML, mapped to DocBook XML] ; that is, do the following:
    • Search for each instance of 'emphasis' and replace it with the proper DocBook contextual markup
    • Search for each instance of 'code' and 'programlisting' and replace it with the proper DocBook contextual markup
    • Search and replace empty literallayout containers with proper markup
    • Convert inlinemediaobject to proper admonition
    • Make 'ulink' entries recursively single -- <ulink url="" />


<section>

Notes

  1.         <para>
              <para>
                <para>
                  <inlinemediaobject>
                    <imageobject>
                      <imagedata contentwidth="35px" fileref="http://fedoraproject.org/w/uploads/a/a\
    4/Idea.png" scalefit="1" width="35px" />
                    </imageobject><caption>
                      <para />
                    </caption>
                  </inlinemediaobject>
                </para><para>
                  <emphasis> Visit <ulink url="http://docs.fedoraproject.org/release-notes/">http://\
    docs.fedoraproject.org/release-notes/</ulink> to view the latest release notes for Fedora, espec\
    ially if you are upgrading.</emphasis><literallayout>
    </literallayout>If you are migrating from a release of Fedora older than the immediately previou\
    s one, you should refer to older Release Notes for additional information. You can find older Re\
    lease Notes at <ulink url="http://docs.fedoraproject.org/release-notes/.">http://docs.fedoraproj\
    ect.org/release-notes/.</ulink>
                </para>
              </para>
            </para>