Archive:MoinDocBookProject/ProgressReports

I'll try to document what I've done, how and why. Sort of like a development diary or a blog. Hopefully I'll have the energy to keep this up to date. The newest entries are at the bottom of the page.

= 26th of May to 5th of June =

Unit tests
I've been trying to write some unit tests. I have a three tests, and an idea of how to create them in a way that is more efficient than my current way of comparing strings, or walking the tree and checking that the elements come in the correct order. One problem that I was having, is that I'd like to also test the parser and not just the formatter, since if the parser changes and generates different requests to the formatter, my tests won't fail, but real world pages will. I got a suggestion of how to fix this, but I haven't done it yet.

Another thing I'd like to do is to run the docbook through a validating xml parser, to check that it is valid (like I do manually with xmllint).

I'll revisit these issues later

Macros
I've decided to use a white list and a blacklist for dealing with macros. If a macro is on the whitelist, it will be handled in some way. If it's on a black list it will not be handled, and a warning in form of a comment is written in to the docbook file. If the macro is on neither list, an attempt to call it is made and a warning is written to the docbook. If the macro fails (tries to do stuff past the parser), it will just generate another warning, saying that this macro should be blacklisted.

I've fixed the Mail Smilies macro used on the SyntaxReference-page is the first blacklisted macro, as it tries to write stuff without using the formatter.

If you have a macro you'd like me to support, please add it to the MoinDocBookProject/Bugs -page.

Doctype and encoding
I've covered this problem earlier, and suffice to say that it's now fixed, and I have a unit tests to make sure it doesn't pop up again.

= 3rd of June = As the deadline for step 1 is approaching quickly, it seems I'm constantly finding new bugs and issues with the current formatter.

I'm using  to make sure the output is valid. The original code to generate tables was complex and even after hacking on it, I couldn't get it to create valid docbook. I've been working on this for a couple of days now, and I ended up removing the old code completely and rewriting it from scratch, switching to a different docbook table structure, and making a separate class of it. I'm pretty happy with the outcome, and the formatter now creates valid tables for all of my testcases. It supports cells spanning multiple columns and/or multiple rows. Alignment of the contents of the cells works both horizontally and vertically, and it supports both types of table-layout arguments present in moinmoin (<)> and ). What it doesn't support is widths and colors, and any other argument you can pass through . I feel that the table code is now in pretty good shape, and getting it this far has taken a lot more work than I anticipated.

On the way I also found an issue with yelp (the gnome help viewer, which I use to view docbooks), it doesn't respect the default value of cell borders (which is "1" for both row-borders and column-borders), and it doesn't care if I specify them in tbody (which it should). I ended specifying them for each table cell separately, which I shouldn't need to.

I also found a new bug where I first create a bullet list, and a listitem, just the way I should, but then the following entries are glosslistitems. I'll need to research it further to figure out when exactly this happens.

What I plan to do before monday is fix the bug mentioned above and figure out how to generate complete urls for links. I'm guessing the current link generation is a bit buggy, so this will probably take what's left of this weekend. I was hoping on getting more unit tests written, but at least I have a few (3) already.

= 4th of June = Yesterday, as I was finishing up, I noticed that a code area with syntax highlighting got mapped as a instead of a. I made the fix, but ended up changing the code in small ways. As I tested my change I noticed that there was a bug. The formatter would eat up any text not being highlighted inculdin spaces, commas parenthesis etc. What slowed my bug hunt considerably was my assumption that this bug had been caused by the changes, but in reality this bug was also present in the old code. This morning I finally found the cause and fixed it, so now syntax highlighting of program listings should work, as well as any other kind of listing inside a { { { Code here } } } section.

I added support for having listitems in a list without bullets/numbering. In wiki syntax such listitems use are preceeded by a dot instead of the regular asterisk or number.

That bug with glossaries and lists turned out to be a swamp of problems. The way the glossary functions are called at more or less random made it impossible to generate semantically sensible glossaries. The old code worked fine, and created valid docbook, but on closer inspection, half of either the term or the definition tags were empty. I spent quite a few hours sorting it out, trying different approaches, until I finally got it more or less right. The issue with lists was a minor one: Currently when we are in a glossary list, and a listitem should be added (easy to trigger accidentally with some whitespace in the beginning of the line), the glossary list isn't closed and a itemized list isn't started. I decided to redirect that call to listitem to glossarydef, which should be fine, and hopefully the right thing for most cases.

I still have the last currently standing bug against the formatter left, the one about generating links with absolute paths. But now I need a break.

= 5th of June = The week starts and I won't be able to get as much done now, as I did this past weekend. I'm a bit nervous as to how much energy I will have for this project after first coding 8h at work. Oh well, I guess I'm about to find out.

I've been thinking about the issue with links. Converting all relative links to complete links isn't necessary what we want. I'd like the relative links to be converted in to inter-docbook links, so that is an author wants to point to an actual url, he'll need to write the whole url, while if he wants to point to another wikipage which will also get converted to docbook, he can use the relative urls. Or maybe this should be a configuration option? Another way would be to use the interwiki linking style, and have that trigger the creation of an inter-docbook link when rendered through moin, while generating a link to the wiki, when viewed online. I'll need to think about linking a bit more.

Configuration options seems to be a topic I'll also need to look in to, since I want users to be able to whitelist macros without touching the source, just the configuration file.

Today is/was the deadline for Step 1. I think I accomplished it well enough. I have a few unit tests written, got the table support up to par, and fixed glossary generation. Those three were really difficult ones. I also fixed a lot of other misc. bugs, and the issue with links is the only remaining issue (with real world usecases) I know of.

I think the next thing I do is create a demo wiki that can be accessed by everyone, and then start hacking on the multipage-docbook creation.

So I discussed the plan, and agreed to limit the scope of the project, so that the first four steps are going to be considered the goal, while the remaining two have lower priorities. He also emphasized that I should work in close relations to the moinmoin-devs and that getting everything accepted upstream was to be a priority. From hanging on moin-dev that the moin SoC:ers have a wicked setup, where their repo-commits get integrated in to their own test wiki for immediate public testing. I asked if it was ok with fedora, if I asked for one for this prjoect from the moin guys and he said to go for it. The moin devs were very accommodating and thanks to Thomas Waldmann, I now have a branch in the moin mercurial "cvs", and a real live testwiki thats open to the public! Go check it out: http://docbook.wikiwikiweb.de = 6th of June = Worked on unittests. My goal was to get the following woring rawwikitext->docbooktext->validating parser. This would incorporate the parser in to the test, which is a good thing, since a change in how the parser works would alter the formatters output and possibly make it create invalid docbooks.

However getting this to work was a bit more problematic than I had originally thought. I got the rawwikitext->docbook working by looking at code in other places and trial and error.

Then I struggled with being able to feed the text in to the xmlparser, since I had an output in a string and it wanted a file. Wrapping in StringIO or the other normal ways didn't help. The solution would have been to override the input_source_generator (or something like that) with returning the StringIO, but that would break fetching of the dtd. In the end I resorted to saving it to a Temporary File and was getting eager to finally see everything work automagically. Unfortunatelly the last part, validating the generated docbook, turned out to be impossible. The xmlval-parser, which is the validating parser, would choke while reading in the dtd.

Now my code just calls the command line program, feeding the string to it, and seeing if it generates errers. Simple and works like a charm, as long as you have xmllint installed. But it's not all python anymore.

Now with that part done I generated some wikisyntaxed files to feed the tester with. These files are placed in MoinMoin/_tests/testdata and the tester automatically reads in all files from that directory which end in, making creating testcases simple.

I think I'm quite happy with the amount of tests I have now, and it'll take considerably less time for me to add new ones. There's one thing I'd still like to test: say a some element is not allowed to be empty or contain a text node, but can have a para-element, I'd like to be able to test to see if that para-elemnt is empty or not. It doesn't make it invalid if it's empty, but for semantically a glossentry without both a term and at least one definition is worthless. This is just finetuning though, and I won't pour resources in to it.

= 7th of June = I've been spammin #fedora-mentors with ramblings about how to create a docbook out of multiple wikipages.

The issue is that the wikipages get transformed in to docbook-articles, which is a lax container and this way any page on the wiki can be turned in to valid docbook. I was thinking about making each page in to a book instead, but then each wikipage should start with a short introduction and then have sections. This would make a lot of pages not validate, and the article format has worked fine so far.

But what would trigger the creation of a docbook-book? I thought about adding handling for cases like "if there's this Docbook Chapter-macro on the page -> transform this article in to a book". But that's not nice at all. Instead I decided on creating a new formatter that would have a corresponding action called Build Book the docbook-book-formatter would be used, instead. Originally I thought about having this formatter understand nothing else except the special Docbook Chapter-macro and ignoring everything else, but for maximum flexibility I have decided it will work as follows:

The builddockbook formatter is to subclass the regular docbook-formatter. It will override the parts needed to create a docbook-book instead of an article (startDocument and startContent). It will also implement handling for the Docbook Chapter(pagename) macro, that will output just the name of the page to be included by default on other formatters, but with the buil-book-formatter it would instead include the pagename-page in to this page, and wrap it in a chapter element.

This way any page can also get rendered as a docbook book instead of an article, both can use the regular Include macro to include pieces in to the existing page, but chapters are created only with the Docbook Chapter-macro.

Now I'll just need to implement it :)

In the end I didn't get much coding done today. I did however manage to break the testwiki in some way. I think it was the result of resyncing with moin trunk, but I'm not sure. Hopefully it will be sorted out soon, as the testwiki was really nice to have.

Another interesting development today was that while talking on irc I got pointed to an xslt for docbook->moin conversions. The author (Jeff Schering) had placed it under a CC-SA licencse, so I needed to contact him in order to ask him if he was willing to relicense it to gpl. Not only was he willing to relicense, he said he was planning on working and improving it in july! This means I'll take the xslt route to do the conversion for step 3 and probably also use it for splitting (if I can figure out how). This doesn't mean that step 3 would be done, I still need to finish the xslt to work with the xml the formatter generates, integrate it in to moin, write testcases etc. And I haven't even properly started coding step 2.

= 15th of June = So, it's been a week since my previous report. I've been quite busy with real life stuff, and compared to the sprint I made at the beginning, progress seems to have slowed down. I'll get more done next weekend.

I've got working chapter-import, but I haven't implemented the Build Doc Book". The action would present the user with a simple upload form. Then the action would unzip and process the contents. First it would create a mainpage for the docbook book, where it would attach all the image and other resources. Then it would extract what chapters the book contains and list them on the page (wrapped in the Include Thing to do.

Other thoughts
I've been fishing for ideas and usecases since I started this project, and a few interesting things have popped up that I want to mention.
 * 1) Doing something automatically when a page has been changed.
 * 2) Support for task and procedure
 * 3) Doing custom postprocessing after the formatter has finished

Ok, so taking these in order:

Item nr 1 is something that I've been requested a lot. Currently moinmoin makes it possible to subscribe to pages, but the people requesting this want to do something automatically on the serverside of things. I've looked in to the code, and it's quite clear where this hook would go. I'd like to do it by checking if a certain script/executalbe exist, and if it does it would get launched in to a separate process, and the information pagename, comment, and trivial, would get passed as command line parameters. Then it would be the responsability of who ever writes the actual script to do what they please with the information. Seems like a simple and useful addition to me.

Nr. 2 is more difficult. A task consists of a description, and then some listitems with sublists. The fact that it is a procedure etc is not simple to embed in to the wiki syntax, as wikisyntax has no support for conveying semantic information. The only solution that I can think of is writing a special formatter and include macro. The task would be placed on a separate page, and when included with something like [[Include in to &rarr;) should work. They get converted in to unicode characters. I also added support for -styled escapes.
 * New Insert macro
 * faster than the Include macro for the Docbook formatter
 * code is a lot cleaner
 * support from=, to=, items= and skip= options (Include macro's options) where from and to can be regexs.
 * support for into= and title= options, which means that you can insert a page (or part of it) into a element, and specify any title eg [[Insert(NameOfSomePage,title="Chapter 3: The beginning", into="chapter")]
 * Added support to the formatter for chapter, part, colophon and glossary
 * Images will now always have a complete url (nice side-effect is that Yelp now shows images, since it loads them from the webserver).
 * BuildBook-action works (but has no gui yet):
 * It takes a page, renders it as a docbook book inserting all chapters etc in to it
 * It then collects all images the docbook referrs to
 * It writes a file detailing the mapping between the url in the docbook and the filenames as stored
 * Everything is placed in a zip which is then passed on.

TODO:
 * Push the changes to the repository as nice clean changesets
 * Write a complex wikibook to demonstrate what the formatter can do
 * Write an e-mail to fedora docs when I have a nice testcase.
 * I should check that I'm not bypassing any acl-security checks (I probably am)

= 6th of July = Well, I pushed the changes, got some feedback (mainly about not always conforming to PEP8). I wrote a complex wikibook, and it works nice and the output is valid. I even wrote the e-mail to fedora docs, though based on the activity on the testwiki, very few are interested in trying it out with their own testcases. I acheived feature parity with my Insert macro, and it can be called very similarly to how the old Include macro could be called.

I have now moved on from doing moin->docbook, though there are some minor adjustments I will still have to make, since admonitions are going to be a standard part of moin-formatters. I also hear people are going to rewrite the ReStructuredText-parser, which will mean that it will be possible to go directly from rst->docbook :)

The last few days I've been hacking away on xsl-transformations. After some braindead whitespace problems, its in pretty good shape. The code might need some cleaning, but I split it up in to multiple files, which helped a lot. I've added support for a limitless number of sublists in lists, :s can be nested to any depth, glosslist are supported and table attributes like valign, align, morerows, and enties that span multiple columns (though this requires the colspec to be a certain way) will produce an equal wikitable.

= 20th of July = I've got a set of XSLT files which nicely generate MoinMoin syntax out of a docbook chapter/article. I did run in to one larger problem, a case which didn't fit in to my conversion architecture very well: Docbooks can have lists inside paragraphs. My xsl-transoformations assumed anything inside a paragraph can be put on a single line in the wikisyntax, which isn't the case here. I did solve it, though the xslt that resulted isn't very pretty.

I've been quite busy with real-life stuff like work, family and friends and I was even mildly ill a day or two, but as I have basically managed to complete step 3, I don't forsee trouble with sticking to the timetable set forth in the roadmap. What still remains to be handled even in simple docbook-articles (ie. in step 3) is xrefs, which is a tricky subject. So much so, that I will leave the problem to simmer and move on to step 4 in the hopes that it would be possible to handle the xrefs conversion in python before the document is split up.

So next up is creating a moinmoin action for uploading a docbook, then splitting it up to parts, running each part through the conversion stylesheets and create pages for each converted page and creating indexpages with links to the pages. I'm also toying with the idea of having the moinmoin-action detect if the uploaded file is a zip, and do some intelligent handling of the contents. But first things first: I want to complete step 4, and then I need to look at how to convert xrefs as they are an extremely common case.