SummerOfCode/2007/AmitUttamchandani

= Working, Elegant DocBook to PDF Solution =


 * Student: AmitUttamchandani
 * Mentor: Not Yet Announced

Abstract
A utility to convert Doc Book XML source. The solution is based on a simple three pronged approach and will successfully complete the requirements for the project.

First, Python will be used as the utility that takes a valid Doc Book XML files. The solution will use the standard python XML parser expat, which is a fairly fast XML parser, and xmlinit to validate the XML source. Third, the reportlab python toolkit will be used to generate the output in PDF.

The approach described above is simple and straightforward. It takes into account a rapid development time frame as well as extensibility of the solution. Another approach would be to use an XSLT and a preprocessor on a Doc Book XML source and generate a PDF output. Future iterations could take in list of Doc Book XML files into PDF. The approach is a simple three-pronged solution that will focus on simplicity and extensibility.

Project
The solution involves creating a command line utility to accomplish the task described above. In its simplest form, the utility takes a Doc Book XML source. After studying various implementations, the standard Python XML parser expat is the best choice. Advantages of Expat include its speed in parsing, simple python bindings, and its implementation as a standard python module. Expat, however, does not validate XML files. To validate the XML source, xmlinit will be used.

Third, the reportlab open-source toolkit will be used to output the parsed XML data structure into a PDF file. The reportlab toolkit allows for easy output of python data structures into a PDF file. Thus, once a Doc Book XML source. (Until 3rd week of July). 1. Implement reportlab toolkit into docbook2PDF and successfully output parsed XML object into PDF. (2nd week of August) 1. Thoroughly test the implementation and make sure it meets the requirements and specifications. Write up documentation on usage of the docbook2PDF utility. (Complete by end of August)

Future Road map
1. Utility can be extended to batch process Doc Book XML sources it finds into PDF files. 1. A GUI can be added using PyGTK to further extend the functionality of the utility.

Biography
My name is Amit Uttamchandani and I will be completing my Bachelor's degree in Computer Engineering this Summer at California State University in Northridge. Before my current internship, I had been working for the Information Systems department at the university. During this time, our group was given the task to perform an inventory of all the computers and peripherals such as printers and scanners in the Engineering department. The current tool used at that time was an Excel sheet. I found this to be quite disturbing. The data that we were collecting would be put to much better use if it were stored in a database. The entire engineering department could benefit from this data. Thus, I suggested to implement a web-based solution involving a PHP front end to a MySQL database back end.

Now, everyone could input the data virtually from anywhere in a simple and easy to use web front end. Also, predefined queries are available to output the data into a PDF file, complete with charts and graphs. The hidden gem comes with Python and reportlab. As soon as the query is made, a python script was called to retrieve the data from a MySQL database and format it using reportlab and provide a link to the outputted PDF file. This whole process worked seamlessly and allowed our department to analyze how many computers where still using Windows NT or how many computers had less that 256MB of RAM, etc.

The above project took 3 months during the summer to complete. The python and reportlab toolkit integration was truly a beauty that shined and impressed. I have been working with reportlab and python ever since to generate on-demand PDF files and reports from databases.

I have also worked with Python and XML. I successfully created a Python script 'prop' to parse and propagate XML test case data from one project to another. The implementation used Python and expat to accomplish the task. Implementing this solution took around 3 weeks and the result was a stable utility. By using standard python libraries, I was able to develop using an OpenBSD system and still use the script in a Windows machine. That is beauty of Python.

I have been involved with open source software ever since my exposure to Mac OS X. From that point on, I strived to use open source software wherever possible. After sometime I felt the need to return the favor the community and I believe this is an opportunity for me to give something back and be part of the open source ecosystem.

Links

 * http://www.csun.edu/~atu13439/resume.pdf