About PDF conversion
#!html <div style="float: right">
File:DocsProject Tools PDFconversion WikiElements/Fedora_Wiki_Headers_0005_Fedora_Documentation.jpg
The DocsProject would like to be able to create PDF files of its docs. At the time of this writing, we are unable to do so directly from DocBook source, basically because of limitations of some FOSS tools.
Our requirements are: write documentation on the wiki, maintain it there and sync it regularly in Docbook in CVS, get translations through po-files and have PDFs.
We are looking for a fully FOSS toolchain. Fedora is about FOSS software, nothing less.
Probably the most easy way to create PDFs is by using
htmldoc, which is already packaged in Fedora in the extras repository.
Here is an example to create a PDF of the Installation Guide:
make html-nochunks htmldoc -t pdf14 -f "ig.pdf" --book fedora-install-guide-en_US.html
htmldoc has some limitations and the quality of the PDF is not very good. However, seeing other PDFs created with this tool (like their own manual for example) I believe that by fixing our HTML (and/or CSS) we can get much better results.
It also supports some HTML tags that can control the output (see the html reference ).
htmldocand Unicode are not very good friends, so you might want to strip out any strange characters before the conversion:
iconv -c -f utf-8 -t iso-8859-1 fedora-install-guide-en_US.html > IG-iso.html.
- See the attachments for a
.bookfile and a sample PDF file of the IG. The
.bookfile that can be run with the
- Study the effect on feeding multiple chapter-like HTML pages to
htmldocinstead of the no-chunks file. This is the recommended way from the creators.
- Create custom
makerules that are better for this process (eg. do NOT create a table-of-contents or use CSS)
- Create some regexxes that fix up our HTML (for example, to include a
before every chapter
- Integrate it into our toolchain
- Study how htmldoc behaves with CSS (does it?)
- Combine many
nochunksfiles for a big guide?
WikiPublisher is an extension to PmWiki. It supports collaborative creation of print documents which draw their content from wiki web pages. In fact any web page able to be reformulated as wikibook XML can be composed into a print-friendly PDF document.
It's basically a plugin for 'pmwiki' that outputs a 'Wiki
Book XML' and a server that converts this syntax to PDF. This allows to ouput a PDF from any wiki page *and* create whole books based on inter-wiki links. You can try this at their page by clicking Typeset book .
The server is written in Perl and uses Latex to do the job. To use it, we'd need to write a plugin for Moin
Moin to export to the Wikibook XML and run the server.
One of the tracks is to use Apache FOP for the PDF creation, which currently requires non-FOSS tools to render PNG images. An alternative one is to use
xmlroff, but it has its limitations too.
JPackage provides packages of some useful tools. Instructions on how to setup the yum repository can be found at  .
sudo yum install fop will install Apache FOP, which can handle most of the issues. SVG support is done via
Batik, PNG support currently requires JIMI or JAI, both of them are not FOSS. Then, inside
example-tutorial you could run something like the following:
make fo-en_US fop en_US/example-tutorial.fo example-tutorial.pdf
cups-pdf is a virtual printer application which is available from the extras repository for Fedora Core 6. From a terminal session, run
sudo yum install cups-pdf to install it. Once installed, simply select "CUPS/cups-pdf" as the printer in the print dialog in a web browser. The resulting PDF file is saved to the desktop.
Advantages of cups-pdf are ease-of-use, well-formatted output and much control over the output through CSS rules (even page breaks). The disadvantage is that it might not easily be used as part of an automated toolchain.
pdftk is a command-line utility available from the Fedora Core 6 extras repository. From a terminal session, run
sudo yum install pdftk to install it. As described at the project site, http://www.accesspdf.com/pdftk/,
"Pdftk is a command-line tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to:
- Merge PDF Documents
- Split PDF Pages into a New Document
- Decrypt Input as Necessary (Password Required)
- Encrypt Output as Desired
- Fill PDF Forms with FDF Data and/or Flatten Forms
- Apply a Background Watermark
- Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
- Update PDF Metadata
- Attach Files to PDF Pages or the PDF Document
- Unpack PDF Attachments
- Burst a PDF Document into Single Pages
- Uncompress and Re-Compress Page Streams
- Repair Corrupted PDF (Where Possible)
Pdftk allows you to manipulate PDF easily and freely. It does not require Acrobat, and it runs on Windows, Linux, Mac OS X, FreeBSD and Solaris.
Pdftk is free software (GPL)."
Here are some references (mostly to mailing list threads) concerning the creation of PDF from our DocBook files.
- Basic Help for Using XML, XSLT, and XSL-FO", from the apache website:  .