Archive:DocsProject/Tools/PDFconversion

= About PDF conversion =

/Fedora_Wiki_Headers_0005_Fedora_Documentation.jpg
 * 1) !html
 * 1) !html

The DocsProject would like to be able to create PDF files of its docs. At the time of this writing, we are unable to do so directly from DocBook source, basically because of limitations of some FOSS tools.

Our requirements are: write documentation on the wiki, maintain it there and sync it regularly in Docbook in CVS, get translations through po-files and have PDFs.

We are looking for a fully FOSS toolchain. Fedora is about FOSS software, nothing less.

[[TableOfContents(2)]

htmldoc
Probably the most easy way to create PDFs is by using, which is already packaged in Fedora in the extras repository.

Here is an example to create a PDF of the Installation Guide:

make html-nochunks htmldoc -t pdf14 -f "ig.pdf" --book fedora-install-guide-en_US.html

has some limitations and the quality of the PDF is not very good. However, seeing other PDFs created with this tool (like their own manual for example) I believe that by fixing our HTML (and/or CSS) we can get much better results.

It also supports some HTML tags that can control the output (see the html reference ).

Tips

 * and Unicode are not very good friends, so you might want to strip out any strange characters before the conversion:.
 * See the attachments for a  file and a sample PDF file of the IG. The   file that can be run with the   command-line option.

TODO

 * Study the effect on feeding multiple chapter-like HTML pages to  instead of the no-chunks file. This is the recommended way from the creators.
 * Create custom  rules that are better for this process (eg. do NOT create a table-of-contents or use CSS)
 * Create some regexxes that fix up our HTML (for example, to include a  files for a big guide?

WikiPublisher
WikiPublisher is an extension to PmWiki. It supports collaborative creation of print documents which draw their content from wiki web pages. In fact any web page able to be reformulated as wikibook XML can be composed into a print-friendly PDF document.

It's basically a plugin for 'pmwiki' that outputs a 'Wiki Moin to export to the Wikibook XML and run the server.

Apache FOP
One of the tracks is to use Apache FOP for the PDF creation, which currently requires non-FOSS tools to render PNG images. An alternative one is to use, but it has its limitations too.

JPackage provides packages of some useful tools. Instructions on how to setup the yum repository can be found at.

Running  will install Apache FOP, which can handle most of the issues. SVG support is done via, PNG support currently requires JIMI or JAI, both of them are not FOSS. Then, inside  you could run something like the following:

make fo-en_US fop en_US/example-tutorial.fo example-tutorial.pdf

cups-pdf
cups-pdf is a virtual printer application which is available from the extras repository for Fedora Core 6. From a terminal session, run  to install it. Once installed, simply select "CUPS/cups-pdf" as the printer in the print dialog in a web browser. The resulting PDF file is saved to the desktop.

Advantages of cups-pdf are ease-of-use, well-formatted output and much control over the output through CSS rules (even page breaks). The disadvantage is that it might not easily be used as part of an automated toolchain.

References:

 * Project site at
 * 
 * 
 * 

pdftk
pdftk is a command-line utility available from the Fedora Core 6 extras repository. From a terminal session, run  to install it. As described at the project site, http://www.accesspdf.com/pdftk/,

"Pdftk is a command-line tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to:


 * Merge PDF Documents
 * Split PDF Pages into a New Document
 * Decrypt Input as Necessary (Password Required)
 * Encrypt Output as Desired
 * Fill PDF Forms with FDF Data and/or Flatten Forms
 * Apply a Background Watermark
 * Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
 * Update PDF Metadata
 * Attach Files to PDF Pages or the PDF Document
 * Unpack PDF Attachments
 * Burst a PDF Document into Single Pages
 * Uncompress and Re-Compress Page Streams
 * Repair Corrupted PDF (Where Possible)

Pdftk allows you to manipulate PDF easily and freely. It does not require Acrobat, and it runs on Windows, Linux, Mac OS X, FreeBSD and Solaris.

Pdftk is free software (GPL)."