Archive:DocsProject/Tools/PDFconversion

From FedoraProject

Revision as of 02:17, 22 February 2009 by Jjmcd (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

About PDF conversion

#!html
<div style="float: right">

File:DocsProject Tools PDFconversion WikiElements/Fedora_Wiki_Headers_0005_Fedora_Documentation.jpg

#!html
</div>

The DocsProject would like to be able to create PDF files of its docs. At the time of this writing, we are unable to do so directly from DocBook source, basically because of limitations of some FOSS tools.

Our requirements are: write documentation on the wiki, maintain it there and sync it regularly in Docbook in CVS, get translations through po-files and have PDFs.

We are looking for a fully FOSS toolchain. Fedora is about FOSS software, nothing less.

[[TableOfContents(2)]

htmldoc

Probably the most easy way to create PDFs is by using htmldoc, which is already packaged in Fedora in the extras repository.

Here is an example to create a PDF of the Installation Guide:

make html-nochunks
htmldoc -t pdf14 -f "ig.pdf" --book fedora-install-guide-en_US.html

htmldoc has some limitations and the quality of the PDF is not very good. However, seeing other PDFs created with this tool (like their own manual for example) I believe that by fixing our HTML (and/or CSS) we can get much better results.

It also supports some HTML tags that can control the output (see the html reference ).

Tips

  • htmldoc and Unicode are not very good friends, so you might want to strip out any strange characters before the conversion: iconv -c -f utf-8 -t iso-8859-1 fedora-install-guide-en_US.html > IG-iso.html.
  • See the attachments for a .book file and a sample PDF file of the IG. The .book file that can be run with the --batch command-line option.

TODO

  • Study the effect on feeding multiple chapter-like HTML pages to htmldoc instead of the no-chunks file. This is the recommended way from the creators.
  • Create custom make rules that are better for this process (eg. do NOT create a table-of-contents or use CSS)
  • Create some regexxes that fix up our HTML (for example, to include a before every chapter
  • Integrate it into our toolchain
  • Study how htmldoc behaves with CSS (does it?)
  • Combine many nochunks files for a big guide?

WikiPublisher

WikiPublisher is an extension to PmWiki. It supports collaborative creation of print documents which draw their content from wiki web pages. In fact any web page able to be reformulated as wikibook XML can be composed into a print-friendly PDF document.

It's basically a plugin for 'pmwiki' that outputs a 'WikiBook XML' and a server that converts this syntax to PDF. This allows to ouput a PDF from any wiki page *and* create whole books based on inter-wiki links. You can try this at their page by clicking Typeset book .

The server is written in Perl and uses Latex to do the job. To use it, we'd need to write a plugin for MoinMoin to export to the Wikibook XML and run the server.


Apache FOP

One of the tracks is to use Apache FOP for the PDF creation, which currently requires non-FOSS tools to render PNG images. An alternative one is to use xmlroff, but it has its limitations too.

JPackage provides packages of some useful tools. Instructions on how to setup the yum repository can be found at [1] .

Running sudo yum install fop will install Apache FOP, which can handle most of the issues. SVG support is done via Batik, PNG support currently requires JIMI or JAI, both of them are not FOSS. Then, inside example-tutorial you could run something like the following:

make fo-en_US
fop en_US/example-tutorial.fo example-tutorial.pdf

cups-pdf

cups-pdf is a virtual printer application which is available from the extras repository for Fedora Core 6. From a terminal session, run sudo yum install cups-pdf to install it. Once installed, simply select "CUPS/cups-pdf" as the printer in the print dialog in a web browser. The resulting PDF file is saved to the desktop.

Advantages of cups-pdf are ease-of-use, well-formatted output and much control over the output through CSS rules (even page breaks). The disadvantage is that it might not easily be used as part of an automated toolchain.

References:

pdftk

pdftk is a command-line utility available from the Fedora Core 6 extras repository. From a terminal session, run sudo yum install pdftk to install it. As described at the project site, http://www.accesspdf.com/pdftk/,

"Pdftk is a command-line tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to:

  • Merge PDF Documents
  • Split PDF Pages into a New Document
  • Decrypt Input as Necessary (Password Required)
  • Encrypt Output as Desired
  • Fill PDF Forms with FDF Data and/or Flatten Forms
  • Apply a Background Watermark
  • Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
  • Update PDF Metadata
  • Attach Files to PDF Pages or the PDF Document
  • Unpack PDF Attachments
  • Burst a PDF Document into Single Pages
  • Uncompress and Re-Compress Page Streams
  • Repair Corrupted PDF (Where Possible)

Pdftk allows you to manipulate PDF easily and freely. It does not require Acrobat, and it runs on Windows, Linux, Mac OS X, FreeBSD and Solaris.

Pdftk is free software (GPL)."

References

Here are some references (mostly to mailing list threads) concerning the creation of PDF from our DocBook files.

March 2007::

http://www.redhat.com/archives/fedora-docs-list/2007-March/msg00007.html

September 2006::

http://www.redhat.com/archives/fedora-docs-list/2006-September/msg00063.html

March 2006::

http://www.redhat.com/archives/fedora-docs-list/2006-March/msg00145.html

April 2005::

http://xmlgraphics.apache.org/fop/0.20.5/graphics.html#support-overview
http://www.redhat.com/archives/fedora-docs-list/2005-April/msg00120.html

January 2005::

http://www.redhat.com/archives/rhl-docs-list/2006-January/msg00180.html