HTML to PDF & Printing

Making HTML into PDF properly is normally mission impossible: tables and images get cut in the middle, titles can appear alone at end of page with the text starting on the next page, headers and footers are normally non-existing or have some strange stuff from the browsers, like the file path to the html file… normally margins make no sense are either too big or too small, etc etc…

At the same time CSS is great to format layouts of documents in a clean, reusable way… Almost seems ideal, but then when you get it onto PDF(or print) it gets all funky…

MultiMarkdown to PDF & Printing

I’m especially interested in this because i am a fan of the so called lightweight markup languages like Markdown(specially the MultiMarkdown extension). Thus avoiding having to go into Microsoft Word.

MultiMarkdown can be easily outputted into HTML and use CSS for formating, but then when trying to get it onto PDF(or printing), trouble starts…

You can either print it(or create PDF) directly from the browser and although it does keep the css formating still has all the troubles described in begin of post… Alternatively, you can transform the MultiMarkdown to LateX and then to PDF, the output is nice but the CSS formatting does not work there, and although the LateX supports formating, its complicated, so in practice i always ended up with the same looking PDF, which ends up being rather boring…

recently i found a 3rd option.

Prince XML

On a comment of a previous blog post on lightweight markup languages, someone suggested an alternative tool: Prince XML

Its a command line application, that gets an html file as input(plus a bunch of other options) and outputs a PDF file. Go check some samples here and example web applications using it here.

I found it so much better from the alternatives i’ve tried, that Im posting it in case someone else is twiddling around the same issues.

You can even find a Google talk about it, with HÃ¥kon Wium Lie(who proposed CSS originally and is Opera CTO) and Michael Day (system architect for Prince).

For Textmate users

I’ve created a simple command, so i get my MultiMarkdown directly onto PDF using 1 command, (its originally the Multimarkdown command that creates HTML with a couple of more lines added that produce the PDF using prince), here’s the code:

# Process the MultiMarkdown document into HTML and then PDF.

NAME="${TM_FILEPATH:-untitled}"
BASENAME="${NAME%.*}"

cd "${TM_MULTIMARKDOWN_PATH:-$HOME/Library/Application Support/MultiMarkdown}"
cd bin

#to HML
./multimarkdown2XHTML.pl > "$BASENAME.html"

#to PDF
`prince "$BASENAME.html" -o "$BASENAME.pdf"`

#open PDF
`open "$BASENAME.pdf"`

This assumes you have Prince XML installed in your system, and you can call it from command line.

No comments: