Generating PDFs using Python and Chrome's Headless mode

One of the most robust ways of programmatically generating PDFs is using a headless Chrome (or open source Chromium) browser.

HTML and CSS is by far the most widely support document formatting markup in the modern era, and Chrome is built on a multi-billion dollar rendering engine that works very well, and very fast.

Chrome’s headless option requires no window server, making it suitable to run in the background or in a console-only server environment.

Install Chrome (or Chromium)

Mac


Install chrome directly from the chrome website as you would any other desktop app.

Ubuntu

sudo apt-get install chromium-browser

Generate PDFs via Command line

Linux

chromium-browser file://path/to/input-file.html --headless --print-to-pdf=path/to/output-file.pdf

MacOS

In MacOS, the path to chrome is `/Applications/Chrome.app`

/Applications/Chrome.app file://path/to/input-file.html --headless --print-to-pdf=path/to/output-file.pdf

Generating with Python

Using chrome with Python is as easy as calling a subtask or subshell:

import subprocess
subprocess.call("chromium-browser file://input.html --headless --print-to-pdf=output.pdf", shell=True)

Python files

Working with Python tempfiles can be very handy in Python scripting.   Chromium does require the input file to be present on disk, but named temporary files make this easy and you can keep your content in memory until the very moment of PDF rendering.  An example:

import tempfile

input_file = NamedTemporaryFile(suffix='.html')
input_file.write(some_html_content)
input_file.flush()

subprocess.call(f"chromium-browser file://{input_file.name} --headless --print-to-pdf=output.pdf", shell=True)

Django

Lofty built django-hardcopy to implement the subprocess handling and some convenience methods for generating Django views that render templates as PDFs

https://github.com/loftylabs/django-hardcopy

Other Arguments

Chrome has quite a few arguments to its headless mode that can help with generation.  A few are listed below:

  • --window-size - Set the window size of the “browser” to assist in formatting output. |
  • --screenshot=path.png - Use in lieu of --print-to-pdf to generate PNG screenshots rather than PDFs |
  • --disable-gpu - Required by headless mode in some older versions of Chrome / operating systems |
  • --virtual-time-budget - Limit the amount of time chrome will wait for the virtual page to load.  Helpful when including some errant javascript libraries that block the page load event from firing in a headless environment |
  • --no-sandbox - Required to run in Docker environments

You can read about headless mode and its arguments at the following links:

[https://developer.chrome.com/blog/headless-chrome/](https://developer.chrome.com/blog/headless-chrome/)

[https://developer.chrome.com/articles/new-headless/](https://developer.chrome.com/articles/new-headless/)

CSS

Since chrome is rendering the input as a browser, you can style your document with CSS (and Javascript, for that matter).  

It's important to note that Chrome runs in the `@print` CSS media, just like if you hit the print button on a webpage, so you can take advantage of the print specific features of CSS (like pagebreaks) in your input to style the output PDF.

The most important rules to consider are :

@media print {
  body {
  	orientation: portrait; // or landscape
  }

  .page-break-example {
    page-break-after: always; //or auto
    page-break-before: auto;
    page-break-inside: auto;
  }
}

Some notes on that are here:

https://developer.mozilla.org/en-US/docs/Web/CSS/page-break-after

https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_paged_media

https://developer.mozilla.org/en-US/docs/Web/Guide/Printing

More from Lofty