Sane PDF Generation

Reliable HTML to PDF parsing is a tricky business. Mastering one of the many PDF-generating libraries out there isn't the end of the line either. What if you change to a framework written in a different programming language? And what about hidden content that needs to be printed (e.g. content behind tabs)?

We've recently been exploring Pdfcrowd, a web service that allows you to create PDFs from pure HTML documents. Pdfcrowd is built around Webkit (full HTML/CSS2 support!) and has a robust REST API for easy document generation. This allows us to focus on screen display that translates to an acceptable direct-to-print version and doesn't lock us in to a library built for a specific language.

Exposing Hidden Content

For content hidden in tabs (or other more complex layouts), we've leveraged a combination of JavaScript triggers and Pdfcrowd's REST API to create PDF documents without a dedicated "print-friendly" template. We start with a form containing a single, hidden input:

<form method="post" action="/pdf_generator" id="pdf-form">
    <input type="hidden" name="html">
</form>

Then, a simple trigger (this example uses a "click" binding parsed by Knockout JS):

<a href="#" data-bind="click: print" data-section="#section-to-print">print</a>

A simple JavaScript function passes the generated HTML (indicated in the data-section attribute) to the hidden input before submitting the form to our server-side controller.

var print = function(data, element) {
    /*
    set the value of the hidden input to the
    generated html of the "data-section" element
    */
    $('#pdf-form input[name=html]').val( $( $(element.target).data('section') ).html() );
    
    /* submit the form */
    $('#pdf-form').submit();
};

Server-side, we have a controller that handles the call to Pdfcrowd's REST API:

<?php
require 'pdfcrowd.php';
try {
    /* create an API client instance */
    $client = new Pdfcrowd("{username}", "{apikey}");
    
    /* create header HTML */
    $head  = '<!DOCTYPE html>';
    $head .= '<html lang="en">';
    $head .= '<head>';
    $head .= '<meta charset="utf-8">';
    $head .= '<link rel="stylesheet" type="text/css" href="/path/to/styles.css">';
    $head .= '</head>';
    
    $html  = $head;
    $html .= '<body>';
    $html .= $this->input->post('html');
    $html .= '</body>';
    $html .= '</html>';
    
    /* convert raw HTML and store the generated PDF as $file */
    $file = $client->convertHtml($html);
    
    /* set HTTP response headers */
    header("Content-Type: application/pdf");
    header("Cache-Control: max-age=0");
    header("Accept-Ranges: none");
    header("Content-Disposition: attachment; filename=\"my.pdf\"");
    
    /* send the generated PDF */
    echo $file;
}
catch(PdfcrowdException $why) {
    echo "Pdfcrowd Error: " . $why;
}

Multi-page PDFs

You’ll notice we stored the HTML <head> element in it's own $head variable. This is useful for leveraging repeating header and footer HTML for multi-page PDFs. We can even reuse parts of our global includes with this method:

<?php
    
    /* set header HTML */
    $header  = $head;
    $header .= '<body>';
    $header .= $this->load->view('header');
    $header .= '</body>';
    $header .= '</html>';
    
    $client->setHeaderHtml($header);
    
    /* set footer HTML */
    $footer  = $head;
    $footer .= '<body>';
    $footer .= $this->load->view('footer');
    $footer .= '</body>';
    $footer .= '</html>';
    
    $client->setFooterHtml($footer);

This is a simplified example, but you can begin to see how you can pass generated and/or hidden content to a simple server-side controller for PDF generation.

Need help with PDF generation on your site or application? Contact us or let us know in the comments!

Filed under Code

About the Author

John Reed

John Reed

@johnthomasreed

Developer