Tutorials
Template based document generation using LiveDocx and Zend Framework
Generating print-ready well-formatted PDF documents with PHP is not an easy task. Traditionally, there are two main approaches to PDF generation with PHP. Given sufficient time and patience, both partially get the job done, but still leave a lot to be desired:
HTML-to-PDF: This approach is widely used in mainstream applications. Here an HTML document is programmatically created and converted to a PDF, using one of the many open source libraries. Since HTML, however, is not a page-oriented format (as is PDF), it is impossible to perform a 1-to-1 mapping between HTML and PDF. Typical word processing file format features, such as header and footers, orphans and widows or even page numbers can simply not be represented in HTML.
Programmatic: This approach offers total control of the resulting PDF. However, it requires that the x and y coordinates of every line of text, every geometrical shape and graphic be set from program code. Not only is this an extremely time-consuming solution, but is also very brittle: Every time a graphical designer changes the layout of a document, a programmer must re-work his or her program code.
A completely new approach
In this article, the author presents an entirely new, third approach. It relies on templates being created in a WYSIWYG environment, such as Microsoft® Word or Open Office, and then being populated with data in PHP. The resulting document can be saved not only to PDF, but also DOCX, DOC and RTF.
Before we delve into a technical discussion on the inner workings of this new approach, let us first take a look at a practical example. The following PHP 5 code illustrates PDF generation, in which the merge fields software, licensee and company in the template template.docx [http://www.phplivedocx.org/wp-content/uploads/2009/01/license-agreement-template.docx] [46.7 KB] are populated with scalar data in PHP. The resulting document document.pdf [http://www.phplivedocx.org/wp-content/uploads/2009/01/license-agreement-document.pdf] [104.7 KB] is created and written to disk.
$phpLiveDocx = new Zend_Service_LiveDocx_MailMerge( array ( 'username' => 'yourUsername', 'password' => 'yourPassword' ) ); $phpLiveDocx->setLocalTemplate('template.docx'); $phpLiveDocx->assign('software', 'Magic Graphical Compression Suite v1.9'); $phpLiveDocx->assign('licensee', 'Henry Smith'); $phpLiveDocx->assign('company', 'Megasoft Co-operation'); $phpLiveDocx->createDocument(); $document = $phpLiveDocx->retrieveDocument('pdf'); file_put_contents('document.pdf', $document); unset($phpLiveDocx);
The code demonstrated in this article will be shipped with the Zend Framework 1.10 [http://www.zendframework.com/download/latest] when it becomes available. Although at the time of writing, there is no official release date, 1.10 is expected to release in Q4 2009. In the meantime, you can check the components out of the Standard Incubator [http://framework.zend.com/svn/framework/standard/incubator/] SVN repository.
Introducing LiveDocx
LiveDocx [http://www.livedocx.com/] is a SOAP [http://en.wikipedia.org/wiki/SOAP]-based document generation service, based on the market-leading, word processing component TX Text Control .NET [http://www.textcontrol.com/]. LiveDocx allows word processing templates to be populated in any programming language that supports SOAP. The resulting document can be saved to any supported format. This article, however, concentrates on using LiveDocx in PHP 5.
The components of the Zend Framework implementation of LiveDocx are located at /Zend/Service/LiveDocx/ in the standard Zend Framework distribution file. It is possible to use LiveDocx directly with the PHP 5 SoapClient [http://www.phplivedocx.org/articles/using-livedocx-without-the-zend-framework/], without the Zend Framework, and with the third party library NuSOAP [http://www.phplivedocx.org/articles/using-livedocx-with-nusoap/]. The NuSOAP approach even allows LiveDocx to be used in PHP 4. This article, however, concentrates on the official Zend Framework components in PHP 5.
The key point with LiveDocx is to reduce the effort required to generate well-formatted, print-ready word processing documents to an absolute minimum. For the end-user, the logic involved in creating any of the supported file formats is identical. For example, regardless of whether you want a PDF or RTF file, the code, with the exception of one parameter, is the same.
The core developers of LiveDocx also wanted to ensure that the generation of templates should be as simple as possible, in an environment, with which the end-user is already very familiar. Hence, you make the templates in Word or Open Office.
Templates and documents
Throughout this article, we refer to the terms templates and documents. It is important to understand the difference between the two.
Templates: The term template is used to refer to the input file, containing formatting and text fields. Templates can be in any one of the following file formats:
- DOCX [http://en.wikipedia.org/wiki/Office_Open_XML] - Office Open XML Format
- DOC [http://en.wikipedia.org/wiki/DOC_(computing)] - Microsoft® Word DOC Format
- RTF [http://en.wikipedia.org/wiki/Rich_Text_Format] - Rich Text Format
- TXD [http://www.textcontrol.com/] - TX Text Control® Format
Templates can be stored either locally on the client machine (the one from which the SOAP request is initiated) or remotely on the backend server. The decision on which one you should use depends upon the kind of application you are developing.
If you store the templates locally, you have to transfer the template together with the data that should be populated on every request. In the case that the template remains the same on every request, this approach is very inefficient. It would be better to upload the template to the backend server once and then reference it on all subsequent requests. This way, only the data that should be populated is transfered from the client to the backend server. Most applications, using LiveDocx fall into this category.
On the other hand, if you have a template that is constantly changing, or an application in which you enable end-users to upload templates, you may consider storing templates locally and transfer them every request. This approach, is obviously slower, as every request contains the template itself, in addition to the data to populate it.
Documents: The term document is used to refer to the generated output file that contains the template file, populated with data - i.e. the finished document. Documents can be saved in any one of the following file formats:
- DOCX [http://en.wikipedia.org/wiki/Office_Open_XML] - Office Open XML Format
- DOC [http://en.wikipedia.org/wiki/DOC_(computing)] - Microsoft® Word DOC Format
- HTML [http://en.wikipedia.org/wiki/Xhtml] - XHTML 1.0 Transitional Format
- RTF [http://en.wikipedia.org/wiki/Rich_Text_Format] - Rich Text Format
- PDF [http://en.wikipedia.org/wiki/Portable_Document_Format] - Acrobat® Portable Document Format
- TXD [http://www.textcontrol.com/] - TX Text Control Format
- TXT [http://en.wikipedia.org/wiki/Text_file] - ANSI Plain Text
In addition to the above word processing file formats, documents can also be saved to the following image file formats:
- BMP [http://en.wikipedia.org/wiki/BMP_file_format] - Bitmap Image Format
- GIF [http://en.wikipedia.org/wiki/GIF] - Graphics Interchange Format
- JPG [http://en.wikipedia.org/wiki/Jpg] - Joint Photographic Experts Group Format
- PNG [http://en.wikipedia.org/wiki/Portable_Network_Graphics] - Portable Network Graphics Format
- TIFF [http://en.wikipedia.org/wiki/Tagged_Image_File_Format] - Tagged Image File Format
- WMF [http://en.wikipedia.org/wiki/Windows_Metafile] - Windows Meta File Format
Using LiveDocx
In this section, we are going to look at the entire process of creating a document using LiveDocx from scratch.
Creating a template in Microsoft® Word 2007
The first step in any LiveDocx project is the creation of a template. To do this, you can use either Open Office or Microsoft® Word. For the purpose of this article, we are going to use Microsoft® Word 2007. For instructions on using Open Office, please take a look at the LiveDocx Blog [http://blog.livedocx.com/post/Creating-templates-using-OpenOfficeorg.aspx].
Insert merge field in Microsoft® Word 2007 Start off by creating a new file in Microsoft® Word 2007 and save the template file as template.docx.
[/uploads/phplivedocx/msword-dialog_zoom.png]
You can then start to compose the template, inserting text, graphics and merge fields with the Field dialog box, shown to the right.
After a while, you will have a template, which contains images, text and a number of merge fields. The merge fields are represented by { MERGEFIELD name } and will be populated with scalar data in the next step. The follow screenshot of the template in Microsoft® Word 2007 illustrates how your template may look:
[/uploads/phplivedocx/msword-basic-template_zoom.png]
Save the template template.docx[/img] [http://www.phplivedocx.org/wp-content/uploads/2009/01/license-agreement-template.docx] [46 KB] when you are done.
Assigning scalar data types in LiveDocx
Now that we have the template file, the next step is to populate it with data. In the following example, we are going to assign scalar data types - in this case strings - to the template.
$phpLiveDocx = new Zend_Service_LiveDocx_MailMerge( array ( 'username' => 'yourUsername', 'password' => 'yourPassword' ) ); $phpLiveDocx->setLocalTemplate('template.docx'); $phpLiveDocx->assign('software', 'Magic Graphical Compression Suite v1.9'); $phpLiveDocx->assign('licensee', 'Henry Smith'); $phpLiveDocx->assign('company', 'Megasoft Co-operation'); $phpLiveDocx->assign('date', 'October 10, 2009'); $phpLiveDocx->assign('time', '14:12:01'); $phpLiveDocx->assign('city', 'Frankfurt'); $phpLiveDocx->assign('country', 'Germany'); $phpLiveDocx->createDocument(); $document = $phpLiveDocx->retrieveDocument('pdf'); file_put_contents('document.pdf', $document); unset($phpLiveDocx);
For many applications, in particular those in which PDF files are used of archiving purposes, you may wish to set the meta data of the PDF file. You can do this, by specifying an associative array with the meta data that should be embedded into the PDF file. The setDocumentProperties() method must be called before createDocument():
$documentProperties = array( 'title' => 'Magic Graphical Compression Suite v1.9', 'author' => 'Megasoft Co-operation', 'subject' => 'Magic Graphical Compression Suite v1.9', 'keywords' => 'Graphics, Magical, Compress, Suite, License' ); $phpLiveDocx->setDocumentProperties($documentProperties);
The resulting document document.pdf [http://www.phplivedocx.org/wp-content/uploads/2009/01/license-agreement-document.pdf] [104 KB] is written to disk and can now be opened in your favorite PDF reader, such as the shipped Document Viewer in Ubuntu:
[/uploads/phplivedocx/msword-basic-document_zoom.png]
Assigning compound data types in LiveDocx
In addition to the scalar data types, which were assigned to the template in the previous example, you can also assign compound data types, such as an associative array. Consider the template template.doc [http://www.phplivedocx.org/wp-content/uploads/2009/01/telephone-bill-template.doc] [20.5 KB] and the resulting document document.pdf [http://www.phplivedocx.org/wp-content/uploads/2009/01/telephone-bill-document.pdf] [77.6 KB]. In particular, take a look at the following section of the template (click to enlarge):
[/uploads/phplivedocx/msword-complex-template_zoom.png]
The section of the template between the bookmarks in Microsoft® Word and in Microsoft® Word is repeated in the final document to produce rows of a table. One sub-array of the following associate array is used for each row.
Using the following PHP 5 code, we are going to populate the template with an associative array of telephone connection data. For clarity, this example shows only the part in which an associative array is assigned. The instantiation of LiveDocx and the document creation and retrievable processes are identical to the previous examples and have been omitted:
// instantiate LiveDocx $billConnections = array( array( 'connection_number' => '+11 (0)222 333 441', 'connection_duration' => '00:01:01', 'fee' => '1.15' ), array( 'connection_number' => '+11 (0)222 333 442', 'connection_duration' => '00:01:02', 'fee' => '1.15' ), array( 'connection_number' => '+11 (0)222 333 443', 'connection_duration' => '00:01:03', 'fee' => '1.15' ), array( 'connection_number' => '+11 (0)222 333 444', 'connection_duration' => '00:01:04', 'fee' => '1.15' ) ); $phpLiveDocx->assign('connection', $billConnections); // create and retrieve document
The resulting document contains the following table with the data from the assigned associate array (click to zoom):
[/uploads/phplivedocx/docviewer-complex-template_zoom.png]
Generating image files with LiveDocx
In addition to the word processing file formats listed above that are supported by LiveDocx, you can also save the resulting documents as one or more image files. For this purpose, Zend_Service_LiveDocx_MailMerge offers the methods getAllBitmaps() and getBitmaps():
// instantiate LiveDocx // get all bitmaps // (zoomFactor, format) $bitmaps = $phpLiveDocx->getAllBitmaps(100, 'png');
Similarly, it is possible to retrieve images for pages in a specific range:
// get just bitmaps in specified range // (fromPage, toPage, zoomFactor, format) $bitmaps = $phpLiveDocx->getBitmaps(2, 2, 100, 'png');
Note that zoomFactor parameter. This is a per cent value, in the range of 10% to 400%. These methods are ideally suited to generating thumbnail images of the created document, for example, to display in the browser as a preview.
The actual image files can be written to disk by iterating through the $bitmaps array. There is one page of binary data per record in the array:
// write to disk // (one page per record) foreach ($bitmaps as $pageNumber => $bitmapData) { $filename = sprintf('documentPage%d.png', $pageNumber); file_put_contents($filename, $bitmapData); printf('Written %d bytes to disk as %s.%s', filesize($filename), $filename, PHP_EOL); }
Deploying LiveDocx in your own applications
The code, which constitutes the PHP 5 implementation of LiveDocx, shipped in the Zend Framework, is released under the New BSD license [http://www.phplivedocx.org/articles/phplivedocx-license/] and thus may be deployed, modified and redistributed in most projects, according to that terms of the license. The actual LiveDocx SOAP server, however, is proprietary software. There are three ways in which the SOAP service can be deployed in your own applications.
- Free public server
For the vast majority of applications, developers choose this approach. The default LiveDocx server that is referenced in the Zend Framework components is the free public server. It may be used in your own applications completely free of charge. Sign up [https://www.livedocx.com/user/account_registration.aspx] for a LiveDocx account. - Premium hosted server
In the case that your application generates several thousand documents per hour, you may consider paying a small monthly fee to have access to your own personal LiveDocx server. In association with leading hosting providers, you can rent such a premium hosted server. - Local licensed server
In the case that your application generates more than about ten thousand documents her hour, you may consider installing a LiveDocx server in your local network. Having direct access in a local gigabit network is by far the fastest way of deploying LiveDocx.
Learn more
This article has scratched the surface of what you can do with LiveDocx. If you would like to learn more about this new powerful document generation platform, please take a look at the following resources:
LiveDocx in PHP 5
- phpLiveDocx blog [http://www.phplivedocx.org/]
- phpLiveDocx technical articles [http://www.phplivedocx.org/articles/]
LiveDocx SOAP service
- LiveDocx blog [http://blog.livedocx.com/]
- LiveDocx API reference [http://www.livedocx.com/pub/documentation/api.aspx]
Please do not hesitate to contact the author [http://www.phplivedocx.org/contact/] or request technical support [http://www.phplivedocx.org/support/] in the support forum (free of charge) at any time.