![]() |
|
||||||||||||||
| Electronic
publishing |
|||||||||||||||
![]() |
PJ and PJX Portable Document Format (PDF) has become a very popular interchange format for graphical and layout-intensive documents normally used in electronic publishing environments. PDF was originally designed as a "final" document format, a simplified analogue to PostScript, and not intended to be modified or "post-processed." In practice, the dynamic requirements of document exchange over the Internet led to attempts to use PDF in a more flexible way. One of the first of these experiments was the development of the PJ open source software at Etymon in 1998. PJ is a class library in Java that implements a simplified programmatic interface to PDF documents. The demand for this kind of software proved significant, and many commercial products entered the market offering similar and enhanced features. PJ has been used very widely and licensed by several companies for commercial redistribution. Its general features are: reading, parsing, modifying, and extracting data from existing PDF documents, and creating new PDF documents. Some examples of how PJ has been used include: using an existing document as a starting point for PDF creation, extracting information about documents for a catalog, stamping header or other text onto pages of a document, combining pages from various sources into a single document, overlaying text onto a form based on user input, extracting graphical elements from a document, and extraction of text content from documents. Most of these applications were not natively supported by PJ but developed on top of the PJ architecture. PJ was initially intended as an experimental prototype implementation. However, as a result of its use in increasingly demanding commercial "production" settings, we offered a more robust design called PJX, also available as open source software. PJX specifically included significantly faster reading and writing of PDF documents, thread safety, "on demand" reading and parsing of PDF objects which greatly reduces memory usage and processing time, incremental update support to enable fast modification of PDF documents, reading PDF documents from either disk or memory, thorough documentation of the class library interface, support for J2SE collection classes and NIO, access to form/field objects, rudimentary support for insertion of images and watermarks, appending of large documents, and design patterns for recursive processing of PDF objects. The PJX source code is available under the terms of version 2 of the GNU General Public License (GPL), and we welcome developers interested in contributing to the software. A brief tutorial is available: [PDF]. The primary documentation is the Javadoc reference pages, to be used in conjunction with the PDF Reference Manual from Adobe. The older PJ software is also available under the same license terms. The following articles and citations contain additional information related to PJ and PJX: Nassar, N. 1998. "Automating PDF Objects for Interactive Publishing," Web Techniques 3, 10 (Oct.), 61-65. Mohseni, P. 2000. "PDF and Java," EarthWeb, May. Nolan, G. 2000. "Serving Dynamic PDF Files," Web Techniques 5, 10 (Oct.). Bergmark, D., Phempoonpanich, P., and Zhao, S. 2001. "Scraping the ACM Digital Library," SIGIR Forum 35, 2 (Fall), 1-7. Laird, C. 2001. "Open-Source PDF Programming," SW Expert, Dec., 43-45. [PDF] Zipper, B. 2002. "PDF on the Fly: Tools and Strategies for Automatic Generation of PDF Files," The Seybold Report 2, 10 (Aug.).
|
||||||||||||||
| Copyright © 1998-2005 Etymon Systems, Inc. Legal notice. |