Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
changing [2016/06/02 12:22]
christian created
changing [2016/09/24 08:05] (current)
dokuadmin ↷ Page moved from pdf:changing to changing
Line 1: Line 1:
 ====== Changing existing PDFs ====== ====== Changing existing PDFs ======
  
-In order to change an existing PDF, the file has to be read with <code>PDF.File read: <aFilename></code>, moved to a new PDF.Document object, modified and written out with: <code><aDocument> write: <aFilename></code>.+To change an existing PDF, the file has to be read with
 +<code>aFile := PDF.File read: <aFilename></code> 
 + 
 +The file can then be converted to a Document, an object which can write itself as a PDF file by: 
 +<code>aDocument := aFile asDocument</code> 
 + 
 +After changing things in ''aDocument''the PDF file is written out with:  
 +<code><aDocument> saveAs: <aFilename></code> 
 + 
 +===== Details ===== 
 + 
 +The class PDF.File is for reading PDFs from files. It does so incrementally by just reading objects from disk when they are needed. On can see that in the PDFExplorer: 
 + 
 +{{:pdf:pdfexplorerpartialreadnumbers.png?nolink|729 of 125179 objects have been read}} 
 + 
 +where 729 of 125179 objects have been read. 
 + 
 +The inital object read from a PDF is the ''/Trailer''. Apart from some internal bookkeeping attributes, a trailer contains the ''/Root'' with a reference to the Catalog (the contents of the PDF), the ''/Info'' and the ''/ID''. This trailer is held by a File in the ''#trailer'' instance variable. 
 + 
 +The cloning of the PDF is done in the ''File>>asDocument'' method: 
 +<code Smalltalk> 
 +asDocument 
 + "<Document> 
 + a new document with the same contents as the receiver for writing out the PDF later" 
 + 
 + | newDocument info | 
 + newDocument := Document new. 
 + newDocument root: self trailer Root. 
 + info := self trailer Info. 
 + info at: #ModDate put: Timestamp now. 
 + info at: #Producer put: PDF producerText. 
 + newDocument info: info. 
 + newDocument previousId: self trailer ID. 
 + ^newDocument 
 +</code> 
 + 
 +For the new document , we just take the ''/Root'', ''/Info'' and ''/ID'' attributes from the file just read. The ''/Info'' is modified by setting the modification time and overwriting ''/Producer'' with the name and version of the library. 
 + 
 +The ''/ID'' needs special treatment. It is an array with two hash values, where the first identifies the original PDF (there both hashes were the same), while the second changes with every change of the document. Some workflows identify different versions of a document by their first ID value. Therefore, it should be preserved by the new document, which is why we store the old ''/ID'' as ''#previousId'' in the new document. 
 + 
 +Finally, when writing the new document, all references from ''/Root'' are followed, possibly read in on the fly by the ''File'' object, and then written to the new file. Therefore, the original file should not be closed before the new document has been written out. 
 + 
 +In the demos 12 and 13 (package "PDF Development", class Document #demo12_copyPagesToNewPDF and #demo13_splitPDF), selected objects, pages, are copied to new PDFs. With #asDocument, all other /Catalog attributes like /Outlines, /Metadata and other document related information are copied over to a new PDF.
  • changing.1464862937.txt.gz
  • Last modified: 2016/06/02 12:22
  • by christian