Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
releasenotes [2021/07/29 09:50] christian [PDFtalk 2.5.0] |
releasenotes [2021/07/29 20:10] (current) christian [PDFtalk 2.5.0] |
||
---|---|---|---|
Line 7: | Line 7: | ||
This release was triggered by Bob Nemec from HTS to improve error handling when appending PDFs. Two errors were seen: objects referenced but missing and streams with one extra byte. | This release was triggered by Bob Nemec from HTS to improve error handling when appending PDFs. Two errors were seen: objects referenced but missing and streams with one extra byte. | ||
- | The use case of appending PDFs is the topic of this release. Some internal structures were redesigned and the bugs are handled. Also, the performance for appending large files was improved. | + | The use case of **appending PDFs** is the topic of this release. Some internal structures were redesigned and the bugs are handled. Also, the performance for appending large files was improved. |
- | ==== Internal changes ==== | + | |
- | The user of the library | + | Since the HTS systems run on Gemstone, the Gemstone version |
- | === Improving performance for large files === | + | ==== Error handling ==== |
- | When reading many objects at once, the library was slow with certain files. In this investigation, a few issues came up which were never a problem when clicking through objects one by one. | + | Two structural errors were discovered which need to be handled. For describing these errors in more detail, a new page [[monsters|Monsters]] was created to collect some observations from the wild. |
- | * Object streams were created and initialized for each access to an object | + | === Handling missing |
- | * References from traversing the PDF objects were collected in an OrderedCollection. The visited check was done with this collection. Unfortunately, | + | |
- | === Redesigned references | + | A reference pointing to an non-existing object (see [[monsters# |
- | Objects are picked (read) from a PDF file stream when they are needed. Originally, this was done using blocks stored in place of the value (referent) of a reference. When the value is requested, the block is evaluated and the resulting PDF object is stored as the referent. The block reads the raw object | + | On writing, the MissingObject |
- | Unfortunately, | + | === Handling incorrect stream length errors === |
- | While at it, the number | + | The ''/ |
- | === Changed internal streams to bytes === | + | Therefore, a very specific error '' |
+ | ==== New APIs ==== | ||
- | The '' | + | === Document>> |
- | ==== Error handling ==== | + | A PDF (all pages) can be appended efficiently to a PDF Document. |
+ | <code smalltalk> | ||
- | === Added MissungReference error === | + | All objects of the PDF to be appended are read from the file by resolving all references reachable from the '' |
- | I encountered an interesting error. The object, a reference was pointing to, was not there. The entry in the cross references was 'free' | + | To concatenate some PDFs do: |
+ | <code smalltalk> | ||
+ | | doc | | ||
+ | doc := Document new. | ||
+ | doc appendAllPagesFrom: | ||
+ | doc appendAllPagesFrom: | ||
+ | doc appendAllPagesFrom: | ||
+ | doc saveAs: ' | ||
+ | </ | ||
- | ==== New APIs ==== | + | === Raw objects |
- | === Document>> | + | There is also a variant |
+ | <code smalltalk> | ||
+ | which reads all objects without typing. The objects are raw - generic '' | ||
- | A PDF (all pages) can be appended efficiently to a PDF Document. | + | In '' |
- | All objects of a PDF to be appended are fully read by resolving all references reachable from the '' | + | On '' |
- | Other errors are collected in the #errors variable of the Parser. | + | ==== Internal changes ==== |
- | To check for these errors, add the following to your code: | + | The user of the library is not affected by these changes. |
- | <code smalltalk> | + | |
- | aPDFFile parser errors notEmpty ifTrue: [ | + | |
- | aPDFFile parser errors inspect]. | + | |
- | </ | + | |
- | === Raw objects | + | === Improving performance for large files === |
- | There is also a variant < | + | When reading many objects |
- | The version is performing slightly faster (~ 5%) than the typed standard variant | + | * Object streams were created and initialized for each access to an object inside. Now, the streams are kept alive in a cache. |
+ | * References from traversing the PDF objects were collected in an OrderedCollection. The visited check was done with this collection. The time grows exponentially with the number of collected objects, so that large files can become very slow. Now, for the visited check, a Set is used. The OrderedCollection for the collected references is kept to ensure a reproducable order. | ||
- | ==== other changes | + | === Redesigned references and tracing |
+ | |||
+ | Objects are picked (read) from a PDF file stream when they are needed. Originally, this was done using blocks stored in place of the value (referent) of a reference. When the value is requested, the block is evaluated and the resulting PDF object is stored as the referent. The block reads the raw object and converts it to the proper type. This can be nested and several types may apply. | ||
+ | |||
+ | Unfortunately, | ||
+ | |||
+ | While at it, the number and generation of references was extracted to an '' | ||
+ | |||
+ | === Changed internal streams to bytes === | ||
+ | |||
+ | The '' | ||
+ | |||
+ | ==== Gemstone ==== | ||
+ | |||
+ | This release updates the Gemstone code for the library and also the [[pdftalk4gemstone|PDFtalk for Gemstone]] page. The biggest addition is the [[postscript|PostScript]] module used with [[cmap|CMaps]] introduced in [[releasenotes# | ||
=== Encoded PostScript sources === | === Encoded PostScript sources === | ||
- | Reencoded cmap source | + | PostScript |
- | === Compatibility | + | Interestingly, |
+ | |||
+ | === Optional CMaps === | ||
+ | |||
+ | The [[cmap|CMaps module]] is used to decode strings to unicode. The library uses this when a font supplies a ''/ | ||
+ | |||
+ | Since they are very big, there are two Gemstone source files: **'' | ||
+ | ==== other changes | ||
- | Some icons were copied to be used in both, the new VW 9.1 and earlier | + | In VisualWorks |