Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
releasenotes [2021/07/29 09:03] christian [Changed internal streams to bytes] |
releasenotes [2021/07/29 20:10] (current) christian [PDFtalk 2.5.0] |
||
---|---|---|---|
Line 5: | Line 5: | ||
July 2021 | July 2021 | ||
- | ==== Redesigned references | + | This release was triggered by Bob Nemec from HTS to improve error handling when appending PDFs. Two errors were seen: objects referenced but missing |
- | Objects are picked (read) from a PDF file stream when they are needed. Originally, this was done using blocks stored in place of the value (referent) | + | The use case of **appending PDFs** is the topic of this release. Some internal structures were redesigned |
- | Unfortunately, | + | Since the HTS systems run on Gemstone, the Gemstone version |
- | While at it, the number and generation of references was extracted to an '' | + | ==== Error handling ==== |
- | ==== Changed internal streams | + | Two structural errors were discovered which need to be handled. For describing these errors in more detail, a new page [[monsters|Monsters]] was created to collect some observations from the wild. |
- | The '' | + | === Handling missing object errors === |
- | ==== Added MissungReference | + | A reference pointing to an non-existing object (see [[monsters# |
- | I encountered an interesting error. The object, a reference was pointing to, was not there. The entry in the cross references was ' | + | On writing, the MissingObject is written as string saying that the object is missing. This preserves |
- | ==== Improving performance for large files ==== | + | === Handling incorrect stream length errors |
- | When reading many objects at once, the library | + | The ''/ |
- | * Object streams were created and initialized for each access to an object inside. Now, the streams are kept alive in a cache. | + | Therefore, a very specific error '' |
- | * References from traversing the PDF objects were collected | + | ==== New APIs ==== |
- | ==== New: Document>> | + | === Document>> |
A PDF (all pages) can be appended efficiently to a PDF Document. | A PDF (all pages) can be appended efficiently to a PDF Document. | ||
+ | <code smalltalk> | ||
- | All objects of a PDF to be appended are fully read by resolving all references reachable from the '' | + | All objects of the PDF to be appended are read from the file by resolving all references reachable from the '' |
- | Other errors are collected in the #errors variable of the Parser. | + | To concatenate some PDFs do: |
- | + | ||
- | To check for these errors, add the following to your code: | + | |
<code smalltalk> | <code smalltalk> | ||
- | aPDFFile parser errors notEmpty ifTrue: [ | + | | doc | |
- | aPDFFile parser errors inspect]. | + | doc := Document new. |
+ | doc appendAllPagesFrom: | ||
+ | doc appendAllPagesFrom: | ||
+ | doc appendAllPagesFrom: | ||
+ | doc saveAs: ' | ||
</ | </ | ||
=== Raw objects === | === Raw objects === | ||
- | There is also a variant < | + | There is also a variant |
+ | < | ||
+ | which reads all objects without typing. The objects are raw - generic '' | ||
- | The version is performing slightly faster (~ 5%) than the typed standard | + | In '' |
- | ==== Encoded PostScript sources ==== | + | |
- | Reencoded cmap source file methods with ASCII85 to allow fileIn to Gemstone. Topas from Gemstone as well as PostScript use the % character at the beginning of a line for directives and comments. Since cmaps are PostScript programs, their source cannot be embedded directly without disturbing | + | On '' |
+ | ==== Internal changes ==== | ||
+ | |||
+ | The user of the library is not affected by these changes. | ||
+ | |||
+ | === Improving performance for large files === | ||
+ | |||
+ | When reading many objects at once, the library was slow with large files. In this investigation, | ||
+ | |||
+ | * Object streams were created and initialized for each access to an object inside. Now, the streams are kept alive in a cache. | ||
+ | * References from traversing the PDF objects were collected in an OrderedCollection. The visited check was done with this collection. The time grows exponentially with the number of collected objects, so that large files can become very slow. Now, for the visited check, a Set is used. The OrderedCollection for the collected references is kept to ensure a reproducable order. | ||
+ | |||
+ | === Redesigned references and tracing === | ||
+ | |||
+ | Objects are picked (read) from a PDF file stream when they are needed. Originally, this was done using blocks stored in place of the value (referent) of a reference. When the value is requested, the block is evaluated and the resulting PDF object is stored as the referent. The block reads the raw object and converts it to the proper type. This can be nested and several types may apply. | ||
+ | |||
+ | Unfortunately, | ||
+ | |||
+ | While at it, the number and generation of references was extracted to an '' | ||
+ | |||
+ | === Changed internal streams to bytes === | ||
+ | |||
+ | The '' | ||
+ | |||
+ | ==== Gemstone ==== | ||
+ | |||
+ | This release updates the Gemstone code for the library and also the [[pdftalk4gemstone|PDFtalk for Gemstone]] page. The biggest addition is the [[postscript|PostScript]] module used with [[cmap|CMaps]] introduced in [[releasenotes# | ||
+ | |||
+ | === Encoded PostScript sources === | ||
+ | |||
+ | PostScript source methods (mainly cmaps and examples) are reencoded with ASCII85 to allow fileIn to Gemstone. Topas from Gemstone as well as PostScript use the % character at the beginning of a line for directives and comments. Since cmaps are PostScript programs, their source cannot be embedded directly without disturbing Gemstone. | ||
+ | |||
+ | Interestingly, | ||
+ | |||
+ | === Optional CMaps === | ||
+ | |||
+ | The [[cmap|CMaps module]] is used to decode strings to unicode. The library uses this when a font supplies a ''/ | ||
+ | |||
+ | Since they are very big, there are two Gemstone source files: **'' | ||
==== other changes ==== | ==== other changes ==== | ||
- | Compatibility changes for VW 9.1 to the UI | ||
- | ---- DRAFT ---- | + | In VisualWorks 9.1, icons were renamed and changed. In order to use the library' |
===== PDFtalk 2.4.0 ===== | ===== PDFtalk 2.4.0 ===== |