Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
releasenotes [2021/07/28 17:48] christian [New: ''Document>>#appendAllPagesFrom:''] |
releasenotes [2021/07/29 20:10] (current) christian [PDFtalk 2.5.0] |
||
---|---|---|---|
Line 3: | Line 3: | ||
===== PDFtalk 2.5.0 ===== | ===== PDFtalk 2.5.0 ===== | ||
- | ---- DRAFT ---- | + | July 2021 |
- | June 2021 | + | This release was triggered by Bob Nemec from HTS to improve error handling when appending PDFs. Two errors were seen: objects referenced but missing and streams with one extra byte. |
- | ==== Redesigned references | + | The use case of **appending PDFs** is the topic of this release. Some internal structures were redesigned |
- | Objects are picked (read) from a PDF file stream when they are needed. Originally, this was done using blocks stored in place of the referent (value? | + | Since the HTS systems run on Gemstone, the Gemstone version |
- | Unfortunately, | + | ==== Error handling ==== |
- | While at it, the number and generation of references | + | Two structural errors were discovered which need to be handled. For describing these errors in more detail, a new page [[monsters|Monsters]] |
- | ==== Changed internal streams to bytes ==== | + | === Handling missing object errors |
- | The internal write stream writes bytes to produce | + | A reference pointing |
- | Before characters were used. When writing | + | On writing, the MissingObject is written as string saying that the object is missing. This preserves the references and leads to a TypeMismatch error on next reading, which can be handled easily. |
- | ==== Added MissungReference error ==== | + | === Handling incorrect stream length errors |
- | I encountered an interesting error. | + | The ''/ |
- | ==== Improving performance for large files ==== | + | Therefore, a very specific error '' |
+ | ==== New APIs ==== | ||
- | When reading many objects at once, the library was slow with certain files. In this investigation, | + | === Document>> |
- | + | ||
- | * Object streams were created and initialized for each access to an object inside. Now, the streams are kept alive in a cache. | + | |
- | * References from traversing the PDF objects were collected in an OrderedCollection. The visited check was done with this collection. Unfortunately, | + | |
- | + | ||
- | ==== New: '' | + | |
A PDF (all pages) can be appended efficiently to a PDF Document. | A PDF (all pages) can be appended efficiently to a PDF Document. | ||
+ | <code smalltalk> | ||
- | All objects of a PDF to be appended are fully read by resolving all references reachable from the '' | + | All objects of the PDF to be appended are read from the file by resolving all references reachable from the '' |
- | Other errors are collected in the #errors variable of the Parser. | + | To concatenate some PDFs do: |
- | + | ||
- | To check for these errors, add the following to your code: | + | |
<code smalltalk> | <code smalltalk> | ||
- | aPDFFile parser errors notEmpty ifTrue: [ | + | | doc | |
- | aPDFFile parser errors inspect]. | + | doc := Document new. |
+ | doc appendAllPagesFrom: | ||
+ | doc appendAllPagesFrom: | ||
+ | doc appendAllPagesFrom: | ||
+ | doc saveAs: ' | ||
</ | </ | ||
=== Raw objects === | === Raw objects === | ||
- | There is also a variant < | + | There is also a variant |
+ | < | ||
+ | which reads all objects without typing. The objects are raw - generic '' | ||
- | The version is performing slightly faster (~ 5%) than the typed standard | + | In '' |
- | ==== Encoded PostScript sources ==== | + | |
- | Reencoded cmap source file methods with ASCII85 to allow fileIn to Gemstone. Topas from Gemstone as well as PostScript use the % character at the beginning of a line for directives and comments. Since cmaps are PostScript programs, their source cannot be embedded directly without disturbing | + | On '' |
+ | ==== Internal changes ==== | ||
+ | |||
+ | The user of the library is not affected by these changes. | ||
+ | |||
+ | === Improving performance for large files === | ||
+ | |||
+ | When reading many objects at once, the library was slow with large files. In this investigation, | ||
+ | |||
+ | * Object streams were created and initialized for each access to an object inside. Now, the streams are kept alive in a cache. | ||
+ | * References from traversing the PDF objects were collected in an OrderedCollection. The visited check was done with this collection. The time grows exponentially with the number of collected objects, so that large files can become very slow. Now, for the visited check, a Set is used. The OrderedCollection for the collected references is kept to ensure a reproducable order. | ||
+ | |||
+ | === Redesigned references and tracing === | ||
+ | |||
+ | Objects are picked (read) from a PDF file stream when they are needed. Originally, this was done using blocks stored in place of the value (referent) of a reference. When the value is requested, the block is evaluated and the resulting PDF object is stored as the referent. The block reads the raw object and converts it to the proper type. This can be nested and several types may apply. | ||
+ | |||
+ | Unfortunately, | ||
+ | |||
+ | While at it, the number and generation of references was extracted to an '' | ||
+ | |||
+ | === Changed internal streams to bytes === | ||
+ | |||
+ | The '' | ||
+ | |||
+ | ==== Gemstone ==== | ||
+ | |||
+ | This release updates the Gemstone code for the library and also the [[pdftalk4gemstone|PDFtalk for Gemstone]] page. The biggest addition is the [[postscript|PostScript]] module used with [[cmap|CMaps]] introduced in [[releasenotes# | ||
+ | |||
+ | === Encoded PostScript sources === | ||
+ | |||
+ | PostScript source methods (mainly cmaps and examples) are reencoded with ASCII85 to allow fileIn to Gemstone. Topas from Gemstone as well as PostScript use the % character at the beginning of a line for directives and comments. Since cmaps are PostScript programs, their source cannot be embedded directly without disturbing Gemstone. | ||
+ | |||
+ | Interestingly, | ||
+ | |||
+ | === Optional CMaps === | ||
+ | |||
+ | The [[cmap|CMaps module]] is used to decode strings to unicode. The library uses this when a font supplies a ''/ | ||
+ | |||
+ | Since they are very big, there are two Gemstone source files: **'' | ||
==== other changes ==== | ==== other changes ==== | ||
- | Compatibility changes for VW 9.1 to the UI | ||
- | ---- DRAFT ---- | + | In VisualWorks 9.1, icons were renamed and changed. In order to use the library' |
===== PDFtalk 2.4.0 ===== | ===== PDFtalk 2.4.0 ===== |