Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
releasenotes [2021/07/29 17:53]
christian [New APIs]
releasenotes [2021/07/29 20:10] (current)
christian [PDFtalk 2.5.0]
Line 7: Line 7:
 This release was triggered by Bob Nemec from HTS to improve error handling when appending PDFs. Two errors were seen: objects referenced but missing and streams with one extra byte. This release was triggered by Bob Nemec from HTS to improve error handling when appending PDFs. Two errors were seen: objects referenced but missing and streams with one extra byte.
  
-The use case of appending PDFs is the topic of this release. Some internal structures were redesigned and the bugs are handled. Also, the performance for appending large files was improved.+The use case of **appending PDFs** is the topic of this release. Some internal structures were redesigned and the bugs are handled. Also, the performance for appending large files was improved.
  
 Since the HTS systems run on Gemstone, the Gemstone version of the library was updated. Since the HTS systems run on Gemstone, the Gemstone version of the library was updated.
Line 61: Line 61:
 === Improving performance for large files === === Improving performance for large files ===
  
-When reading many objects at once, the library was slow with certain files. In this investigation, a few issues came up which were never a problem when clicking through objects one by one.+When reading many objects at once, the library was slow with large files. In this investigation, a few issues came up which were never a problem when clicking through objects one by one.
  
   * Object streams were created and initialized for each access to an object inside. Now, the streams are kept alive in a cache.   * Object streams were created and initialized for each access to an object inside. Now, the streams are kept alive in a cache.
-  * References from traversing the PDF objects were collected in an OrderedCollection. The visited check was done with this collection. Unfortunately, the time grows exponentially with the number of collected objects so that large files can become very slow. Now, for the visited check, a Set is used. The OrderedCollection for the collected references is kept to ensure a reproducable order.+  * References from traversing the PDF objects were collected in an OrderedCollection. The visited check was done with this collection. The time grows exponentially with the number of collected objectsso that large files can become very slow. Now, for the visited check, a Set is used. The OrderedCollection for the collected references is kept to ensure a reproducable order.
  
 === Redesigned references and tracing === === Redesigned references and tracing ===
Line 70: Line 70:
 Objects are picked (read) from a PDF file stream when they are needed. Originally, this was done using blocks stored in place of the value (referent) of a reference. When the value is requested, the block is evaluated and the resulting PDF object is stored as the referent. The block reads the raw object and converts it to the proper type. This can be nested and several types may apply. Objects are picked (read) from a PDF file stream when they are needed. Originally, this was done using blocks stored in place of the value (referent) of a reference. When the value is requested, the block is evaluated and the resulting PDF object is stored as the referent. The block reads the raw object and converts it to the proper type. This can be nested and several types may apply.
  
-Unfortunately, the design with blocks does not allow to reason about the types to be applied. This led to problems where a general type overtook a more specific, better matching type. So, I reified the blocks to ''FileReference'' which can read an object from file and has a list of types to be applied to the raw object. The types list is maintained to reflect the subtype order.+Unfortunately, the design with blocks does not allow to defer the typing. This led to problems where a general type overtook a more specific, better matching type. So, I reified the blocks to ''FileReference'' which can read an object from file and has a list of types to be applied to the raw object. The types list is maintained to reflect the subtype order.
  
 While at it, the number and generation of references was extracted to an ''ObjectId''. While at it, the number and generation of references was extracted to an ''ObjectId''.
Line 76: Line 76:
 === Changed internal streams to bytes === === Changed internal streams to bytes ===
  
-The ''Writer'' (internal write stream) writes bytes instead of characters to produce the PDF file. When writing the physical file, a copy to a byte array was needed to write the binary data. This copy is not needed anymore.+The ''Writer'' (internal write stream) writes now bytes instead of characters to produce the PDF file. When writing the physical file, the string was converted to a byte array to write the binary data. This copy is not needed anymore.
  
 ==== Gemstone ==== ==== Gemstone ====
  
-This release updates the Gemstone code for the library. The biggest addition is the [[postscript|PostScript]] module used with [[cmap|CMaps]] introduced in [[releasenotes#pdftalk_23|version 2.3]].+This release updates the Gemstone code for the library and also the [[pdftalk4gemstone|PDFtalk for Gemstone]] page. The biggest addition is the [[postscript|PostScript]] module used with [[cmap|CMaps]] introduced in [[releasenotes#pdftalk_23|version 2.3]].
  
 === Encoded PostScript sources === === Encoded PostScript sources ===
  • releasenotes.1627574013.txt.gz
  • Last modified: 2021/07/29 17:53
  • by christian