This is an old revision of the document!
Release Notes
PDFtalk 2.3
February 2020
PostScript
Added PostScript as new prerequisite of PDFtalk.
The package [PostScript] implements some low level methods which are used by PDFtalk.
PostScript was implemented after PDFtalk and used some basic methods of it (Number reading and writing, ASCII85 encoding and PostScript character names). These dependencies have been reversed so that PostScript can be used stand-alone while PDFtalk now depends on it. This also reflects the correct historical relationship.
CMaps
Added CMap to the {PDFtalk Fonts} bundle.
CMaps are PostScript programs defining complex code mappings. The mechanism is very general and allows for variable byte length encodings. Because of its generality, CMaps are used by some PDF writers to even encode simple mappings. Hence, it is necessary to fully implement CMaps in order to decode PDF text.
This is not intended to be used by the user of the library. Rather, it is part of the basic font infrastructure enabling decoding of PDF strings. This will be the base for Text extraction in the next step.
Standard CMaps
PDF defines 181 standard CMaps which are to be understood by a conforming reader. These CMaps are available at GitHub1). All maps have been imported as methods containing the source of the CMaps. Since they are rather large (16 MB with sources), it might be important to not load them into a runtime image when you don't need them, i.e. do text extraction in your application.
Therefore, I put them into a seperate package (outside of the runtime, but part of the project bundle): [PostScript CMap instances]. The CMaps are constructed from the source methods lazily when needed. If the package is not loaded, the source of a requested CMap is downloaded from GitHub, which is slower.
Typing
Allow narrower types to shadow broader types
Example:
DecodeParms <type: #ZipFilterParameter> <type: #Dictionary>
ZipFilterParameter
is a subtype of Dictionary
. Because it is declared before, it is tried to match it first. Before, both alternatives were equal and Dictionary
might have matched, even when the dictionary was a valid ZipFilterParameter
.
Generalized ''Textstring'' to ''String''
Textstring does not need to be differenciated. We can rely on VisualWorks handling of multi byte strings.
Known problem
The PDF specification allows bfchar-mappings to have a string of UTF-16BE characters as destination. This is not yet implemented.
PDFtalk 2.2
August 2019
Renamed OrderedDictionary
to Valuemap
in the [Values] package.
This version replaces all references of OrderedDictionary
.
PDFtalk now depends on the [Values] package with version 3.x and up and is incompatible the earlier versions.
PDFtalk 2.1
July 2019
Flate encoding is using zlib of VW 8.1 now. This solved problems allocating buffers under heavy load
PDFtalk 2.0
October 2017
What's new
Name The new name is PDFtalk. The first version was called PDF4Smalltalk. The namespace changed from PDF
to PDFtalk
and the domain “pdftalk.de” provides a home with a wiki dedicated to the library: wiki.pdftalk.de.
Typing The heard of the “PDF engine” is the typing system which allows the assignment of Smalltalk classes to raw PDF objects. The new version has a redesigned type system where PDF types are properly modeled independent from the Smalltalk class hierarchy. This allows to rename classes freely (i.e. adding prefixes) without affecting PDF types. Also, boxing of some simple objects like “null” and booleans is not necessary anymore. Instead, existing classes are declared as PDF types.
PDFtalk for Gemstone The new release was triggered by a contract to port the library to Gemstone (thanks to HTS and Bob Nemec). A talk about this was held at ESUG 2017: “PDFtalk for Gemstone” (slides are here).
Gemstone Fileout A VisualWorks to Gemstone translation tool. This tool, with project specific code transformation declarations, creates a Gemstone filein. Used to create the Gemstone PDFtalkLibrary from the Values package and PDFtalk bundle.
Both new projects are open source with MIT licence.
Changes for users of the library
Some changes are incompatible with the previous version, which are described here.
It is not recommended to load the new version into an image with the old version of the library.
Namespace and bundle structure
The former namespace PDF is renamed to PDFtalk
.
The former independent bundle Fonts
is now integrated into PDFtalk
. The packages are all renamed with the PDFtalk
prefix and the order and contents have been revised and changed.
The demos are now in class PDF
in its own package [PDFtalk Demonstrations]
in the {PDFtalk Testing}
bundle. Do
PDF runAllDemos
to see if they are running. You may need to edit the file path to the PDF specification and to your demo directory.
Referencing PDF classes
Smalltalk classes representing a PDF type should not be referenced directly anymore. Instead an expression like
PDF classAt: <PDF type symbol>
should be used.
Example
(PDF classAt: #Contents) "returns class PDFtype.Contents"
Often used classes can be accessed through a shortcut method:
PDF String. "returns class PDFtalk.PDFString" PDF Array. "returns class PDFtalk.PDFArray" PDF Dictionary. "returns class PDFtalk.PDFDictionary" PDF Stream. "returns class PDFtalk.PDFStream" PDF Page. "returns class PDFtalk.Page"
There are 2 reasons for this
- PDF type and Smalltalk class names may not be the same anymore
- The Smalltalk class name may differ in different ports of the library.
New shared Smalltalk.PDF
The shared variable PDF
was added to the Smalltalk namespace. The variable holds the class PDFtalk.PDF
which serves as general entry point for the library.
Unless you extend the library, there should be no need to add the PDFtalk namespace to the imports of your project. Instead most functionality should be accessed through PDF
in the Smalltalk namespace.
Aligned types
The new typing system allowed to remove the PDF classes Null
and Boolean
. Instead, they are now implemented by the system classes UndefinedObject
and Boolean
.
Boxing (with asPDF
) and unboxing (with asSmalltalkValue
) is not necessary anymore for nil
, true
and false
. Instead, the Smalltalk objects are used directly.
The work is not finished yet. Number
and maybe String
and Date
could be aligned as well.
Typing redesign
The major change is the redesign of the PDF typing system. Initially, I represented the types of PDF objects by Smalltalk classes with the same name. This turned out to be not sufficient.
Firstly, PDF types form a different hierarchy than the Smalltalk classes implementing them. Both hierarchies cannot be represented in a single inheritance class hierarchy at the same time.
Secondly, the PDF type names are tied to class names. A class cannot be renamed without changing the PDF types. This has been a problem for porting the libary to other Smalltalk dialects.
Therefore, PDF types are now modeled independently.
Notes
Specialization only on assignment
When PDF objects were created, all classes were searched for possible specializations: for example, a Dictionary with a #Type entry was automatically converted to its corresponding class.
In the new version, objects are only typed and specialized when they are assigned to an attribute of a Dictionary or Array.
System classes as PDF classes
The first version had wrappers for all basic types of PDF, such as null, booleans, numbers etc.. This is similar to boxing of primitive types in other programming languages.
The motivation was to accurately implement any semantic differences to the corresponding Smalltalk objects in a proper place. This lead to the wide spread use of #asPDF and #asSmalltalkValue for boxing and unboxing.
Now I know that the differences are minor and that they could be properly implemented in the system classes. Therefore, I aligned some system classes with the PDF classes by declaring the system classes as PDF type and removing the PDF class: Null and Boolean. Next to go is Number; later maybe String and Date and…
This should lead to simpler code and more space and time efficient processing.
On the other hand, there are now a lot more root PDF classes. Therefore, I moved some of the general PDF behavior to Object.