How text gets onto a page

A PDF page has a content stream (/Contents) containing a list of graphics operator with their parameters. The operators are sequentially executed and can set aspects of the GraphicsState or paint in the context of the current GraphicsState.

The operators in the contents stream have no way to reference PDF objects outside of the contents. With an important exception: complex objects like raster images or fonts are held separately in the /Resources dictionary of the page. The /Resources dictionary has several entries for differnt kinds of objects like /XObject for raster images (not quite supported yet) and embedded graphics and /Font for the fonts used on the page. Each entry is a dictionary with named objects which can be used by specific graphics operators. Lets see an example:

/Page <<
  /Resources <<
    /Font <</F1 aFont>>
  >>
  /Contents <</Length 43>> 
    stream
      0 0 0 1 k
      BT
      /F1 10 Tf
      10 5 Td
      (Hello World) Tj
      ET
    endstream
>>

This paints the string Hello World in black at 10@5 with font /F1 in size 10.

To achieve this in Smalltalk you can write the following:

page := Page newInBounds: (0 @ 0 corner: 70 @ 20) colorspace: DeviceCMYK new render: [:renderer |
  renderer fillColor: CmykColor black.
  renderer textObjectDo: [
    renderer setFont: #Helvetica size: 10.
    renderer add: (NextLineRelative operands: #(10 5)).
    renderer showString: 'Hello World']].

demo01_helloworld.pdf See the class method demo01_HelloWorld in class Document.

You notice that I did not use the font ID /F1 but the font directly (referenced as the global #Helvetica). renderer setFont: takes care of that and puts the font into the resources and assigns it to an internal name which is used in the content stream. This mechanism works for all resource types so that the programmer can always use the appropriate object directly and never needs to care about the internal IDs.

The renderer you get when creating a Page takes care of the /Contents stream with its /Resources dictionary.

Common operators are implemented as renderer methods like fillColor: and showString:, but not all. In the end, all these methods boil down to expressions creating operators and adding them to the renderer as done with the NextLineRelative (Td) operator.

NextLineRelative is not covered by a convenience method of the renderer, because there are several ways to put text on a page - and using NextLineRelative is not a very practical one.

Painting Text

In order to display text, you need to do:

set the relevant graphics state parameters including the font
set the position/Matrix
show the string

While the state parameters are straight forward (see list below), positioning the string is best done using a matrix. The text matrix is set by SetTextMatrix (Tm) operator with 6 numbers as parameters.

(SetTextMatrix operands: #(1 0 0 1 10 5))

produces

1 0 0 1 10 5 Tm

which sets the scaling to 1 horizontally and vertically and adds an offset to point 10 @ 5, i.e. it does the same as 10 5 Td. A transformation matrix can express scaling, rotation, skewing and translation at once. For example

0.95 0 0 1 10 5 Tm

would compress the text horizonatally by 5%.

For example Adobe Illustrator sets the font size always to 1 and uses the matrix to scale accordingly.

/F1 1 Tf
9.5 0 0 10 10 5 Tm

The text state operators are:

TextFont (Tf)
TextRenderingMode (Tr)
CharacterSpacing (Tc)
WordSpacing (Tw)
Leading (TL)
TextRise (Ts)
HorizontalScaling (Th) (should better be done with Tm).

The two relevant text showing operators are:

ShowText (Tj)
ShowTextPositioned (TJ) show string with individual character positioning

There are no high-level operations for word wrapping, automatic kerning, hyphenization or even simple justification. This is only about putting characters at specific positions. How you get these positions is up to you or your layout program.

For justification, you need to know the length of a string. For this you can use

aFont stringWidthOf: aString at: aFontsize.

Using our example you would write

(Graphics.Fonts.Font fontAt: #Helvetica) stringWidthOf: 'Hello' at: 10.

returning 22.78 which is the width in PostScript points in an unscaled coordinate system.

The implementation may look at bit clumsy. Why should you use

renderer add: (NextLineRelative operands: #(10 5))

to get the simple string 10 5 Td?

Firstly, I wanted operators as objects and not just as strings you write into the contents stream. The objects can be read from a PDF (try: from the pdfexplorer inspect a /Contents object and send it #operations) and the list of operator you create can be written to a PDF. There are some things a program could do with operators:

check the consistency/validity. F.ex. BeginText (BT) must be written before EndText (ET) and must enclose certain text operators; they cannot be nested etc. etc.
implement a GraphicsState object to track the changes to it. With this, unneccessary operators can be avoided (this is on my todo list).

In any case, it is good to have operators referable in the development image.

Secondly, this clumsy interface is meant to be used as backend by a higher level graphics framework. It is expected that you have an abstraction of Text which can render itself using the PDF primitives. In smallCharts I have Texts like

ChartText
  style: (Textstyle
    color: (CmykColor cyan: 1 magenta: 0.3 yellow: 0 black: 0.3)
    font: #Helvetica
    size: 12
    trackKerning: -30
    withoutLeftSideBearing: true
    scale: 0.95 @ 1)
  string: 'Hello'
  position: 10 @ 5

which renders itself as PDF with

ChartText>>renderPDFOn: aPDFRenderer
  aPDFRenderer textRenderingMode: 0.
  aPDFRenderer fillColor: self style color.
  aPDFRenderer textObjectDo: [
    self style renderPDFOn: aPDFRenderer.
    aPDFRenderer textMatrix: self pdfMatrixArray.
    aPDFRenderer showString: (self pdfStringFor: aPDFRenderer)]

I did not open-source my graphics classes, because they are specific to the needs of smallCharts. It does vector graphics and a bit of text. For example, I have classes for horizontal and vertical lines, since charts use mostly those. My objects can only scale and translate, but not rotate - they don't need to… This may be different for others.

I like to develop my abstractions from the bottom up and try to keep them as simple as possible. Maybe, over time, users will develop abstractions which are generally useful. In the end it should be a community discussion and consensus of what should be included. So far, only the bare metal on the spec will be available and you have to evolve your own abstractions.

Comments

Higher level abstractions

Submitted by bobcalco on Tue, 2012-01-24 10:32.

I have found Prawn, a pure Ruby PDF generation library, to have a good level of abstraction, and would recommend you look at how they've done it.

http://prawn.majesticseacreature.com/

They've taken a more top-down approach, with its attendant pluses and minues, but their first concern was making it natural and Ruby-esque to code PDF files. Given Ruby's Smalltalk pedigree, I suspect we would not be far off the mark to consider their high level API as a kind of model.

I know about this because I used an earlier version of Prawn to code a PDF generation feature of a content delivery system, which I am now needing to replace in Smalltalk, having decided to make the switch. I am a bit sad at the state of PDF generation in Smalltalk. There are so many other strengths in Smalltalk for the kind of distributed system I am building that wooed me over, but this deficiency is going to cost me some late nights and lamp oil.

Submitted by bobcalco on Tue, 2012-01-24 12:03.

Here is a PDF manual explaining most of the current feature set of Prawn:

http://prawn.majesticseacreature.com/manual.pdf external

Re: Higher level abstractions

Submitted by ChristianHaider on Tue, 2012-01-24 11:58.

Interesting. I am curious what experiences you have while porting from Prawn to PDF4Smalltalk. Maybe some good concepts can be integrated… If you have any questions, please ask in the forum - sometimes I am responsive

Submitted by bobcalco on Tue, 2012-01-24 12:06.

I will post in the forum henceforth but just want to mention that I *am* glad you have done what you have so far. I was despairing altogether until I found your library in the public repository. Thank you for this library! :)

How text gets onto a page

Painting Text

Program design

Comments

Higher level abstractions

Here is a PDF manual

Re: Higher level abstractions

Thanks!

PDFtalk