This is an old revision of the document!


Monsters from the wild

Some PDF writers produce PDFs which are not correct according to the specification. The term monster refers to lakatosian monsters as coined by Imre Lakatos to refer to counterexamples of a theory.

Software trying to read real PDF files, cannot just throw an error when something is wrong. Instead, it should deal with wrong structures and try to use as much information as possible from the file.

Generally, situations like this will raise a proceedable specific error. Therefore, the error could be treated by the reading software, but could also be ignored if it is not important.

This page describes some of the problems encountered in real PDFs from the wild and discusses ways to deal with such situations.

An attribute of an object has a reference pointing to a free reference in the cross references.

Example

Referencing indirect object (2 0): the /Outlines

1 0 obj
	<<	/Type /Catalog
		/Outlines 2 0 R
		/Pages 3 0 R
	>>
endobj

The cross reference section

xref
0 7
0000000000 65535 f 
0000000009 00000 n 
0000000000 65535 f 
0000000131 00000 n 
... % 4 more

The reference to object (2 0) is the 3rd entry in the xref table which is a free reference (0000000000 65535 f). Therefore, object (2 0) cannot be accessed from the xref table. It does not matter if the object is actually stored in the file or not.

Handling

A proceedable MissingObjectError is raised. The reference points to a MissingObject containing the expected type.

When writing out the reference to a new PDF, a string (The original object is missing) is written as object instead (if type information is available, it is added to the message). This preserves the correct reference which may be used in several places. Subsequently, this will result in a type mismatch when reading that PDF (unless the original object was a string as well, which is unlikely).

Reference

Seen in /Info/Producer: Microsoft® Excel® for Microsoft 365 and Bluebeam PDF Library 18

The /Length of a stream is 1 smaller than the number of bytes.

Example

42 0 obj
	<</Length 9>>
	stream
abcdefghij
	endstream
endobj

In the example, the stream contents in the file is abcdefghij, a 10 byte string. But the /Length attribute states 9 bytes. Therefore, the j is extra. If the error is resumed, a stream with abcdefghi will be created.

Handling

The library handles one specific instance of this error: when there is exactly one byte too much between stream and endstream (trailing whitespace is ignored). Then, a proceedable error (ExtraCharacterInStreamError) is raised with the extra character as parameter. When proceeding, a stream will be created with the /Length given number of content bytes and the extra byte is discarted.

If there are more bytes extra, a ReadError is raised and no stream object is created. The error may be proceeded, but if the stream is used later, another error will occur.

Reference

Seen in /Info/Producer: Bluebeam PDF Library 18

  • monsters.1627054869.txt.gz
  • Last modified: 2021/07/23 17:41
  • by christian