This is an old revision of the document!


Monsters from the wild

Some PDF writers produce PDFs which are not correct according to the specification. The term monster refers to lakatosian monsters as coined by Imre Lakatos to refer to counterexamples of a theory.

Software trying to read real PDF files, cannot just throw an error when something is wrong. Instead, it should deal with wrong structures and try to use as much information as possible from the file.

Generally, situations like this will raise a proceedable specific error. Therefore, the error could be treated by the reading software, but could also be ignored if it is not important.

This page describes some of the problems encountered in real PDFs from the wild and discusses ways to deal with such situations.

An attribute of an object has a reference pointing to a free reference in the cross references, i.e. the object is not in the file. This will raise a MissingObjectError with a MissingObject containing the type information for the expected object as parameter.

When writing out the reference to a new PDF, a string (The original object is missing. It should have been a <Type>) is written as object instead. This preserves the correct reference which may be used in several places. Subsequently, this will result in a type mismatch when reading that PDF (unless the original object was a string as well, which is unlikely).

Seen in Producer

  • Microsoft® Excel® for Microsoft 365
  • Bluebeam PDF Library 18

The dictionary part of a stream has a Length which is different from the number of bytes between the stream and endstream tokens.

Seen in Producer

  • Bluebeam PDF Library 18
  • monsters.1626764480.txt.gz
  • Last modified: 2021/07/20 09:01
  • by christian