Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
monsters [2021/07/13 16:19] christian created |
monsters [2021/07/24 08:12] (current) christian [Incorrect stream length] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Monsters from the wild ====== | ====== Monsters from the wild ====== | ||
- | Some PDF writers produce PDFs which are not correct according to the specification. | + | Some PDF writers produce PDFs which are not correct according to the specification. The term '' |
+ | Software trying to read real PDF files, cannot just throw an error when something is wrong. Instead, it should deal with wrong structures and try to use as much information as possible from the file. | ||
+ | Generally, situations like this will raise a proceedable specific error. Therefore, the error could be treated by the reading software, but could also be ignored if it is not important. | ||
+ | |||
+ | This page describes some of the problems encountered in real PDFs from the wild and discusses ways to deal with such situations. | ||
+ | |||
+ | ===== Missing object ===== | ||
+ | |||
+ | An attribute of an object has a reference pointing to a free reference in the cross references. | ||
+ | |||
+ | === Example === | ||
+ | |||
+ | Referencing indirect object (2 0): the ''/ | ||
+ | |||
+ | <code pdf> | ||
+ | 1 0 obj | ||
+ | << | ||
+ | /Outlines 2 0 R | ||
+ | /Pages 3 0 R | ||
+ | >> | ||
+ | endobj | ||
+ | </ | ||
+ | |||
+ | The cross reference section | ||
+ | <code pdf> | ||
+ | xref | ||
+ | 0 7 | ||
+ | 0000000000 65535 f | ||
+ | 0000000009 00000 n | ||
+ | 0000000000 65535 f | ||
+ | 0000000131 00000 n | ||
+ | ... % 4 more | ||
+ | </ | ||
+ | |||
+ | The reference to object (2 0) is the 3rd entry in the '' | ||
+ | |||
+ | === Handling === | ||
+ | |||
+ | A proceedable '' | ||
+ | |||
+ | When writing out the reference to a new PDF, a string '' | ||
+ | |||
+ | === Reference === | ||
+ | |||
+ | Seen in ''/ | ||
+ | |||
+ | ===== Incorrect stream length ===== | ||
+ | |||
+ | The ''/ | ||
+ | |||
+ | The following cases are possible: | ||
+ | * ''/ | ||
+ | * ''/ | ||
+ | |||
+ | The particular monster where I encountered this, had always one byte too much in the content. Therefore, not the general problem was handled, but just the simple case where the content is exactly 1 larger than the number of bytes given by the ''/ | ||
+ | |||
+ | === Example === | ||
+ | |||
+ | < | ||
+ | 42 0 obj | ||
+ | <</ | ||
+ | stream | ||
+ | abcdefghij | ||
+ | endstream | ||
+ | endobj | ||
+ | </ | ||
+ | |||
+ | In the example, the stream contents in the file is '' | ||
+ | |||
+ | === Handling === | ||
+ | |||
+ | The library handles one specific instance of this error: when there is exactly one byte too much between '' | ||
+ | |||
+ | If there are more bytes extra, a '' | ||
+ | |||
+ | == Known problem == | ||
+ | |||
+ | The general problem has not been adressed. One idea is to find the end of the stream content of the current object. With this information it is possible to determine if the ''/ | ||
+ | |||
+ | The end of the stream would be before the '' | ||
+ | |||
+ | Object streams need not be considered, because they cannot contain streams. | ||
+ | |||
+ | This should be easy for the simple case of only one xref table. But handling several xrefs from different updates deemed too complex at the time (that' | ||
+ | |||
+ | === Reference === | ||
+ | |||
+ | Seen in ''/ |