Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cmap [2020/02/23 11:11]
christian [Prevent copying]
cmap [2020/02/23 14:33] (current)
christian [CMap]
Line 1: Line 1:
 ====== CMap ====== ====== CMap ======
  
-CMaps(([[https://www-cdf.fnal.gov/offline/PostScript/5014.CIDFont_Spec.pdf|5014.CIDFont_Spec.pdf]] Adobe CMap and CIDFont Files Specification)) (Character Maps) define unidirectional mapping from a code to another. +CMaps(([[https://www-cdf.fnal.gov/offline/PostScript/5014.CIDFont_Spec.pdf|5014.CIDFont_Spec.pdf]] Adobe CMap and CIDFont Files Specification)) (Character Maps) define unidirectional mapping from a code to another. (This should not be confused with the cmap table(([[https://docs.microsoft.com/en-us/typography/opentype/spec/cmap|cmap — Character to Glyph Index Mapping Table]])) of an OpenType font.)
  
 CMaps provide a very general mechanism which can describe any mappings, including unicode which was developed later. Input codes of variable length (1, 2, 3 or more bytes) can be mapped to characters. CMaps provide a very general mechanism which can describe any mappings, including unicode which was developed later. Input codes of variable length (1, 2, 3 or more bytes) can be mapped to characters.
Line 124: Line 124:
   * the mappings are ordered. This is not strictly prescribed, but recommended by the specifications.   * the mappings are ordered. This is not strictly prescribed, but recommended by the specifications.
  
-==== Handling malformed CMaps ==== 
- 
-Sometimes CMaps define mappings which are not covered by the codespace ranges. This can be seen very often in the wild. These illegal mappings are collected into the ''#unmapped'' variable of a Mappings object. 
 ===== Monster from the wild ===== ===== Monster from the wild =====
  
-==== Mappings outside the codespace ====+CMaps are not well defined. Therefore, there are some interesting variations of them in the wild. Here is a small selection of some issues. 
 +==== Codespace problems ====
  
-single byte mappings in a double byte codespace+=== Wrong code length ===
  
-==== Wrong PostScript ====+<code postscript> 
 +%... 
 +1 begincodespacerange 
 +<0000> <FFFF> 
 +endcodespacerange 
 +27 beginbfchar 
 +<20> <0020> 
 +<2E> <002E> 
 +<43> <0043> 
 +<44> <0044> 
 +<45> <0045> 
 +%... 
 +</code>
  
-using /find instead of /findresource +Here are single byte mappings in a double byte codespace which is not correct according to the documentation.
  
 +This can be seen often. These illegal mappings are collected into the ''#unmapped'' variable of a Mappings object.
 +
 +=== Mappings outside the codespace ===
 +
 +<code postscript>
 +%...
 +1 begincodespacerange
 +<0001> <1004>
 +endcodespacerange
 +11 beginbfchar
 +<0003> <00A0>
 +<0005> <0022>
 +<0008> <0025>
 +<000F> <002C>
 +<0010> <00AD>
 +%...
 +</code>
 +
 +Here, only the first mapping matches the code space. All others fall outside of it, because the second byte has to be between <00> and <04>.
 +
 +==== Wrong PostScript ====
 +
 +On one occasion, I saw a CMap where the PostScript used a non-existing operator (''/find'' instead of ''/findresource''). See the [[postscript#exception_handling_example]] on the PostScript page.
 ==== Prevent copying ==== ==== Prevent copying ====
  
Line 163: Line 196:
 Here, all codes map to the same character (Substitute character, Ctrl-Z) to prevent extracting the text. Interesting is also the ordering by the second byte, which forced me to redesign the object structure to avoid exponential processing time. Here, all codes map to the same character (Substitute character, Ctrl-Z) to prevent extracting the text. Interesting is also the ordering by the second byte, which forced me to redesign the object structure to avoid exponential processing time.
  
-Seen in [[https://github.com/adobe-type-tools/Adobe-CNS1/Adobe-CNS1-7.pdf|The Adobe-CNS1-7 Character Collection]].+Seen in [[https://github.com/adobe-type-tools/Adobe-CNS1/raw/master/Adobe-CNS1-7.pdf|The Adobe-CNS1-7 Character Collection]].
 ==== Char to string mapping ==== ==== Char to string mapping ====
  
Line 183: Line 216:
 </code> </code>
  
-It looks as if two codes (<24> and <50>) are mapped to a string of 2-byte characters. I have not found anything about this in the documenation. Seen in a PDF with the ''Producer'' "Mac OS X 10.7.1 Quartz PDFContext".+Two codes (<24> and <50>) are mapped to a string of 2-byte characters. This is defined by the PDF spec(({{pdf:pdf32000_2008.pdf|PDF specification (ISO standard PDF 32000-1:2008)}})) in section 9.10.3 "ToUnicode CMaps". This has not been implemented yet. 
 + 
 +Seen in a PDF with the ''Producer'' "Mac OS X 10.7.1 Quartz PDFContext".
  • cmap.1582452699.txt.gz
  • Last modified: 2020/02/23 11:11
  • by christian