Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cmap [2020/02/23 13:35]
christian [Wrong PostScript]
cmap [2020/02/23 14:33] (current)
christian [CMap]
Line 1: Line 1:
 ====== CMap ====== ====== CMap ======
  
-CMaps(([[https://www-cdf.fnal.gov/offline/PostScript/5014.CIDFont_Spec.pdf|5014.CIDFont_Spec.pdf]] Adobe CMap and CIDFont Files Specification)) (Character Maps) define unidirectional mapping from a code to another. +CMaps(([[https://www-cdf.fnal.gov/offline/PostScript/5014.CIDFont_Spec.pdf|5014.CIDFont_Spec.pdf]] Adobe CMap and CIDFont Files Specification)) (Character Maps) define unidirectional mapping from a code to another. (This should not be confused with the cmap table(([[https://docs.microsoft.com/en-us/typography/opentype/spec/cmap|cmap — Character to Glyph Index Mapping Table]])) of an OpenType font.)
  
 CMaps provide a very general mechanism which can describe any mappings, including unicode which was developed later. Input codes of variable length (1, 2, 3 or more bytes) can be mapped to characters. CMaps provide a very general mechanism which can describe any mappings, including unicode which was developed later. Input codes of variable length (1, 2, 3 or more bytes) can be mapped to characters.
Line 126: Line 126:
 ===== Monster from the wild ===== ===== Monster from the wild =====
  
 +CMaps are not well defined. Therefore, there are some interesting variations of them in the wild. Here is a small selection of some issues.
 +==== Codespace problems ====
  
-==== Mappings outside the codespace ====+=== Wrong code length ===
  
 <code postscript> <code postscript>
Line 146: Line 148:
  
 This can be seen often. These illegal mappings are collected into the ''#unmapped'' variable of a Mappings object. This can be seen often. These illegal mappings are collected into the ''#unmapped'' variable of a Mappings object.
 +
 +=== Mappings outside the codespace ===
 +
 +<code postscript>
 +%...
 +1 begincodespacerange
 +<0001> <1004>
 +endcodespacerange
 +11 beginbfchar
 +<0003> <00A0>
 +<0005> <0022>
 +<0008> <0025>
 +<000F> <002C>
 +<0010> <00AD>
 +%...
 +</code>
 +
 +Here, only the first mapping matches the code space. All others fall outside of it, because the second byte has to be between <00> and <04>.
  
 ==== Wrong PostScript ==== ==== Wrong PostScript ====
Line 196: Line 216:
 </code> </code>
  
-It looks as if two codes (<24> and <50>) are mapped to a string of 2-byte characters. I have not found anything about this in the documenation. Seen in a PDF with the ''Producer'' "Mac OS X 10.7.1 Quartz PDFContext".+Two codes (<24> and <50>) are mapped to a string of 2-byte characters. This is defined by the PDF spec(({{pdf:pdf32000_2008.pdf|PDF specification (ISO standard PDF 32000-1:2008)}})) in section 9.10.3 "ToUnicode CMaps". This has not been implemented yet. 
 + 
 +Seen in a PDF with the ''Producer'' "Mac OS X 10.7.1 Quartz PDFContext".
  • cmap.1582461327.txt.gz
  • Last modified: 2020/02/23 13:35
  • by christian