app/tools/halibut/charset/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

This subdirectory contains a general character-set conversion
library, used in Timber, and available for use in other software if
it should happen to be useful.

I intend to use this same library in other programs at some future
date. (A cut-down version of it is already in use in some ports of
PuTTY.) It is therefore a _strong_ design goal that this library
should remain perfectly general, and not tied to particulars of
Timber. It must not reference any code outside its own subdirectory;
it should not have Timber-specific helper routines added to it
unless they can be documented in a general manner which might make
them useful in other circumstances as well.

There are some multibyte character encodings which this library does
not currently support. Those that I know of are:

 - Johab. There is no reason why we _shouldn't_ support this, but it
   wasn't immediately necessary at the time I did the initial
   coding. If anyone needs it, it shouldn't be too hard. The Unicode
   mapping table for the encoding is available at
   http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/JOHAB.TXT

 - ISO-2022-JP-1 (RFC 2237), and ISO-2022-JP-2 (RFC 1554). These
   should be even easier if required - we already have the ISO 2022
   machinery in place, and support all the underlying character
   sets.

 - ISO-2022-CN and ISO-2022-CN-EXT (RFC 1922). These are a little tricky
   as they allow use of both GB2312 (simplified Chinese) and CNS 11643
   (traditional Chinese), so we may need some way to specify which to
   prefer.

 - The Hong Kong (HKSCS) extension to Big5. Again, mapping tables
   are available in the Unihan database.

 - Other Big Five extensions, which I don't have mapping tables for
   at all.