sam-fans - fans of the sam editor
 help / color / mirror / Atom feed
* Announcing CJK 2.5 (Chin/Jap/Kor for LaTeX2e)
@ 1995-01-19 23:32 Werner Lemberg
  0 siblings, 0 replies; only message in thread
From: Werner Lemberg @ 1995-01-19 23:32 UTC (permalink / raw)
  To: unicode, soft-authors, ctan-ann, linux-asia, sam-fans

This is the LaTeX2e style package CJK Version 2.4 (3-Jan-1995)

It contains the following files:

    history.txt  Package history
    CJK.txt      This file
    CJK.sty      A LaTeX2e style file to enable CJK (Chinese/Japanese/Korean)
                 logographs (i.e. Hanzi/Kanji/Hangul) with LaTeX2e
    CJK.enc      Master Encoding File
    pmCbig.enc   Encoding scheme files
    pmC.chr      Character encoding files
    Bg5conv.tex  preprocessor for Big 5 encoded text files
    bg5latex.bat a batch file (for DOS) to demonstrate use of Bg5conv.tex
    CNS.chr      CNS encoding to be used together with a different CJK
                 encoding following Christian Wittern's CEF (Chinese Encoding
    UGBt.fd      Font definition files for Chinese (examples only!)
    UJIS.fd      Font definition file for Japanese (example only!)
    Uhangul.fd   Font definition file for standard Hangul fonts
    Uhanja.fd    Font definition file for Hanja font (example only!)
    Uutf8.fd     Font definition file for Unicode font (example only!)
    UpmC-KS.fd   Font definition files for (old) pmC-fonts
    UCNS-7.fd    Font definition files for CNS fonts (examples only!)
    tfm/*.tfm    virtual fonts and metric files for hangul standard fonts to
                 use in combination with the font libraries lj_han and lj_han1
                 (available at the CTAN hosts)

    utils/hbf2gf.w          CWEB source file for hbf2gf
    utils/hbf2gf.c          C code file extracted from the CWEB source files
    utils/hbf2gf.dvi        Documentation extracted from the CWEB source files
    utils/hbf2gf.cfg        Configuration file example
    utils/hbf2gf.exe        Bound executable for DOS and OS/2
    utils/hbf.c             Ross Paterson's HBF API (with small extensions)
    utils/Makefile          Makefile for hbf2gf
    utils/rsx.exe           Runtime binaries for DOS and OS/2 (must be in
                            the path)

This is freely distributable under the GNU Public License.


Use CJK.sty as a package, e.g.

    \usepackage{CJK}            .

Two new environments \begin{CJK}{encoding}{shape} ... \end{CJK} and
\begin{CJK*}{encoding}{shape} ... \end{CJK*} are defined:

    encoding        the following encodings are currently implemented in
                    CJK.enc (for CNS encoding see below):

                        Bg5  (Big 5)
                        GBs  (GuoBiao with simplified characters,
                          G1 = GB 2312-80)
                        GBt  (GuoBiao with traditional characters,
                          G1 = GB 12345-90)
                        JIS  (Japanese Industry Standard,
                          G1 = JIS X0208-1990)
                        KS   (hangul and hanja,
                          G1 = KSC 5601-1987)
                        utf8 (UTF 8 (Unicode Transformation format 8), also
                              called UTF 2 or FSS-UTF)

                    The encodings (except Big 5 and UTF 8) are simplified EUC
                    (Extended UNIX Code) character sets without single shifts.
                    The character set slot G1 stands for two byte encodings
                    with byte values taken from the GR (Graphic Right)
                    character range 0xA1-0xFE (as defined in ISO 2022).

                    For compatibility with the pmC package these additional
                    encodings are defined: pmC-Bg5, pmC-GBs, pmC-GBt, pmC-JIS,
                    and pmC-KS. It's not encouraged to use these encodings
                    because of wasting fonts. If possible, convert your
                    original CJK-bitmaps with hbf2gf (see below) to

    shape           It is impossible to know what fonts are available at your
                    site; look at the example .fd-files how to create
                    appropriate .fd-files suiting your needs. If you use the
                    KS environment, this parameter is unused (see below).

    The CJK* environment will swallow unprotected spaces and newlines after a
    CJK character, whereas CJK will not.

This is a very realistic example:

    Text in GuoBiao encoding

How it works

Asiatic logographs can't be represented with one byte per character. (At
least) two bytes are needed, and the most common encoding schemes (GB, Big 5,
JIS, KS etc.) have a certain range for the first byte (usually 0xA1-OxFE or a
part of it) which signales that this and the next byte represents an Asiatic
logograph. This means that plain ASCII-text (i.e. characters between 0x00 and
0x7F) will be left undisturbed, and most characters of the extended ASCII
character set (0x80-0xFF) will be assigned to a CJK encoding.

Due to the internal architecture of TeX it is impossible to support ISO 2022
escape sequences as used with MULE (Multi Language Emacs). MULE is a common
extension of GNU-Emacs to support many non-English scripts, including Chinese,
Japanese, Korean, Hebrew etc.

CJK.sty will make the characters 0xA1-OxFE active inside of an environment and
assigns the macros \CJK@char and \CJK@charx to the active characters which
select the proper font. The real mechanism is a bit more complicated to assure
robustness (it was borrowed and modified from german.sty) and correct handling
of punctuation characters.

The encodings

CJK.sty defines internally \CJK@standardEncoding, \CJK@Bg5Encoding,
\CJK@KSEncoding, \CJK@utf8Encoding, and for compatibility with pmC,
\CJK@pmCsmallEncoding and \CJK@pmCbigEncoding.

\CJK@standardEncoding will be used for encodings with the second byte in the
range 0xA0-0xFE (GB, JIS).

\CJK@Bg5Encoding will be used for Big 5 encoding (e.g. NTU fonts) with the
second byte in the range 0x40-0xFE.

\CJK@KSEncoding will be used for KS encoding. Two sets of subfonts are
defined, one for Hangul syllables and elements, and a second for Hanja. For
more details see below.

\CJK@utf8Encoding will be used for Unicode in UTF 8. The first byte is in the
range 0xC0-0xDF for two byte values and in the range 0xE0-0xEF for three byte
values. The other bytes are in the range 0x80-0xBF. Note that CJK expects two
hexadecimal digits as a running number in the font name instead of two decimal
digits. Use the option `unicode on' if you use hbf2gf to transform bitmap
fonts in HBF format to .pk fonts as used by CJK.sty .

\CJK@pmCsmallEncoding and \CJK@pmCbigEncoding can be activated with \pmCsmall
(this is the default) and \pmCbig inside the CJK environment. Note that the
original pmC fonts have two character sizes per font (the bigger ones with an
offset of -128); pmC-Big 5 encoded fonts cannot contain big characters. The
names of the fonts in the UpmC-xxx.fd files reflect the modifications added by
Marc Leisher <> to the original poor man's Chinese (pmC)
package written by Thomas Ridgeway <>.

The fonts

CJK.sty uses NFSS (New Font Selection Scheme, now part of LaTeX2e) which has
some advantages over the font selection offered with pmC (for plain TeX and
LaTeX 2.09):

    o   TeX fonts are loaded only on demand. This is especially useful with
        Asiatic logographs. If you have e.g. three Chinese characters in your
        text, pmC must load the whole Chinese font (about 85 TeX fonts),
        whereas LaTeX2e loads only three fonts normally.

    o   As long as the limit of 256 TeX fonts will not be exceeded, you can
        use as many CJK fonts as you like (e.g. simplified and traditional
        Chinese characters together with Japanese fonts in different sizes)
        --- pmC is limited to two sizes and can only have two CJK fonts at the
        same time.

        In the web2c-TeX package (for UNIX) you will find a patch which allows
        the use of more than 256 TeX fonts.

    o   You need not to care about the right size of CJK fonts in footnotes
        etc. They will obey the NFSS (although changing other attributes
        except font series and size will be done with \CJKenc and \CJKshape).

        For Hangul font selection see below.

        Of course you must have access to bitmap CJK fonts --- use hbf2gf to
        convert them to .pk-fonts. See the last section for availability of
        precompiled fonts.

If you chose one font per active character as with the pmC macros, you would
waste character space (256 characters per font are possible with TeX 3).
Therefore CJK.sty expects the whole Asiatic font splitted in TeX subfonts with
256 characters each.

An example:

    GuoBiao-encoded simplified characters in song style at 12pt:
    ^               ^                        ^^         ^^

              first byte  second byte       TeX font  offset
                 0xA1      0xA1-OxFE        gsso1201     0
                 0xA2      0xA1-0xFE        gsso1201    94
                 0xA3      0xA1-0xE4        gsso1201   188
                 0xA3      0xE5-0xFE        gsso1202     0
                 0xA4      0xA1-0xFE        gsso1202    26
                 0xA5      0xA1-0xFE        gsso1202   120
                 0xFE      0xA1-OxFE        gsso1235    38

For converting to .pk-files with hbf2gf, you must get the appropriate HBF
(Hanzi Bitmap Font) header files from (or create if you can't find
the right one); almost all Chinese bitmap fonts in the public domain together
with their HBF headers are collected there. These HBF files document CJK fonts

Using hbf2gf

hbf2gf converts CJK bitmaps with an HBF header file into .gf-files (and
consequently into .pk fonts).


    hbf2gf configuration_file

Keywords in the configuration file must start a line, the appropriate values
being on the same line separated with one or more blanks or tabs.

Here is an example configuration file jfs56.cfg (please refer to hbf2gf.dvi
for a description of the keywords):

hbf_header     jfs56.hbf
mag_x          1.482
x_offset       3
y_offset       -8
comment        jianti fansongti 56x56 pixel font magnified and adapted for 10pt

nmb_files      -1

output_name    gsfs10

checksum       123456789

dpi_x          600
dpi_y          600

coding         codingscheme GB 2312-80 encoded TeX text

pk_directory   d:\china\pixel.ljh\600dpi\
tfm_directory  d:\china\tfm\

rm_command     del
cp_command     copy
long_extension off
job_extension  .cmd

And here the results:

    input files: jfs56.a - jfs56.e, jfs56.hbf

    program call: hbf2gf jfs56.cfg

    intermediate files: gsfs10.cmd, -,

    batch file call: gsfs10.cmd

    output files: d:\china\pixel.ljh\600dpi\ -,
                  d:\china\tfm\gsfs1001.tfm - gsfs1032.tfm

[gsfs: GuoBiao simple encoded FanSong style
       ^       ^              ^  ^
It's hard to overcome the DOS restriction of 8 characters in a file name if
you need two characters as a running number...]

This would be a correct entry in UGBs.fd:

                    <-10>                      CJKfixed *        gsfs10
                    <10>                      sCJKfixed *        gsfs10
                    <10.95>                   sCJKfixed *        gsfs12
                    <12>                      sCJKfixed *        gsfs12
                    <14.4>                    sCJKfixed *        gsfs14
                    <17.28>                   sCJKfixed *        gsfs17
                    <20.74->                   CJKfixed *        gsfs17}{}

assuming that you have created fonts for 10, 12, 14.4, and 17.28pt.

Korean input

(The status of this feature is experimental. I can't speak Korean and would
 be glad to hear comments from people who have any idea what is happening
 here :-)

There are already different packages handling Hangul: hlatex, htex etc.; there
is one package which also can handle hanja: jhtex.

The great difference of the packages just mentioned compared to CJK is the use
of a preprocessor which converts text files containing KS encoded text into a
TeX file. To do so has some advantages, but the output is completely
unreadable. Additionally the output lines become rather lengthy (a two byte
character code will be converted into a string up to 11 characters long),
which may confuse some editors; and if you have a text which contains Chinese
or Japanese also, you can't use KS to TeX converters because the code ranges
overlap and converters are not able to recognize which is Korean and which is

In contrast, CJK does not need a preprocessor and the problems mentioned above
are nonexistent, but you get nothing for free: CJK uses the virtual font
mechanism to map the hangul syllables onto Hangul Elements (11 virtual fonts
map to 2 real fonts), whereas preprocessors directly use the real fonts.

If you want a complete Korean environment, I recommend jhtex. There you will
also find a hangul.sty which modifies (among others) the sectioning commands
to enable Korean chapter counting and Korean headers.

To use KS encoding, say

    \end{CJK}       .

These font switches are available inside the environment:

    fonts from hLaTeX:

    *   \mj  MyoungJo   (default)
        \gt  Gothic
        \gs  BootGulssi
        \gr  Graphic
        \dr  Dinaru

    fonts from jhTeX:

    *   \hgt Hangul Gothic
    *   \hmj Hangul MyoungJo (MunHwaBu fonts)
    *   \hpg Hangul Pilgi
        \hol Hangul Outline (MyoungJo)

If a font is marked with a star, bold series are available.

You will find the hangul fonts in the lj_han and lj_han1 packages. These are
emTeX libraries for 300 dpi resolution which can be easily converted back to
.pk fonts using the fontlib package of emTeX. If you need different
resolutions, you must obtain the original metafont sources of the
hlatex_mf.tar.gz and the jhtex packages. Note that the shapes of Hangul
elements are not satisfactory.

You find the needed virtual fonts and virtual metric files in the vf and tfm
directories. Move the .tfm files into a directory TeX will scan. You need a
dvi driver which understands virtual fonts -- move the .vf files into a
directory your dvi driver will scan.

For non-hangul characters inside the KS environment (i.e. the first byte in
the ranges 0xA1-0xAF except 0xA4 and 0xC9-0xFD), fonts are taken from
Uhanja.fd . This enables the use of many hangul fonts and perhaps only one or
two different hanja fonts. If you don't want the overlay of hangul fonts from
Uhangul.fd, say \CJKhanja. The opposite command is \CJKhangul.

Archaic hangul elements (KS 0xA4D5-0xA4FE) and the character KS 0xA4D4 are
only accessible if \CJKhanja is active.

You should convert your KS hanja fonts using hbf2gf as described above.


Using the Bg5text environment is a mess. Having an external preprocessor needs
access to a compiler, which is not always the case. Thus I wrote Bg5conv.tex,
a preprocessor for Big 5 characters to overcome the restrictions of the
Bg5text environment.

Each Big 5 character `XY' will be converted into the form `XZZZ.'; ZZZ is a
decimal number followed by a dot. The use of Bg5conv.tex is completely
transparent, no changes to your document are necessary.

The use is simple: before calling Bg5text you must define \CJKin (and
optionally \CJKout); after conversion the output file will be processed like a
normal input file. Bg5conv.tex inserts additionally the (empty) macro
\CJKpreproc as the first line of the output.

Here is an example batch file (bg5latex.bat) for DOS which demonstrates the
use of Bg5conv.tex . Note that you must not use an extension for the input
file here (I am too lazy to write a sophisticated shell program - any
volunteers are welcome) (default names for \CJKin and \CJKout are
`Bg5input.tex' and `Bg5input.cjk' respectively):

    call latex \def\CJKin{%1} \def\CJKout{%1.cjk} \input Bg5conv.tex
    call latex %1.cjk

You say

    bg5latex mytext

to get mytext.tex processed.

It's not possible to mix Big 5 encoding with different encodings (except CNS)
if Bg5conv.tex is used (and I doubt whether this should be ever necessary).


(The status of this feature is experimental.)

Christian Wittern <> develops CEF, the
Chinese Encoding Framework. This will enable the use of Big 5 as the primary
encoding with CNS 11643-1992 as a secondary character set for characters not
included in Big 5. Inputting CNS characters into a text will be done with a
data base. To facilitate this, the first bytes of the three byte CNS encoding
are mapped onto the characters 0x81-0x87.



to use CNS (CJK.sty will be loaded automatically). If you need to
specify options for CJK, say


The possible options for CNS.sty are `compressed' and `uncompressed' to
indicate the use of compressed (256 characters per font a la CJK.sty) or
uncompressed fonts (94 characters per plane as in pmC). Default is compressed.

CNS encoding is available only in CJK environments; the commands \CNSchar
(of course with three parameters for byte 1 to 3) and \CNSshape are similar
to their CJK counterparts. Default value of \CNSshape is `song'.

Uncompressed fonts should be named equal to pmC fonts (font names ending with
hex numbers).

The .fd-files

CJK fonts can be installed as easy as normal TeX fonts!

CJK.sty defines four new size commands:

    CJK         corresponds to `' (empty)
    sCJK        corresponds to `s'
    CJKfixed    corresponds to `fixed'
    sCJKfixed   corresponds to `sfixed'             .

The difference between these size functions and the original commands defined
by LaTeX2e is that a CJK size function defines a class of fonts.

If you say as an example

    \DeclareFontShape{U}{Bg5}{m}{song}{<6> <7> <8> sCJKfixed * b5so07}{}   ,

LaTeX2e searches for fonts named b5so0701 - b5so0758 if the font size is 6, 7
or 8 pt; with other words, the CJK size functions append two digits to select
the proper subfonts. These digits are defined in the \CJK@...Encoding macros;
the macro \CJK@plane holds the current value (in pmC compatibility mode,
\CJK@plane holds hexadecimal numbers).

See the example .fd files how to define font substitutions additionally.


    o   You can of course use CJK-environments inside of a CJK-environment,
        but it is possible that you must increase the so called save size
        (with emTeX you can adjust this with -ms=...).

        The CJK package has optional arguments which controls the scope of CJK

            lowercase       If you want to use \lowercase with encodings
                            inside CJK environments. You need less save size
                            using the `encapsulated' option if `lowercase' is
                            not set. You must use Bg5conv.tex to use Big 5
                            characters with this option.

            global          \lccode (if `lowercase' set), \uccode, \catcode
                            and the activation of the characters 0xA1-0xFE
                            will be globally modified (\lccode and \uccode
                            reset to 0). This is the most economical mode
                            concerning save size, but you can't have CJK
                            environments inside of CJK environments or other
                            environments which manipulate the character range

                            Packages which change some of the above values
                            only once (e.g. in the preamble) will also not
                            work after the first use of a CJK environment.

            local           Only \lccode (if `lowercase' set) and \uccode will
                            be modified globally. This is the default. You can
                            stack environments.

            encapsulated    If you want to use DC fonts outside of the CJK
                            environment with \uppercase and \lowercase working
                            correctly, you must use this option. All values
                            mentioned above will be local, so you can stack
                            environments. This option probably causes an
                            overflow of the save size.



        to activate `option'.

    o   There is an other way to overcome the problem of stacked environments.
        CJK implements two low level CJK attribute switches: \CJKenc and
        \CJKshape, which take the same arguments as the corresponding values
        of the CJK environment. If you need two different encodings/shapes at
        the same output line, you must use these macros. An example:

            ... Text in GBs song ... \CJKenc{GBt}
            ... Text in GBt song ... \CJKshape{kai}
            ... Text in GBt kai ...

        Contrary to \begin{CJK}{...}{...} it's not necessary to start a new
        line after \CJKenc.

    o   The characters \, {, and } are used as second bytes in the Big 5
        encoding. If you write Big 5 text mixed with other encodings, you
        should use the Bg5text environment which changes the category codes of
        these characters. The command prefix is now the forward slash `/', and
        the grouping characters are `(' and `)' respectively.

        An example:


        To get the `/', `(', and `)' characters, write `//', `/(', and `/)'
        inside the Bg5text environment.

        This environment is ugly, and some commands like \newcommand will not
        work in it.

    o   Instead of using the Bg5text environment you can protect the
        offending second bytes with a backslash, i.e. \{, \}, \\ (using a non-
        Chinese editor). This will not increase the readability of the Chinese
        text, but for short texts it's perhaps more comfortable. Alas, it
        doesn't work in page header commands because the macros \{ etc. will
        not be expanded.

    o   Be careful not to use any commands inside the Bg5text environment
        which write something into an external file (commands like \chapter

    o   If it's not possible to avoid Big 5 character codes with \, {, or }
        outside of the Bg5text environment (e.g. having Big 5-text in a
        \chapter or \section command), you can replace them with the \CJKchar
        macro manually:

            \section{This is a problematic Big 5 character: \CJKchar{169}{92}}

        The parameters are the first and second byte of the Big 5 character
        code. You can also use hexadecimal or octal notation.

    o   A similar command is \Unicode{byte1}{byte2} to access Unicode
        characters (not in UTF 8) directly; the parameters are the first and
        second byte of the Unicode.

    o   CJK will disable \uppercase (preserving the command as \CJKuppercase)
        if you select Big 5 encoding without using Bg5conv.tex . This affects
        the headers of the standard classes and \Roman only in standard
        LaTeX2e. Be aware that some packages and style files may use
        \uppercase for dirty tricks (e.g. to define macros for active

    o   \uppercase and \lowercase will work with NONE of the CJK encoding
        schemes if you use DC fonts because these 8-bit fonts have most
        \lccode's and \uccode's set in the range 0x80-0xFF.

    o   You should define for each TeX font size a CJK font (as an example,
        use sCJKfixed for good sizes and CJKfixed for bad sizes, and LaTeX2e
        will complain loudly about wrong sizes on the screen).

        LaTeX2e will also do the job if some size definitions are missing
        (using defined sizes), but expect a font warning for each (!) CJK
        character affected under certain circumstances.

Possible errors

    o   If you write Chinese (or Japanese) text, don't forget to suppress the
        linefeed character with a trailing `%' in the CJK environment,
        otherwise you get unwanted spaces in the output. On the other side,
        say `\ ' or something similar inside the CJK* environment to get a
        space after a CJK character.

    o   To prevent a line break before a CJK character (e.g. between an
        opening (non-CJK) parenthesis and a CJK character), say \CJKkern. This
        command prevents the insertion of \CJKglue before the CJK character.

        You may wonder about the curious name: a small kern (1 sp) between two
        CJK characters signales that the first one is a punctuation character.

    o   If you get the error message: "\CJK@min (or \CJK@max) undefined", you
        should insert \newpage before saying \end{CJK}. This can happen if
        LaTeX writes the headers (or footers) of a page containing CJK
        characters after closing the CJK environment.

    o   If you get overfull hboxes caused by CJK characters, try to increase
        \CJKglue. It defines the glue between CJK characters; the default
        definition is

            \newcommand{\CJKglue}{\hskip 0pt plus 0.08\baselineskip}  .

        \CJKglue will be inserted by CJK before each Chinese character (except
        punctuation characters as defined in the punctuation tables; see
        CJK.enc), and none after. You should separate non-Chinese text from
        CJK characters with spaces to enable hyphenation.

    o   If you get overfull hboxes caused by Hangul syllables, try to increase
        \CJKtolerance. The default definition is

            \newcommand{\CJKtolerance}{400}  .

    o   If you encounter a TeX stack overflow caused by
        {\CJKenc{new_encoding} ....}, you should write

            \CJKenc{new_encoding} ... \CJKenc{old_encoding}

        instead. Or (better) increase the stack size as discussed above.

How to get CJK and related software

    o   You will find CJK and software related to TeX at the CTAN hosts
        (Comprehensive TeX Archive Network). These completely identical ftp
        servers (concerning TeX software) are

      Sam Houston University
                            Texas (USA)
      DANTE (Deutsche Anwendervereinigung fuer TeX)
                            Heidelberg (Germany)
     Cambridge University
                            Cambridge (England)

        You should use the nearest one, or even better, a local mirror of
        a CTAN host.

        CJK will be found unpacked. To receive the complete package, go to the
        parent directory of CJK and say

          or                (whichever is appropriate for your system)
            get CJK.tar.gz

        The CJK directory and all subdirectories will be sent to you in
        compressed form. Be aware that not all mirrors of CTAN sites support
        compression of directories.

    o   The main site for Chinese related software is (USA). Mirrors
        are (Taiwan), (USA) and (Sweden). Here you
        find free Chinese fonts, Text editors etc.

        Note that while updating this text (3-Jan-1994) has still
        stopped ftp access due to networking problems.

    o   The main site for Korean related software is
        (Korea). I don't know any mirror sites of this host. At you
        will find a 24x24 hanja font with HBF header in

    o   Sam Chiu <> compiled the fonts jfs56 (GBs encoded)
        and ntu_kai48 (Big 5 encoded) for various sizes with 600dpi
        resolution. You will find them (about 22 MByte uncompressed!) at the
        CTAN hosts in /tex-archive/fonts/chinese


Werner Lemberg <a7621gac@awiuni11.bitnet>

Goldschlagstr. 52/14
A-1150 Vienna

Please report any errors or suggestions to this email-address.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~1995-01-19 23:36 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1995-01-19 23:32 Announcing CJK 2.5 (Chin/Jap/Kor for LaTeX2e) Werner Lemberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).