ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Writing Japanese using ConTeXt
@ 2003-06-17  7:15 Lei Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Lei Wang @ 2003-06-17  7:15 UTC (permalink / raw)


I have posted the following to the list. But it seems disappeared. :-)
So again.

> One other point that may or may not matter is that ... I'm not sure if
> this is the correct terminology, but the code points of the Japanese
> character sets are arrayed in a sparse matrix (?). Each plane is
> 194x194, rather than 256x256. I used to know why.

Although the plane only have 194 characters each, many Japanese
fonts used by TeX were split by 256 per subfonts as that in Chinese 
CJK compact mode. 

----- Original Message ----- 
From: "Lei Wang" <leiwang@swt.edu>
To: <ntg-context@ntg.nl>
Sent: Wednesday, June 11, 2003 11:50 AM
Subject: Re: [NTG-context] Writing Japanese using ConTeXt


> Right. Although there are many Chinese unicode fonts, Both unix and windows 
> remap them to GBK or GB when used. As for Japanese in ConTeXt, I think
> it may be better to support the SJIS or other common used encoding, too.
> UTF8 is good, but it is inconvenience since I found few editor can save your file
> in UTF8 encoding under Windows. So I have to use convert tools to convert 
> my files.
> 
> I am not familiar with the Japanese encodings. But I think it can be implement in
> ConTeXt as Chinese because many things are in the same way. Remap them
> according their encodings (JIS, SJIS,etc.) should work if we can solve the
> one problem that Japanese SJIS encoding has some one-byte characters in the range
> 0XA1 - 0XDF while Chinese and Korean only have two-byte characters. 
> 

 Wang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Writing Japanese using ConTeXt
  2003-06-15 22:22   ` Matt Gushee
@ 2003-06-16  7:55     ` Hans Hagen
  0 siblings, 0 replies; 16+ messages in thread
From: Hans Hagen @ 2003-06-16  7:55 UTC (permalink / raw)


At 16:22 15/06/2003 -0600, you wrote:

>For a detailed explanation, you should refer to the big book. But
>actually the rules are not all that difficult--probably a good deal
>simpler than European languages, I'd say. The most important thing to
>know is that there is a certain set of characters that may not occur at
>the end of a line, and another set that may not occur at the beginning,
>and I believe (it's been a while since I seriously looked at any of
>this) that there are certain unbreakable pairs, but not a huge number of
>them.

ok, so that's like chinese;

now, how about numbering [chinese have multiple systems] and labels 
[chinese has pre/post labels]?

> > - how many glyphs are there (well, i could look it up in the big cjk book)

nice explanation, i think you all should team up in making a nice manual 
about this!

Hans
-------------------------------------------------------------------------
                                   Hans Hagen | PRAGMA ADE | pragma@wxs.nl
                       Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
  tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------
                        information: http://www.pragma-ade.com/roadmap.pdf
                     documentation: http://www.pragma-ade.com/showcase.pdf
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Writing Japanese using ConTeXt
  2003-06-16  4:37   ` Tim 't Hart
@ 2003-06-16  7:51     ` Hans Hagen
  0 siblings, 0 replies; 16+ messages in thread
From: Hans Hagen @ 2003-06-16  7:51 UTC (permalink / raw)


At 06:37 16/06/2003 +0200, you wrote:
>Hans Hagen wrote:
>
> > you mix up two mechanisms:
>
>Yes, after studying the Chinese module for a while, I also came to the
>conclusion that I mixed up bad! :-)
>
>So instead of enjoying the nice weather during the weekend, I wrote some
>mapping files that will create subfonts for EUC-JP encoding. Each subfont
>contains glyphs with the same first byte, just like the idea behind the
>Chinese module.
>
>Then I wrote a basic 'font-jpn.tex' file and now I can write Japanese in
>EUC-JP encoding, including basic line breaking!
>
>I was still working on this and wanted to release it when it was more
>useful, but I guess I have to speed things up now. Also, since I'm not an
>expert in ConText, I'm sure I'm doing some things completely the wrong way,
>so I think it's good if someone else will take a look at it. There is a lot
>to improve! :-)

wang lei (chinese) and chof (korean) are experts in that area

Hans
-------------------------------------------------------------------------
                                   Hans Hagen | PRAGMA ADE | pragma@wxs.nl
                       Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
  tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------
                        information: http://www.pragma-ade.com/roadmap.pdf
                     documentation: http://www.pragma-ade.com/showcase.pdf
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Writing Japanese using ConTeXt
  2003-06-15 21:03 ` Hans Hagen
  2003-06-15 22:22   ` Matt Gushee
@ 2003-06-16  4:37   ` Tim 't Hart
  2003-06-16  7:51     ` Hans Hagen
  1 sibling, 1 reply; 16+ messages in thread
From: Tim 't Hart @ 2003-06-16  4:37 UTC (permalink / raw)


Hans Hagen wrote:

> you mix up two mechanisms:

Yes, after studying the Chinese module for a while, I also came to the
conclusion that I mixed up bad! :-)

So instead of enjoying the nice weather during the weekend, I wrote some
mapping files that will create subfonts for EUC-JP encoding. Each subfont
contains glyphs with the same first byte, just like the idea behind the
Chinese module. 

Then I wrote a basic 'font-jpn.tex' file and now I can write Japanese in
EUC-JP encoding, including basic line breaking!

I was still working on this and wanted to release it when it was more
useful, but I guess I have to speed things up now. Also, since I'm not an
expert in ConText, I'm sure I'm doing some things completely the wrong way,
so I think it's good if someone else will take a look at it. There is a lot
to improve! :-)

> - How are the rules for breaking?

The rules are basically the same as in Chinese. Japanese also contains
smaller versions of the kana (hiragana and katakana) glyphs, and breaking
before those is not allowed as well. Also, there seems to be different
classes of breaking: for some characters breaking is strictly forbidden, and
for some it is slightly forbidden. (I guess they mean that you should not
break slightly forbidden characters, but if the penalty is too bad, break
them anyway)

> Can you make a small test suite?

Yes, I am at work right now, but when I get back, I'll send you the mapping
files to make the fonts, the font-jpn file I was working on, and some other
things like sample files and line breaking rules.

My best,
Tim

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Writing Japanese using ConTeXt
  2003-06-15 21:03 ` Hans Hagen
@ 2003-06-15 22:22   ` Matt Gushee
  2003-06-16  7:55     ` Hans Hagen
  2003-06-16  4:37   ` Tim 't Hart
  1 sibling, 1 reply; 16+ messages in thread
From: Matt Gushee @ 2003-06-15 22:22 UTC (permalink / raw)


On Sun, Jun 15, 2003 at 11:03:06PM +0200, Hans Hagen wrote:

> A few questions;
> 
> - How are the rules for breaking?

For a detailed explanation, you should refer to the big book. But
actually the rules are not all that difficult--probably a good deal
simpler than European languages, I'd say. The most important thing to
know is that there is a certain set of characters that may not occur at
the end of a line, and another set that may not occur at the beginning,
and I believe (it's been a while since I seriously looked at any of
this) that there are certain unbreakable pairs, but not a huge number of
them.

> - how many glyphs are there (well, i could look it up in the big cjk book)

That's rather a tricky question, and the answer depends partly on
whether you want a complete solution or an 80/20 one. You probably know
that there are two main character sets in Japanese: jis-x-0208 and
jis-x-0212 (of course, the full names are suffixed with years, but I
forget what the current versions are). The vast majority of all Japanese
text (notice I said text, *not* documents) can be written with hiragana
and katakana (50+ characters each), roman alphabet (256, I guess?), and
the kanji in jis-x-0208, of which there are about 6000.

However, it's hard to get away without using jis-x-0212. Literary terms
and probably some specialized scientific vocabulary often require it,
and most critically, geographic and personal names very often use
jis-x-0212 characters. It's common to find names whose characters are
represented in jis-x-0208, but for any given name you must use a
different glyph that is in jis-x-0212. In Japanese culture it is
unacceptable to substitute glyphs in names. An analogy in Western
languages might be: suppose you had a typesetting system that was
incapable of rendering the string "sen" at the end of the word. Thus,
whenever yyou encountered the names Andersen or Olsen, you would print
them as "Anderson" and "Olson." I don't think anyone would consider that
acceptable. 

So the upshot of this is that, though jis-x-0212 glyphs make up a very
small proportion of the Japanese text that is printed (I'd guess 1-2
percent), a large proportion of documents (40-50 percent, maybe) require
one or more glyphs from that set. So that's another 8000 glyphs, if you
want to do it right.

One other point that may or may not matter is that ... I'm not sure if
this is the correct terminology, but the code points of the Japanese
character sets are arrayed in a sparse matrix (?). Each plane is
194x194, rather than 256x256. I used to know why.

-- 
Matt Gushee                 When a nation follows the Way,
Englewood, Colorado, USA    Horses bear manure through
mgushee@havenrock.com           its fields;
http://www.havenrock.com/   When a nation ignores the Way,
                            Horses bear soldiers through
                                its streets.
                                
                            --Lao Tzu (Peter Merel, trans.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Writing Japanese using ConTeXt
  2003-06-08 11:48 Tim 't Hart
  2003-06-09 14:16 ` Matthew Huggett
  2003-06-09 23:24 ` Matt Gushee
@ 2003-06-15 21:03 ` Hans Hagen
  2003-06-15 22:22   ` Matt Gushee
  2003-06-16  4:37   ` Tim 't Hart
  2 siblings, 2 replies; 16+ messages in thread
From: Hans Hagen @ 2003-06-15 21:03 UTC (permalink / raw)


At 13:48 08/06/2003 +0200, Tim 't Hart wrote:

>Then I decided to try ConTeXt's UTF-8 support. I created the following test
>file:

.....

you mix up two mechanisms:

(1) the one used for chinese is not utf but an installable multi glyph 
mechanism, where the first glyph triggers a font and the second a char
(2) utf encodings directly map onto a font (needed to get hyphenation right)

so what you need is either a didicated handler like chinese, or a plug in 
into the utf handler.

>But since there are usually no spaces in a Japanese sentence, there is no
>line breaking. And as you can imagine, line breaking is a useful feature to
>have! :-)

A few questions;

- How are the rules for breaking?
- how many glyphs are there (well, i could look it up in the big cjk book)
- what ranges do we use?

(see unic-* files for uft handling)

Can you make a small test suite?

Hans
-------------------------------------------------------------------------
                                   Hans Hagen | PRAGMA ADE | pragma@wxs.nl
                       Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
  tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------
                        information: http://www.pragma-ade.com/roadmap.pdf
                     documentation: http://www.pragma-ade.com/showcase.pdf
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Writing Japanese using ConTeXt
  2003-06-10 20:02       ` Tim 't Hart
@ 2003-06-11  2:35         ` Matthew Huggett
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Huggett @ 2003-06-11  2:35 UTC (permalink / raw)


>
>
>
>Hans, please tell me what I can do to help implementing Japanese support in
>ConText, or what more information you need to get a better overview of
>things that need to be done. I don't know much about ConTeXt yet, but I'll
>promise to do my best.
>
>My best,
>Tim
>
>
>  
>
If you need any help with documentation (writing, proof-reading, etc.) 
let me know.  

Matt H.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Writing Japanese using ConTeXt
  2003-06-10  8:18     ` Hans Hagen
@ 2003-06-10 20:02       ` Tim 't Hart
  2003-06-11  2:35         ` Matthew Huggett
  0 siblings, 1 reply; 16+ messages in thread
From: Tim 't Hart @ 2003-06-10 20:02 UTC (permalink / raw)


Hello Hans,

You wrote:

>one of the first things to do is to collect fonts in suitable encodings and

>post them somewhere (or at least post scripts that generate them)

And 

> for that i need to have samples and fonts,

I created a simple home page that will tell you where you can find some good
Japanese (Unicode) fonts, and how I installed them so that they can be used
in ConTeXt and dvipdfmx.

The URL is:

http://context.t-hart.com/

I have also posted some ConTeXt source files which will show you what I
could do with Japanese fonts and ConTeXt right now. The PDF output files are
downloadable as well, so you can see how everything should look like.
Remember that I have only used ConTeXt for only a few months, so please
don't have a heart attack when you see my flashy ConTeXt coding! ;-)

There is also a small list of things to do or to keep in mind when making
simple Japanese support. If someone has any more ideas, let me know. IMHO, I
think we need to concentrate on supporting Unicode fonts at first and if
that works, we will look at other font encodings. The most important feature
to have right now is simple line breaking. 

Hans, please tell me what I can do to help implementing Japanese support in
ConText, or what more information you need to get a better overview of
things that need to be done. I don't know much about ConTeXt yet, but I'll
promise to do my best.

My best,
Tim

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Writing Japanese using ConTeXt
  2003-06-10  8:13   ` Hans Hagen
@ 2003-06-10 19:36     ` Tim 't Hart
  0 siblings, 0 replies; 16+ messages in thread
From: Tim 't Hart @ 2003-06-10 19:36 UTC (permalink / raw)


Hello Hans and Matt,

> >Can PDFTeX handle TTC files? I know ttf2afm/ttf2pk can process them, but
> >I have tried 2 or 3 times to include a Japanese TTC font directly in a
> >PDFTeX document, but was never able to make it work.
> 
> dunno, maybe dvipdfmx can

I don't think PDFTeX can use TTC fonts. I use PDFTeX for DVI output and use
dvipdfmx for PDF. Map files for dvipdfmx support fonts inside a TrueType
Collection. TTF2TFM also supports the extra fonts inside a TTC by using the
-f switch.

For example, msmincho.ttc contains MS-Mincho and MS-PMincho:
ttf2tfm msmincho.ttc msmin@Unicode@  (will make TFM for MS-Mincho)
ttf2tfm msmincho.ttc -f 1 mspmin@Unicode@ (will make TFM for MS-PMincho)

The map file for dvipdfmx will then look like:
msmin@Unicode@ Identity-H :0:msmincho.ttc  (for MS-Mincho)
mspmin@Unicode@ Identity-H :1:msmincho.ttc (for MS-PMincho)	

> >Well, it can be done in stages. I think that any serious attempt to
> >support Japanese in ConTeXt should encompass all common encodings. But
> >I don't see anything wrong with starting out Unicode-only.
> 
> in that case some range mapping should be defined; proper test files, etc

Right now I'm working on a home page which contains information about where
to find Japanese fonts and how to install them for ConTeXt/dvipdfmx. I will
also add some example files of what is already possible in ConTeXt. I'll
post the URL soon. 

My best,
Tim

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Writing Japanese using ConTeXt
  2003-06-09 16:33   ` Tim 't Hart
@ 2003-06-10  8:18     ` Hans Hagen
  2003-06-10 20:02       ` Tim 't Hart
  0 siblings, 1 reply; 16+ messages in thread
From: Hans Hagen @ 2003-06-10  8:18 UTC (permalink / raw)


At 18:33 09/06/2003 +0200, Tim 't Hart wrote:

>When I look at the source of the Chinese module, the most difficult part for
>me to understand is the part about font encoding, the enco-chi.tex file, and
>the use of \defineuclass in that file. I guess it has to do something with
>mapping the written text to the font. If I understand correctly, the Chinese
>module doesn't use Unicode fonts, but GBK or Big5 encoded fonts.

indeed, there is quite some remapping going on there, (one can hook in new 
ones if needed); a complication is that the mapping may change per font 
(simplified or not)

>get printed in CMR. I did some tests and I could change the font in any
>other font I wanted to, just by using the normal ConTeXt font mechanisms. So
>I guess it is easy to mix Japanese fonts with normal Latin fonts.

the cmr comes from the main font handler so if you choose times or palatino 
it would come out that way; in chinese font switching is triggered by glyphs

>With the ConTeXt example that I posted yesterday, I am already able to write
>Japanese in UTF-8, use a Unicoded Japanese font in ConTeXt, and get Japanese
>output. I hope the hard part is already behind me! :-) The only thing that
>still puzzles me is how I can add interglyph space so that TeX can break the
>lines. If someone can help, I would really appreciate it!

for that i need to have samples and fonts,

Hans
-------------------------------------------------------------------------
                                   Hans Hagen | PRAGMA ADE | pragma@wxs.nl
                       Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
  tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------
                        information: http://www.pragma-ade.com/roadmap.pdf
                     documentation: http://www.pragma-ade.com/showcase.pdf
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Writing Japanese using ConTeXt
  2003-06-09 23:24 ` Matt Gushee
  2003-06-10  7:41   ` Matthew Huggett
@ 2003-06-10  8:13   ` Hans Hagen
  2003-06-10 19:36     ` Tim 't Hart
  1 sibling, 1 reply; 16+ messages in thread
From: Hans Hagen @ 2003-06-10  8:13 UTC (permalink / raw)


At 17:24 09/06/2003 -0600, Matt Gushee wrote:

> > Typesetting Japanese could be more complicated than Chinese because of
> > the concurrent use of four writing systems:

dunno, could also be a challenge; as long as tagging is done properly i see 
no real problem there

>On Mon, Jun 09, 2003 at 06:33:49PM +0200, Tim 't Hart wrote:
> >
> > Unicode wasn't that popular because Unix-like operating systems used EUC as
> > encoding, and Microsoft used their own invented Shift-JIS encoding.
>
>There were also cultural/political reasons, with perhaps a touch of Not
>Invented Here syndrome. But that's a different story.

same as in china: many encodings alongside unicode

> > Since ConTeXt
> > already supports UTF-8, I don't see a reason to make thinks more difficult
> > than they already are by writing text in other encodings.
>
>On the face of it that makes sense. But I don't think it's safe to make
>a blanket assumption that the text in a ConTeXt document will originate
>with the creator of the document, or that it will be newly written.
>Also, UTF-8 support is still a bit half-baked on Unix/Linux systems.

i'm sure that wang lei (on this list) can help you out; if i'm right he is 
aware of japanese font demands

> > I guess that if you want to make a proper Japanese module, you'll need to
> > support JIS or Shift-JIS encoded fonts.
>
>This would be a good idea for Type 1 font support. It seems to me that
>almost all recent Japanese TrueType fonts have a Unicode CMap.

one of the first things to do is to collect fonts in suitable encodings and 
post them somewhere (or at least post scripts that generate them)

>Can PDFTeX handle TTC files? I know ttf2afm/ttf2pk can process them, but
>I have tried 2 or 3 times to include a Japanese TTC font directly in a
>PDFTeX document, but was never able to make it work.

dunno, maybe dvipdfmx can

>Well, it can be done in stages. I think that any serious attempt to
>support Japanese in ConTeXt should encompass all common encodings. But
>I don't see anything wrong with starting out Unicode-only.

in that case some range mapping should be defined; proper test files, etc

> > > Typesetting Japanese could be more complicated than Chinese because of
> > > the concurrent use of four writing systems
> >
> > The fact that Japanese uses four writing systems is not really a problem.
>
>Maybe it's not a big problem. But it is certainly more complex than
>chinese, since there is a mixture of proportional and fixed-width
>characters, and the presence of Kana and Romaji complicate the
>line-breaking rules.

hm, but as long as the rules are clear, things should be configurable as 
much as possible

> > The only info I got is from Ken Lunde's CJKV book, where he mentions some
> > rules about CJK line breaking.
>
>Yes, Lunde is good, but he doesn't go into enough detail to serve as an
>implementor's guide. I've also searched for more info on this subject;

right, many nice tables and glyphs -)

>my impression is that besides Lunde's books there is really nothing
>available in English. I could probably make some sense out of the
>Japanese works that are available, but it would take up much more time
>than I have.

then ... write it down in a document/manual and make that the test case for 
context; if the manual can be processed we're done!

Hans
-------------------------------------------------------------------------
                                   Hans Hagen | PRAGMA ADE | pragma@wxs.nl
                       Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
  tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------
                        information: http://www.pragma-ade.com/roadmap.pdf
                     documentation: http://www.pragma-ade.com/showcase.pdf
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Writing Japanese using ConTeXt
  2003-06-09 23:24 ` Matt Gushee
@ 2003-06-10  7:41   ` Matthew Huggett
  2003-06-10  8:13   ` Hans Hagen
  1 sibling, 0 replies; 16+ messages in thread
From: Matthew Huggett @ 2003-06-10  7:41 UTC (permalink / raw)


Matt Gushee wrote:

>What would a good sample consist of? I can probably find something.
>
>  
>
Well, for starters I guess samples showing the interaction of the four 
writing scripts (I'm thinking of glyph spacing and line-breaking here; 
e.g., in the transition from native script to Romaji and back again). 
 Do you know much about different heading styles?  I suppose they are 
similar to the Chinese ones depending on how traditional the text is; 
i.e., kanji or Arabic numerals, the presence of a "section" kanji before 
the numbering, etc.   Examples of Furigana would be good.

Matt Huggett

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Writing Japanese using ConTeXt
  2003-06-08 11:48 Tim 't Hart
  2003-06-09 14:16 ` Matthew Huggett
@ 2003-06-09 23:24 ` Matt Gushee
  2003-06-10  7:41   ` Matthew Huggett
  2003-06-10  8:13   ` Hans Hagen
  2003-06-15 21:03 ` Hans Hagen
  2 siblings, 2 replies; 16+ messages in thread
From: Matt Gushee @ 2003-06-09 23:24 UTC (permalink / raw)


On Mon, Jun 09, 2003 at 11:16:27PM +0900, Matthew Huggett wrote:
> 
> >Recently, I've made the 'unwise' decision to start studying Japanese next
> >year,

Unwise? Only if you don't really want to do it, or if you are laboring
under illusions--left over from the 80s--that it will guarantee you a
lucrative and glamorous career in international trade ;-)

But anyway, I am also interested in using ConTeXt for Japanese, and
would be glad to contribute what I can to this effort.

> I asked about Japanese a while back.  Hans requested more information on 
> encodings, fonts, etc.  I don't know enough about these things or 
> ConTeXt to know what is needed exactly.

I don't know much about ConTeXt internals, but do know something about
"these things," so I may be able to help. Was Hans' request on the
mailing list? If you know when it was posted, perhaps I can look it up.

> Typesetting Japanese could be more complicated than Chinese because of 
> the concurrent use of four writing systems:

On Mon, Jun 09, 2003 at 06:33:49PM +0200, Tim 't Hart wrote:
> 
> Unicode wasn't that popular because Unix-like operating systems used EUC as
> encoding, and Microsoft used their own invented Shift-JIS encoding.

There were also cultural/political reasons, with perhaps a touch of Not
Invented Here syndrome. But that's a different story.

> So there
> is still a lot of digital text out there written in these encodings, and a
> lot of tools still use it. But I think that if you want to write new texts,
> using Unicode shouldn't be a problem for most users. I guess that most
> editors supporting Asian encodings also make it possible to save in UTF-8. I
> think nowadays it's easier to find a Unicode enabled editor than it is to
> find a Shift-JIS/EUC editor! (Well, on Windows anyway...).

Yes, recent Windows versions (starting with NT 4.0 in the business
series, and ... not sure ... ME? in the consumer series) use some form
of Unicode as their base encoding, so I think it is now the norm for
Windows text editors to support UTF-8 ... I'm pretty sure TextPad does,
for example.

> Since ConTeXt
> already supports UTF-8, I don't see a reason to make thinks more difficult
> than they already are by writing text in other encodings.

On the face of it that makes sense. But I don't think it's safe to make
a blanket assumption that the text in a ConTeXt document will originate
with the creator of the document, or that it will be newly written.
Also, UTF-8 support is still a bit half-baked on Unix/Linux systems.

> When I look at the source of the Chinese module, the most difficult part for
> me to understand is the part about font encoding, the enco-chi.tex file, and
> the use of \defineuclass in that file. I guess it has to do something with
> mapping the written text to the font.

Most likely. I might be able to glean something useful from that file.
I'll take a look when I can find the time.

> I guess that if you want to make a proper Japanese module, you'll need to
> support JIS or Shift-JIS encoded fonts.

This would be a good idea for Type 1 font support. It seems to me that
almost all recent Japanese TrueType fonts have a Unicode CMap.

> But on the other hand, maybe we
> don't need to support that since there are a lot of Japanese Unicode fonts
> available. I use WinXP, and there we have msmincho.ttc and msgothic.ttc,
> which are both Unicode fonts.

Can PDFTeX handle TTC files? I know ttf2afm/ttf2pk can process them, but
I have tried 2 or 3 times to include a Japanese TTC font directly in a
PDFTeX document, but was never able to make it work.

> And Cyberbit is a Unicoded font as well. Commercially available fonts by
> Dynalab (Dynafont Japanese TrueType collection is quite cheap and very good)
> are also Unicode fonts. Again, I don't think we should make it difficult for
> ourselves by trying to support non-Unicode fonts while unicoded Japanese
> fonts are easy to use and widely available.

Well, it can be done in stages. I think that any serious attempt to
support Japanese in ConTeXt should encompass all common encodings. But
I don't see anything wrong with starting out Unicode-only.

> > Typesetting Japanese could be more complicated than Chinese because of
> > the concurrent use of four writing systems 
> 
> The fact that Japanese uses four writing systems is not really a problem.

Maybe it's not a big problem. But it is certainly more complex than
chinese, since there is a mixture of proportional and fixed-width
characters, and the presence of Kana and Romaji complicate the
line-breaking rules.

> > I guess I need to track down a few sample documents.  I tried to turn up 
> > some info on Japanese typesetting rules but had no luck.

What would a good sample consist of? I can probably find something.

> The only info I got is from Ken Lunde's CJKV book, where he mentions some
> rules about CJK line breaking.

Yes, Lunde is good, but he doesn't go into enough detail to serve as an
implementor's guide. I've also searched for more info on this subject;
my impression is that besides Lunde's books there is really nothing
available in English. I could probably make some sense out of the
Japanese works that are available, but it would take up much more time
than I have.

> With the ConTeXt example that I posted yesterday, I am already able to write
> Japanese in UTF-8, use a Unicoded Japanese font in ConTeXt, and get Japanese
> output. I hope the hard part is already behind me! :-) The only thing that
> still puzzles me is how I can add interglyph space so that TeX can break the
> lines. If someone can help, I would really appreciate it!

Sorry, no idea. But it sounds like you've made an admirable effort so
far. I was working along similar lines a couple of years ago, but was
never able to produce anything useful. Guess you're a better TeXnician
than I.

-- 
Matt Gushee                 When a nation follows the Way,
Englewood, Colorado, USA    Horses bear manure through
mgushee@havenrock.com           its fields;
http://www.havenrock.com/   When a nation ignores the Way,
                            Horses bear soldiers through
                                its streets.
                                
                            --Lao Tzu (Peter Merel, trans.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Writing Japanese using ConTeXt
  2003-06-09 14:16 ` Matthew Huggett
@ 2003-06-09 16:33   ` Tim 't Hart
  2003-06-10  8:18     ` Hans Hagen
  0 siblings, 1 reply; 16+ messages in thread
From: Tim 't Hart @ 2003-06-09 16:33 UTC (permalink / raw)


Matthew Huggett wrote:

> I asked about Japanese a while back.  Hans requested more information on
> encodings, fonts, etc.  I don't know enough about these things or
> ConTeXt to know what is needed exactly.
 
>  From what I've read, unicode is not that popular in Japan itself.  ...

Unicode wasn't that popular because Unix-like operating systems used EUC as
encoding, and Microsoft used their own invented Shift-JIS encoding. So there
is still a lot of digital text out there written in these encodings, and a
lot of tools still use it. But I think that if you want to write new texts,
using Unicode shouldn't be a problem for most users. I guess that most
editors supporting Asian encodings also make it possible to save in UTF-8. I
think nowadays it's easier to find a Unicode enabled editor than it is to
find a Shift-JIS/EUC editor! (Well, on Windows anyway...). Since ConTeXt
already supports UTF-8, I don't see a reason to make thinks more difficult
than they already are by writing text in other encodings.

When I look at the source of the Chinese module, the most difficult part for
me to understand is the part about font encoding, the enco-chi.tex file, and
the use of \defineuclass in that file. I guess it has to do something with
mapping the written text to the font. If I understand correctly, the Chinese
module doesn't use Unicode fonts, but GBK or Big5 encoded fonts.  

I guess that if you want to make a proper Japanese module, you'll need to
support JIS or Shift-JIS encoded fonts. But on the other hand, maybe we
don't need to support that since there are a lot of Japanese Unicode fonts
available. I use WinXP, and there we have msmincho.ttc and msgothic.ttc,
which are both Unicode fonts. I also use kochi-mincho.ttf and
kochi-gothic.ttf, which are both freely available Japanese Unicode fonts.
And Cyberbit is a Unicoded font as well. Commercially available fonts by
Dynalab (Dynafont Japanese TrueType collection is quite cheap and very good)
are also Unicode fonts. Again, I don't think we should make it difficult for
ourselves by trying to support non-Unicode fonts while unicoded Japanese
fonts are easy to use and widely available.

> Typesetting Japanese could be more complicated than Chinese because of
> the concurrent use of four writing systems 

The fact that Japanese uses four writing systems is not really a problem.
Hiragana and Katakana (Kana) are just part of other Unicode ranges than
Kanji/Chinese. Things might get difficult if you want to use different fonts
for Kana than you are using for Kanji. Then you need to assign a different
font to a different Unicode range. But I have no idea why somebody wants to
do such a thing! Just using Unicode and a Japanese Unicode font will take
care of things.

If you type Romaji/Latin characters in the example I posted yesterday, they
get printed in CMR. I did some tests and I could change the font in any
other font I wanted to, just by using the normal ConTeXt font mechanisms. So
I guess it is easy to mix Japanese fonts with normal Latin fonts.

> I guess I need to track down a few sample documents.  I tried to turn up 
> some info on Japanese typesetting rules but had no luck.

The only info I got is from Ken Lunde's CJKV book, where he mentions some
rules about CJK line breaking. Also, some characters are allowed to protrude
in the right margin. I have some OTP's for Omega which handles all of this.
They can be seen here:
http://www.math.jussieu.fr/~zoonek/LaTeX/Omega-Japanese/doc.html

At first I wanted to use Omega with ConTeXt so that I could use these OTP's,
but Omega isn't really stable.

With the ConTeXt example that I posted yesterday, I am already able to write
Japanese in UTF-8, use a Unicoded Japanese font in ConTeXt, and get Japanese
output. I hope the hard part is already behind me! :-) The only thing that
still puzzles me is how I can add interglyph space so that TeX can break the
lines. If someone can help, I would really appreciate it!

My best,
Tim

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Writing Japanese using ConTeXt
  2003-06-08 11:48 Tim 't Hart
@ 2003-06-09 14:16 ` Matthew Huggett
  2003-06-09 16:33   ` Tim 't Hart
  2003-06-09 23:24 ` Matt Gushee
  2003-06-15 21:03 ` Hans Hagen
  2 siblings, 1 reply; 16+ messages in thread
From: Matthew Huggett @ 2003-06-09 14:16 UTC (permalink / raw)


Tim 't Hart wrote:

>Recently, I've made the 'unwise' decision to start studying Japanese next
>year, and of course I want to keep on using ConTeXt to write my school
>papers. [....] So I decided to find a way to
>write Japanese in ConTeXt.
>
>First I tried using the eOmega/ConTeXt combination since I have some great
>OTPs for it, but soon found out that Omega is still "the TeX of the future",
>in other words, not the "TeX of today" and extremely unstable.
>
>Then I decided to try ConTeXt's UTF-8 support. I created the following test
>  
>
I asked about Japanese a while back.  Hans requested more information on 
encodings, fonts, etc.  I don't know enough about these things or 
ConTeXt to know what is needed exactly.

 From what I've read, unicode is not that popular in Japan itself.  The 
most common encodings here are
a) iso-2022-jp (7bit)
b) japanese-iso-8bit (a.k.a euc-japan-1990, euc-japan, euc-jp)
c) japanese-shift-jis (shift jis 8bit; common under MS Windows)
"Describe Language Environment" under MULE in Gnu Emacs gives some info. 
 Ken Lunde of Adobe has a book or two on processing Japanese.

Typesetting Japanese could be more complicated than Chinese because of 
the concurrent use of four writing systems:
a) Kanji (Chinese Characters)
b) Hiragana (Syllabic script for representing grammatical endings and 
words for which Kanji are not commonly used.)
c)  Katakana (Syllabic script for representing foreign words, some 
scientfic words (flora, fauna), and for emphasis)
d) Romaji -- lit.  "Roman Characters" (Sometimes foreign languages, 
especially English, are represented in latin script)  It is more common 
than you might imagine.


I guess I need to track down a few sample documents.  I tried to turn up 
some info on Japanese typesetting rules but had no luck.

best wishes,

Matt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Writing Japanese using ConTeXt
@ 2003-06-08 11:48 Tim 't Hart
  2003-06-09 14:16 ` Matthew Huggett
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Tim 't Hart @ 2003-06-08 11:48 UTC (permalink / raw)


Hello all,

This is my first message to the list. I've been using ConTeXt for a few
months now, and so far it does everything I want to do with it, plus much
and much more!

Recently, I've made the 'unwise' decision to start studying Japanese next
year, and of course I want to keep on using ConTeXt to write my school
papers. I am already able to create Japanese documents using a terrific
Japanese TeX distribution (w32tex) and pLaTeX, but everyone on this
mailinglists knows that LaTeX is kinda 'weird' (to put it mildly) when you
are used to the beauty that is ConTexT! :-)  So I decided to find a way to
write Japanese in ConTeXt.

First I tried using the eOmega/ConTeXt combination since I have some great
OTPs for it, but soon found out that Omega is still "the TeX of the future",
in other words, not the "TeX of today" and extremely unstable.

Then I decided to try ConTeXt's UTF-8 support. I created the following test
file:
--------------
\chardef\utfunihashmode=1

\setupunicodefont
  [japanese]
  [scale=1.0]

\definefontsynonym [JapaneseMinchoRegular][cyberb]
\defineunicodefont [Mincho][JapaneseMincho][japanese]

\Mincho \enableregime[utf]

\starttext
...
<Imagine a bunch of UTF-8 encoded Japanese characters here>
...
\stoptext
--------------
cyberb is the Unicode font cyberbit.ttf which I installed using ttf2tfm:
ttf2tfm cyberbit.ttf cyberb@Unicode@

For output I use dvipdfmx with the following line added to the map file:
cyberb@Unicode@		Identity-H	:0:cyberbit.ttf

Well, to my big surprise, it worked! I saw the characters without a problem.
Using the 'scale' option of \setupunicodefont I could also change the size
of the characters. Great!

But since there are usually no spaces in a Japanese sentence, there is no
line breaking. And as you can imagine, line breaking is a useful feature to
have! :-)

I imagined that the line breaking problem is also happening when someone
wants to write Chinese, so I decided to take a look in ConTeXt's Chinese
module to see how it is handled there.

I saw that the Chinese module adds an interglyph space after a character,
which is breakable by TeX. This happens in a macro that is (indirectly)
called using \setupunicodefont and the 'command' option. I decided to try
the same in my test file. But first, I checked to see if using the 'command'
option in \setupunicodefont actually worked:

I added the following macro:

\def\HandleJapaneseGlyph
  {\insertunicodeglyph}

And changed my \setupunicodefont into:

\setupunicodefont
  [japanese]
  [scale=1.0,
  command=\HandleJapaneseGlyph]

Well, I still get Japanese characters like normal. I imagined that if I
removed \insertunicodeglyph from my macro, I wouldn't get to see them. But
this is not the case. I found out that I can do anything in my macro, but it
doesn't have an effect on the Japanese characters. They still get printed. I
also found out that I can even use command=\whateveryoulike and it still
wouldn't complain that such a macro doesn't exist. I get the feeling that
the command option is completely ignored. Apparently, my idea isn't going to
work. :-(

To make a long story even longer, I would like to know why it doesn't work,
or what I should do in order to make it work. What is the correct method to
divert the Unicode character output to another macro so that I can add a
breakable space after each character?

Well, I've been using ConTeXt for only a few months now, so maybe the
complexity of this is way over my head. At least it kept me busy! But on the
other hand, I don't think writing Japanese is much more different than
writing Chinese. It must be possible to achieve without much trouble or
reinventing the wheel.

Thanks for listening,
Tim

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-06-17  7:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-17  7:15 Writing Japanese using ConTeXt Lei Wang
  -- strict thread matches above, loose matches on Subject: below --
2003-06-08 11:48 Tim 't Hart
2003-06-09 14:16 ` Matthew Huggett
2003-06-09 16:33   ` Tim 't Hart
2003-06-10  8:18     ` Hans Hagen
2003-06-10 20:02       ` Tim 't Hart
2003-06-11  2:35         ` Matthew Huggett
2003-06-09 23:24 ` Matt Gushee
2003-06-10  7:41   ` Matthew Huggett
2003-06-10  8:13   ` Hans Hagen
2003-06-10 19:36     ` Tim 't Hart
2003-06-15 21:03 ` Hans Hagen
2003-06-15 22:22   ` Matt Gushee
2003-06-16  7:55     ` Hans Hagen
2003-06-16  4:37   ` Tim 't Hart
2003-06-16  7:51     ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).