Re: Chinese

ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed

* Re: Chinese
       [not found] <20051212173143.94765127FA@ronja.ntg.nl>
@ 2005-12-13  8:07 ` Duncan Hothersall
  2005-12-13  9:52   ` Chinese Hans Hagen
  0 siblings, 1 reply; 11+ messages in thread
From: Duncan Hothersall @ 2005-12-13  8:07 UTC (permalink / raw)

Hans wrote:

> chinese is not yet defined in utf so if you want that, we need to do it
...
> assuming this, how about making a set of tfm,enc,map files that match 
> the unicode positions (volunteers ...)

I'm very willing to help, especially if there is some drudge work
involved in constructing the files. I don't know enough (yet) about the
logic of it all to help with setting up the system, but if someone can
supply skeleton files and/or a method for constructing the necessary
files, I'm happy to do any leg-work.

Duncan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Chinese
  2005-12-13  8:07 ` Chinese Duncan Hothersall
@ 2005-12-13  9:52   ` Hans Hagen
  2005-12-13 10:03     ` sjoerd siebinga
  2005-12-13 12:33     ` Adam Lindsay
  0 siblings, 2 replies; 11+ messages in thread
From: Hans Hagen @ 2005-12-13  9:52 UTC (permalink / raw)


Duncan Hothersall wrote:

>Hans wrote:
>
>  
>
>>chinese is not yet defined in utf so if you want that, we need to do it
>>    
>>
>...
>  
>
>>assuming this, how about making a set of tfm,enc,map files that match 
>>the unicode positions (volunteers ...)
>>    
>>
>
>I'm very willing to help, especially if there is some drudge work
>involved in constructing the files. I don't know enough (yet) about the
>logic of it all to help with setting up the system, but if someone can
>supply skeleton files and/or a method for constructing the necessary
>files, I'm happy to do any leg-work.
>  
>
what we need is a set of encoding files like

/UniEncoding52 [
....
/uni52DF
/uni52E0
/uni52E1
/uni52E2
/uni52E3
/uni52E4
...
/.notdef
....
] def

that represent the ranges and can be used to construct tfm files.

(or whatever index entry is needed in order to filter the metrics from 
the ttf file)

maybe patricks font code already can do that:

- read in a ttf file (or a glyph list produced by ttf2tfm or ttf2afm)
- make a range of enc and tfm files

actually, this is rather generic, since pdftex can handle symbolic names 
like /index... and /uni..., so if we have such a set, we can stick to 
one bunch of enc files

the utf handler can then simply access char E1 from htsong-52.tfm

testing is rather simple:

\pdfmapline{htsong-52 <uni-52.enc <htsong.ttf}

\font\test=htsong-52 \char"e1


Hans

Hans

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Chinese
  2005-12-13  9:52   ` Chinese Hans Hagen
@ 2005-12-13 10:03     ` sjoerd siebinga
  2005-12-13 10:34       ` Hans Hagen
  2005-12-13 10:46       ` Re: Chinese Tobias Burnus
  2005-12-13 12:33     ` Adam Lindsay
  1 sibling, 2 replies; 11+ messages in thread
From: sjoerd siebinga @ 2005-12-13 10:03 UTC (permalink / raw)



On 13 Dec 2005, at 10:52, Hans Hagen wrote:

> Duncan Hothersall wrote:
>
>> Hans wrote:
>>
>>
>>> chinese is not yet defined in utf so if you want that, we need to  
>>> do it
>>>
>> ...
>>
>>> assuming this, how about making a set of tfm,enc,map files that  
>>> match the unicode positions (volunteers ...)
>>>
>>
>> I'm very willing to help, especially if there is some drudge work
>> involved in constructing the files. I don't know enough (yet)  
>> about the
>> logic of it all to help with setting up the system, but if someone  
>> can
>> supply skeleton files and/or a method for constructing the necessary
>> files, I'm happy to do any leg-work.
>>
> what we need is a set of encoding files like
>
> /UniEncoding52 [
> ....
> /uni52DF
> /uni52E0
> /uni52E1
> /uni52E2
> /uni52E3
> /uni52E4
> ...
> /.notdef
> ....
> ] def

I have made a Ruby-script (for personal use loosely based on Adam's  
xsl-files) which generates all the encoding- and symbolfiles from a  
given cmapfile. If someone could send me the ttf-font, I can generate  
all the necessary encodingfiles for you.

Sjoerd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Chinese
  2005-12-13 10:03     ` sjoerd siebinga
@ 2005-12-13 10:34       ` Hans Hagen
  2005-12-13 11:26         ` sjoerd siebinga
  2005-12-13 10:46       ` Re: Chinese Tobias Burnus
  1 sibling, 1 reply; 11+ messages in thread
From: Hans Hagen @ 2005-12-13 10:34 UTC (permalink / raw)


sjoerd siebinga wrote:

> I have made a Ruby-script (for personal use loosely based on Adam's  
> xsl-files) which generates all the encoding- and symbolfiles from a  
> given cmapfile. If someone could send me the ttf-font, I can generate  
> all the necessary encodingfiles for you.

the chinese fonts mentioned in the context garden qualify for such a 
treatment (htsong cum suis)

Hans

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Chinese
  2005-12-13 10:03     ` sjoerd siebinga
  2005-12-13 10:34       ` Hans Hagen
@ 2005-12-13 10:46       ` Tobias Burnus
  2005-12-13 10:56         ` Hans Hagen
  1 sibling, 1 reply; 11+ messages in thread
From: Tobias Burnus @ 2005-12-13 10:46 UTC (permalink / raw)


Hi,

sjoerd siebinga wrote:
> I have made a Ruby-script (for personal use loosely based on Adam's 
> xsl-files) which generates all the encoding- and symbolfiles from a 
> given cmapfile. If someone could send me the ttf-font, I can generate 
> all the necessary encodingfiles for you.
Nice! The recommended (by Xiao Jianfeng) TrueType fonts are given at 
http://wiki.contextgarden.net/Chinese
They are
ftp://ftp.ctex.org/pub/tex/fonts/truetype/ttf/htfs.ttf
ftp://ftp.ctex.org/pub/tex/fonts/truetype/ttf/hthei.ttf
ftp://ftp.ctex.org/pub/tex/fonts/truetype/ttf/htkai.ttf
ftp://ftp.ctex.org/pub/tex/fonts/truetype/ttf/htsong.ttf


Richard Gabriel wrote:
> But yet another question: What about Japanese? I've made only small 
> research so far, but unlike Chinese, there's almost no information 
> about Japanese in TeX. How much of work would be to adjust the current 
> "chinese" ConTeXt module for Japanese? What would you need for it?
> [Of course, meanwhile I'll investigate some other ways of typesetting 
> Japanese...]
(I don't know much about Japanese.)

In Japanese contrary to Chinese they mix different character sets:
- The Chinese characters ("Kanji"), which seem to make up most of the 
(scientific) text (I'v seen);
in addition some pronouncation based characters are used:
- ("Kana":) Hiragana and Katagana; the former are rather round 
characters in Japanese texts, most prominent should be "の" [means 
something like "of" in English]. They are mostly used for 
suffixes/prefixes where no Chinese equivalent exists. Whereas Katagana 
is used to write words which have been taken from (mostly) European 
languages.

For Kanji there should be no problem with the Chinese module, for Kana 
you need additional support for these characters. Since they are 
pronouncation based, they only consisted of < 50 Characters each.

Tobias

(Hmm, I never though I would end up such deep in linguistics duing my 
PhD theses in physics. But having three Chinese in the group and doing 
regularily some measurements at a research centre in Taiwan - I couldn't 
help picking up something.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Chinese
  2005-12-13 10:46       ` Re: Chinese Tobias Burnus
@ 2005-12-13 10:56         ` Hans Hagen
  0 siblings, 0 replies; 11+ messages in thread
From: Hans Hagen @ 2005-12-13 10:56 UTC (permalink / raw)


Tobias Burnus wrote:

> (Hmm, I never though I would end up such deep in linguistics duing my 
> PhD theses in physics. But having three Chinese in the group and doing 
> regularily some measurements at a research centre in Taiwan - I 
> couldn't help picking up something.)


well, there is a certain charm in those characters, even if you cannot 
read them (during a 2*10 hour trip in a chinese bus during the last tug 
conference one quickly learns to recognize the symbols for gas stations 
and such -)

browsing a chinese-english dictionary is also fun (i have a small one on 
my desk; some day i should start collecting dictionaries of all 
languages that context supports -); with a bit of puzzling one can find 
out the system behind the way words are made up

Hans

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Chinese
  2005-12-13 10:34       ` Hans Hagen
@ 2005-12-13 11:26         ` sjoerd siebinga
  2005-12-13 13:02           ` your mails at go Hans Hagen
  0 siblings, 1 reply; 11+ messages in thread
From: sjoerd siebinga @ 2005-12-13 11:26 UTC (permalink / raw)



On 13 Dec 2005, at 11:34, Hans Hagen wrote:

> sjoerd siebinga wrote:
>
>> I have made a Ruby-script (for personal use loosely based on  
>> Adam's  xsl-files) which generates all the encoding- and  
>> symbolfiles from a  given cmapfile. If someone could send me the  
>> ttf-font, I can generate  all the necessary encodingfiles for you.
>
> the chinese fonts mentioned in the context garden qualify for such  
> a treatment (htsong cum suis)
>

Ok. Where can I send the chinese encodingfiles?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Chinese
  2005-12-13  9:52   ` Chinese Hans Hagen
  2005-12-13 10:03     ` sjoerd siebinga
@ 2005-12-13 12:33     ` Adam Lindsay
  2005-12-13 15:12       ` Hans Hagen
  1 sibling, 1 reply; 11+ messages in thread
From: Adam Lindsay @ 2005-12-13 12:33 UTC (permalink / raw)

Hans Hagen wrote:

> what we need is a set of encoding files like
> 
> /UniEncoding52 [
> ....
> /uni52DF
> /uni52E0

I hate to be negative, but I have doubts about how generic this approach 
may be. In some tentative experiments, I discovered that many (most?) 
CJK fonts don't use traditional postscript names, but rather map from 
unicode to an indexed glyph number.

Fortunately, ttf2tfm's -w enco@Unicode@ notation seems to address this 
in most of the old test cases I tried.

adam
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  Adam T. Lindsay, Computing Dept.     atl@comp.lancs.ac.uk
  Lancaster University, InfoLab21        +44(0)1524/510.514
  Lancaster, LA1 4WA, UK             Fax:+44(0)1524/510.492
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 11+ messages in thread

* your mails at go
  2005-12-13 11:26         ` sjoerd siebinga
@ 2005-12-13 13:02           ` Hans Hagen
  0 siblings, 0 replies; 11+ messages in thread
From: Hans Hagen @ 2005-12-13 13:02 UTC (permalink / raw)


sjoerd siebinga wrote:

>
> On 13 Dec 2005, at 11:34, Hans Hagen wrote:
>
>> sjoerd siebinga wrote:
>>
>>> I have made a Ruby-script (for personal use loosely based on  
>>> Adam's  xsl-files) which generates all the encoding- and  
>>> symbolfiles from a  given cmapfile. If someone could send me the  
>>> ttf-font, I can generate  all the necessary encodingfiles for you.
>>
>>
>> the chinese fonts mentioned in the context garden qualify for such  a 
>> treatment (htsong cum suis)
>>
>
> Ok. Where can I send the chinese encodingfiles?

you can send me a zip

maybe we should start thinking on how to set up a repository at

  https://foundry.supelec.fr/

taco and patrick have more experience in this area than i have so maybe 
they have some ideas on how to organize this

Hans

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Chinese
  2005-12-13 12:33     ` Adam Lindsay
@ 2005-12-13 15:12       ` Hans Hagen
  2005-12-13 15:29         ` Adam Lindsay
  0 siblings, 1 reply; 11+ messages in thread
From: Hans Hagen @ 2005-12-13 15:12 UTC (permalink / raw)


Adam Lindsay wrote:

> Hans Hagen wrote:
>
>> what we need is a set of encoding files like
>>
>> /UniEncoding52 [
>> ....
>> /uni52DF
>> /uni52E0
>
>
> I hate to be negative, but I have doubts about how generic this 
> approach may be. In some tentative experiments, I discovered that many 
> (most?) CJK fonts don't use traditional postscript names, but rather 
> map from unicode to an indexed glyph number.
>
> Fortunately, ttf2tfm's -w enco@Unicode@ notation seems to address this 
> in most of the old test cases I tried.

afaik pdftex can handle the indexXXXX and unicXXXX entries as 
alternatives for glyphnames

Hans

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: Chinese
  2005-12-13 15:12       ` Hans Hagen
@ 2005-12-13 15:29         ` Adam Lindsay
  0 siblings, 0 replies; 11+ messages in thread
From: Adam Lindsay @ 2005-12-13 15:29 UTC (permalink / raw)


Hans Hagen wrote:
> Adam Lindsay wrote:
>> Fortunately, ttf2tfm's -w enco@Unicode@ notation seems to address this 
>> in most of the old test cases I tried.
> 
> 
> afaik pdftex can handle the indexXXXX and unicXXXX entries as 
> alternatives for glyphnames

Yes. Sorry I wasn't clear on that.
It's just that ttf2tfm is the tool that does a good job at extracting 
those entries when other tools fail.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  Adam T. Lindsay, Computing Dept.     atl@comp.lancs.ac.uk
  Lancaster University, InfoLab21        +44(0)1524/510.514
  Lancaster, LA1 4WA, UK             Fax:+44(0)1524/510.492
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-12-13 15:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20051212173143.94765127FA@ronja.ntg.nl>
2005-12-13  8:07 ` Chinese Duncan Hothersall
2005-12-13  9:52   ` Chinese Hans Hagen
2005-12-13 10:03     ` sjoerd siebinga
2005-12-13 10:34       ` Hans Hagen
2005-12-13 11:26         ` sjoerd siebinga
2005-12-13 13:02           ` your mails at go Hans Hagen
2005-12-13 10:46       ` Re: Chinese Tobias Burnus
2005-12-13 10:56         ` Hans Hagen
2005-12-13 12:33     ` Adam Lindsay
2005-12-13 15:12       ` Hans Hagen
2005-12-13 15:29         ` Adam Lindsay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).