From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/38824 Path: news.gmane.org!not-for-mail From: "Wolfgang Schuster" Newsgroups: gmane.comp.tex.context Subject: Re: MKIV Chinese typesetting Date: Mon, 28 Jan 2008 17:26:51 +0100 Message-ID: <115224fb0801280826r7f7f5277s49c90c0f780d1987@mail.gmail.com> References: <68bfdc900801270151u7e763fa6y7d8935fc35637b3f@mail.gmail.com> <20080128021711.GA13410@phare.normalesup.org> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1201539948 13588 80.91.229.12 (28 Jan 2008 17:05:48 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 28 Jan 2008 17:05:48 +0000 (UTC) To: "Mailing list for ConTeXt users" Original-X-From: ntg-context-bounces@ntg.nl Mon Jan 28 18:06:07 2008 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1JJXQc-0001ib-RH for gctc-ntg-context-518@m.gmane.org; Mon, 28 Jan 2008 18:05:50 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id C31451FDF1; Mon, 28 Jan 2008 18:05:10 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 25243-01-47; Mon, 28 Jan 2008 18:04:29 +0100 (CET) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id D1C971FD16; Mon, 28 Jan 2008 17:46:26 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 361871FC61 for ; Mon, 28 Jan 2008 17:46:01 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 25236-01-15 for ; Mon, 28 Jan 2008 17:45:16 +0100 (CET) Original-Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.173]) by ronja.ntg.nl (Postfix) with ESMTP id B1F1C1FD16 for ; Mon, 28 Jan 2008 17:26:52 +0100 (CET) Original-Received: by ug-out-1314.google.com with SMTP id q2so38663uge.50 for ; Mon, 28 Jan 2008 08:26:52 -0800 (PST) Original-Received: by 10.67.115.9 with SMTP id s9mr252938ugm.81.1201537611681; Mon, 28 Jan 2008 08:26:51 -0800 (PST) Original-Received: by 10.67.94.6 with HTTP; Mon, 28 Jan 2008 08:26:51 -0800 (PST) In-Reply-To: <20080128021711.GA13410@phare.normalesup.org> Content-Disposition: inline X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.9 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:38824 Archived-At: On Jan 28, 2008 3:17 AM, Arthur Reutenauer wrote: > Hello, > > Thanks for this comprehensive review. If I'm not mistaken, there is > no specific code for CJKV typesetting in Mark IV; the examples in mk.pdf > seem to use the generic font loading mechanism. This is wrong, fon-otf contains a few lua macros about linebreaking and char-def has information about the character width (full width, half width ...) and other information like opening punctuation, parenthesis but none of them is finished. > I would like to answer more completely, but don't have much time for > the moment. About some of your remarks: > > > so I think a new feature should be added to map all the Chinese puncts > > into english while at the same time, a space should be added after the > > English punct marks. > > Would it not be better to automatically add shrinkable glue after > Chinese punctuation, rather than replacing the character by force? This > would be very much in line with the general TeX philosophy of setting > text (and would probably suppress the need for half-width forms in the > font altogether). > > > - pp118, penultimate example, box 2, line1, the ' punct mark should > > not appear at the end of the line > > This should be taken care of by adding an appropriate penalty before > the character. > > > - pp118, ultimate example, box 2, line2, in fact, if you want do > > perfect Chinese typesetting, all the puncts which begin a line or end > > a line should be closed to the margin line > > Do you mean simply closer to the margin, or in the margin itself > (protruding)? Protruding is already possible in pdfTeX; I believe it is > available in LuaTeX as well, although it might be broken for the moment > (Taco?). Setting the character closer to the margin should be possible > as well, as a modified form of protruding, I trust. > > > A small skip should be left between Chinese and English which makes > > the result much better. usually the space is a quarter of a chinese > > character width. A TeX expression should like: > > \hspace{0.25em plus 0.125em minus 0.08em} > > Again, this can be taken care of by automatically adding this glue > between pairs of character of the appropriate category. > > > The last important thing for English and Chinese bi-lingual > > typesetting is that: do not use English glyphs in Chinese fonts > > Sure, there should be a possibility of specifying a Western font to be > used inside Chinese text. Could be done with cirtual fonts but we need a interface. > > - the following script produce an error: Invalid field id penalty for > > node type glyph (1). > > I don't have that error here. This is very big font; are you sure it > has been read entirely and correctly written to the cache? Lua crashed > on my machine when I first compiled your example, and only a partial > font hash was written to the cache (ConTeXt didn't crash, so the first > compilation apparently ended well, but the cache was already filled with > a partial font). I can imagine that problems will arise in the presence > of a partially hashed font in the cache. > > Anyway, the code looks quite weird to me: > > > \definefontfeature[chinese][mode=node,script=hang,lang=zht,script=hani,lang=dlft] > > This means that you activate two different scripts at the same time > (hang == Hangul and hani == Han ideographs), and also two languages at > the same time (zht == Chinese Traditional and dlft is probably a typo > for dflt == default). I can't imagine what that is supposed to mean, > and activating Traditional Chinese is probably wrong with Adobe Song Std > which is a Simplified Chinese font. A saner definition of that feature > would be in my opinion: > > \definefontfeature[chinese-traditional][mode=node,script=hani,lang=zhs] You need the hang script, it takes care about the linebreak. > I know this code comes from mk.pdf, but I think it is a mistake. > > Finally, there is an interesting article by Jin-Hwan Cho (the dvipdfmx > author) and Haruhiko Okumura about CJKV typesetting with Omega a couple > of years ago. They have implemented all of the rules you mention above > and a bit more; and although they used OTPs at the time, it should be > quite straighforward to transpose it in Lua code (actually, I've done it > a couple of months ago, but I have used plain LuaTeX, and in ConTeXt it > should probably done using node processors or something). > > http://project.ktug.or.kr/omega-cjk/tug2004-preprint.pdf This this currently done in font-otf.lua. Greetings, Wolfgang ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________