From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/9977 Path: main.gmane.org!not-for-mail From: Simon Pepping Newsgroups: gmane.comp.tex.context Subject: Re: DocBookInContext & multi-languages (newbie) Date: Mon, 2 Dec 2002 20:46:52 +0100 Sender: ntg-context-admin@ntg.nl Message-ID: <20021202194652.GB651@scaprea> References: <20021129072039.GA7792@mail.inet.hr> <20021129191840.GB909@scaprea> <20021130201545.GA31855@mail.inet.hr> Reply-To: ntg-context@ntg.nl NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1038858810 15811 80.91.224.249 (2 Dec 2002 19:53:30 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 2 Dec 2002 19:53:30 +0000 (UTC) Return-path: Original-Received: from ref.vet.uu.nl ([131.211.172.13] helo=ref.ntg.nl) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18IwdE-00046p-00 for ; Mon, 02 Dec 2002 20:53:28 +0100 Original-Received: from ref.ntg.nl (localhost.localdomain [127.0.0.1]) by ref.ntg.nl (Postfix) with ESMTP id 0AECD10AE8; Mon, 2 Dec 2002 20:55:47 +0100 (MET) Original-Received: from hgatenl.hobby.nl (ns.hobby.nl [212.72.224.8]) by ref.ntg.nl (Postfix) with ESMTP id EF50B10AE6 for ; Mon, 2 Dec 2002 20:54:47 +0100 (MET) Original-Received: from hgatenl.hobby.nl (localhost [127.0.0.1]) by hgatenl.hobby.nl (8.12.5/8.12.2) with ESMTP id gB2JsiIF045724 for ; Mon, 2 Dec 2002 20:54:44 +0100 (CET) (envelope-from spepping@scaprea.hobby.nl) Original-Received: (from uucp@localhost) by hgatenl.hobby.nl (8.12.5/8.12.2/Submit) with UUCP id gB2Jsiuh045723 for ntg-context@ntg.nl; Mon, 2 Dec 2002 20:54:44 +0100 (CET) Original-Received: from simon by scaprea.salix.nl with local (Exim 3.35 #1 (Debian)) id 18IwWq-0000Rb-00; Mon, 02 Dec 2002 20:46:52 +0100 Original-To: ntg-context@ntg.nl Mail-Followup-To: ntg-context@ntg.nl Content-Disposition: inline In-Reply-To: <20021130201545.GA31855@mail.inet.hr> User-Agent: Mutt/1.3.28i Errors-To: ntg-context-admin@ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.0.13 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.comp.tex.context:9977 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:9977 On Sat, Nov 30, 2002 at 09:15:45PM +0100, Gour wrote: > Simon Pepping (spepping@scaprea.hobby.nl) wrote: > > > I would like to know that too :-) I have not yet found the time to > > find out how Context deals with encodings. I only have a note that > > says that one should do \useXMLfilter [utf], and that I should have a > > look at the xtag-utf (which is input by the above command) or enco > > files. > > As far as I can see ConTeXt does not understand utf-8 encoding. > > Where did you find this note mentioning utf? On my computer :-) I collected remarks made on this list in that document. > Some time ago I saw a post on DocBook list from Sebastian Rahtz who is > considering to rewrite PassiveTex with ConTeXt support instead of LaTeX. That would be very good; much better than just doing docbook. Sometimes I think I would better spend my time on such an effort, but I am afraid it is a huge task. > The question remains, how to do it with multi-lingual document > encoded in utf-8? > > Any hint? As is the case more often in open source: do it yourself. Hans has not taken part in this discussion, so I think he does not feel like embarking on an effort in this area. The basic mechanism to make TeX work with encodings is to declare all characters above 127 active, and map them to a suitable control sequence. But that only works with single-byte encodings. xmltex, David Carlisle's XML parser in tex, which is used by Passivetex, can swallow and interpret utf-8 encoding. I think he applies the utf-8 rules to the sequences of single bytes. It should be easy to transfer this to Context, because it should not be macro package dependent. The other options are: use an input filter, like the program that was mentioned in this thread. Or use NTS, the java based TeX implementation. Currently it does not deal with multibyte encodings because it is artificially restricted to 256 characters (if I remember correctly) and because there are no input encoding macro packages for higher character codes. Sebastian's PassiveTeX has long mapping tables for unicode to latex control sequences. These can be translated to context. (And they could be made to work with NTS.) While I am writing this, I am beginning to think that copying xmltex's algorithm to context is the best way to go. Regards, Simon -- Simon Pepping email: spepping@scaprea.hobby.nl