ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Simon Pepping <spepping@scaprea.hobby.nl>
Subject: Re: DocBookInContext & multi-languages (newbie)
Date: Mon, 2 Dec 2002 20:46:52 +0100	[thread overview]
Message-ID: <20021202194652.GB651@scaprea> (raw)
In-Reply-To: <20021130201545.GA31855@mail.inet.hr>

On Sat, Nov 30, 2002 at 09:15:45PM +0100, Gour wrote:
> Simon Pepping (spepping@scaprea.hobby.nl) wrote:
> 
> > I would like to know that too :-) I have not yet found the time to
> > find out how Context deals with encodings. I only have a note that
> > says that one should do \useXMLfilter [utf], and that I should have a
> > look at the xtag-utf (which is input by the above command) or enco
> > files.
> 
> As far as I can see ConTeXt does not understand utf-8 encoding.
> 
> Where did you find this note mentioning utf?

On my computer :-) I collected remarks made on this list in that
document.

> Some time ago I saw a post on DocBook list from Sebastian Rahtz who is 
> considering to rewrite PassiveTex with ConTeXt support instead of LaTeX.

That would be very good; much better than just doing
docbook. Sometimes I think I would better spend my time on such an
effort, but I am afraid it is a huge task.
 
> The question remains, how to do it with multi-lingual document
> encoded in utf-8? 
>
> Any hint?

As is the case more often in open source: do it yourself. Hans has not
taken part in this discussion, so I think he does not feel like
embarking on an effort in this area.

The basic mechanism to make TeX work with encodings is to declare all
characters above 127 active, and map them to a suitable control
sequence. But that only works with single-byte encodings.

xmltex, David Carlisle's XML parser in tex, which is used by
Passivetex, can swallow and interpret utf-8 encoding. I think he
applies the utf-8 rules to the sequences of single bytes. It should be
easy to transfer this to Context, because it should not be macro
package dependent.

The other options are: use an input filter, like the program that was
mentioned in this thread. Or use NTS, the java based TeX
implementation. Currently it does not deal with multibyte encodings
because it is artificially restricted to 256 characters (if I remember
correctly) and because there are no input encoding macro packages for
higher character codes.

Sebastian's PassiveTeX has long mapping tables for unicode to latex
control sequences. These can be translated to context. (And they
could be made to work with NTS.)

While I am writing this, I am beginning to think that copying xmltex's
algorithm to context is the best way to go.

Regards, Simon

-- 
Simon Pepping
email: spepping@scaprea.hobby.nl

  parent reply	other threads:[~2002-12-02 19:46 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-29  7:20 Gour
2002-11-29 19:18 ` Simon Pepping
2002-11-30 20:15   ` Gour
2002-11-30 20:55     ` Bruce D'Arcus
2002-12-01  6:40       ` Gour
2002-12-02 19:46     ` Simon Pepping [this message]
2002-12-02 20:30       ` Tobias Burnus
2002-12-02 21:54       ` Hans Hagen
     [not found]       ` <Pine.LNX.4.44.0212022106550.2205-100000@tom.physik.fu-berl in.de>
2002-12-02 21:59         ` Hans Hagen
2002-12-03 12:48           ` Tobias Burnus
2002-12-03 13:59             ` Willi Egger
     [not found]           ` <Pine.LNX.4.44.0212031306170.23965-100000@warp9.physik.fu-b erlin.de>
2002-12-03 13:45             ` Hans Hagen
2002-12-02 12:28   ` DocBookInContext & multi-languages (newbie) / utf Hans Hagen
2002-12-02 13:59     ` Gour
2002-12-02 14:43       ` Hans Hagen
2002-12-02 16:36         ` Taco Hoekwater
2002-12-02 17:40         ` Gour
2002-12-02 20:16           ` Simon Pepping
2002-12-02 21:57             ` Hans Hagen
2002-12-03 20:03               ` Simon Pepping
2002-12-03 23:31                 ` Hans Hagen
2002-12-04 14:10                   ` Gour
2002-12-04 16:31                     ` Hans Hagen
2002-12-04 20:08                       ` Gour
2002-12-05  0:10                         ` multi-languages [UTF-8 Roman and UTF-8 Nagari test files] Richard Mahoney
2002-12-05 11:58                         ` DocBookInContext & multi-languages (newbie) / utf Hans Hagen
2002-12-05 12:22                           ` Taco Hoekwater
2002-12-05 13:25                             ` Hans Hagen
2002-12-05 14:03                           ` Tobias Burnus
2002-12-05 19:09                           ` Create Type 1 fonts with Indological diacritics and UTF-8 TTF Richard Mahoney
2002-12-06 14:10                             ` Hans Hagen
2002-12-06 15:22                               ` Docu set Michael Hallgren
2002-12-07 14:12                                 ` Patrick Gundlach
2002-12-07 17:37                                   ` Michael Hallgren
2002-12-06 15:36                               ` Re: Create Type 1 fonts with Indological diacritics and UTF-8 TTF Gour
2002-12-06 16:47                                 ` Hans Hagen
2002-12-03 19:14           ` DocBookInContext ... [CSX+, UTF-8 Roman, and Nagari Codings] Richard Mahoney
2002-12-04 14:16             ` Gour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021202194652.GB651@scaprea \
    --to=spepping@scaprea.hobby.nl \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).