Re: Basic question on Unicode and ConTeXt

ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed

From: Mojca Miklavec <mojca.miklavec.lists@gmail.com>
Subject: Re: Basic question on Unicode and ConTeXt
Date: Thu, 21 Jul 2005 02:52:31 +0200	[thread overview]
Message-ID: <6faad9f005072017524f914147@mail.gmail.com> (raw)
In-Reply-To: <42DEB5AC.8000806@creutzig.de>

Christopher Creutzig wrote:
> Hans Hagen wrote:
> >> So why not mapping the characters to unicode first and defining the
> >> mapping from unicode to \TeXcommand only once? regi-* files (at least
> >> in the meaning they have now) could be prepared automatically by a
> >> script, less error-prone and without the need to say "Some more
> >> definitions will be added later."
> >>
> > you mean ...
> >
> > \defineactivetoken 123 {\uchar{...}{...}}
> >
> > it is an option but it's much slower and take much more memory
> 
>   I may be wrong, of course, but I think Mojca proposed something
> different (and something that should be really easy to implement):  Have
> the unicode vectors stored in a format easily parsed by an external ruby
> script and create the regi-* files from that, using the conversion
> tables provided by your operating system or iconv or wherever ruby gets
> them from.

Yes, I had something different in mind.

A1.) prepare the files to be used as a source of transformation from
"any" character set to utf and prepare a list of synonyms for
encodings

(example: a file that says that in ISO-8859-2, character 0xA3
represents an unicode character 0x0141 (lstroke): for every character,
for every Mac/Windows/iso/[...] encoding that we want to support)

A2.) write a script which automatically generates regi-* files from
those files, but regi-* files would contain only the mapping to
unicode number

(example:
\startregime[iso-8859-2]
...
\somecommandtomapacharactertounicode {163}{1}{65} % lstroke
...
\stopregime)

A3.) prepare a huge file with mapping from unicode numbers to ConTeXt commands

(example:
...
\somecommandtomapfromunicodetocontext {1}{65}{\lstroke}
...)

A4.) ... I don't mind what ConTeXt does with this \lstroke afterwards,
but it seems it is already clever enough to produce the (proper) glyph
at the end

What should ConTeXt do with that?
B1.) The file under A3 should be processed at the beginning. As it may
become really huge, exotic definitions should be only preloaded if
asked for (\usemodule[korean]), while there is probably no harm if
(accented) latin, greek, cyrillic and punctuation (TM, copyright, ..)
are preloaded by default

B2.) Once the \enableregime[iso-8859-2] or any other regime is
requested, the file with the corresponding regime definitions is
processed. However, as \somecommandtomapacharactertounicode
{163}{1}{65} is processed, the character '163' is not stored as
\uchar{1}{65}, but as \lstroke. '\somecommandtomapacharactertounicode'
would first take a look which ConTeXt command is saved under
\uchar{1}{65} and call the
\defineactivetoken 179 {\lstroke} as a result.

I don't know the details of the ConTeXt internal stuff, but I think
(hope) that it should be possible to do it this way. B1 (preloading
mapping from unicode to tex commands) is probably the only "hungry"
step in the whole story.

I think that it doesn't make any sense to ask the user to "\input
regi-whatever". \enableregime and some additional definitions should
be clever enough to find out which file to process in order to enable
the proper regime.

%%%%%%%%%%%%%%%%%%%%%

Christopher's idea is actually yet another alternative, which combines
the steps A2 and A3. If the mapping unicode->ConTeXt is in some
easy-to-parse format, there's actually no additional effort if the
script writes directly the ConTeXt commands instead of unicode numbers
into regi-* files, so that B2 has some less work to do. As long as it
is guaranteed that nobody will change these files manually, this is
OK. The only drawback is that if someone notices that "\textellipsis"
is more suitable than "\dots", the script has to be changed and the
files have to be generated once more. If the character is mapped to
(0x2026 HORIZONTAL ELLIPSIS) instead, only one line in the file with
unicode->ConTeXt mapping (A3) has to be changed.

If B2 cannot work as described, the Christopher's proposal would be
the only proper way to go.

%%%%%%%%%%%%%%%%%%%%%

I wanted to test \showcharacters on the live.contextgarden.net (as
Hans suggested that my map files are probably not OK), but it didn't
compile there. (I hope it's not because of my buggy contributions in
the last few days.)

Is there any tool or macro to visialize all the glyphs available in a
font? \showcharacters (if it works) shows only the glyphs that ConTeXt
is aware of. What about the rest?

Mojca

next prev parent reply	other threads:[~2005-07-21  0:52 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-14  9:30 Steffen Wolfrum
2005-07-14 10:29 ` Henning Hraban Ramm
2005-07-14 19:13   ` Steffen Wolfrum
2005-07-14 19:32     ` VnPenguin
2005-07-15  5:16     ` Radhelorn
2005-07-15  9:09     ` Henning Hraban Ramm
2005-07-15 18:43   ` Mojca Miklavec
2005-07-15 18:59     ` hungarumlaut (was: Basic question on Unicode) Henning Hraban Ramm
2005-07-15 21:13     ` ISO/windows encodings (was: Basic question on Unicode ...) Mojca Miklavec
2005-07-17 23:38       ` ISO/windows encodings Hans Hagen
2005-07-17 20:01     ` Basic question on Unicode and ConTeXt Hans Hagen
2005-07-18  5:50       ` VnPenguin
2005-07-18 20:26       ` Mojca Miklavec
2005-07-18 21:46         ` Hans Hagen
2005-07-18 21:54         ` Hans Hagen
2005-07-18 23:11           ` Mojca Miklavec
2005-07-19  8:06             ` Hans Hagen
2005-07-20 20:35               ` Christopher Creutzig
2005-07-21  0:52                 ` Mojca Miklavec [this message]
2005-07-22 11:30                   ` Christopher Creutzig
2005-07-22 12:05                     ` Hans Hagen
2005-07-22 22:20                     ` Mojca Miklavec
2005-07-25 15:58                       ` Henning Hraban Ramm
2005-07-25 23:49                       ` Hans Hagen
2005-07-17 20:37     ` Hans Hagen
2005-07-17 21:51       ` Henning Hraban Ramm
2005-07-17 22:36         ` Hans Hagen
2005-07-18 16:18           ` Visual Debugging (was: Basic question) Henning Hraban Ramm
2005-07-18 20:44             ` Brooks Moses
2005-07-18 21:41               ` Visual Debugging Hans Hagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6faad9f005072017524f914147@mail.gmail.com \
    --to=mojca.miklavec.lists@gmail.com \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).