zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <pws@csr.com>
To: zsh-workers@sunsite.dk (Zsh hackers list)
Subject: UTF-8 fonts
Date: Thu, 19 Sep 2002 17:56:57 +0100	[thread overview]
Message-ID: <14747.1032454617@csr.com> (raw)

See http://www.cl.cam.ac.uk/~mgk25/unicode.html for a nice summary of
the subject.

My first thought about using UTF-8 instead of eight bit characters was
that we would have to replace the current `Meta' system.  However, I
don't think we do since the current system will seamlessly translate
from UTF-8 input to UTF-8 output.

Therefore, all we have to do is modify the shell's internals at the
point where it actually compares characters --- or, more generally,
tries to turn metafied sequences into a single character --- to use the
normal UTF8 rules.  There may also be some extra places where counting
the length needs changing.

Unicode characters are up to 6 bytes, so either with 64-bit integers we
can do a direct comparison some bit arithmetic, or we can just use
strncmp.  (I don't fancy relying on internationalisation support for
this this but in principle that's probably the right thing to do.)
Hence I don't see the necessity for actually decoding UTF-8 into Unicode
at any point, just deciding the number of bytes.  Not doing this avoids
problems with overlong encodings (ones which illegally represent a
character using too many bytes): an overlong encoding will always
compare differently to the standard encoding.

Probably we need a configuration option to switch this on or off.

Zle might be a bit more of a problem.  The web page I referred to above
gives the hopeful message that all encoding to/decoding from UTF-8 at
the terminal is handled by the terminal driver.  So for zle we have to
worry about things like
- determining whether the terminal is actually in UTF-8 mode, probably
  from the locale
- how UTF-8 encoded characters interfere with meta-bindings.  May be
  good enough simply not to use these, at least while we work out what's
  what
- reading multi-byte characters --- timeouts and the like
- getting the right length for displaying, deleting, copying
  etc. multi-byte characters.  Apart from counting continutation
  bytes, we may be stuck with using wcwidth for display.  This is a pain
  because it involves explicity wchar_t's, and I have no experience at
  all with these (except that they mess up compilation of otherwise trivial
  string-handling functions).
- all the stuff I've forgotten.

Any comments?

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR Ltd., Science Park, Milton Road,
Cambridge, CB4 0WH, UK                          Tel: +44 (0)1223 692070


**********************************************************************
The information transmitted is intended only for the person or
entity to which it is addressed and may contain confidential 
and/or privileged material. 
Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is 
prohibited.  
If you received this in error, please contact the sender and 
delete the material from any computer.
**********************************************************************


             reply	other threads:[~2002-09-19 16:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-19 16:56 Peter Stephenson [this message]
2002-09-19 18:14 ` Clint Adams
2002-09-24 13:39 ` Oliver Kiddle
2002-09-24 16:03   ` Clint Adams
2002-09-24 17:41     ` Peter Stephenson
2002-09-25 11:11 Borzenkov Andrey
2002-09-25 11:36 ` Peter Stephenson
2002-09-25 13:27   ` Nadav Har'El
2002-09-25 17:29   ` Oliver Kiddle
2002-09-25 17:50     ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14747.1032454617@csr.com \
    --to=pws@csr.com \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).