zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <pws@csr.com>
To: Zsh-workers <zsh-workers@sunsite.dk>
Subject: Re: UTF-8 support
Date: Tue, 05 Oct 2004 12:32:01 +0100	[thread overview]
Message-ID: <200410051132.i95BW1qv007200@news01.csr.com> (raw)
In-Reply-To: <29214.1096974092@trentino.logica.co.uk>

Oliver Kiddle wrote:
> If you want to find a short string in a long string you can surely
> metafy the short string instead of unmetafying the long string.

Both strings are likely to be metafied anyway, internally, but that
doesn't help if you're using the library routines for comparisons, since
they don't know about meta characters; and because you don't know where
a character ends, you also don't know at what byte two characters differ
without using library functions.  Unless you guess where it ends you
need the entire string from the first multibyte character in the
representation used by the library.

Indeed, unless we start with some assumption about the encoding we have
to compare every single character with library functions on an
unmetafied string.  This is very messy if we have to support systems
where the library functions aren't available (and we break quite a lot
unless we do that).  So, while I can't say for sure, I strongly suspect
we're going to end up with having to make some of the assumptions which
are already encoded into the library.  Thus some kind of hybrid is
forced on us for practical reasons.  Given this, I suspect that assuming
UTF-8 and avoiding the library functions where we don't need them is
actually going to be the neatest.  However, this remains to be seen.

I can't see an advantage in assuming UTF-8 and then relying on the
library for comparisons etc.  This seems to give the worst of both
worlds.

> The approach I was suggesting has the big advantage that we can add
> support in isolated areas without first breaking the entire shell.

That can be done however we decide, at least if we keep the current Meta
scheme.  Indeed, that's probably the way to go; we can experiment with
different methods locally before altering the rest of the shell.  The
pattern code is probably the most time-critical for comparing multibyte
characters.  Maybe this is a good time to look at removing the
requirement for NULL-terminated strings after all.

> mblen may be easy to reimplement but wcwidth is not so we'd end up
> with a mixture.

Yes, we certainly need library calls in zle.  However, formatting
strings for interactive output doesn't need to go particularly fast.
As I said, I think that in practice we're stuck with a mixture anyway.

pws


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


      reply	other threads:[~2004-10-05 11:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-30  8:29 David Gómez
2004-09-30  9:24 ` Peter Stephenson
2004-10-01 18:41 ` David Gómez
2004-10-01 19:46   ` Oliver Kiddle
2004-10-04 16:08     ` David Gómez
2004-10-04 16:15       ` Clint Adams
2004-10-05 11:13       ` Oliver Kiddle
2004-10-04 16:20     ` Peter Stephenson
2004-10-05 11:01       ` Oliver Kiddle
2004-10-05 11:32         ` Peter Stephenson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200410051132.i95BW1qv007200@news01.csr.com \
    --to=pws@csr.com \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).