zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <pws@csr.com>
To: Zsh hackers list <zsh-workers@sunsite.dk>
Subject: Re: PATCH: multibyte characters in patterns.
Date: Mon, 10 Apr 2006 17:00:38 +0100	[thread overview]
Message-ID: <EXCHANGE03nTo4X943o000076e6@exchange03.csr.com> (raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

Vincent Lefevre wrote:
> Could you give examples of what it does exactly?
> Do you mean that "?" can now match a multibyte character?

Yes.  If X is a multibyte character consisting of two bytes (say a with
a grave accent) in the current locale, the following are both true:

[[ X = (#U)?? ]]
[[ X = (#u)? ]]

> Will it also match a UTF-8 character while being in ISO-8859-1 locales?
> (The reason could be to be able to handle data that use another encoding
> than the locales, mainly when data are shared amongst different users
> who use different locales, in which case these data are encoded in UTF-8
> in general.)

You should be able to do this by locally altering the locale, since the
various variables (LANG, LC_*) are special in zsh and will perform the
appropriate setlocale() calls---as long as the system library supports
the locale, obviously.  Making the variable local should be good enough
since specials are set and restored with the correct function calls.
However, I haven't tried this.  (This ability is already present---the
only relevant thing I've changed is that patterns will obey the locale.)

> How about that in UTF-8 locales?
> 
> dixsept:~> foo="bàr"
> dixsept:~> echo $foo[2]

I haven't done anything with parameters yet, so that currently operates
on bytes, but this will be fixed eventually.  The MULTIBYTE option will
apply and we'll presumably need parameter flags equivalent to the
globbing flags; unfortunately this time even (u) and (U) are taken.

> Couldn't an "unused" area of Unicode be used for arbitrary bytes?

I suppose that's possible, but it's not actually guaranteed (and we
don't require) that a wchar_t is actually a Unicode character at all; if
I've finally understood the __STDC_ISO_10646__ stuff there seems to be
quite a lot of systems like this.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


             reply	other threads:[~2006-04-10 16:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-10 16:00 Peter Stephenson [this message]
  -- strict thread matches above, loose matches on Subject: below --
2006-04-10 15:40 Vincent Lefevre
2006-04-09 21:53 Peter Stephenson
2006-04-09 21:38 Peter Stephenson
2006-05-31  1:16 ` Wayne Davison
2006-05-31  9:25   ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EXCHANGE03nTo4X943o000076e6@exchange03.csr.com \
    --to=pws@csr.com \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).