From mboxrd@z Thu Jan 1 00:00:00 1970 To: Zsh hackers list Subject: Re: PATCH: multibyte characters in patterns. Date: Mon, 10 Apr 2006 17:00:38 +0100 From: Peter Stephenson Message-ID: X-Seq: zsh-workers 22411 Vincent Lefevre wrote: > Could you give examples of what it does exactly? > Do you mean that "?" can now match a multibyte character? Yes. If X is a multibyte character consisting of two bytes (say a with a grave accent) in the current locale, the following are both true: [[ X = (#U)?? ]] [[ X = (#u)? ]] > Will it also match a UTF-8 character while being in ISO-8859-1 locales? > (The reason could be to be able to handle data that use another encoding > than the locales, mainly when data are shared amongst different users > who use different locales, in which case these data are encoded in UTF-8 > in general.) You should be able to do this by locally altering the locale, since the various variables (LANG, LC_*) are special in zsh and will perform the appropriate setlocale() calls---as long as the system library supports the locale, obviously. Making the variable local should be good enough since specials are set and restored with the correct function calls. However, I haven't tried this. (This ability is already present---the only relevant thing I've changed is that patterns will obey the locale.) > How about that in UTF-8 locales? > > dixsept:~> foo="bār" > dixsept:~> echo $foo[2] I haven't done anything with parameters yet, so that currently operates on bytes, but this will be fixed eventually. The MULTIBYTE option will apply and we'll presumably need parameter flags equivalent to the globbing flags; unfortunately this time even (u) and (U) are taken. > Couldn't an "unused" area of Unicode be used for arbitrary bytes? I suppose that's possible, but it's not actually guaranteed (and we don't require) that a wchar_t is actually a Unicode character at all; if I've finally understood the __STDC_ISO_10646__ stuff there seems to be quite a lot of systems like this. -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070