zsh-workers
 help / color / mirror / code / Atom feed
* Silent UTF-8 assumption?
@ 2007-05-10  7:56 Andrey Borzenkov
  2007-05-10  9:46 ` Peter Stephenson
  0 siblings, 1 reply; 2+ messages in thread
From: Andrey Borzenkov @ 2007-05-10  7:56 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 411 bytes --]

This caught my attention:

static wchar_t
charref(char *x, char *y)
{
    wchar_t wc;
    size_t ret;

    if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80))
        return (wchar_t) STOUC(*x);

well, this is definitely not valid for arbitrary multibyte character set. I am 
just curious if it is possible to consistently assume that UTF-8 is in use? 
That can definitely simplify things.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Silent UTF-8 assumption?
  2007-05-10  7:56 Silent UTF-8 assumption? Andrey Borzenkov
@ 2007-05-10  9:46 ` Peter Stephenson
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Stephenson @ 2007-05-10  9:46 UTC (permalink / raw)
  To: zsh-workers

Andrey Borzenkov wrote:
> --nextPart1795203.6vxPbZfGLe
> Content-Type: text/plain;
>   charset="us-ascii"
> Content-Transfer-Encoding: quoted-printable
> Content-Disposition: inline
> 
> This caught my attention:
> 
> static wchar_t
> charref(char *x, char *y)
> {
>     wchar_t wc;
>     size_t ret;
> 
>     if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80))
>         return (wchar_t) STOUC(*x);
> 
> well, this is definitely not valid for arbitrary multibyte character
> set.

We're not using an arbitrary character set, we're using one that has the
portable character set (i.e. ASCII) as a 7-bit subset, including the
property of UTF-8 that any true multibyte stream has the eighth bit set
in all octets.  That's entirely for the practical reason that, if we
don't make that assumption, all hell will break use because we have to
make *every* part of the shell that ever tests a character, even an
ASCII character, multibyte aware.

There's a good chance the multibyte character set in question is UTF-8,
but it doesn't necessarily have to be.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php

To get further information regarding CSR, please visit our Investor Relations page at http://ir.csr.com/csr/about/overview


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-05-10  9:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-10  7:56 Silent UTF-8 assumption? Andrey Borzenkov
2007-05-10  9:46 ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).