From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22641 invoked from network); 25 Feb 2005 19:18:10 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 25 Feb 2005 19:18:10 -0000 Received: (qmail 6343 invoked from network); 25 Feb 2005 19:18:02 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 25 Feb 2005 19:18:02 -0000 Received: (qmail 27089 invoked by alias); 25 Feb 2005 19:17:57 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 20870 Received: (qmail 27074 invoked from network); 25 Feb 2005 19:17:56 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 25 Feb 2005 19:17:56 -0000 Received: (qmail 6025 invoked from network); 25 Feb 2005 19:17:56 -0000 Received: from morda.newmail.ru (HELO flock1.newmail.ru) (212.48.140.150) by a.mx.sunsite.dk with SMTP; 25 Feb 2005 19:17:52 -0000 Received: (qmail 16559 invoked from network); 25 Feb 2005 19:02:38 -0000 Received: from unknown (HELO ?10.0.0.1?) (arvidjaar@newmail.ru@83.237.195.193) by smtpd.newmail.ru with SMTP; 25 Feb 2005 19:02:38 -0000 From: Andrey Borzenkov To: zsh-workers@sunsite.dk Subject: Re: [PATCH] zle_refresh multibyte fix Date: Fri, 25 Feb 2005 22:17:45 +0300 User-Agent: KMail/1.7.2 References: <200502231727.58923.arvidjaar@newmail.ru> <200502231457.j1NEvfBI032390@news01.csr.com> In-Reply-To: <200502231457.j1NEvfBI032390@news01.csr.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200502252217.46201.arvidjaar@newmail.ru> X-Spam-Checker-Version: SpamAssassin 3.0.2 on a.mx.sunsite.dk X-Spam-Level: X-Spam-Status: No, score=-2.6 required=6.0 tests=BAYES_00 autolearn=ham version=3.0.2 X-Spam-Hits: -2.6 On Wednesday 23 February 2005 17:57, Peter Stephenson wrote: > > > Actually I find wc stuff very easy and suitable for using as internal > > representation in zsh core. But this is separate topic. > > Apart from the inefficiency of extending every byte that comes into the > shell into (typically) a four-byte integer, we can't rely on input and > output bytes being a valid wide character in the current locale at all. > I think the shell has to handle arbitrary strings of bytes without > mutilating them. Consider, for example: > > # Pass secret byte to my utility > my_utility $'\xff' > > (or any other string you like, the only point being that it isn't a > valid multibyte character string). I don't see why we should > arbitrarily decide that doesn't work because it doesn't convert to a > wide character. It will simply break far too many things. > I do not have easy answer, but what would be semantic of =2D regexps ([[:print:]] et al.)? =2D $foo[n,m] for scalar? =2D Upper/Lower conversion? =2D comparison (collating)? apparently at least bash treats things byte-oriented: bash-3.00$ echo =D0=BA$(zsh -c "echo $'\xff\xff\xff'") | xxd 0000000: d0ba ffff ff0a ...... bash-3.00$ echo =D0=BA$(zsh -c "echo $'\xff\xff\xff'") | { read foo; case = "$foo"=20 in ???? ) echo yes;; * ) echo no; esac; } no but bash-3.00$ zsh -c "echo $'\xff\xff\xff'" | { read foo; case "$foo" in ??? = )=20 echo yes;; * ) echo no; esac; } yes and has definite problem editing non-ASCII input line (actually the same as= =20 zsh had - cursor position is wrong). But it is hardly acceptable for zsh. It depends (completion code in the fir= st=20 place) too much on correct character handling. OTOH does mb -> wc -> mb preserves original byte sequence? It is true for= =20 UTF-8 (at least in current revision), but I am not sure that it is true for= =20 arbitrary encoding. =2Dandrey