From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29948 invoked from network); 19 Oct 2006 00:17:43 -0000 X-Spam-Checker-Version: SpamAssassin 3.1.6 (2006-10-03) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00, FORGED_RCVD_HELO autolearn=ham version=3.1.6 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 19 Oct 2006 00:17:43 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 66163 invoked from network); 19 Oct 2006 00:17:32 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 19 Oct 2006 00:17:32 -0000 Received: (qmail 8868 invoked by alias); 19 Oct 2006 00:17:29 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 22885 Received: (qmail 8851 invoked from network); 19 Oct 2006 00:17:28 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 19 Oct 2006 00:17:28 -0000 Received: (qmail 65856 invoked from network); 19 Oct 2006 00:17:28 -0000 Received: from master.altlinux.org (62.118.250.235) by a.mx.sunsite.dk with SMTP; 19 Oct 2006 00:17:27 -0000 Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by master.altlinux.org (Postfix) with ESMTP id 77A2BE33FE for ; Thu, 19 Oct 2006 04:17:24 +0400 (MSD) Received: by localhost.localdomain (Postfix, from userid 500) id 0013C35A72D; Thu, 19 Oct 2006 04:17:23 +0400 (MSD) Date: Thu, 19 Oct 2006 04:17:23 +0400 From: Alexey Tourbin To: zsh-workers@sunsite.dk Subject: Re: compaudit problem Message-ID: <20061019001723.GT11317@localhost.localdomain> Mail-Followup-To: zsh-workers@sunsite.dk References: <20060819115030.GE25959@localhost.localdomain> <060819103035.ZM28413@torch.brasslantern.com> <20061017190546.GB11317@localhost.localdomain> <061017204150.ZM8127@torch.brasslantern.com> <20061018120019.GK11317@localhost.localdomain> <20061018143113.035276ba.pws@csr.com> <20061018162028.GP11317@localhost.localdomain> <20061018182019.62809029.pws@csr.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="f6M9UaX53EEZorp0" Content-Disposition: inline In-Reply-To: <20061018182019.62809029.pws@csr.com> --f6M9UaX53EEZorp0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Oct 18, 2006 at 06:20:19PM +0100, Peter Stephenson wrote: > Alexey Tourbin wrote: > > Thanks for the clue. git-bisect now blames 22544. >=20 > That patch made the shell smarter about finding the end of > special types of string known to the shell (identifiers in particular), > using the multibyte code. >=20 > I wonder if it's part of the problem Andrey noted? At some points the > string we apply this too may contain tokenized characters, which > aren't valid multibyte characters. Since the string must be metafied, > these are easy to detect. >=20 > The simplest fix is just to ensure we don't try to handle these as > mulitbyte characters, telling the caller they're invalid. Most callers > will just handle it as a single-byte character and move on, which > is the right thing to do; some callers which really need valid characters > will abort, but they shouldn't be getting a tokenized string. So > this might actually work. If not, we need to be smarter, but probably at= a > higher level. >=20 > We need some fix like this even if it isn't the root of the present > problem. (If I could reproduce that it ought now to be easy to trace.) >=20 > Index: Src/utils.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > RCS file: /cvsroot/zsh/zsh/Src/utils.c,v > retrieving revision 1.142 > diff -u -r1.142 utils.c > --- Src/utils.c 10 Oct 2006 09:37:19 -0000 1.142 > +++ Src/utils.c 18 Oct 2006 17:09:16 -0000 > @@ -4003,6 +4003,21 @@ > *wcp =3D (wint_t)(*s =3D=3D Meta ? s[1] ^ 32 : *s); > return 1 + (*s =3D=3D Meta); > } > + /* > + * We have to handle tokens here, since we may be looking > + * through a tokenized input. Obviously this isn't > + * a valid multibyte character, so just return WEOF > + * and let the caller handle it as a single character. > + * > + * TODO: I've a sneaking suspicion we could do more here > + * to prevent the caller always needing to handle invalid > + * characters specially, but sometimes it may need to know. > + */ > + if (itok(*s)) { > + if (wcp) > + *wcp =3D EOF; > + return 1; > + } > =20 > ret =3D MB_INVALID; > for (ptr =3D s; *ptr; ) { Thanks Peter! This patch resolves the problem. (I quote the whole message because apparently it was not CC'ed to zsh-wokers.) Unfortunately I don't quite understand unicode issues in zsh. I build zsh rpm package because I use it (and a few others use it, too). The latest stable 4.2 release had problems in utf8 console, so I decided to move to then-current cvs snapshot. I got my first decently working utf8-enabled zsh with 20050926 snapshot. So as for now there's just about the only thing I can provide is feedback. This will change as I grok zsh code. BTW, git archive is available at git://git.altlinux.org/people/at/packages/zsh.git The 'master' branch is for my own cooking, but "cvs" branch, as well as "zsh-4_0-patches" and "zsh-4_2-patches" have pristine zsh sources. I verified "cvs" branch against checkout, and it's almost zero-diff (the only exception is that there's very old Completion/Core/_closequotes is in there, but is not in checkout). I used Keith Packard's "parsecvs" (with my changes, some of which already merged into mainline). > --=20 > Peter Stephenson Software Engineer > CSR PLC, Churchill House, Cambridge Business Park, Cowley Road > Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 >=20 >=20 > To access the latest news from CSR copy this link into a web browser: ht= tp://www.csr.com/email_sig.php --f6M9UaX53EEZorp0 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQFFNsQTfBKgtDjnu0YRAmuyAKCFQS5Gpq37Wd23Ut0OuDkP39urUACdHkWP MlFI7fSnoexIysT8Vit8kIc= =h5/6 -----END PGP SIGNATURE----- --f6M9UaX53EEZorp0--