zsh-workers
 help / color / mirror / code / Atom feed
From: Alexey Tourbin <at@altlinux.ru>
To: zsh-workers@sunsite.dk
Subject: Re: compaudit problem
Date: Thu, 19 Oct 2006 04:17:23 +0400	[thread overview]
Message-ID: <20061019001723.GT11317@localhost.localdomain> (raw)
In-Reply-To: <20061018182019.62809029.pws@csr.com>

[-- Attachment #1: Type: text/plain, Size: 3662 bytes --]

On Wed, Oct 18, 2006 at 06:20:19PM +0100, Peter Stephenson wrote:
> Alexey Tourbin <at@altlinux.ru> wrote:
> > Thanks for the clue.  git-bisect now blames 22544.
> 
> That patch made the shell smarter about finding the end of
> special types of string known to the shell (identifiers in particular),
> using the multibyte code.
> 
> I wonder if it's part of the problem Andrey noted?  At some points the
> string we apply this too may contain tokenized characters, which
> aren't valid multibyte characters.  Since the string must be metafied,
> these are easy to detect.
> 
> The simplest fix is just to ensure we don't try to handle these as
> mulitbyte characters, telling the caller they're invalid.  Most callers
> will just handle it as a single-byte character and move on, which
> is the right thing to do; some callers which really need valid characters
> will abort, but they shouldn't be getting a tokenized string.  So
> this might actually work.  If not, we need to be smarter, but probably at a
> higher level.
> 
> We need some fix like this even if it isn't the root of the present
> problem.  (If I could reproduce that it ought now to be easy to trace.)
> 
> Index: Src/utils.c
> ===================================================================
> RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
> retrieving revision 1.142
> diff -u -r1.142 utils.c
> --- Src/utils.c	10 Oct 2006 09:37:19 -0000	1.142
> +++ Src/utils.c	18 Oct 2006 17:09:16 -0000
> @@ -4003,6 +4003,21 @@
>  	    *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s);
>  	return 1 + (*s == Meta);
>      }
> +    /*
> +     * We have to handle tokens here, since we may be looking
> +     * through a tokenized input.  Obviously this isn't
> +     * a valid multibyte character, so just return WEOF
> +     * and let the caller handle it as a single character.
> +     *
> +     * TODO: I've a sneaking suspicion we could do more here
> +     * to prevent the caller always needing to handle invalid
> +     * characters specially, but sometimes it may need to know.
> +     */
> +    if (itok(*s)) {
> +	if (wcp)
> +	    *wcp = EOF;
> +	return 1;
> +    }
>  
>      ret = MB_INVALID;
>      for (ptr = s; *ptr; ) {

Thanks Peter!  This patch resolves the problem.

(I quote the whole message because apparently it was not CC'ed to
zsh-wokers.)

Unfortunately I don't quite understand unicode issues in zsh.  I build
zsh rpm package because I use it (and a few others use it, too).  The
latest stable 4.2 release had problems in utf8 console, so I decided
to move to then-current cvs snapshot.  I got my first decently working
utf8-enabled zsh with 20050926 snapshot.

So as for now there's just about the only thing I can provide is feedback.
This will change as I grok zsh code.

BTW, git archive is available at
git://git.altlinux.org/people/at/packages/zsh.git
The 'master' branch is for my own cooking, but "cvs" branch, as well
as "zsh-4_0-patches" and "zsh-4_2-patches" have pristine zsh sources.
I verified "cvs" branch against checkout, and it's almost zero-diff
(the only exception is that there's very old Completion/Core/_closequotes
is in there, but is not in checkout).  I used Keith Packard's "parsecvs"
(with my changes, some of which already merged into mainline).

> -- 
> Peter Stephenson <pws@csr.com>                  Software Engineer
> CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
> Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070
> 
> 
> To access the latest news from CSR copy this link into a web browser:  http://www.csr.com/email_sig.php

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  parent reply	other threads:[~2006-10-19  0:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-19 11:50 Alexey Tourbin
2006-08-19 17:30 ` Bart Schaefer
2006-08-19 17:45   ` Alexey Tourbin
2006-08-19 19:20     ` Bart Schaefer
2006-08-20 17:55       ` Alexey Tourbin
2006-08-19 18:00   ` Alexey Tourbin
2006-08-20 17:16     ` Peter Stephenson
2006-08-20 17:32       ` Alexey Tourbin
2006-08-20 18:33         ` Peter Stephenson
2006-10-17 19:05   ` Alexey Tourbin
2006-10-18  3:41     ` Bart Schaefer
2006-10-18 12:00       ` Alexey Tourbin
2006-10-18 13:31         ` Peter Stephenson
2006-10-18 16:20           ` Alexey Tourbin
     [not found]             ` <20061018182019.62809029.pws@csr.com>
2006-10-19  0:17               ` Alexey Tourbin [this message]
2006-10-19  8:35                 ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061019001723.GT11317@localhost.localdomain \
    --to=at@altlinux.ru \
    --cc=zsh-workers@sunsite.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).