zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <p.stephenson@samsung.com>
To: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: zsh/complist colours improperly handle multibyte characters
Date: Tue, 25 Oct 2016 11:44:25 +0100	[thread overview]
Message-ID: <20161025114425.64574aff@pwslap01u.europe.root.pri> (raw)
In-Reply-To: <161023123416.ZM3982@torch.brasslantern.com>

On Sun, 23 Oct 2016 12:34:16 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Oct 23,  7:59pm, Peter Stephenson wrote:
> } Subject: Re: zsh/complist colours improperly handle multibyte characters
> }
> } On Sun, 23 Oct 2016 10:56:52 -0700
> } Bart Schaefer <schaefer@brasslantern.com> wrote:
> } > No, sorry, this is a UTF-8 full-line-height vertical-bar, not ascii pipe.
> } > It's incorrectly interpreted as a left angle bracket pattern character,
> } > if that BUG message is accurate.
> } 
> } Ah, then there's a good chance this is indeed a problem with
> } zshtokenize.  We probably ought at least to pass through metafied
> } characters.  I don't know that fits this particular case, but it's the
> } obvious problem.
> 
> Nope, the zshtokenize patch doesn't help in this case at all.  I still
> get the BUG: message.  Strangely (?) I do NOT get that message if I use
> the character directly in a pattern expression such as [[ ... ]], so
> it has something to do with the way compdescribe is passing it around.

For me this was working just with metafying the string in complist when
it goes to patcompile().  The patch I posted makes this a bit safer in
theory, though in fact I don't think we hit the problem in practice.

In the previous code, the input string is

* E2 94 82 *

That 94 looks like a token.  On tokenisation we get

87 E2 94 82 87

The * has become a token, but 94 still looks like a token because it's
not protected.  So the pattern compiler turns it back into the
corresponding string form, '<', when it gets an incomplete multibyte
pattern.  This makes the pattern look invalid, so it gives up.  Later you
get "<" as a token, which doesn't work as there's no numeric expression.

To fix this safely, we need first to metafy the input string,

* E2 83 B4 82 *

then tokenise it with the change I previously posted to skip Meta,

87 E2 83 B4 82 87

What the extra change is doing is making sure that 83 B4 goes through as
is --- a metafied character is by definition escaped from tokenisation.
However, because this only happens when bit 7 is set, and we'll never
tokenise such a character, I don't think it actually makes a
difference.  But I've left it in as it respects the intention.

pws

diff --git a/Src/Zle/complist.c b/Src/Zle/complist.c
index 39ac782..d4672a1 100644
--- a/Src/Zle/complist.c
+++ b/Src/Zle/complist.c
@@ -415,6 +415,7 @@ getcoldef(char *s)
 		break;
 	    *s++ = '\0';
 	}
+	p = metafy(p, strlen(p), META_USEHEAP);
 	tokenize(p);
 	if ((prog = patcompile(p, 0, NULL))) {
 	    Patcol pc, po;
diff --git a/Src/glob.c b/Src/glob.c
index a845c5f..50f6dce 100644
--- a/Src/glob.c
+++ b/Src/glob.c
@@ -3499,6 +3499,10 @@ zshtokenize(char *s, int flags)
     for (; *s; s++) {
       cont:
 	switch (*s) {
+	case Meta:
+	    /* skip both Meta and following character */
+	    s++;
+	    break;
 	case Bnull:
 	case Bnullkeep:
 	case '\\':


  reply	other threads:[~2016-10-25 10:44 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20161021011017eucas1p1d1fceae920242e69b4426bc41cb3c064@eucas1p1.samsung.com>
2016-10-21  1:09 ` Danielle McLean
2016-10-21  4:07   ` Bart Schaefer
2016-10-23 17:30     ` Peter Stephenson
2016-10-23 18:23       ` Bart Schaefer
     [not found]     ` <20161023184641.4549e10a@ntlworld.com>
     [not found]       ` <161023105652.ZM3309@torch.brasslantern.com>
2016-10-23 18:59         ` Peter Stephenson
2016-10-23 19:34           ` Bart Schaefer
2016-10-25 10:44             ` Peter Stephenson [this message]
2016-10-25 15:59               ` Bart Schaefer
2016-10-25 16:05                 ` Peter Stephenson
2016-10-21  8:33   ` Peter Stephenson
2016-10-21 15:48     ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161025114425.64574aff@pwslap01u.europe.root.pri \
    --to=p.stephenson@samsung.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).