From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-22217-mason-zsh=primenet.com.au@sunsite.dk>
Received: (qmail 6055 invoked from network); 11 Feb 2006 10:19:16 -0000
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,
	FORGED_RCVD_HELO autolearn=ham version=3.1.0
Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88)
  by ns1.primenet.com.au with SMTP; 11 Feb 2006 10:19:16 -0000
Received: (qmail 82801 invoked from network); 11 Feb 2006 10:19:10 -0000
Received: from sunsite.dk (130.225.247.90)
  by a.mx.sunsite.dk with SMTP; 11 Feb 2006 10:19:10 -0000
Received: (qmail 1539 invoked by alias); 11 Feb 2006 10:19:07 -0000
Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm
Precedence: bulk
X-No-Archive: yes
X-Seq: 22217
Received: (qmail 1530 invoked from network); 11 Feb 2006 10:19:07 -0000
Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88)
  by sunsite.dk with SMTP; 11 Feb 2006 10:19:07 -0000
Received: (qmail 82519 invoked from network); 11 Feb 2006 10:19:07 -0000
Received: from dsl3-63-249-88-2.cruzio.com (HELO dot.blorf.net) (63.249.88.2)
  by a.mx.sunsite.dk with SMTP; 11 Feb 2006 10:19:06 -0000
Received: by dot.blorf.net (Postfix, from userid 1000)
	id 4F7776107; Sat, 11 Feb 2006 02:19:05 -0800 (PST)
Date: Sat, 11 Feb 2006 02:19:05 -0800
From: Wayne Davison <wayned@users.sourceforge.net>
To: Peter Stephenson <pws@csr.com>
Cc: Zsh hackers list <zsh-workers@sunsite.dk>
Subject: Re: Another idea on how to insert illegal multibyte characters
Message-ID: <20060211101905.GA21506@dot.blorf.net>
References: <20060112034200.GB28221@dot.blorf.net> <EXCHANGE03YFhwWJr5700002900@exchange03.csr.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="9amGYk9869ThD9tj"
Content-Disposition: inline
In-Reply-To: <EXCHANGE03YFhwWJr5700002900@exchange03.csr.com>
User-Agent: Mutt/1.5.11


--9amGYk9869ThD9tj
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Thu, Jan 12, 2006 at 09:23:19AM +0000, Peter Stephenson wrote:
> The completion system is a bit more quoting aware: it knows whether or
> not it needs to insert a backslash before special characters because of
> quotes earlier on the line.  Ideally it should handle unprintable
> characters at the same point where it tries to do that.  That doesn't
> need to be done at the same time, though.  (I would hope it could be
> done independently and prevent the equivalent code inside zle kicking
> in.)

The attached patch is an alternative to my older patch that changed
stringaszleline().  This one changes add_match_data(), which means that
it is happening early enough that zsh could be made to figure out how
to insert the $'\123' sequences into single- or double-quoted strings
(though it does not yet do this).  This patch also fixes the updating
glitch that I mentioned my last patch had.

I think this would be good enough to include in the next release.  It
would at least make the completion of filenames with invalid charset
sequences possible, which is better than the current truncating.
Thoughts?

One caveat about my renaming of "sl" to "stl":  add_match_data() had two
variables with the same name (one more deeply nested), so I changed the
outer one (which holds the length of "str") to be "stl".

..wayne..

--9amGYk9869ThD9tj
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="multibyte.patch"

--- Src/Zle/compcore.c	15 Nov 2005 08:44:18 -0000	1.78
+++ Src/Zle/compcore.c	11 Feb 2006 09:44:45 -0000
@@ -2227,10 +2227,15 @@ add_match_data(int alt, char *str, char 
 	       char *psuf, Cline sline,
 	       char *suf, int flags, int exact)
 {
+#ifdef MULTIBYTE_SUPPORT
+    mbstate_t mbs;
+    char *t, *f, *new_str = NULL;
+    int fl, eol = 0;
+#endif
     Cmatch cm;
     Aminfo ai = (alt ? fainfo : ainfo);
     int palen, salen, qipl, ipl, pl, ppl, qisl, isl, psl;
-    int sl, lpl, lsl, ml;
+    int stl, lpl, lsl, ml;
 
     palen = salen = qipl = ipl = pl = ppl = qisl = isl = psl = 0;
 
@@ -2445,6 +2450,59 @@ add_match_data(int alt, char *str, char 
 	    line = p;
 	}
     }
+
+    stl = strlen(str);
+#ifdef MULTIBYTE_SUPPORT
+    /* If "str" contains a character that won't convert into a wide
+     * character, change it into a $'\123' sequence. */
+    memset(&mbs, '\0', sizeof mbs);
+    for (t = f = str, fl = stl; fl > 0; ) {
+	wchar_t wc;
+	size_t cnt = eol ? MB_INVALID : mbrtowc(&wc, f, fl, &mbs);
+	switch (cnt) {
+	case MB_INCOMPLETE:
+	    eol = 1;
+	    /* FALL THROUGH */
+	case MB_INVALID:
+	    /* Get mbs out of its undefined state. */
+	    memset(&mbs, '\0', sizeof mbs);
+	    if (!new_str) {
+		/* Be very pessimistic about how much space we'll need. */
+		new_str = zhalloc(stl*7 + 1);
+		memcpy(new_str, str, t - str);
+		t = new_str + (t - str);
+	    }
+	    *t++ = '$';
+	    *t++ = '\'';
+	    *t++ = '\\';
+	    *t++ = '0' + ((STOUC(*f) >> 6) & 7);
+	    *t++ = '0' + ((STOUC(*f) >> 3) & 7);
+	    *t++ = '0' + (STOUC(*f) & 7);
+	    *t++ = '\'';
+	    f++;
+	    fl--;
+	    break;
+	case 0:
+	    /* Converting '\0' returns 0, but a '\0' is a real
+	     * character for us, so we should consume 1 byte
+	     * (certainly true for Unicode and unlikely to be false
+	     * in any non-pathological multibyte representation). */
+	    cnt = 1;
+	    /* FALL THROUGH */
+	default:
+	    fl -= cnt;
+	    while (cnt--)
+		*t++ = *f++;
+	    break;
+	}
+    }
+    if (new_str) {
+	*t = '\0';
+	str = new_str;
+	stl = strlen(str);
+    }
+#endif
+
     /* Allocate and fill the match structure. */
     cm = (Cmatch) zhalloc(sizeof(struct cmatch));
     cm->str = str;
@@ -2539,10 +2597,9 @@ add_match_data(int alt, char *str, char 
     if (!ai->firstm)
 	ai->firstm = cm;
 
-    sl = strlen(str);
     lpl = (cm->ppre ? strlen(cm->ppre) : 0);
     lsl = (cm->psuf ? strlen(cm->psuf) : 0);
-    ml = sl + lpl + lsl;
+    ml = stl + lpl + lsl;
 
     if (ml < minmlen)
 	minmlen = ml;
@@ -2566,7 +2623,7 @@ add_match_data(int alt, char *str, char 
 		    e += lpl;
 		}
 		strcpy(e, str);
-		e += sl;
+		e += stl;
 		if (cm->psuf)
 		    strcpy(e, cm->psuf);
 		comp_setunset(0, 0, CP_EXACTSTR, 0);

--9amGYk9869ThD9tj--