From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6055 invoked from network); 11 Feb 2006 10:19:16 -0000 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00, FORGED_RCVD_HELO autolearn=ham version=3.1.0 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 11 Feb 2006 10:19:16 -0000 Received: (qmail 82801 invoked from network); 11 Feb 2006 10:19:10 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 11 Feb 2006 10:19:10 -0000 Received: (qmail 1539 invoked by alias); 11 Feb 2006 10:19:07 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 22217 Received: (qmail 1530 invoked from network); 11 Feb 2006 10:19:07 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 11 Feb 2006 10:19:07 -0000 Received: (qmail 82519 invoked from network); 11 Feb 2006 10:19:07 -0000 Received: from dsl3-63-249-88-2.cruzio.com (HELO dot.blorf.net) (63.249.88.2) by a.mx.sunsite.dk with SMTP; 11 Feb 2006 10:19:06 -0000 Received: by dot.blorf.net (Postfix, from userid 1000) id 4F7776107; Sat, 11 Feb 2006 02:19:05 -0800 (PST) Date: Sat, 11 Feb 2006 02:19:05 -0800 From: Wayne Davison To: Peter Stephenson Cc: Zsh hackers list Subject: Re: Another idea on how to insert illegal multibyte characters Message-ID: <20060211101905.GA21506@dot.blorf.net> References: <20060112034200.GB28221@dot.blorf.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="9amGYk9869ThD9tj" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.11 --9amGYk9869ThD9tj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Jan 12, 2006 at 09:23:19AM +0000, Peter Stephenson wrote: > The completion system is a bit more quoting aware: it knows whether or > not it needs to insert a backslash before special characters because of > quotes earlier on the line. Ideally it should handle unprintable > characters at the same point where it tries to do that. That doesn't > need to be done at the same time, though. (I would hope it could be > done independently and prevent the equivalent code inside zle kicking > in.) The attached patch is an alternative to my older patch that changed stringaszleline(). This one changes add_match_data(), which means that it is happening early enough that zsh could be made to figure out how to insert the $'\123' sequences into single- or double-quoted strings (though it does not yet do this). This patch also fixes the updating glitch that I mentioned my last patch had. I think this would be good enough to include in the next release. It would at least make the completion of filenames with invalid charset sequences possible, which is better than the current truncating. Thoughts? One caveat about my renaming of "sl" to "stl": add_match_data() had two variables with the same name (one more deeply nested), so I changed the outer one (which holds the length of "str") to be "stl". ..wayne.. --9amGYk9869ThD9tj Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="multibyte.patch" --- Src/Zle/compcore.c 15 Nov 2005 08:44:18 -0000 1.78 +++ Src/Zle/compcore.c 11 Feb 2006 09:44:45 -0000 @@ -2227,10 +2227,15 @@ add_match_data(int alt, char *str, char char *psuf, Cline sline, char *suf, int flags, int exact) { +#ifdef MULTIBYTE_SUPPORT + mbstate_t mbs; + char *t, *f, *new_str = NULL; + int fl, eol = 0; +#endif Cmatch cm; Aminfo ai = (alt ? fainfo : ainfo); int palen, salen, qipl, ipl, pl, ppl, qisl, isl, psl; - int sl, lpl, lsl, ml; + int stl, lpl, lsl, ml; palen = salen = qipl = ipl = pl = ppl = qisl = isl = psl = 0; @@ -2445,6 +2450,59 @@ add_match_data(int alt, char *str, char line = p; } } + + stl = strlen(str); +#ifdef MULTIBYTE_SUPPORT + /* If "str" contains a character that won't convert into a wide + * character, change it into a $'\123' sequence. */ + memset(&mbs, '\0', sizeof mbs); + for (t = f = str, fl = stl; fl > 0; ) { + wchar_t wc; + size_t cnt = eol ? MB_INVALID : mbrtowc(&wc, f, fl, &mbs); + switch (cnt) { + case MB_INCOMPLETE: + eol = 1; + /* FALL THROUGH */ + case MB_INVALID: + /* Get mbs out of its undefined state. */ + memset(&mbs, '\0', sizeof mbs); + if (!new_str) { + /* Be very pessimistic about how much space we'll need. */ + new_str = zhalloc(stl*7 + 1); + memcpy(new_str, str, t - str); + t = new_str + (t - str); + } + *t++ = '$'; + *t++ = '\''; + *t++ = '\\'; + *t++ = '0' + ((STOUC(*f) >> 6) & 7); + *t++ = '0' + ((STOUC(*f) >> 3) & 7); + *t++ = '0' + (STOUC(*f) & 7); + *t++ = '\''; + f++; + fl--; + break; + case 0: + /* Converting '\0' returns 0, but a '\0' is a real + * character for us, so we should consume 1 byte + * (certainly true for Unicode and unlikely to be false + * in any non-pathological multibyte representation). */ + cnt = 1; + /* FALL THROUGH */ + default: + fl -= cnt; + while (cnt--) + *t++ = *f++; + break; + } + } + if (new_str) { + *t = '\0'; + str = new_str; + stl = strlen(str); + } +#endif + /* Allocate and fill the match structure. */ cm = (Cmatch) zhalloc(sizeof(struct cmatch)); cm->str = str; @@ -2539,10 +2597,9 @@ add_match_data(int alt, char *str, char if (!ai->firstm) ai->firstm = cm; - sl = strlen(str); lpl = (cm->ppre ? strlen(cm->ppre) : 0); lsl = (cm->psuf ? strlen(cm->psuf) : 0); - ml = sl + lpl + lsl; + ml = stl + lpl + lsl; if (ml < minmlen) minmlen = ml; @@ -2566,7 +2623,7 @@ add_match_data(int alt, char *str, char e += lpl; } strcpy(e, str); - e += sl; + e += stl; if (cm->psuf) strcpy(e, cm->psuf); comp_setunset(0, 0, CP_EXACTSTR, 0); --9amGYk9869ThD9tj--