* Another idea on how to insert illegal multibyte characters @ 2006-01-12 3:42 Wayne Davison 2006-01-12 9:23 ` Peter Stephenson 2006-01-13 1:00 ` Clint Adams 0 siblings, 2 replies; 5+ messages in thread From: Wayne Davison @ 2006-01-12 3:42 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 680 bytes --] Here's another idea on how filenames with illegal byte sequences could be inserted in the command line: insert a $'\321' string for each one. Since this idiom uses plain ASCII, it inserts into the line just fine. It also has the advantage that zsh will interpret the sequence back into the appropriate character. I created an initial patch for this. It works to insert the necessary letters into the command-line, but has a bug where tab completion will not remove enough characters when moving from match to match if the previous match had one or more expanded $'\321' sequences. If folks like this idea, I imagine this bug wouldn't be too hard to fix. Opinions? ..wayne.. [-- Attachment #2: another-try.diff --] [-- Type: text/plain, Size: 3036 bytes --] --- Src/Zle/zle_utils.c 12 Jan 2006 01:04:17 -0000 1.36 +++ Src/Zle/zle_utils.c 12 Jan 2006 03:12:34 -0000 @@ -233,8 +233,9 @@ mod_export ZLE_STRING_T stringaszleline(char *instr, int incs, int *outll, int *outsz, int *outcs) { ZLE_STRING_T outstr; - int ll, sz; + int ll; #ifdef MULTIBYTE_SUPPORT + int eol = 0; mbstate_t mbs; #endif @@ -256,17 +257,15 @@ stringaszleline(char *instr, int incs, i } unmetafy(instr, &ll); - /* - * ll is the maximum number of characters there can be in - * the output string; the closer to ASCII the string, the - * better the guess. For the 2 see above. - */ - sz = (ll + 2) * ZLE_CHAR_SIZE; +#ifdef MULTIBYTE_SUPPORT + /* Compute the maximum amount of memory we'll need, which takes the + * pessimistic view that every character in the input needs to turn + * into a $'\321' string in the output. For the reason for the +2, + * see the function comments. */ if (outsz) - *outsz = ll; - outstr = (ZLE_STRING_T)zalloc(sz); + *outsz = ll * 7; + outstr = (ZLE_STRING_T)zalloc((ll*7 + 2) * ZLE_CHAR_SIZE); -#ifdef MULTIBYTE_SUPPORT if (ll) { char *inptr = instr; wchar_t *outptr = outstr; @@ -275,22 +274,36 @@ stringaszleline(char *instr, int incs, i memset(&mbs, '\0', sizeof mbs); while (ll > 0) { - size_t cnt = mbrtowc(outptr, inptr, ll, &mbs); + size_t cnt = eol ? MB_INVALID : mbrtowc(outptr, inptr, ll, &mbs); - /* - * At this point we don't handle either incomplete (-2) or - * invalid (-1) multibyte sequences. Use the current length - * and return. - */ - if (cnt == MB_INCOMPLETE || cnt == MB_INVALID) + switch (cnt) { + case MB_INCOMPLETE: + eol = 1; + /* FALL THROUGH */ + case MB_INVALID: + /* Get mbs out of its undefined state. */ + memset(&mbs, '\0', sizeof mbs); + /* Transform invalid character sequences into $'\321' + * strings that will be converted by the shell into + * the appropriate character. */ + *outptr++ = L'$'; + *outptr++ = L'\''; + *outptr++ = L'\\'; + *outptr++ = L'0' + (STOUC(*inptr) / 0100); + *outptr++ = L'0' + ((STOUC(*inptr) / 010) & 07); + *outptr++ = L'0' + (STOUC(*inptr) & 07); + *outptr = L'\''; + cnt = 1; break; - - if (cnt == 0) { + case 0: /* Converting '\0' returns 0, but a '\0' is a real * character for us, so we should consume 1 byte * (certainly true for Unicode and unlikely to be false * in any non-pathological multibyte representation). */ cnt = 1; + /* FALL THROUGH */ + default: + break; } if (outcs) { @@ -311,7 +324,15 @@ stringaszleline(char *instr, int incs, i if (outcs) *outcs = 0; } -#else + +#else /* !MULTIBYTE_SUPPORT */ + + if (outsz) + *outsz = ll; + /* ll is the number of characters in the unmetafied string. For the + * reason for the +2, see the function comments. */ + outstr = (ZLE_STRING_T)zalloc(ll + 2); + memcpy(outstr, instr, ll); *outll = ll; if (outcs) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Another idea on how to insert illegal multibyte characters 2006-01-12 3:42 Another idea on how to insert illegal multibyte characters Wayne Davison @ 2006-01-12 9:23 ` Peter Stephenson 2006-02-11 10:19 ` Wayne Davison 2006-01-13 1:00 ` Clint Adams 1 sibling, 1 reply; 5+ messages in thread From: Peter Stephenson @ 2006-01-12 9:23 UTC (permalink / raw) To: Zsh hackers list Wayne Davison wrote: > Here's another idea on how filenames with illegal byte sequences could > be inserted in the command line: insert a $'\321' string for each one. That ought to work quite well, although to do it completely consistently you'd have to worry about quoting, which is difficult at that point inside zle. Filenames aren't usually quoted, except using backslashes, so this will work most of the time, but every now and then it won't. I certainly think it's good enough for now. The completion system is a bit more quoting aware: it knows whether or not it needs to insert a backslash before special characters because of quotes earlier on the line. Ideally it should handle unprintable characters at the same point where it tries to do that. That doesn't need to be done at the same time, though. (I would hope it could be done independently and prevent the equivalent code inside zle kicking in.) -- Peter Stephenson <pws@csr.com> Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 Your mail client is unable to display the latest news from CSR. To access our news copy this link into a web browser: http://www.csr.com/email_sig.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Another idea on how to insert illegal multibyte characters 2006-01-12 9:23 ` Peter Stephenson @ 2006-02-11 10:19 ` Wayne Davison 2006-02-13 10:54 ` Peter Stephenson 0 siblings, 1 reply; 5+ messages in thread From: Wayne Davison @ 2006-02-11 10:19 UTC (permalink / raw) To: Peter Stephenson; +Cc: Zsh hackers list [-- Attachment #1: Type: text/plain, Size: 1326 bytes --] On Thu, Jan 12, 2006 at 09:23:19AM +0000, Peter Stephenson wrote: > The completion system is a bit more quoting aware: it knows whether or > not it needs to insert a backslash before special characters because of > quotes earlier on the line. Ideally it should handle unprintable > characters at the same point where it tries to do that. That doesn't > need to be done at the same time, though. (I would hope it could be > done independently and prevent the equivalent code inside zle kicking > in.) The attached patch is an alternative to my older patch that changed stringaszleline(). This one changes add_match_data(), which means that it is happening early enough that zsh could be made to figure out how to insert the $'\123' sequences into single- or double-quoted strings (though it does not yet do this). This patch also fixes the updating glitch that I mentioned my last patch had. I think this would be good enough to include in the next release. It would at least make the completion of filenames with invalid charset sequences possible, which is better than the current truncating. Thoughts? One caveat about my renaming of "sl" to "stl": add_match_data() had two variables with the same name (one more deeply nested), so I changed the outer one (which holds the length of "str") to be "stl". ..wayne.. [-- Attachment #2: multibyte.patch --] [-- Type: text/plain, Size: 2746 bytes --] --- Src/Zle/compcore.c 15 Nov 2005 08:44:18 -0000 1.78 +++ Src/Zle/compcore.c 11 Feb 2006 09:44:45 -0000 @@ -2227,10 +2227,15 @@ add_match_data(int alt, char *str, char char *psuf, Cline sline, char *suf, int flags, int exact) { +#ifdef MULTIBYTE_SUPPORT + mbstate_t mbs; + char *t, *f, *new_str = NULL; + int fl, eol = 0; +#endif Cmatch cm; Aminfo ai = (alt ? fainfo : ainfo); int palen, salen, qipl, ipl, pl, ppl, qisl, isl, psl; - int sl, lpl, lsl, ml; + int stl, lpl, lsl, ml; palen = salen = qipl = ipl = pl = ppl = qisl = isl = psl = 0; @@ -2445,6 +2450,59 @@ add_match_data(int alt, char *str, char line = p; } } + + stl = strlen(str); +#ifdef MULTIBYTE_SUPPORT + /* If "str" contains a character that won't convert into a wide + * character, change it into a $'\123' sequence. */ + memset(&mbs, '\0', sizeof mbs); + for (t = f = str, fl = stl; fl > 0; ) { + wchar_t wc; + size_t cnt = eol ? MB_INVALID : mbrtowc(&wc, f, fl, &mbs); + switch (cnt) { + case MB_INCOMPLETE: + eol = 1; + /* FALL THROUGH */ + case MB_INVALID: + /* Get mbs out of its undefined state. */ + memset(&mbs, '\0', sizeof mbs); + if (!new_str) { + /* Be very pessimistic about how much space we'll need. */ + new_str = zhalloc(stl*7 + 1); + memcpy(new_str, str, t - str); + t = new_str + (t - str); + } + *t++ = '$'; + *t++ = '\''; + *t++ = '\\'; + *t++ = '0' + ((STOUC(*f) >> 6) & 7); + *t++ = '0' + ((STOUC(*f) >> 3) & 7); + *t++ = '0' + (STOUC(*f) & 7); + *t++ = '\''; + f++; + fl--; + break; + case 0: + /* Converting '\0' returns 0, but a '\0' is a real + * character for us, so we should consume 1 byte + * (certainly true for Unicode and unlikely to be false + * in any non-pathological multibyte representation). */ + cnt = 1; + /* FALL THROUGH */ + default: + fl -= cnt; + while (cnt--) + *t++ = *f++; + break; + } + } + if (new_str) { + *t = '\0'; + str = new_str; + stl = strlen(str); + } +#endif + /* Allocate and fill the match structure. */ cm = (Cmatch) zhalloc(sizeof(struct cmatch)); cm->str = str; @@ -2539,10 +2597,9 @@ add_match_data(int alt, char *str, char if (!ai->firstm) ai->firstm = cm; - sl = strlen(str); lpl = (cm->ppre ? strlen(cm->ppre) : 0); lsl = (cm->psuf ? strlen(cm->psuf) : 0); - ml = sl + lpl + lsl; + ml = stl + lpl + lsl; if (ml < minmlen) minmlen = ml; @@ -2566,7 +2623,7 @@ add_match_data(int alt, char *str, char e += lpl; } strcpy(e, str); - e += sl; + e += stl; if (cm->psuf) strcpy(e, cm->psuf); comp_setunset(0, 0, CP_EXACTSTR, 0); ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Another idea on how to insert illegal multibyte characters 2006-02-11 10:19 ` Wayne Davison @ 2006-02-13 10:54 ` Peter Stephenson 0 siblings, 0 replies; 5+ messages in thread From: Peter Stephenson @ 2006-02-13 10:54 UTC (permalink / raw) To: Zsh hackers list Wayne Davison wrote: > On Thu, Jan 12, 2006 at 09:23:19AM +0000, Peter Stephenson wrote: > > The completion system is a bit more quoting aware: it knows whether or > > not it needs to insert a backslash before special characters because of > > quotes earlier on the line. Ideally it should handle unprintable > > characters at the same point where it tries to do that. That doesn't > > need to be done at the same time, though. (I would hope it could be > > done independently and prevent the equivalent code inside zle kicking > > in.) > > The attached patch is an alternative to my older patch that changed > stringaszleline(). This one changes add_match_data(), which means that > it is happening early enough that zsh could be made to figure out how > to insert the $'\123' sequences into single- or double-quoted strings > (though it does not yet do this). This patch also fixes the updating > glitch that I mentioned my last patch had. Any reasonably consistent attempt to do this, even if it's incomplete, strikes me as a good thing. 4.3.1 isn't going to be completely multibyte-aware anyway. -- Peter Stephenson <pws@csr.com> Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070 To access the latest news from CSR copy this link into a web browser: http://www.csr.com/email_sig.php ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Another idea on how to insert illegal multibyte characters 2006-01-12 3:42 Another idea on how to insert illegal multibyte characters Wayne Davison 2006-01-12 9:23 ` Peter Stephenson @ 2006-01-13 1:00 ` Clint Adams 1 sibling, 0 replies; 5+ messages in thread From: Clint Adams @ 2006-01-13 1:00 UTC (permalink / raw) To: Wayne Davison; +Cc: zsh-workers > I created an initial patch for this. It works to insert the necessary > letters into the command-line, but has a bug where tab completion will > not remove enough characters when moving from match to match if the > previous match had one or more expanded $'\321' sequences. If folks > like this idea, I imagine this bug wouldn't be too hard to fix. Sounds better than the status quo. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-02-13 10:54 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-01-12 3:42 Another idea on how to insert illegal multibyte characters Wayne Davison 2006-01-12 9:23 ` Peter Stephenson 2006-02-11 10:19 ` Wayne Davison 2006-02-13 10:54 ` Peter Stephenson 2006-01-13 1:00 ` Clint Adams
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).