zsh-workers
 help / color / mirror / code / Atom feed
* Re: Bug#419832: zsh: expanding non-ASCII filenames with <TAB>
       [not found] ` <200708170905.l7H9521T1534406@shell01.TheWorld.com>
@ 2007-08-17 12:08   ` Clint Adams
  2007-08-17 14:22     ` Peter Stephenson
  0 siblings, 1 reply; 5+ messages in thread
From: Clint Adams @ 2007-08-17 12:08 UTC (permalink / raw)
  To: zsh-workers; +Cc: Alan Curry, 419832-forwarded

On Fri, Aug 17, 2007 at 05:05:02AM -0400, Alan Curry wrote:
> >> In the following demonstration, the first <TAB> keypress inserted the $'\300'
> >> for me. The second <TAB> keypress, typed immediately after the asterisk,
> >> should expand the glob into $'\300' also, but instead it just erases the
> >> asterisk, replacing it with nothing at all. If Return is pressed after the
> >> tab, the cat is executed with no arguments and reads from the tty.

> >> Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

> >Non-ASCII characters don't exist in the C locale; maybe you want to pick
> >a better one.

> That's a pretty lame brush-off.

> My locale is set correctly (to be precise, it is unset correctly; none of
> those environment variables are set). It represents the type of output I want
> to get from all programs that recognize locales: text in English if possible,
> and traditional sort order, not that new-fangled chaotic LANG=en order, where
> ls hides your Makefile in the middle of all your lowercase source files! (Why
> do you think they made make(1) recognize Makefiles with a capital M? Because
> it belongs at the start of the listing, that's why.)

> If you think this behavior is justified, for what am I being punished? Using
> the default ("C") locale? It accurately describes what language I can read.
> Having a file that is not a valid sequence of characters in that locale?
> Maybe I should go file bug reports on all the programs that allow me to
> create a file with such a name. That will be a lot of bug reports.

> Or maybe we could admit that regardless of one's preferred locale, it is
> inevitable that one will occasionally obtain files whose names are not valid
> character strings in that locale. It would be nice if our tools would not
> choke on those, would it not?

> The $'\300' notation is a vast improvement over what older zsh versions did,
> just dump the wacky bytes directly to the terminal. The current version
> already automatically inserts $'\300' when completing; I only suggest that it
> behave identically when expanding.

> Expanding a glob to an empty list, when in fact it matched something, surely
> can't be considered acceptable behavior. Even worse if it matched several
> things and only one of them had a nasty byte and got omitted, you might not
> notice and then go ahead and act on the wrong set of files.
> 
> Come on.

There is apparently a bit of inconsistency here.  expand-or-complete and
expand-word will eat the asterisk, but _expand_word won't.

What's worse is that if you `touch a b$'\300' c` in an empty directory,
cat *<TAB> will only expand to "a b".


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug#419832: zsh: expanding non-ASCII filenames with <TAB>
  2007-08-17 12:08   ` Bug#419832: zsh: expanding non-ASCII filenames with <TAB> Clint Adams
@ 2007-08-17 14:22     ` Peter Stephenson
  2007-08-17 14:55       ` Bart Schaefer
  2007-08-18  4:29       ` Clint Adams
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Stephenson @ 2007-08-17 14:22 UTC (permalink / raw)
  To: zsh-workers; +Cc: Alan Curry, 419832-forwarded

On Fri, 17 Aug 2007 08:08:44 -0400
Clint Adams <schizo@debian.org> wrote:
> There is apparently a bit of inconsistency here.  expand-or-complete and
> expand-word will eat the asterisk, but _expand_word won't.
> 
> What's worse is that if you `touch a b$'\300' c` in an empty directory,
> cat *<TAB> will only expand to "a b".

I can believe there's some logic missing here, but it's not currently
clear to me here what.  Could you post an explicit recipe for
getting from an unconfigured shell to an expansion that doesn't
(somehow) display all the elements?

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug#419832: zsh: expanding non-ASCII filenames with <TAB>
  2007-08-17 14:22     ` Peter Stephenson
@ 2007-08-17 14:55       ` Bart Schaefer
  2007-08-20 21:18         ` Peter Stephenson
  2007-08-18  4:29       ` Clint Adams
  1 sibling, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2007-08-17 14:55 UTC (permalink / raw)
  To: zsh-workers; +Cc: 419832-forwarded, Alan Curry

On Aug 17,  3:22pm, Peter Stephenson wrote:
} Subject: Re: Bug#419832: zsh: expanding non-ASCII filenames with <TAB>
}
} On Fri, 17 Aug 2007 08:08:44 -0400
} Clint Adams <schizo@debian.org> wrote:
} > What's worse is that if you `touch a b$'\300' c` in an empty directory,
} > cat *<TAB> will only expand to "a b".
} 
} I can believe there's some logic missing here, but it's not currently
} clear to me here what.  Could you post an explicit recipe for
} getting from an unconfigured shell to an expansion that doesn't
} (somehow) display all the elements?

This is a problem with old compctl, not compsys.

With compinit run and LANG=C and/or "unset LANG" I get:

schaefer<504> cat *<TAB>
schaefer<504> cat a
Completing expansions
a         b$'\300'  c       
Completing all expansions
a b$'\300' c
Completing original
*

With LANG=en_US I get:

schaefer<506> cat *<TAB>
schaefer<506> cat a
Completing expansions
a   bÀ  c 
Completing all expansions
a bÀ c
Completing original
*

With "compsys" turned off (old compctl completion) I get the something
sensible when LANG=en_US:

torch% cat *<TAB>
torch% cat a bÀ c

But the bug appears with LANG unset:

torch% cat*<TAB>
torch% cat a b

So the recipe PWS wants is just to run "zsh -f".


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug#419832: zsh: expanding non-ASCII filenames with <TAB>
  2007-08-17 14:22     ` Peter Stephenson
  2007-08-17 14:55       ` Bart Schaefer
@ 2007-08-18  4:29       ` Clint Adams
  1 sibling, 0 replies; 5+ messages in thread
From: Clint Adams @ 2007-08-18  4:29 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers, Alan Curry, 419832

On Fri, Aug 17, 2007 at 03:22:10PM +0100, Peter Stephenson wrote:
> I can believe there's some logic missing here, but it's not currently
> clear to me here what.  Could you post an explicit recipe for
> getting from an unconfigured shell to an expansion that doesn't
> (somehow) display all the elements?

% zsh -f
percebes% autoload -U compinit;compinit
percebes% mkdir /tmp/blah
percebes% cd !$
cd /tmp/blah
percebes% export LANG=en_US.UTF-8
percebes% touch a b$'\300' c
percebes% cat *<TAB>

(expands to)
percebes% cat a b


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug#419832: zsh: expanding non-ASCII filenames with <TAB>
  2007-08-17 14:55       ` Bart Schaefer
@ 2007-08-20 21:18         ` Peter Stephenson
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Stephenson @ 2007-08-20 21:18 UTC (permalink / raw)
  To: zsh-workers, 419832-forwarded, Alan Curry

On Fri, 17 Aug 2007 07:55:15 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> torch% cat*<TAB>
> torch% cat a b
> 
> So the recipe PWS wants is just to run "zsh -f".

This and the version Clint posted both go through quotestring() with
QT_BACKSLASH.  So the fix is to improve handling of unprintable
characters in that case.  All the tests still pass, but I could
have messed up some unusual case.

By the way, I don't speak octal and I'd personally prefer \x, but
it doesn't matter that much.

Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.164
diff -u -r1.164 utils.c
--- Src/utils.c	10 May 2007 11:36:24 -0000	1.164
+++ Src/utils.c	20 Aug 2007 21:15:46 -0000
@@ -4124,6 +4124,51 @@
     return 0;
 }
 
+
+static char *
+addunprintable(char *v, const char *u, const char *uend)
+{
+    for (; u < uend; u++) {
+	/*
+	 * Just do this byte by byte; there's no great
+	 * advantage in being clever with multibyte
+	 * characters if we don't think they're printable.
+	 */
+	int c;
+	if (*u == Meta)
+	    c = STOUC(*++u ^ 32);
+	else
+	    c = STOUC(*u);
+	switch (c) {
+	case '\0':
+	    *v++ = '\\';
+	    *v++ = '0';
+	    if ('0' <= u[1] && u[1] <= '7') {
+		*v++ = '0';
+		*v++ = '0';
+	    }
+	    break;
+
+	case '\007': *v++ = '\\'; *v++ = 'a'; break;
+	case '\b': *v++ = '\\'; *v++ = 'b'; break;
+	case '\f': *v++ = '\\'; *v++ = 'f'; break;
+	case '\n': *v++ = '\\'; *v++ = 'n'; break;
+	case '\r': *v++ = '\\'; *v++ = 'r'; break;
+	case '\t': *v++ = '\\'; *v++ = 't'; break;
+	case '\v': *v++ = '\\'; *v++ = 'v'; break;
+
+	default:
+	    *v++ = '\\';
+	    *v++ = '0' + ((c >> 6) & 7);
+	    *v++ = '0' + ((c >> 3) & 7);
+	    *v++ = '0' + (c & 7);
+	    break;
+	}
+    }
+
+    return v;
+}
+
 /*
  * Quote the string s and return the result.
  *
@@ -4142,8 +4187,16 @@
 {
     const char *u, *tt;
     char *v;
-    char *buf = hcalloc(4 * strlen(s) + 1);
+    /*
+     * With QT_BACKSLASH we may need to use $'\300' stuff.
+     * Keep memory usage within limits by allocating temporary
+     * storage and using heap for correct size at end.
+     */
+    int alloclen = (instring == QT_BACKSLASH ? 7 : 4) * strlen(s) + 1;
+    char *buf = zshcalloc(alloclen);
     int sf = 0;
+    convchar_t cc;
+    const char *uend;
 
     DPUTS(instring < QT_BACKSLASH || instring > QT_DOLLARS,
 	  "BUG: bad quote type in quotestring");
@@ -4154,10 +4207,9 @@
 	 * As we test for printability here we need to be able
 	 * to look for multibyte characters.
 	 */
-	convchar_t cc;
 	MB_METACHARINIT();
 	while (*u) {
-	    const char *uend = u + MB_METACHARLENCONV(u, &cc);
+	    uend = u + MB_METACHARLENCONV(u, &cc);
 
 	    if (e && !sf && *e <= u) {
 		*e = v;
@@ -4183,53 +4235,19 @@
 		    *v++ = *u++;
 	    } else {
 		/* Not printable */
-		for (; u < uend; u++) {
-		    /*
-		     * Just do this byte by byte; there's no great
-		     * advantage in being clever with multibyte
-		     * characters if we don't think they're printable.
-		     */
-		    int c;
-		    if (*u == Meta)
-			c = STOUC(*++u ^ 32);
-		    else
-			c = STOUC(*u);
-		    switch (c) {
-		    case '\0':
-			*v++ = '\\';
-			*v++ = '0';
-			if ('0' <= u[1] && u[1] <= '7') {
-			    *v++ = '0';
-			    *v++ = '0';
-			}
-			break;
-
-		    case '\007': *v++ = '\\'; *v++ = 'a'; break;
-		    case '\b': *v++ = '\\'; *v++ = 'b'; break;
-		    case '\f': *v++ = '\\'; *v++ = 'f'; break;
-		    case '\n': *v++ = '\\'; *v++ = 'n'; break;
-		    case '\r': *v++ = '\\'; *v++ = 'r'; break;
-		    case '\t': *v++ = '\\'; *v++ = 't'; break;
-		    case '\v': *v++ = '\\'; *v++ = 'v'; break;
-
-		    default:
-			*v++ = '\\';
-			*v++ = '0' + ((c >> 6) & 7);
-			*v++ = '0' + ((c >> 3) & 7);
-			*v++ = '0' + (c & 7);
-			break;
-		    }
-		}
+		v = addunprintable(v, u, uend);
+		u = uend;
 	    }
 	}
     }
     else
     {
 	/*
-	 * Here the only special characters are syntactic, so
-	 * we can go through bytewise.
+	 * Here there are syntactic special characters, so
+	 * we start by going through bytewise.
 	 */
-	for (; *u; u++) {
+	while (*u) {
+	    int dobackslash = 0;
 	    if (e && *e == u)
 		*e = v, sf = 1;
 	    if (*u == Tick || *u == Qtick) {
@@ -4239,8 +4257,6 @@
 		while (*u && *u != c)
 		    *v++ = *u++;
 		*v++ = c;
-		if (!*u)
-		    u--;
 		continue;
 	    } else if ((*u == Qstring || *u == '$') && u[1] == '\'' &&
 		       instring == QT_DOUBLE) {
@@ -4268,9 +4284,7 @@
 		    *v++ = *u++;
 		}
 		if (*u)
-		    *v++ = *u;
-		else
-		    u--;
+		    *v++ = *u++;
 		continue;
 	    }
 	    else if (ispecial(*u) &&
@@ -4296,13 +4310,51 @@
 			*v++ = '"', *v++ = '\n', *v++ = '"';
 		    else
 			*v++ = '\'', *v++ = '\'';
+		    u++;
 		    continue;
-		} else
-		    *v++ = '\\';
+		} else {
+		    /*
+		     * We'll need a backslash, but don't add it
+		     * yet since if the character isn't printable
+		     * we'll have to upgrade it to $'...'.
+		     */
+		    dobackslash = 1;
+		}
 	    }
-	    if(*u == Meta)
+
+	    if (itok(*u) || instring != QT_BACKSLASH) {
+		/* Needs to be passed straight through. */
+		if (dobackslash)
+		    *v++ = '\\';
 		*v++ = *u++;
-	    *v++ = *u;
+		continue;
+	    }
+
+	    /*
+	     * Now check if the output is unprintable in the
+	     * current character set.
+	     */
+	    uend = u + MB_METACHARLENCONV(u, &cc);
+	    if (
+#ifdef MULTIBYTE_SUPPORT
+		cc != WEOF &&
+#endif
+		WC_ISPRINT(cc)) {
+		if (dobackslash)
+		    *v++ = '\\';
+		while (u < uend) {
+		    if (*u == Meta)
+			*v++ = *u++;
+		    *v++ = *u++;
+		}
+	    } else {
+		/* Not printable */
+		*v++ = '$';
+		*v++ = '\'';
+		v = addunprintable(v, u, uend);
+		*v++ = '\'';
+		u = uend;
+	    }
 	}
     }
     *v = '\0';
@@ -4311,7 +4363,9 @@
 	*e = v, sf = 1;
     DPUTS(e && !sf, "BUG: Wild pointer *e in quotestring()");
 
-    return buf;
+    v = dupstring(buf);
+    zfree(buf, alloclen);
+    return v;
 }
 
 /* Unmetafy and output a string, quoted if it contains special characters. */

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-08-20 23:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20070817001222.GA19399@scowler.net>
     [not found] ` <200708170905.l7H9521T1534406@shell01.TheWorld.com>
2007-08-17 12:08   ` Bug#419832: zsh: expanding non-ASCII filenames with <TAB> Clint Adams
2007-08-17 14:22     ` Peter Stephenson
2007-08-17 14:55       ` Bart Schaefer
2007-08-20 21:18         ` Peter Stephenson
2007-08-18  4:29       ` Clint Adams

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).