zsh-workers
 help / color / mirror / code / Atom feed
* [bug] backslash stripped in sh/ksh emulation
@ 2005-10-11  8:38 Stephane Chazelas
  2005-10-11  8:53 ` Stephane Chazelas
  2005-10-11 11:36 ` Peter Stephenson
  0 siblings, 2 replies; 9+ messages in thread
From: Stephane Chazelas @ 2005-10-11  8:38 UTC (permalink / raw)
  To: Zsh hackers list

Hi guys,

$ ARGV0=ksh zsh -xc 'a="\\*"; case $a in *\\*) echo a;; esac'
+ a='\*'
+ case * (*\*)

Can anyone explain it? It's OK if $a is quoted as in <case "$a">

After investigation, it appears it is triggered by globsubst.

Best regards,
Stéphane


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug] backslash stripped in sh/ksh emulation
  2005-10-11  8:38 [bug] backslash stripped in sh/ksh emulation Stephane Chazelas
@ 2005-10-11  8:53 ` Stephane Chazelas
  2005-10-11 11:36 ` Peter Stephenson
  1 sibling, 0 replies; 9+ messages in thread
From: Stephane Chazelas @ 2005-10-11  8:53 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, Oct 11, 2005 at 09:38:42AM +0100, Stephane Chazelas wrote:
> Hi guys,
> 
> $ ARGV0=ksh zsh -xc 'a="\\*"; case $a in *\\*) echo a;; esac'
> + a='\*'
> + case * (*\*)
[...]

Same problem with:

~$ zsh -c 'setopt globsubst; a="\\\\\\$"; b=${a#?}; print -r "$a" "$b"'
\\\$ \$
~$ zsh -c 'setopt globsubst; a="\\\\\\$"; b="${a#?}"; print -r "$a" "$b"'
\\\$ \\$

best regards,
Stéphane


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug] backslash stripped in sh/ksh emulation
  2005-10-11  8:38 [bug] backslash stripped in sh/ksh emulation Stephane Chazelas
  2005-10-11  8:53 ` Stephane Chazelas
@ 2005-10-11 11:36 ` Peter Stephenson
  2005-10-11 11:43   ` Peter Stephenson
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Peter Stephenson @ 2005-10-11 11:36 UTC (permalink / raw)
  To: zsh-workers

Stephane Chazelas <Stephane_Chazelas@yahoo.fr> wrote:
> $ ARGV0=ksh zsh -xc 'a="\\*"; case $a in *\\*) echo a;; esac'
> + a='\*'
> + case * (*\*)
> 
> Can anyone explain it? It's OK if $a is quoted as in <case "$a">
> 
> After investigation, it appears it is triggered by globsubst.

You're basically pointing out that in:

a='\\*'
print -r ${~a}

the output is

\*

with only a single backslash (I'm assuming nonomatch is set and the pattern
doesn't match a file --- everything is OK if it does).  Yes, this does seem
to be a bug: the string shouldn't be altered if it doesn't match a pattern,
and the use is inconsistent with other shells, which is bad here since
glob_subst is partly designed to provide compatibility.

Fixing it isn't that trivial, given the constraints:

- Nothing else should change, i.e. the myriad other points at which
  quoting takes place shouldn't be affected.
- Also, the quoting effect of the first backslash should be retained
  in pattern matching (in other words, ${~a} in the example should match a
  single backslash).
- zsh removes the so-called "null" arguments, essentially the ghosts
  of quoting characters that are only needed to recreate a printed
  representation of the input string, before pattern matching takes
  place, so restoring it after isn't natural in the current code.

The last one is the really tricky one.  Possible a complete rewrite of the
way quoting is handled would make this all work, but I'm not going down
that road.

So I've introduced a variant of Bnull, the ghost of a backslash, called
Bnullkeep.  This is only inserted in the code used for globsubst, isn't
removed by remnulargs(), and is explicitly ignored by pattern matching.  If
the pattern match failed then untokenize() will restore the backslash to
output the original string.

It's possible there are inconsistencies in that there are places that Bnull
and Bnullkeep are treated the same when they should be different, or vice
versa.  It's also possible that there's a point in the code that expects a
tokenized string but hasn't been taught properly about Bnullkeep, since the
pattern matching is the only part I've modified.  However, none of the
existing tests fail, and the new test I've added succeeds.

Nonetheless, this doesn't give me a warm, fuzzy feeling.  If anyone can
think of places where this might have the wrong effect...

I will not be applying this to 4.2.

(It's just possible we could get away with only using Bnullkeep by suitable
changes, simplifying the code a bit.)

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug] backslash stripped in sh/ksh emulation
  2005-10-11 11:36 ` Peter Stephenson
@ 2005-10-11 11:43   ` Peter Stephenson
  2005-10-11 12:57   ` Stephane Chazelas
  2005-10-11 14:43   ` Bart Schaefer
  2 siblings, 0 replies; 9+ messages in thread
From: Peter Stephenson @ 2005-10-11 11:43 UTC (permalink / raw)
  To: zsh-workers

Peter Stephenson <pws@csr.com> wrote:
> So I've introduced a variant of Bnull, the ghost of a backslash, called
> Bnullkeep.  This is only inserted in the code used for globsubst, isn't
> removed by remnulargs(), and is explicitly ignored by pattern matching.  If
> the pattern match failed then untokenize() will restore the backslash to
> output the original string.

Er, and here's the actual code...

Note there is a minor fix to ztest.zsh which was garbling \'s in output
from diff because it used echo's backslash convention.

Index: Src/glob.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/glob.c,v
retrieving revision 1.46
diff -u -r1.46 glob.c
--- Src/glob.c	18 Aug 2005 10:17:52 -0000	1.46
+++ Src/glob.c	11 Oct 2005 11:12:35 -0000
@@ -2487,19 +2487,29 @@
 mod_export void
 tokenize(char *s)
 {
-    zshtokenize(s, 0);
+    zshtokenize(s, 0, 0);
 }
 
+/*
+ * shtokenize is used when we tokenize a string with GLOB_SUBST set.
+ * In that case we need to retain backslashes when we turn the
+ * pattern back into a string, so that the string is not
+ * modified if it failed to match a pattern.
+ *
+ * It may be modified by the effect of SH_GLOB which turns off
+ * various zsh-specific options.
+ */
+
 /**/
 mod_export void
 shtokenize(char *s)
 {
-    zshtokenize(s, isset(SHGLOB));
+    zshtokenize(s, 1, isset(SHGLOB));
 }
 
 /**/
 static void
-zshtokenize(char *s, int shglob)
+zshtokenize(char *s, int glbsbst, int shglob)
 {
     char *t;
     int bslash = 0;
@@ -2508,9 +2518,10 @@
       cont:
 	switch (*s) {
 	case Bnull:
+	case Bnullkeep:
 	case '\\':
 	    if (bslash) {
-		s[-1] = Bnull;
+		s[-1] = glbsbst ? Bnullkeep : Bnull;
 		break;
 	    }
 	    bslash = 1;
@@ -2519,7 +2530,7 @@
 	    if (shglob)
 		break;
 	    if (bslash) {
-		s[-1] = Bnull;
+		s[-1] = glbsbst ? Bnullkeep : Bnull;
 		break;
 	    }
 	    t = s;
@@ -2549,7 +2560,7 @@
 	    for (t = ztokens; *t; t++)
 		if (*t == *s) {
 		    if (bslash)
-			s[-1] = Bnull;
+			s[-1] = glbsbst ? Bnullkeep : Bnull;
 		    else
 			*s = (t - ztokens) + Pound;
 		    break;
@@ -2569,12 +2580,23 @@
 	char *o = s, c;
 
 	while ((c = *s++))
-	    if (INULL(c)) {
+	    if (c == Bnullkeep) {
+		/*
+		 * An active backslash that needs to be turned back into
+		 * a real backslash for output.  However, we don't
+		 * do that yet since we need to ignore it during
+		 * pattern matching.
+		 */
+		continue;
+	    } else if (INULL(c)) {
 		char *t = s - 1;
 
-		while ((c = *s++))
-		    if (!INULL(c))
+		while ((c = *s++)) {
+		    if (c == Bnullkeep)
+			*t++ = '\\';
+		    else if (!INULL(c))
 			*t++ = c;
+		}
 		*t = '\0';
 		if (!*o) {
 		    o[0] = Nularg;
Index: Src/lex.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/lex.c,v
retrieving revision 1.30
diff -u -r1.30 lex.c
--- Src/lex.c	10 Aug 2005 10:56:41 -0000	1.30
+++ Src/lex.c	11 Oct 2005 11:12:35 -0000
@@ -33,7 +33,7 @@
 /* tokens */
 
 /**/
-mod_export char ztokens[] = "#$^*()$=|{}[]`<>?~`,'\"\\";
+mod_export char ztokens[] = "#$^*()$=|{}[]`<>?~`,'\"\\\\";
 
 /* parts of the current token */
 
Index: Src/pattern.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v
retrieving revision 1.28
diff -u -r1.28 pattern.c
--- Src/pattern.c	20 Sep 2005 15:10:27 -0000	1.28
+++ Src/pattern.c	11 Oct 2005 11:12:36 -0000
@@ -260,13 +260,13 @@
 
 static char endstr[] = {
     '/',			/* file only */
-    '\0', Bar, Outpar, Quest, Star, Inbrack, Inpar, Inang,
+    '\0', Bar, Outpar, Quest, Star, Inbrack, Inpar, Inang, Bnullkeep,
 				/* all patterns */
     Tilde, Hat, Pound		/* extended glob only */
 };
 
-#define PATENDSTRLEN_NORM 9
-#define PATENDSTRLEN_EXT  12
+#define PATENDSTRLEN_NORM 10
+#define PATENDSTRLEN_EXT  13
 
 
 /* Default size for pattern buffer */
@@ -1240,6 +1240,13 @@
 	     */
 	    return 0;
 	    break;
+	case Bnullkeep:
+	    /*
+	     * Marker for restoring a backslash in output:
+	     * does not match a character.
+	     */
+	    return patcomppiece(flagp);
+	    break;
 #ifdef DEBUG
 	default:
 	    dputs("BUG: character not handled in patcomppiece");
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.40
diff -u -r1.40 subst.c
--- Src/subst.c	7 Dec 2004 16:55:03 -0000	1.40
+++ Src/subst.c	11 Oct 2005 11:12:36 -0000
@@ -1945,7 +1945,7 @@
 	     */
 	    for (ptr = s; (c = *ptr) && c != '/'; ptr++)
 	    {
-		if ((c == Bnull || c == '\\') && ptr[1])
+		if ((c == Bnull || c == Bnullkeep || c == '\\') && ptr[1])
 		{
 		    if (ptr[1] == '/')
 			chuck(ptr);
@@ -2846,11 +2846,11 @@
 		}
 		zsfree(hsubr);
 		for (tt = hsubl; *tt; tt++)
-		    if (INULL(*tt))
+		    if (INULL(*tt) && *tt != Bnullkeep)
 			chuck(tt--);
 		untokenize(hsubl);
 		for (tt = hsubr = ztrdup(ptr2); *tt; tt++)
-		    if (INULL(*tt))
+		    if (INULL(*tt) && *tt != Bnullkeep)
 			chuck(tt--);
 		ptr2[-1] = del;
 		if (sav)
Index: Src/zsh.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v
retrieving revision 1.76
diff -u -r1.76 zsh.h
--- Src/zsh.h	8 Aug 2005 16:49:11 -0000	1.76
+++ Src/zsh.h	11 Oct 2005 11:12:36 -0000
@@ -120,7 +120,10 @@
 
 #define DEFAULT_IFS	" \t\n\203 "
 
-/* Character tokens */
+/*
+ * Character tokens.
+ * These should match the characters in ztokens, defined in lex.c
+ */
 #define Pound		((char) 0x84)
 #define String		((char) 0x85)
 #define Hat		((char) 0x86)
@@ -141,15 +144,33 @@
 #define Tilde		((char) 0x95)
 #define Qtick		((char) 0x96)
 #define Comma		((char) 0x97)
+/*
+ * Null arguments: placeholders for single and double quotes
+ * and backslashes.
+ */
 #define Snull		((char) 0x98)
 #define Dnull		((char) 0x99)
 #define Bnull		((char) 0x9a)
-#define Nularg		((char) 0x9b)
+/*
+ * Backslash which will be returned to "\" instead of being stripped
+ * when we turn the string into a printable format.
+ */
+#define Bnullkeep       ((char) 0x9b)
+/*
+ * Null argument that does not correspond to any character.
+ * This should be last as it does not appear in ztokens and
+ * is used to initialise the IMETA type in inittyptab().
+ */
+#define Nularg		((char) 0x9c)
 
-#define INULL(x)	(((x) & 0xfc) == 0x98)
+#define INULL(x)	(((x) & 0xf8) == 0x98)
 
+/*
+ * Take care to update the use of IMETA appropriately when adding
+ * tokens here.
+ */
 /* Marker used in paramsubst for rc_expand_param */
-#define Marker		((char) 0x9c)
+#define Marker		((char) 0xa0)
 
 /* chars that need to be quoted if meant literally */
 
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.12
diff -u -r1.12 D04parameter.ztst
--- Test/D04parameter.ztst	22 Aug 2005 11:43:36 -0000	1.12
+++ Test/D04parameter.ztst	11 Oct 2005 11:12:36 -0000
@@ -196,6 +196,20 @@
 >* boringfile evenmoreboringfile boringfile evenmoreboringfile
 >boringfile evenmoreboringfile
 
+# The following tests a bug where globsubst didn't preserve
+# backslashes when printing out the original string.
+  str1='\\*\\'
+  (
+  setopt globsubst nonomatch
+  [[ \\\\ = $str1 ]] && print -r '\\ matched by' $str1
+  [[ \\foo\\ = $str1 ]] && print -r '\\foo matched by' $str1
+  [[ a\\b\\ = $str1 ]] || print -r 'a\\b not matched by' $str1
+  )
+0:globsubst with backslashes
+>\\ matched by \\*\\
+>\\foo matched by \\*\\
+>a\\b not matched by \\*\\
+
   print -l "${$(print one word)}" "${=$(print two words)}"
 0:splitting of $(...) inside ${...}
 >one word
Index: Test/ztst.zsh
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/ztst.zsh,v
retrieving revision 1.22
diff -u -r1.22 ztst.zsh
--- Test/ztst.zsh	9 Aug 2005 06:51:40 -0000	1.22
+++ Test/ztst.zsh	11 Oct 2005 11:12:36 -0000
@@ -280,7 +280,7 @@
   diff_out=$(diff "$@")
   diff_ret="$?"
   if [[ "$diff_ret" != "0" ]]; then
-    echo "$diff_out"
+    print -r "$diff_out"
   fi
 
   return "$diff_ret"


-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug] backslash stripped in sh/ksh emulation
  2005-10-11 11:36 ` Peter Stephenson
  2005-10-11 11:43   ` Peter Stephenson
@ 2005-10-11 12:57   ` Stephane Chazelas
  2005-10-11 13:20     ` Peter Stephenson
  2005-10-11 14:43   ` Bart Schaefer
  2 siblings, 1 reply; 9+ messages in thread
From: Stephane Chazelas @ 2005-10-11 12:57 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, Oct 11, 2005 at 12:36:24PM +0100, Peter Stephenson wrote:
> Stephane Chazelas <Stephane_Chazelas@yahoo.fr> wrote:
> > $ ARGV0=ksh zsh -xc 'a="\\*"; case $a in *\\*) echo a;; esac'
> > + a='\*'
> > + case * (*\*)
> > 
> > Can anyone explain it? It's OK if $a is quoted as in <case "$a">
> > 
> > After investigation, it appears it is triggered by globsubst.
> 
> You're basically pointing out that in:
> 
> a='\\*'
> print -r ${~a}
> 
> the output is
> 
> \*
[...]

Not really. I could be fine with that (though other shells
behave differently) as leaving a variable unquoted in list
context (as in arguments to a command above) is known not to be
reliable anyway.

My concern was more in non-list contexts.

In

emulate sh
var='\\'
newvar=$var

I expect newvar to be the exact copy of var.

-- 
Stéphane


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug] backslash stripped in sh/ksh emulation
  2005-10-11 12:57   ` Stephane Chazelas
@ 2005-10-11 13:20     ` Peter Stephenson
  2005-10-11 14:45       ` Stephane Chazelas
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2005-10-11 13:20 UTC (permalink / raw)
  To: Zsh hackers list

Stephane Chazelas wrote:
> In
> 
> emulate sh
> var='\\'
> newvar=$var
> 
> I expect newvar to be the exact copy of var.

Yes, that's basically what I was fixing.

pws


This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug] backslash stripped in sh/ksh emulation
  2005-10-11 11:36 ` Peter Stephenson
  2005-10-11 11:43   ` Peter Stephenson
  2005-10-11 12:57   ` Stephane Chazelas
@ 2005-10-11 14:43   ` Bart Schaefer
  2005-10-11 15:06     ` Peter Stephenson
  2 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2005-10-11 14:43 UTC (permalink / raw)
  To: zsh-workers

On Oct 11, 12:36pm, Peter Stephenson wrote:
}
} So I've introduced a variant of Bnull, the ghost of a backslash, called
} Bnullkeep.

I was a bit skeptical of this when I read the description, but looking at
the patch I think it's actually OK.  This shouldn't change anything when
GLOB_SUBST is not set, right?

} Nonetheless, this doesn't give me a warm, fuzzy feeling.  If anyone can
} think of places where this might have the wrong effect...

subst.c and exec.c seem to be the only source files that mention both
INULL and GLOBSUBST, or call shtokenize() [exec.c only when GLOBSUBST
is set].  So on first blush I think you've covered it nicely.
 
} (It's just possible we could get away with only using Bnullkeep by suitable
} changes, simplifying the code a bit.)

Did you mean "only using Bnull"?  Otherwise I don't follow.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug] backslash stripped in sh/ksh emulation
  2005-10-11 13:20     ` Peter Stephenson
@ 2005-10-11 14:45       ` Stephane Chazelas
  0 siblings, 0 replies; 9+ messages in thread
From: Stephane Chazelas @ 2005-10-11 14:45 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, Oct 11, 2005 at 02:20:25PM +0100, Peter Stephenson wrote:
> Stephane Chazelas wrote:
> > In
> > 
> > emulate sh
> > var='\\'
> > newvar=$var
> > 
> > I expect newvar to be the exact copy of var.
> 
> Yes, that's basically what I was fixing.
[...]

Thanks,

I'm always impressed by the time it takes (well it doesn't take,
should I say) zsh bugs to be fixed.

Best regards, and thanks again for zsh,
Stéphane


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug] backslash stripped in sh/ksh emulation
  2005-10-11 14:43   ` Bart Schaefer
@ 2005-10-11 15:06     ` Peter Stephenson
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Stephenson @ 2005-10-11 15:06 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote:
> On Oct 11, 12:36pm, Peter Stephenson wrote:
> }
> } So I've introduced a variant of Bnull, the ghost of a backslash, called
> } Bnullkeep.
> 
> I was a bit skeptical of this when I read the description, but looking at
> the patch I think it's actually OK.  This shouldn't change anything when
> GLOB_SUBST is not set, right?

Yes, that's why I picked shtokenize(): it seems to be tied to
GLOB_SUBST.

> } (It's just possible we could get away with only using Bnullkeep by suitable
> } changes, simplifying the code a bit.)
> 
> Did you mean "only using Bnull"?  Otherwise I don't follow.

What I meant is that it may be possible to use only the new code,
so that all Bnull-style efforts are only removed right at the last
minute, including any which are currently Bnull.  However, I haven't
thought this through.

pws


This message has been scanned for viruses by BlackSpider MailControl - www.blackspider.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-10-11 15:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-11  8:38 [bug] backslash stripped in sh/ksh emulation Stephane Chazelas
2005-10-11  8:53 ` Stephane Chazelas
2005-10-11 11:36 ` Peter Stephenson
2005-10-11 11:43   ` Peter Stephenson
2005-10-11 12:57   ` Stephane Chazelas
2005-10-11 13:20     ` Peter Stephenson
2005-10-11 14:45       ` Stephane Chazelas
2005-10-11 14:43   ` Bart Schaefer
2005-10-11 15:06     ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).