zsh-workers
 help / color / mirror / code / Atom feed
* PATCH: improve ${(q)...}
@ 2009-04-28 16:46 Peter Stephenson
  2009-04-28 22:37 ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2009-04-28 16:46 UTC (permalink / raw)
  To: Zsh hackers list

It's annoyed me for a while now that single-quoting a value uses
single quotes even where they're not needed, and with existing single
quotes in the value at the start or end you get something quite messy.

This fixes the problem.  Is there a good reason for making this a
different flag, or is altering (q) good enough?  I couldn't think of a
reason why you would need it to be verbose, i.e. why you need

% v=foo
% print ${(qq)v}
'foo'

rather than just

foo

although I can think of arguments the other way.  We've never made any
guarantee over the final form of the quoting, just that it protects it
in the way the shell expects.

Note that the new form takes care that you still get the quotes at the
start and end if quotes are necessary (up to subtleties with embedded
single quotes which are also supposed to be gracefully handled).
This could be optimised in the simple case (no \' at the start), but
that can wait.

I won't commit this before 4.3.10.  New tests needed, too.

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.105
diff -u -r1.105 expn.yo
--- Doc/Zsh/expn.yo	23 Mar 2009 12:17:33 -0000	1.105
+++ Doc/Zsh/expn.yo	28 Apr 2009 16:21:12 -0000
@@ -830,9 +830,13 @@
 quotes for each octet.  If this flag is given
 twice, the resulting words are quoted in single quotes and if it is
 given three times, the words are quoted in double quotes; in these forms
-no special handling of unprintable or invalid characters is attempted.  If
-the flag is given four times, the words are quoted in single quotes
+no special handling of unprintable or invalid characters is attempted.
+If the flag is given four times, the words are quoted in single quotes
 preceded by a tt($).
+
+If the context does not require a particular form of quoting, use
+of single quotes (tt(qq)) is recommended as this takes care to
+emit the minimum number of quotes required.
 )
 item(tt(Q))(
 Remove one level of quotes from the resulting words.
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.97
diff -u -r1.97 subst.c
--- Src/subst.c	23 Mar 2009 12:17:33 -0000	1.97
+++ Src/subst.c	28 Apr 2009 16:21:12 -0000
@@ -2800,8 +2800,29 @@
      * the repetitions of the (q) flag.
      */
     if (quotemod) {
+	int pre = 0, post = 0;
+
 	if (quotetype > QT_DOLLARS)
 	    quotetype = QT_DOLLARS;
+	if (quotemod > 0 && quotetype > QT_BACKSLASH) {
+	    switch (quotetype)
+	    {
+	    case QT_DOLLARS:
+		/* space for "$" */
+		pre = 2;
+		post = 1;
+		break;
+
+	    case QT_SINGLE:
+		quotetype = QT_SINGLE_OPTIONAL;
+		/* quotes will be added for us */
+		break;
+
+	    default:
+		pre = post = 1;
+		break;
+	    }
+	}
 	if (isarr) {
 	    char **ap;
 
@@ -2815,13 +2836,13 @@
 		    char *tmp;
 
 		    for (; *ap; ap++) {
-			int pre = quotetype != QT_DOLLARS ? 1 : 2;
 			tmp = quotestring(*ap, NULL, quotetype);
 			sl = strlen(tmp);
-			*ap = (char *) zhalloc(pre + sl + 2);
+			*ap = (char *) zhalloc(pre + sl + post + 1);
 			strcpy((*ap) + pre, tmp);
-			ap[0][pre - 1] = ap[0][pre + sl] =
-			    (quotetype != QT_DOUBLE ? '\'' : '"');
+			if (pre)
+			    ap[0][pre - 1] = ap[0][pre + sl] =
+				(quotetype != QT_DOUBLE ? '\'' : '"');
 			ap[0][pre + sl + 1] = '\0';
 			if (quotetype == QT_DOLLARS)
 			  ap[0][0] = '$';
@@ -2852,15 +2873,15 @@
 		val = dupstring(val), copied = 1;
 	    if (quotemod > 0) {
 		if (quotetype > QT_BACKSLASH) {
-		    int pre = quotetype != QT_DOLLARS ? 1 : 2;
 		    int sl;
 		    char *tmp;
 		    tmp = quotestring(val, NULL, quotetype);
 		    sl = strlen(tmp);
 		    val = (char *) zhalloc(pre + sl + 2);
 		    strcpy(val + pre, tmp);
-		    val[pre - 1] = val[pre + sl] =
-			(quotetype != QT_DOUBLE ? '\'' : '"');
+		    if (pre)
+			val[pre - 1] = val[pre + sl] =
+			    (quotetype != QT_DOUBLE ? '\'' : '"');
 		    val[pre + sl + 1] = '\0';
 		    if (quotetype == QT_DOLLARS)
 		      val[0] = '$';
Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.221
diff -u -r1.221 utils.c
--- Src/utils.c	6 Apr 2009 09:06:36 -0000	1.221
+++ Src/utils.c	28 Apr 2009 16:21:12 -0000
@@ -4401,6 +4401,11 @@
  * 
  * The last argument is a QT_ value defined in zsh.h other than QT_NONE.
  *
+ * Most quote styles other than backslash assume the quotes are to
+ * be added outside quotestring().  QT_SINGLE_OPTIONAL is different:
+ * the single quotes are only added where necessary, so the
+ * whole expression is handled here.
+ *
  * The string may be metafied and contain tokens.
  */
 
@@ -4410,20 +4415,50 @@
 {
     const char *u, *tt;
     char *v;
+    int alloclen;
+    char *buf;
+    int sf = 0;
     /*
-     * With QT_BACKSLASH we may need to use $'\300' stuff.
-     * Keep memory usage within limits by allocating temporary
-     * storage and using heap for correct size at end.
+     * quotesub is used with QT_SINGLE_OPTIONAL.
+     * quotesub = 0:  mechanism not active
+     * quotesub = 1:  mechanism pending, no "'" yet;
+     *                needs adding at quotestart.
+     * quotesub = 2:  mechanism active, added opening "'"; need
+     *                closing "'".
      */
-    int alloclen = (instring == QT_BACKSLASH ? 7 : 4) * strlen(s) + 1;
-    char *buf = zshcalloc(alloclen);
-    int sf = 0;
+    int quotesub = 0;
+    char *quotestart;
     convchar_t cc;
     const char *uend;
 
-    DPUTS(instring < QT_BACKSLASH || instring > QT_DOLLARS,
+    switch (instring)
+    {
+    case QT_BACKSLASH:
+	/*
+	 * With QT_BACKSLASH we may need to use $'\300' stuff.
+	 * Keep memory usage within limits by allocating temporary
+	 * storage and using heap for correct size at end.
+	 */
+	alloclen = strlen(s) * 7 + 1;
+	break;
+
+    case QT_SINGLE_OPTIONAL:
+	/*
+	 * Here, we may need to add single quotes.
+	 */
+	alloclen = strlen(s) * 4 + 3;
+	quotesub = 1;
+	break;
+
+    default:
+	alloclen = strlen(s) * 4 + 1;
+	break;
+    }
+    tt = quotestart = v = buf = zshcalloc(alloclen);
+
+    DPUTS(instring < QT_BACKSLASH || instring == QT_BACKTICK ||
+	  instring > QT_SINGLE_OPTIONAL,
 	  "BUG: bad quote type in quotestring");
-    tt = v = buf;
     u = s;
     if (instring == QT_DOLLARS) {
 	/*
@@ -4519,12 +4554,64 @@
 		       (u[-1] == '=' || u[-1] == ':')) ||
 		      (*u == '~' && isset(EXTENDEDGLOB))) &&
 		     (instring == QT_BACKSLASH ||
+		      instring == QT_SINGLE_OPTIONAL ||
 		      (isset(BANGHIST) && *u == (char)bangchar &&
 		       instring != QT_SINGLE) ||
 		      (instring == QT_DOUBLE &&
 		       (*u == '$' || *u == '`' || *u == '\"' || *u == '\\')) ||
 		      (instring == QT_SINGLE && *u == '\''))) {
-		if (*u == '\n' || (instring == QT_SINGLE && *u == '\'')) {
+		if (instring == QT_SINGLE_OPTIONAL) {
+		    if (quotesub == 1) {
+			/*
+			 * We haven't yet had to quote at the start.
+			 */
+			if (*u == '\'') {
+			    /*
+			     * We don't need to.
+			     */
+			    *v++ = '\\';
+			} else {
+			    /*
+			     * It's now time to add quotes.
+			     */
+			    if (v > quotestart)
+			    {
+				char *addq;
+
+				for (addq = v; addq > quotestart; addq--)
+				    *addq = addq[-1];
+			    }
+			    *quotestart = '\'';
+			    v++;
+			    quotesub = 2;
+			}
+			*v++ = *u++;
+			/*
+			 * Next place to start quotes is here.
+			 */
+			quotestart = v;
+		    } else if (*u == '\'') {
+			if (unset(RCQUOTES)) {
+			    *v++ = '\'';
+			    *v++ = '\\';
+			    *v++ = '\'';
+			    /* Don't restart quotes unless we need them */
+			    quotesub = 1;
+			    quotestart = v;
+			} else {
+			    /* simplest just to use '' always */
+			    *v++ = '\'';
+			    *v++ = '\'';
+			}
+			/* dealt with */
+			u++;
+		    } else {
+			/* else already quoting, just add */
+			*v++ = *u++;
+		    }
+		    continue;
+		} else if (*u == '\n' ||
+			   (instring == QT_SINGLE && *u == '\'')) {
 		    if (unset(RCQUOTES)) {
 			*v++ = '\'';
 			if (*u == '\'')
@@ -4582,6 +4669,8 @@
 	    }
 	}
     }
+    if (quotesub == 2)
+	*v++ = '\'';
     *v = '\0';
 
     if (e && *e == u)
Index: Src/zsh.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v
retrieving revision 1.155
diff -u -r1.155 zsh.h
--- Src/zsh.h	24 Mar 2009 12:52:08 -0000	1.155
+++ Src/zsh.h	28 Apr 2009 16:21:13 -0000
@@ -210,8 +210,15 @@
      * in those cases where we need to represent a complete set.
      */
     QT_BACKTICK,
+    /*
+     * Single quotes, but the default is not to quote unless necessary.
+     * This is only useful as an argument to quotestring().
+     */
+    QT_SINGLE_OPTIONAL
 };
 
+#define QT_IS_SINGLE(x)	((x) == QT_SINGLE || (x) == QT_SINGLE_OPTIONAL)
+
 /*
  * Lexical tokens: unlike the character tokens above, these never
  * appear in strings and don't necessarily represent a single character.
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.36
diff -u -r1.36 D04parameter.ztst
--- Test/D04parameter.ztst	9 Oct 2008 13:46:46 -0000	1.36
+++ Test/D04parameter.ztst	28 Apr 2009 16:21:13 -0000
@@ -321,7 +321,7 @@
   print -r ${(qqqq)foo}
 0:${(q...)...}
 >playing\ \'stupid\'\ \"games\"\ \\w\\i\\t\\h\ \$quoting.
->'playing '\''stupid'\'' "games" \w\i\t\h $quoting.'
+>'playing '\'stupid\'' "games" \w\i\t\h $quoting.'
 >"playing 'stupid' \"games\" \\w\\i\\t\\h \$quoting."
 >$'playing \'stupid\' "games" \\w\\i\\t\\h $quoting.'
 



-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: improve ${(q)...}
  2009-04-28 16:46 PATCH: improve ${(q)...} Peter Stephenson
@ 2009-04-28 22:37 ` Bart Schaefer
  2009-04-29  8:51   ` Peter Stephenson
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2009-04-28 22:37 UTC (permalink / raw)
  To: Zsh hackers list

On Apr 28,  5:46pm, Peter Stephenson wrote:
}
} It's annoyed me for a while now that single-quoting a value uses
} single quotes even where they're not needed, and with existing single
} quotes in the value at the start or end you get something quite messy.

Hmm.

} This fixes the problem.  Is there a good reason for making this a
} different flag, or is altering (q) good enough?  I couldn't think of a
} reason why you would need it to be verbose

The proposed patch changes the behavior of nested (q) options rather
significantly.  E.g., ${(qqq)${(qq)v}} becomes very different.  This
could be important if the resulting string is going to be processed
with "eval".

Also keep in mind what happens when strings are catenated by adjacency.

x=\$foo
y=bar
foo=this
foobar=that
eval print ${x}${(qq)y}

Would you really expect "that" rather than "thisbar"?  (Admittedly an
edge case.)

Although it's true that we never "guaranteed" the quoting behavior,
it's implicit in the doc:

       ... If this flag is given twice, the resulting words
       are quoted in single quotes and if it is given three times,
       the words are quoted in double quotes ...

Note it *doesn't* say "as if in <mumble> quotes".  Wrapping the entire
string is also consistent with (qqqq), which is distinct from (q) which
applies $'...' only where needed.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: improve ${(q)...}
  2009-04-28 22:37 ` Bart Schaefer
@ 2009-04-29  8:51   ` Peter Stephenson
  2009-04-29 16:35     ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2009-04-29  8:51 UTC (permalink / raw)
  To: Zsh hackers list

Bart Schaefer wrote:
> } This fixes the problem.  Is there a good reason for making this a
> } different flag, or is altering (q) good enough?  I couldn't think of a
> } reason why you would need it to be verbose
> 
> The proposed patch changes the behavior of nested (q) options rather
> significantly.  E.g., ${(qqq)${(qq)v}} becomes very different.  This
> could be important if the resulting string is going to be processed
> with "eval".

I don't follow that.  The quotation is present if the inner quotation
would cause the word to behave differently when unquoted.  Stripping
multiple levels of quotation should therefore still work.  Or are you
thinking of things like the example below?

> Also keep in mind what happens when strings are catenated by adjacency.
> 
> x=\$foo
> y=bar
> foo=this
> foobar=that
> eval print ${x}${(qq)y}
> 
> Would you really expect "that" rather than "thisbar"?  (Admittedly an
> edge case.)

I won't rely on my expectation here---it's different with backslash
quoting anyway.  However, you've certainly spotted a case I hadn't where
this changes the behaviour.  I can look for different letters...
Hmm, we could use qQ which is currently meaningless as a sort of
shorthand for "minimal quoting"...?

> Although it's true that we never "guaranteed" the quoting behavior,
> it's implicit in the doc:
> 
>        ... If this flag is given twice, the resulting words
>        are quoted in single quotes and if it is given three times,
>        the words are quoted in double quotes ...
> 
> Note it *doesn't* say "as if in <mumble> quotes".

OK, I see you can claim the annoying inconsistency with backslash is a
feature.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: improve ${(q)...}
  2009-04-29  8:51   ` Peter Stephenson
@ 2009-04-29 16:35     ` Bart Schaefer
  2009-04-29 16:42       ` Peter Stephenson
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2009-04-29 16:35 UTC (permalink / raw)
  To: Zsh hackers list

On Apr 29,  9:51am, Peter Stephenson wrote:
}
} > The proposed patch changes the behavior of nested (q) options rather
} > significantly.  E.g., ${(qqq)${(qq)v}} becomes very different.  This
} > could be important if the resulting string is going to be processed
} > with "eval".
} 
} I don't follow that.  The quotation is present if the inner quotation
} would cause the word to behave differently when unquoted.  Stripping
} multiple levels of quotation should therefore still work.  Or are you
} thinking of things like the example below?

I'm not necessarily thinking of any specific example.  With the current
code, ${(qqq)${(qq)v}} produces "'foo'" which still has single quotes
after one level of quotes is stripped.  If that's combined with another
string that also contains single quotes, whether by directly adjacent
concatenation or by interpolation inside a double-quoted string, etc.,
those quotes might mean something.
 
} I won't rely on my expectation here---it's different with backslash
} quoting anyway.  However, you've certainly spotted a case I hadn't where
} this changes the behaviour.  I can look for different letters...
} Hmm, we could use qQ which is currently meaningless as a sort of
} shorthand for "minimal quoting"...?

What does (qqQQ) mean, then?  Or (qQq)?  It'd probably be better to pick
something that currently isn't even valid syntax, like (q-) ... I was
all set to suggest (q1) (q2) etc. but (q0) already means "split at NULL
bytes and then quote".  However, (q-) raises the question of what (-q)
means, and whether (-) by itself means anything; the only character that
is currently "attached" to a preceding flag is ":" and (q:EXPR:) seems
like overkill.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: improve ${(q)...}
  2009-04-29 16:35     ` Bart Schaefer
@ 2009-04-29 16:42       ` Peter Stephenson
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Stephenson @ 2009-04-29 16:42 UTC (permalink / raw)
  To: Zsh hackers list

Bart Schaefer wrote:
> On Apr 29,  9:51am, Peter Stephenson wrote:
> I'm not necessarily thinking of any specific example.  With the current
> code, ${(qqq)${(qq)v}} produces "'foo'" which still has single quotes
> after one level of quotes is stripped.  If that's combined with another
> string that also contains single quotes, whether by directly adjacent
> concatenation or by interpolation inside a double-quoted string, etc.,
> those quotes might mean something.

I still can't see how this can have an effect, but I think it's moot
anyway.

> } I won't rely on my expectation here---it's different with backslash
> } quoting anyway.  However, you've certainly spotted a case I hadn't where
> } this changes the behaviour.  I can look for different letters...
> } Hmm, we could use qQ which is currently meaningless as a sort of
> } shorthand for "minimal quoting"...?
> 
> What does (qqQQ) mean, then?  Or (qQq)?  It'd probably be better to pick
> something that currently isn't even valid syntax, like (q-) ... I was
> all set to suggest (q1) (q2) etc. but (q0) already means "split at NULL
> bytes and then quote".  However, (q-) raises the question of what (-q)
> means, and whether (-) by itself means anything; the only character that
> is currently "attached" to a preceding flag is ":" and (q:EXPR:) seems
> like overkill.

(q-) seems harmless and - on its own can be a syntax error as at
present.  I can't see any problem with that.  All this means as a
precedent is it would probably be better never to add - as a flag on its
own, which I can live with.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-04-29 16:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-28 16:46 PATCH: improve ${(q)...} Peter Stephenson
2009-04-28 22:37 ` Bart Schaefer
2009-04-29  8:51   ` Peter Stephenson
2009-04-29 16:35     ` Bart Schaefer
2009-04-29 16:42       ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).