zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH] _arguments: Escape colons and backslashes in $opt_args unambiguously.
@ 2016-09-04 18:26 Daniel Shahaf
  2016-09-07  7:03 ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Shahaf @ 2016-09-04 18:26 UTC (permalink / raw)
  To: zsh-workers

---
tl;dr: the encoding of values into $opt_args is ambiguous.

For a longer description of the issue being fixed, see the README hunk
of the patch.  There's a README entry because this incompatibly affects
completion functions that use $opt_args[-x] where the argument to -x is
expected to contain backslashes (as in «foo -c 'hello\world'»).

I found this while looking into the _libvirt backslashed colons, see
39158 (I'm replying to it separately).

I verified this patch by hand, using:
.
    _f () {
        local context state state_descr line
        typeset -A opt_args
        _arguments : -c:foo:-\>state:bar:-\>state2
        typeset -p opt_args > /dev/tty
    }

Is it okay to refer to NEWS from the manual?

In the manual I wrote "See NEWS" even though the pointed-to text is in
README because (a) NEWS includes README by reference, (b) I doubted
readers would follow a reference to README in that context.

A good future improvement would be to teach _arguments to use NUL rather
than colon as the join character of $opt_args values: this would make it
a *lot* easier to split the value, and would relieve $opt_args users
that don't use the (somewhat obscure) option-with-multiple-arguments
syntax from having to manually unescape colons and backslashes.

Cheers,

Daniel

 Doc/Zsh/compsys.yo               |  6 ++++--
 README                           |  9 +++++++++
 Src/Zle/computil.c               | 12 +++++++++---
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/Doc/Zsh/compsys.yo b/Doc/Zsh/compsys.yo
index 8c7ef0f..b2cc392 100644
--- a/Doc/Zsh/compsys.yo
+++ b/Doc/Zsh/compsys.yo
@@ -3948,8 +3948,10 @@ command line after the command name excluding all options and their
 arguments.  Options are stored in the associative array
 `tt(opt_args)' with option names as keys and their arguments as
 the values.  For options that have more than one argument these are
-given as one string, separated by colons.  All colons in the original
-arguments are preceded with backslashes.
+given as one string, separated by colons.  All colons and backslashes
+in the original arguments are preceded with backslashes.  (Note:
+Zsh 5.2 and older did not escape backslashes in the original string;
+see the tt(NEWS) file for details.)
 
 The parameter `tt(context)' is set when returning to the calling function
 to perform an action of the form `tt(->)var(string)'.  It is set to an
diff --git a/README b/README
index 9de5eb4..019294e 100644
--- a/README
+++ b/README
@@ -87,6 +87,15 @@ The "f" qualifier has for many years been the documented way of testing
 file modes; it allows the "and" test ("*(f+1)" is the documented
 equivalent of "*(1)") as well as many other forms.
 
+5) The completion helper function _arguments now escapes both backslashes
+and colons in the values of option arguments when populating the $opt_args
+associative array.  Previously, colons were escaped with a backslash but
+backslashes were not themselves escaped with a backslash, which lead to
+ambiguity: if the -x option took two arguments (as in
+    _arguments : -x:foo:${action}:bar:$action
+), it would be impossible to tell from $opt_args whether the command-line
+was '-x foo\:bar' or '-x foo\\ bar'.
+
 Incompatibilities between 5.0.8 and 5.2
 ---------------------------------------
 
diff --git a/Src/Zle/computil.c b/Src/Zle/computil.c
index ecfa2bc..1c90a54 100644
--- a/Src/Zle/computil.c
+++ b/Src/Zle/computil.c
@@ -2310,7 +2310,10 @@ ca_parse_line(Cadef d, int multi, int first)
     return 0;
 }
 
-/* Build a colon-list from a list. */
+/* Build a colon-list from a list.
+ *
+ * This is only used to populate values of $opt_args.
+ */
 
 static char *
 ca_colonlist(LinkList l)
@@ -2320,16 +2323,19 @@ ca_colonlist(LinkList l)
 	int len = 0;
 	char *p, *ret, *q;
 
+	/* Compute the length to be allocated. */
 	for (n = firstnode(l); n; incnode(n)) {
 	    len++;
 	    for (p = (char *) getdata(n); *p; p++)
-		len += (*p == ':' ? 2 : 1);
+		len += (*p == ':' || *p == '\\') ? 2 : 1;
 	}
 	ret = q = (char *) zalloc(len);
 
+	/* Join L into RET, joining with colons and escaping colons and
+	 * backslashes. */
 	for (n = firstnode(l); n;) {
 	    for (p = (char *) getdata(n); *p; p++) {
-		if (*p == ':')
+		if (*p == ':' || *p == '\\')
 		    *q++ = '\\';
 		*q++ = *p;
 	    }


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] _arguments: Escape colons and backslashes in $opt_args unambiguously.
  2016-09-04 18:26 [PATCH] _arguments: Escape colons and backslashes in $opt_args unambiguously Daniel Shahaf
@ 2016-09-07  7:03 ` Bart Schaefer
  2016-09-07 22:07   ` Daniel Shahaf
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2016-09-07  7:03 UTC (permalink / raw)
  To: zsh-workers

On Sep 4,  6:26pm, Daniel Shahaf wrote:
}
} Is it okay to refer to NEWS from the manual?

Are there any other examples of the manual referring directly to the
behavior of previous versions?

It's not the kind of thing it's easy to search for, but I can't think
of any, so my inclination would be to drop your parenthetical Note:.

} In the manual I wrote "See NEWS" even though the pointed-to text is in
} README because (a) NEWS includes README by reference, (b) I doubted
} readers would follow a reference to README in that context.

I doubt readers will follow a NEWS reference either.  If this is
important enough to reference -- which I don't think it is -- then
it's important enough to say it here, not cross-reference somewhere
outside of the regular documentation tree, and certainly it shouldn't
require going to NEWS, finding nothing, and then having to realize
it might be in README.

} +ambiguity: if the -x option took two arguments (as in
} +    _arguments : -x:foo:${action}:bar:$action
} +), it would be impossible to tell from $opt_args whether the command-line
} +was '-x foo\:bar' or '-x foo\\ bar'.

Is this example correct?  Isn't the actual ambiguity between
[[[ -x foo\:bar ]]] (one arg) and [[[ -x foo bar ]]] (two args)?  If
I'm wrong, what is it about your explanation that confused me?

There's probably still an ambiguity between [[[ -x foo bar ]]] and
[[[ -x foo -x bar ]]] ...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] _arguments: Escape colons and backslashes in $opt_args unambiguously.
  2016-09-07  7:03 ` Bart Schaefer
@ 2016-09-07 22:07   ` Daniel Shahaf
  2016-09-08  0:13     ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Shahaf @ 2016-09-07 22:07 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote on Wed, Sep 07, 2016 at 00:03:20 -0700:
> On Sep 4,  6:26pm, Daniel Shahaf wrote:
> }
> } Is it okay to refer to NEWS from the manual?
> 
> Are there any other examples of the manual referring directly to the
> behavior of previous versions?

Yes: CONTINUE_ON_ERROR in zshoptions(1).  (And the usage examples under
is-at-least, but that doesn't count.)

> It's not the kind of thing it's easy to search for,

I found that with:
.
    ag -C 3 -G 'yo$' '[345][.][0-9]' Doc/Zsh
.
which is equivalent to:
.
    grep -C 3 '[345][.][0-9]' Doc/Zsh/**/*yo
.
where the regexp is designed to catch references to zsh version numbers
between 3.0 and the present day.  'ag' is usually packaged as
"silver_searcher" in distros.

> but I can't think of any, so my inclination would be to drop your
> parenthetical Note:.

> } In the manual I wrote "See NEWS" even though the pointed-to text is in
> } README because (a) NEWS includes README by reference, (b) I doubted
> } readers would follow a reference to README in that context.
> 
> I doubt readers will follow a NEWS reference either.  If this is
> important enough to reference -- which I don't think it is -- then
> it's important enough to say it here, not cross-reference somewhere
> outside of the regular documentation tree, and certainly it shouldn't
> require going to NEWS, finding nothing, and then having to realize
> it might be in README.
> 

I'm happy to drop the note entirely, or to drop the reference to NEWS
and keep the sentence stating the behaviour difference.

> } +ambiguity: if the -x option took two arguments (as in
> } +    _arguments : -x:foo:${action}:bar:$action
> } +), it would be impossible to tell from $opt_args whether the command-line
> } +was '-x foo\:bar' or '-x foo\\ bar'.
> 
> Is this example correct?  Isn't the actual ambiguity between
> [[[ -x foo\:bar ]]] (one arg) and [[[ -x foo bar ]]] (two args)?

The example is correct, and the two cases you give are not ambiguous
with each other:

% echo $ZSH_PATCHLEVEL                                                                                                                                       
zsh-5.2-dev-1-335-g831a336
% _f() { local context state state_descr line; typeset -A opt_args; _arguments : -x:foo:-\>state:bar:-\>state2;  >/dev/tty { echo; typeset -p opt_args } }
% compdef _f f
% f -x foo\:bar<TAB><^C>
typeset -A opt_args=( -x 'foo\\:bar' )
% f -x foo\\ bar<TAB><^C>
typeset -A opt_args=( -x 'foo\\:bar' )
% f -x foo bar<TAB><^C>
typeset -A opt_args=( -x foo:bar )

What's happening here is that (a) every colon in the word gets preceded
by a backslash, (b) colons are inserted between words.  No other
transformation happens.  Consequently:

$BUFFER    colons to be escaped?  colons inserted?   result in ${opt_args[-x]}  which character(s) were inserted
=======    =====================  ================   =========================  ================================
foo\:bar   yes                    no                 foo\\:bar                  second backslash
foo\\ bar  no                     yes                foo\\:bar                  colon
foo bar    no                     yes                foo:bar                    colon

Again, that's before this patch.  With this patch the first two cases
are subjected to backslash doubling at the same time as colon escaping,
so they result in «foo\\\:bar» [the first and third slashes were added]
and «foo\\\\:bar» [ditto] respectively.

> If I'm wrong, what is it about your explanation that confused me?

I don't know.

Perhaps you assumed words were joined with backslash-escaped colons.

Perhaps you misparsed the phrase «was '-x foo\:bar' or '-x foo\\ bar'».
There are three distinct levels of quoting at play here: interactive
input quoting, ca_colonlist() quoting, and the English prose quoting,
where single quotes were used for the function tt() provides in yodl
documents.

How might the README text be improved?

> There's probably still an ambiguity between [[[ -x foo bar ]]] and
> [[[ -x foo -x bar ]]] ...

This depends on the spec of -x.

If -x takes two mandatory arguments, then the first case should parse as
as [opt_args=(-x foo:bar)] and the second one as [opt_args=(-x foo:-x)
with 'bar' as a positional argument].

However, if -x is repeatable and takes either one or two arguments — as in
«_arguments : '*-x:foo:->foo::bar:->bar'» — then I'd say the
command-line in the second case is ambiguous.

Good catch.

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] _arguments: Escape colons and backslashes in $opt_args unambiguously.
  2016-09-07 22:07   ` Daniel Shahaf
@ 2016-09-08  0:13     ` Bart Schaefer
  2016-09-08 11:01       ` Daniel Shahaf
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2016-09-08  0:13 UTC (permalink / raw)
  To: zsh-workers

On Sep 7, 10:07pm, Daniel Shahaf wrote:
} Subject: Re: [PATCH] _arguments: Escape colons and backslashes in $opt_arg
}
} Bart Schaefer wrote on Wed, Sep 07, 2016 at 00:03:20 -0700:
} > On Sep 4,  6:26pm, Daniel Shahaf wrote:
} > }
} > } Is it okay to refer to NEWS from the manual?
} > 
} > Are there any other examples of the manual referring directly to the
} > behavior of previous versions?
} 
} Yes: CONTINUE_ON_ERROR in zshoptions(1).  (And the usage examples under
} is-at-least, but that doesn't count.)
} 
} > It's not the kind of thing it's easy to search for,
} 
} I found that with:
} .
}     ag -C 3 -G 'yo$' '[345][.][0-9]' Doc/Zsh
} .

I suspect "ag" is the same thing I have installed as "ack".

  ack -C 1 -G .yo$ version Doc | grep -i /Zsh/.\*zsh

which should have found all mentions of "version" and "zsh" on the same
or adjacent lines.  Found the 5.0.1 reference, and a remark about "the
restriction in older versions of zsh" in params.yo discussion of export.

} I'm happy to drop the note entirely, or to drop the reference to NEWS
} and keep the sentence stating the behaviour difference.

CONINUE_ON_ERROR was a *lot* more significant change than this -- I'd
classify backslashing the backslashes as a bug fix, and we don't usually
call those out in the manual.

Anyway this has probably now had more thought than it was worth.

} > } +ambiguity: if the -x option took two arguments (as in
} > } +    _arguments : -x:foo:${action}:bar:$action
} > } +), it would be impossible to tell from $opt_args whether the command-line
} > } +was '-x foo\:bar' or '-x foo\\ bar'.
} > 
} > Is this example correct?  Isn't the actual ambiguity between
} > [[[ -x foo\:bar ]]] (one arg) and [[[ -x foo bar ]]] (two args)?
} 
} > If I'm wrong, what is it about your explanation that confused me?
} 
} I don't know.

Sometime after I'd sent that message, it occurred to me that the phrase
"two words where the first ends in a backslash cannot be distinguished
from one word with a backslashed colon in the middle" was what you were
getting at all along, and that indeed the example was correct.  Maybe
just write that out?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] _arguments: Escape colons and backslashes in $opt_args unambiguously.
  2016-09-08  0:13     ` Bart Schaefer
@ 2016-09-08 11:01       ` Daniel Shahaf
  2016-09-08 15:49         ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Shahaf @ 2016-09-08 11:01 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote on Wed, Sep 07, 2016 at 17:13:56 -0700:
> I suspect "ag" is the same thing I have installed as "ack".

"ag" and "ack" are distinct tools.

Applied both of your suggestions, and added a paragraph:

diff --git a/Doc/Zsh/compsys.yo b/Doc/Zsh/compsys.yo
index b2cc392..ecf17e7 100644
--- a/Doc/Zsh/compsys.yo
+++ b/Doc/Zsh/compsys.yo
@@ -3949,9 +3949,7 @@ arguments.  Options are stored in the associative array
 `tt(opt_args)' with option names as keys and their arguments as
 the values.  For options that have more than one argument these are
 given as one string, separated by colons.  All colons and backslashes
-in the original arguments are preceded with backslashes.  (Note:
-Zsh 5.2 and older did not escape backslashes in the original string;
-see the tt(NEWS) file for details.)
+in the original arguments are preceded with backslashes.
 
 The parameter `tt(context)' is set when returning to the calling function
 to perform an action of the form `tt(->)var(string)'.  It is set to an
diff --git a/README b/README
index 019294e..d92fb61 100644
--- a/README
+++ b/README
@@ -91,10 +91,15 @@ equivalent of "*(1)") as well as many other forms.
 and colons in the values of option arguments when populating the $opt_args
 associative array.  Previously, colons were escaped with a backslash but
 backslashes were not themselves escaped with a backslash, which lead to
-ambiguity: if the -x option took two arguments (as in
+ambiguity: '-x foo\:bar' (one argument with a backslashed colon) and
+'-x foo\\ bar' (two argumnets, and the first one ends in a backslash) would
+both set $opt_args[-x] to the same value.  This example assumes the -x
+option's spec declared two arguments, as in:
     _arguments : -x:foo:${action}:bar:$action
-), it would be impossible to tell from $opt_args whether the command-line
-was '-x foo\:bar' or '-x foo\\ bar'.
+
+For the more common case of non-repeatable options that take a single
+argument, completion functions now have to unescape not only colons but
+also backslashes when obtaining the option's argument from $opt_args.
 
 Incompatibilities between 5.0.8 and 5.2
 ---------------------------------------


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] _arguments: Escape colons and backslashes in $opt_args unambiguously.
  2016-09-08 11:01       ` Daniel Shahaf
@ 2016-09-08 15:49         ` Bart Schaefer
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Schaefer @ 2016-09-08 15:49 UTC (permalink / raw)
  To: zsh-workers

On Sep 8, 11:01am, Daniel Shahaf wrote:
}
} Applied both of your suggestions, and added a paragraph:

Thanks.  Typo:

} +ambiguity: '-x foo\:bar' (one argument with a backslashed colon) and
} +'-x foo\\ bar' (two argumnets, and the first one ends in a backslash) would

"argumnets"


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-09-08 15:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-04 18:26 [PATCH] _arguments: Escape colons and backslashes in $opt_args unambiguously Daniel Shahaf
2016-09-07  7:03 ` Bart Schaefer
2016-09-07 22:07   ` Daniel Shahaf
2016-09-08  0:13     ` Bart Schaefer
2016-09-08 11:01       ` Daniel Shahaf
2016-09-08 15:49         ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).