zsh-users
 help / color / mirror / code / Atom feed
* Suggestion:  Option to ignore unmatched quotes when (Q) parameter-expansion flag
@ 2022-04-13 21:12 dg1727
  2022-04-17 19:43 ` Daniel Shahaf
  2022-04-20  0:46 ` Bart Schaefer
  0 siblings, 2 replies; 4+ messages in thread
From: dg1727 @ 2022-04-13 21:12 UTC (permalink / raw)
  To: zsh-users

I have been in brief discussions about this on IRC, and was asked to inquire of the mailing list:


*** First I will describe my application, so that my suggestion for improvement of the shell doesn't seem contrived, and because I think this syntax situation is one that other shell users might benefit from learning about.

In my application, the user will be presented a list of strings that might contain any printable character that the user can reasonably generate from the keyboard.  These strings might *not* be any sort of valid zsh syntax.

I want the user to be able to type a shell pattern (as opposed to a regex), composed mentally by the user, that matches any group of those strings.

I prefer that the user be able to use not only backslash quoting, but also other forms of quoting (double quotes, single quotes, dollar single-quoting) to disable the pattern-matching meaning of characters the user may type, such as [].

Suppose the user wants to enter *[abc]* in which both of the '*' are pattern-matching characters; but the user will give double-quotes to indicate that the [] should be matched literally.  If variable ${u_input} contains the user's input, we start with something equivalent to:

u_input='*"[abc]"*'
a_string='one[abc]two'
if [[ "${a_string}" == ${u_input} ]] then
  print "it matches"
fi

The first problem was that the above [[ ]] pattern-matching statement didn't match, but instead would match something such as:
a_string='one"a"two'
... since the [abc] is taken as a character class that matches 1 character, rather than taken as the 5-character literal string that the user wants; and the "" are taken literally rather than as being quote characters.

If the double-quotes are put into the pattern comparison literally:

a_string='one[abc]two'                    # Same string as above
if [[ "${a_string}" == *"[abc]"* ]] then  # Same pattern as in ${u_input} above!
  print "it matches"
fi

... Now it matches, which is what we wish would happen when a variable expansion ${...} is on the right-hand side of the == operator.

Even with shell option GLOB_SUBST enabled, the only quoting honoured when substituting the contents of a variable into a shell pattern is '\' backslash.

The solution found so far is to use parameter-expansion flags, first (b), then (Q), as follows:

u_input='*"[abc]"*'  # Same shell pattern as above
u_input="${(Q)${(b)u_input}}"

# u_input is now *\[abc\]*
# ... the [] are quoted with \, the " are removed, the * are unquoted

a_string='one[abc]two'
if [[ "${a_string}" == ${u_input} ]] then
  print "it matches"  # Now it matches!
fi

How it works:

The (b) parameter-expansion flag uses backslashes to quote all the pattern-matching characters including those that were already quoted by the user.

The definition of (Q) is that (Q) removes the outermost level of quotes; backslash is always treated as being inside "" or ''.

Chars that were backslashed by (b) and NOT quoted by the user get unquoted by (Q);

... chars that WERE quoted by the user have the "" or '' removed by (Q) but the backslashes [that were added by (b)] left in place.

This technique works if the user gave '\' as well as '' "".


*** The current problem is the shell's handling of unbalanced quotation marks:

a_string='one[abc]"two'  # unbalanced doublequote

u_input='*[abc]"*'
# the user tries to match that doublequote and the [], all 3 literally

u_input="${(Q)${(b)u_input}}"

# u_input is now \*\[abc\]"\*
# the (Q) was skipped

If we add the (X) flag to make (QX) so the shell will print an error message, we find that the shell detects the unbalanced (").

The zshexpn documentation for X says "Without the [X] flag, errors are silently ignored."  It seems that, without (X), the unbalanced (") isn't 'ignored,' but rather causes the (Q) flag to fail entirely.


*** I looked at the source code a little bit, and I have the following suggestion, whose details are NOT terribly firm:

[I wrote these in the order in which the functions are called in C.  The outline of an idea for ignoring unbalanced quote marks is listed under gettokstr().]

-- subst.c, paramsubst(), starting ~line 1619:
~line 2028 decrements variable 'quotemod' if (Q) flag
So we could state in the documentation that issuing (Q) twice, in other words (QQ), will activate the suggested error-handling mode.

~line 3807 calls parse_subst_string() for arrays
~line 3847 calls parse_subst_string() for scalars
Those 2 blocks of code seem like good places to check for variable 'quotemod' < -1

-- lex.c, parse_subst_string(), starting ~line 1728:
  Add a parameter to this function:  ignore_unbalanced_quotes
Back in paramsubst() in subst.c, if quotemod < -1, then pass 2 to parse_subst_string() as ignore_unbalanced_quotes
  (All other calls to parse_subst_string() will pass 0 for ignore_unbalanced_quotes)
In parse_subst_string() in lex.c again:
~line 1744 currently calls gettokstr(, sub=1)
  If ignore_unbalanced_quotes != 0 then call gettokstr(, sub=ignore_unbalanced_quotes)

-- lex.c, gettokstr(), starting ~line 937:
~line 1314, case LX2_QUOTE, detects unbalanced (')
  if sub == 2 then:
    change the Snull that was added ~line 1284 to be a (')
~line 1335, case LX2_DQUOTE, detects unbalanced (")
  if sub == 2 then:
    instead of adding a Dnull (~line 1333), add a (")
~line 1377, case LX2_BQUOTE, detects unbalanced (`)
  if sub == 2 then:
    instead of adding a Tick (~line 1347), add a (`)
Using literal (')(") instead of Snull,Dnull will keep Snull,Dnull from being removed by remnulargs() or altered by untokenize()
Using literal (`) instead of Tick will keep Tick from being altered by untokenize() [Tick wouldn't be removed by remnulargs() ]

I've not found how variable 'quoteerr' [which means the (X) flag, which determines how errors such as unbalanced quotes are handled] in paramsubst() in subst.c is propagated through parse_subst_string() in lex.c to gettokstr() in lex.c - perhaps by means of the 'lexflags' variable.  The error message including the word 'unmatched' is sent to zerr() in gettokstr(), thus being evidence that 'quoteerr' must currently be propagated to gettokstr() somehow.  It might be suitable to use the same means of propagating a (QQ) flag, since flags are parsed in paramsubst() but the different error-handling that we want is in gettokstr()

I suggest that QQX should act like QX:  If quoteerr == 1, then the existing code in gettokstr() that calls zerr() would still run, even if quotemod == -2.  This is so a script author can simply change (QQ) to (QQX) to get diagnostic output.

Manpage zshexpn, section Parameter Expansion Flags:
  change last sentence of description of X to:
  Without the flag, errors silently cause the current processing step to be skipped.

Add to the description of Q flag:
  Handling of unbalanced quotes depends on whether the X flag is present.
  Also, if the Q flag is given twice and the X flag is *not* given, then unbalanced quotation marks are silently ignored; other forms of quoting are still removed.  (For example, if a string contains an unbalanced double-quote but the outermost level of quoting within the string includes balanced single-quotes, then the single-quotes will be removed.)

I don't think I'm up to making these changes myself, but I'd appreciate feedback that might result in a solution.

Many thanks for your time.

-dg1727


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestion:  Option to ignore unmatched quotes when (Q) parameter-expansion flag
  2022-04-13 21:12 Suggestion: Option to ignore unmatched quotes when (Q) parameter-expansion flag dg1727
@ 2022-04-17 19:43 ` Daniel Shahaf
  2022-04-20  0:46 ` Bart Schaefer
  1 sibling, 0 replies; 4+ messages in thread
From: Daniel Shahaf @ 2022-04-17 19:43 UTC (permalink / raw)
  To: dg1727; +Cc: zsh-users

dg1727 wrote on Wed, Apr 13, 2022 at 21:12:13 +0000:
> If variable ${u_input} contains the user's input, we start with
> something equivalent to:
> a_string='one[abc]two'
> if [[ "${a_string}" == ${u_input} ]] then
>   print "it matches"  # Now it matches!
> fi
> *** The current problem is the shell's handling of unbalanced
> quotation marks:
> 
> a_string='one[abc]"two'  # unbalanced doublequote
> 
> u_input='*[abc]"*'
> # the user tries to match that doublequote and the [], all 3 literally
> 
> u_input="${(Q)${(b)u_input}}"
> 
> # u_input is now \*\[abc\]"\*
> # the (Q) was skipped
> 
> If we add the (X) flag to make (QX) so the shell will print an error
> message, we find that the shell detects the unbalanced (").
> 
> The zshexpn documentation for X says "Without the [X] flag, errors are
> silently ignored."  It seems that, without (X), the unbalanced (")
> isn't 'ignored,' but rather causes the (Q) flag to fail entirely.
> 
> 
> *** I looked at the source code a little bit, and I have the following
> suggestion, whose details are NOT terribly firm:
> Manpage zshexpn, section Parameter Expansion Flags:
>   change last sentence of description of X to:
>   Without the flag, errors silently cause the current processing step to be skipped.
> 
> Add to the description of Q flag:
>   Handling of unbalanced quotes depends on whether the X flag is
>   present.
>   Also, if the Q flag is given twice and the X flag is *not* given,
>   then unbalanced quotation marks are silently ignored; other forms of
>   quoting are still removed.  (For example, if a string contains an
>   unbalanced double-quote but the outermost level of quoting within
>   the string includes balanced single-quotes, then the single-quotes
>   will be removed.)

A couple of points:

- Wrap your lines to 80 columns.  It's hard to read otherwise.

- It's premature to go to this level of implementation details at this
  point in the discussion.  More precisely, it can cause tunnel vision.
  We should consider all possible implementations of the proposed
  functinality, and all possible functionalities that may address the
  original use-case.

- The best way to show proposed changes is by posting a unidiff (the
  output of `git diff`, attached in a file named *.txt).  That's true
  even if the changes are alpha quality or even not meant to be applied
  at all.

- I don't know that it makes sense to tie your program's glob syntax to
  zsh's input syntax (as opposed to zsh's pattern syntax, which is what
  plain «[[ $foo == $bar ]]» does).  The fact that it's hard to
  implement is a clue in and of itself.  You're essentially inventing
  your own string matching library here.

- Now, if you actually wanted to use zsh's _input_ syntax, the obvious
  idea would be to use «eval» — but it's not trivial to implement that
  in a way that doesn't risk bobby tables bugs (e.g., if the pattern
  is «& pwd»).

- Couldn't you simply append a «"» character to the user input and then
  use zsh unmodified?  (You'd have to try three times: with no append,
  with an appended «"», with an appended «'».)

Cheers,

Daniel

> Many thanks for your time.
> 
> -dg1727
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestion: Option to ignore unmatched quotes when (Q) parameter-expansion flag
  2022-04-13 21:12 Suggestion: Option to ignore unmatched quotes when (Q) parameter-expansion flag dg1727
  2022-04-17 19:43 ` Daniel Shahaf
@ 2022-04-20  0:46 ` Bart Schaefer
  2022-04-20  1:28   ` Bart Schaefer
  1 sibling, 1 reply; 4+ messages in thread
From: Bart Schaefer @ 2022-04-20  0:46 UTC (permalink / raw)
  To: dg1727; +Cc: zsh-users

On Wed, Apr 13, 2022 at 7:01 PM dg1727 <dg1727@protonmail.com> wrote:
>
> I prefer that the user be able to use not only backslash quoting, but also other forms of quoting (double quotes, single quotes, dollar single-quoting) to disable the pattern-matching meaning of characters the user may type, such as [].

The problem, as Daniel already touched upon, is that you're trying to
enable a limited form of input syntax rather than using pattern
syntax.  The parameter flags for handling patterns and quoting are not
designed for that.

Jumping ahead a bit ...

> The zshexpn documentation for X says "Without the [X] flag, errors are silently ignored."  It seems that, without (X), the unbalanced (") isn't 'ignored,' but rather causes the (Q) flag to fail entirely.

It's the error that ignored, not the value that produced the error.
That is, yes, the Q flag failed, because it could not remove quoting,
but that didn't cause the surrounding command context to
perceive/report an error state.

A possible way to detect this is to use a test something like [[
"${var}" == "${(Q)var}" ]], which is true only if (Q) did nothing.

> Even with shell option GLOB_SUBST enabled, the only quoting honoured when substituting the contents of a variable into a shell pattern is '\' backslash.

Yes, because that's how patterns are defined.  You're trying to
translate between a different syntax and pattern syntax ... so what
you first need is a parser for your other syntax.  Unmatched quotes
are probably the least of your problems.

> u_input='*"[abc]"*'
> a_string='one[abc]two'

Let's think about what constitutes "a parser for your other syntax".
The [[ ]] operator fits the bill, but it has to parse the contents of
$u_input as the expression, rather than first parsing the expression
and then expanding $u_input.

Fortunately zsh has a trick up its sleeve:  You can create and modify
function definitions by assignment to fields in the $functions special
parameter.  Thus something like this:

zmodload zsh/parameter
functions[funkymatcher]='[[ $1 == '"${u_input}"' ]]'

This will even throw parse errors on most "Bobby Tables" inputs,
although if you want to prevent $(command) substitutions (and
backticks) you'll need to figure that out yourself.  Anyway, with that
you can now call

funkymatcher "${a_string}"

and it will return 0 for a match and nonzero otherwise in exactly the
way you want.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestion: Option to ignore unmatched quotes when (Q) parameter-expansion flag
  2022-04-20  0:46 ` Bart Schaefer
@ 2022-04-20  1:28   ` Bart Schaefer
  0 siblings, 0 replies; 4+ messages in thread
From: Bart Schaefer @ 2022-04-20  1:28 UTC (permalink / raw)
  To: zsh-users

On Tue, Apr 19, 2022 at 5:46 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> zmodload zsh/parameter
> functions[funkymatcher]='[[ $1 == '"${u_input}"' ]]'
>
> This will even throw parse errors on most "Bobby Tables" inputs

I was pondering over dinner how to make this more robust ... it
occurred to me that

funkybody='[[ $1 == '"${u_input}"' ]]'
if [[ ${#${(z)funkybody}} -ne 5 ]]
then return 1
else functions[funkymatcher]=$funkybody
if

should handle it?

> although if you want to prevent $(command) substitutions (and
> backticks) you'll need to figure that out yourself.

That continues to be left as an exercise.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-04-20  1:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13 21:12 Suggestion: Option to ignore unmatched quotes when (Q) parameter-expansion flag dg1727
2022-04-17 19:43 ` Daniel Shahaf
2022-04-20  0:46 ` Bart Schaefer
2022-04-20  1:28   ` Bart Schaefer

Code repositories for project(s) associated with this inbox:

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).