From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=DATE_IN_PAST_03_06, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM, FREEMAIL_REPLYTO_END_DIGIT,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 31781 invoked from network); 14 Apr 2022 02:04:05 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 14 Apr 2022 02:04:05 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1649901845; b=r2HgrOO3fmZrIk2dp+YBbGHuM8Jgy8EjPKGbrWpasmXuM1XRKksP4ziW2p0Onw7TuQEHG5HLlV ar7qpwIbzSMFDuzyfi5mziHVQkDvEaheLRd4EBu/K24w8KgAtqTZ57H9qNGymH7ZL74JIMrJUm V3RUyweBDCs2KSL3f0XYHzyYLEGEpu9QvUSKcaFtmsgHwfYXRhk5P3GcJLuColmzKr6W+xb/Uz olyBY4Kb4AqqWlf5PF9earW9t4VGcYHaX6W4X3gvRt9qLrXV71nZLHHXIQIazEoCz9Kqojj+p1 nMiCDbG8WnDP5QnIdv7QDHu64+rlbhrfn4M8p8Z1Sv0jXA==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (mail-41104.protonmail.ch) smtp.remote-ip=185.70.41.104; dkim=pass header.d=protonmail.com header.s=protonmail header.a=rsa-sha256; dmarc=pass header.from=protonmail.com; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1649901845; bh=jaJ05/ATWZhjvNkF51Kcvy5i9np88hYWeBM8n1eSBKM=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Transfer-Encoding:Content-Type:MIME-Version: Message-ID:Subject:Reply-To:From:To:DKIM-Signature:Date:DKIM-Signature; b=sV4zQiuMszxETrdUOTti0LDrmMCWk/3fYoq378Y/s83acMjz37AaIiqAqOySfM3S1EqItRe+AA SpDIm2TmAKKK6QKbrk+gmv6/xLc0IJFWjOSfWwTvBmKRsDPw5UutPDl36a4PmKNaykzRXNoAZ8 DtUh2Gay0UsdBJrTN4iXPU+2hoesKQK3n5fxlSE7GdocGcqckXU7P53uxi/qOeqreI02z8JIFp zclgHzhGuU3dM/azeYMw9howTqLFNAnzfSoRbaz1nhtvkMXDQQRFEL1Yw+c2WQkkvEs7tNByJv gzI8vBIUdrfNldPkKK9OI9ncR4Z2nLXAYrTGsqCcYI2Ueg==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Subject:Reply-To:From:To:Date:Cc: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References; bh=NGE7P6Qcxf+n8xdfB3tAcpObiNGubcTzDV6GjxYNIZI=; b=lTuEOwQ+81LI3dLNd0/eUH/jvN LwGOYUqS39pjhgTi4A70W8Ek4YtjgweYlG14lzyAat1EnKHLmlXyQVnbYDApnCbxmMSX7c4rtLC9c wasXqlC8Xk97MikP3MWTOS/IZMB4fLlJ9mi43tG3BAa3QVN+G8leR14jxZyo8in865BcHE4MqJ+9t bnW8j4+CT/KbNPaEMUFkTvCi0ZuCuFq5ex4N/LhGNfHDidg49Sw8HvMpm/j+94xocTfl34eniZaY/ 59zPymlOpUIYrWZEw5zIVvKsbdX5sYBb9drNylS/Vg7vl8/3GO5dKUu3fXvQxh/HmRvXKplzaaOk+ B+PtNmJQ==; Received: from authenticated user by zero.zsh.org with local id 1neoq5-000ERJ-7b; Thu, 14 Apr 2022 02:04:05 +0000 Authentication-Results: zsh.org; iprev=pass (mail-41104.protonmail.ch) smtp.remote-ip=185.70.41.104; dkim=pass header.d=protonmail.com header.s=protonmail header.a=rsa-sha256; dmarc=pass header.from=protonmail.com; arc=none Received: from mail-41104.protonmail.ch ([185.70.41.104]:52866) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_256_GCM_SHA384:256) id 1neonP-000DeY-Hx; Thu, 14 Apr 2022 02:01:20 +0000 Date: Wed, 13 Apr 2022 21:12:13 +0000 Authentication-Results: mail-41104.protonmail.ch; dkim=pass (2048-bit key) header.d=protonmail.com header.i=@protonmail.com header.b="fLOa3gWN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1649884339; bh=NGE7P6Qcxf+n8xdfB3tAcpObiNGubcTzDV6GjxYNIZI=; h=Date:To:From:Reply-To:Subject:Message-ID:From:To:Cc:Date:Subject: Reply-To:Feedback-ID:Message-ID; b=fLOa3gWNnd8kp7jIazrDu+Um6EZoCbdvIVX69SPBF8Mo6UZKBiSTBgEGaN3+d0K+a RgKZiM1r4kyx16xODKNd0AdpRxl1nFRV6WKd/JoQtybcG1lrCxWbRnB0GLOMlGJ3g5 K3JK8Gmq58Z/pfKndJIvXtYrHHYrM0Nuyd0nf1Dozqmg5tDNgLCP+EkWKQmqMDaaBx AYDepsSIu0laJvFVVjb19RK8WGZnN3CXV+Ant3eoJMiMmZXcnlNXGUcntAqs5LZMzh DGtuyYdzu5gG7xW64W8DimUW4HPaAj/f1LkGSwIcNTLdxmc8WT1FaF1OjAIiWhK1QD zrhue44jty7eg== To: "zsh-users@zsh.org" From: dg1727 Reply-To: dg1727 Subject: Suggestion: Option to ignore unmatched quotes when (Q) parameter-expansion flag Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Seq: 27658 Archived-At: X-Loop: zsh-users@zsh.org Errors-To: zsh-users-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-users-request@zsh.org X-no-archive: yes List-Id: List-Help: List-Subscribe: List-Unsubscribe: List-Post: List-Owner: List-Archive: I have been in brief discussions about this on IRC, and was asked to inquir= e of the mailing list: *** First I will describe my application, so that my suggestion for improve= ment of the shell doesn't seem contrived, and because I think this syntax s= ituation is one that other shell users might benefit from learning about. In my application, the user will be presented a list of strings that might = contain any printable character that the user can reasonably generate from = the keyboard. These strings might *not* be any sort of valid zsh syntax. I want the user to be able to type a shell pattern (as opposed to a regex),= composed mentally by the user, that matches any group of those strings. I prefer that the user be able to use not only backslash quoting, but also = other forms of quoting (double quotes, single quotes, dollar single-quoting= ) to disable the pattern-matching meaning of characters the user may type, = such as []. Suppose the user wants to enter *[abc]* in which both of the '*' are patter= n-matching characters; but the user will give double-quotes to indicate tha= t the [] should be matched literally. If variable ${u_input} contains the = user's input, we start with something equivalent to: u_input=3D'*"[abc]"*' a_string=3D'one[abc]two' if [[ "${a_string}" =3D=3D ${u_input} ]] then print "it matches" fi The first problem was that the above [[ ]] pattern-matching statement didn'= t match, but instead would match something such as: a_string=3D'one"a"two' ... since the [abc] is taken as a character class that matches 1 character,= rather than taken as the 5-character literal string that the user wants; a= nd the "" are taken literally rather than as being quote characters. If the double-quotes are put into the pattern comparison literally: a_string=3D'one[abc]two' # Same string as above if [[ "${a_string}" =3D=3D *"[abc]"* ]] then # Same pattern as in ${u_inpu= t} above! print "it matches" fi ... Now it matches, which is what we wish would happen when a variable expa= nsion ${...} is on the right-hand side of the =3D=3D operator. Even with shell option GLOB_SUBST enabled, the only quoting honoured when s= ubstituting the contents of a variable into a shell pattern is '\' backslas= h. The solution found so far is to use parameter-expansion flags, first (b), t= hen (Q), as follows: u_input=3D'*"[abc]"*' # Same shell pattern as above u_input=3D"${(Q)${(b)u_input}}" # u_input is now *\[abc\]* # ... the [] are quoted with \, the " are removed, the * are unquoted a_string=3D'one[abc]two' if [[ "${a_string}" =3D=3D ${u_input} ]] then print "it matches" # Now it matches! fi How it works: The (b) parameter-expansion flag uses backslashes to quote all the pattern-= matching characters including those that were already quoted by the user. The definition of (Q) is that (Q) removes the outermost level of quotes; ba= ckslash is always treated as being inside "" or ''. Chars that were backslashed by (b) and NOT quoted by the user get unquoted = by (Q); ... chars that WERE quoted by the user have the "" or '' removed by (Q) but= the backslashes [that were added by (b)] left in place. This technique works if the user gave '\' as well as '' "". *** The current problem is the shell's handling of unbalanced quotation mar= ks: a_string=3D'one[abc]"two' # unbalanced doublequote u_input=3D'*[abc]"*' # the user tries to match that doublequote and the [], all 3 literally u_input=3D"${(Q)${(b)u_input}}" # u_input is now \*\[abc\]"\* # the (Q) was skipped If we add the (X) flag to make (QX) so the shell will print an error messag= e, we find that the shell detects the unbalanced ("). The zshexpn documentation for X says "Without the [X] flag, errors are sile= ntly ignored." It seems that, without (X), the unbalanced (") isn't 'ignor= ed,' but rather causes the (Q) flag to fail entirely. *** I looked at the source code a little bit, and I have the following sugg= estion, whose details are NOT terribly firm: [I wrote these in the order in which the functions are called in C. The ou= tline of an idea for ignoring unbalanced quote marks is listed under gettok= str().] -- subst.c, paramsubst(), starting ~line 1619: ~line 2028 decrements variable 'quotemod' if (Q) flag So we could state in the documentation that issuing (Q) twice, in other wor= ds (QQ), will activate the suggested error-handling mode. ~line 3807 calls parse_subst_string() for arrays ~line 3847 calls parse_subst_string() for scalars Those 2 blocks of code seem like good places to check for variable 'quotemo= d' < -1 -- lex.c, parse_subst_string(), starting ~line 1728: Add a parameter to this function: ignore_unbalanced_quotes Back in paramsubst() in subst.c, if quotemod < -1, then pass 2 to parse_sub= st_string() as ignore_unbalanced_quotes (All other calls to parse_subst_string() will pass 0 for ignore_unbalance= d_quotes) In parse_subst_string() in lex.c again: ~line 1744 currently calls gettokstr(, sub=3D1) If ignore_unbalanced_quotes !=3D 0 then call gettokstr(, sub=3Dignore_unb= alanced_quotes) -- lex.c, gettokstr(), starting ~line 937: ~line 1314, case LX2_QUOTE, detects unbalanced (') if sub =3D=3D 2 then: change the Snull that was added ~line 1284 to be a (') ~line 1335, case LX2_DQUOTE, detects unbalanced (") if sub =3D=3D 2 then: instead of adding a Dnull (~line 1333), add a (") ~line 1377, case LX2_BQUOTE, detects unbalanced (`) if sub =3D=3D 2 then: instead of adding a Tick (~line 1347), add a (`) Using literal (')(") instead of Snull,Dnull will keep Snull,Dnull from bein= g removed by remnulargs() or altered by untokenize() Using literal (`) instead of Tick will keep Tick from being altered by unto= kenize() [Tick wouldn't be removed by remnulargs() ] I've not found how variable 'quoteerr' [which means the (X) flag, which det= ermines how errors such as unbalanced quotes are handled] in paramsubst() i= n subst.c is propagated through parse_subst_string() in lex.c to gettokstr(= ) in lex.c - perhaps by means of the 'lexflags' variable. The error messag= e including the word 'unmatched' is sent to zerr() in gettokstr(), thus bei= ng evidence that 'quoteerr' must currently be propagated to gettokstr() som= ehow. It might be suitable to use the same means of propagating a (QQ) fla= g, since flags are parsed in paramsubst() but the different error-handling = that we want is in gettokstr() I suggest that QQX should act like QX: If quoteerr =3D=3D 1, then the exis= ting code in gettokstr() that calls zerr() would still run, even if quotemo= d =3D=3D -2. This is so a script author can simply change (QQ) to (QQX) to= get diagnostic output. Manpage zshexpn, section Parameter Expansion Flags: change last sentence of description of X to: Without the flag, errors silently cause the current processing step to be= skipped. Add to the description of Q flag: Handling of unbalanced quotes depends on whether the X flag is present. Also, if the Q flag is given twice and the X flag is *not* given, then un= balanced quotation marks are silently ignored; other forms of quoting are s= till removed. (For example, if a string contains an unbalanced double-quot= e but the outermost level of quoting within the string includes balanced si= ngle-quotes, then the single-quotes will be removed.) I don't think I'm up to making these changes myself, but I'd appreciate fee= dback that might result in a solution. Many thanks for your time. -dg1727