From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from zero.zsh.org (zero.zsh.org [IPv6:2a02:898:31:0:48:4558:7a:7368]) by inbox.vuxu.org (Postfix) with ESMTP id 3CCCF27BCF for ; Sat, 24 Feb 2024 10:48:04 +0100 (CET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1708768084; b=HQWdvKqi2Pos5R3Wf+W8jdxi45Ju3Ey71H+Np6qEkb5bzjD6c7PRPvLoOfURD6umo6ae4TTfR0 bX0yA218b0ugWFyhUTPmh00Uk4uCergwLzSMLeo3Rq6gT62kH/q552G4mK0eEwv1gbQePYAeaY JV8a+0w2JUzTiZua8OiVBpRKzfamc53rTnCLD1iqbEin2+fvdpeRZx521ypUbxuZ6P69O9cjSm uMVijUhZC2mC7D/bWh7ZvvmTpexURLmt4a1emG6Vr06Xh/qVVKlt9M9bgdqBsUCex1M72KWe0B aPtXX2gN+3djHjh9xpF2OOnwpyX5ZdYZAEm7Fs1T9vOkYQ==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (relay8-d.mail.gandi.net) smtp.remote-ip=217.70.183.201; dmarc=none header.from=chazelas.org; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1708768084; bh=S2nvmWGVntFZof5g8/T7e18fNrREXQXFH8uk4dYJx9M=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:In-Reply-To:Content-Transfer-Encoding:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:DKIM-Signature; b=N1R/NkrI/BRK5iEyWfWEf7v/k3LHmVqwkKYCRTqZH+GJVuo6vCdYmy9Jc672T2IbRjZfbtV/uW SD6S1/LUE4PwO/OggAuFzEcxB1PmorewUYFEyyfkndd0688InwlMl5iAbf8qA3QQqLgvV51a0h ov2ZKOGqCG0R9zBE7MOCq19zyYXn3ia0kcipBCTG401XPLuZ9dquSmVlBJhu1aTpam8j0s0Grq Od0DUpPu2eAgqoxtfFufyETT8SUUTD3VqnJTCVV9tKp+DXL8YP4OpE4Zqzq9mZfrVWzmCu88RS uyUC3fpcX8l9j+zJXeNigZ6tYe/WE5hFQYFoZN+vzG5VfA==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:In-Reply-To:Content-Transfer-Encoding :Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID; bh=QaggOTofR6zZ+AxMJauZdxEupioFsBE/uAgpP0MZAsU=; b=CN0/Bxzrgpi2YBQVHTFL+Katut lwdwUfR90ozyiSZx1VSkF5fKHLzjeXADm/SIRfbEMi807NAwWP3Vgx0aSP4lo7QsfkrlLKTtOIerw 4y+mL8b1Cd/UB4vb3Mwx531WelHRARoanIlssOkHGpFNaFEdom2v98Ie/UM/VU14o7OTy8+LuIemu FKKJXfvjOdRwOIrBvqyDCBnaj01Kj6lNaaITlFCi7F78pdnFl5ed857joQ6o0gGQh7aHqmr82eSdX V98rkvnqhTIeuWh7Sowh/UyfuSU3XedYxqePDJ/Dd2fSqUWQtBQLhhxkszR/67Ahy8/XN2ZZb9Qv9 6o5M+ksg==; Received: by zero.zsh.org with local id 1rdodW-000LGo-A7; Sat, 24 Feb 2024 09:48:02 +0000 Authentication-Results: zsh.org; iprev=pass (relay8-d.mail.gandi.net) smtp.remote-ip=217.70.183.201; dmarc=none header.from=chazelas.org; arc=none Received: from relay8-d.mail.gandi.net ([217.70.183.201]:59449) by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1rdocu-000Kwb-SD; Sat, 24 Feb 2024 09:47:25 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id 85A7B1BF204; Sat, 24 Feb 2024 09:47:23 +0000 (UTC) Date: Sat, 24 Feb 2024 09:47:22 +0000 From: Stephane Chazelas To: Bart Schaefer Cc: zsh workers Subject: Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var}) Message-ID: <20240224094722.hnullrzrb6gsswnm@chazelas.org> Mail-Followup-To: Bart Schaefer , zsh workers References: <20240221194534.o2mufin7orng6ttg@chazelas.org> <20240221202150.tccftcqbxqqexq4x@chazelas.org> <20240222072313.7woy5vxvt4fbxyhj@chazelas.org> <20240222075528.eruaoosiuhmcrdsy@chazelas.org> <20240223192717.tczrbc63fei7d4m2@chazelas.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-GND-Sasl: stephane@chazelas.org X-Seq: 52589 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: 2024-02-23 14:32:49 -0800, Bart Schaefer: [...] > > zsh: bad math expression: operand expected at `|aM-^C c' > > You're missing part of my point here. > > % printf '%d\n' $(( 1+|a\x83 c )) > zsh: bad math expression: operand expected at `|a\x83 c ' > > That is IMO more useful than either of "^@" or "M-^C " and is down to > the difference between using printf on a $'...' string (which is > interpreted before printf even gets its mitts on it, and two layers > before the math parser does) vs. using the actual math parser > directly. This has nothing to do with how the string is passed to > zerr() and everything to do with how printf and parsing interpret the > input -- by the time the math parser actually calls zerr() it can't > know how to unwind that, and the internals of zerr() are even further > removed. > > I would therefore argue that these examples are out of scope for this > discussion -- these examples are not about how zerr() et al. should > receive strings, they're about how the math parser should receive > them, and needs to be fixed upstream e.g. in bin_print(). [...] I agree the bug is in printf which forgets to metafy the input before passing to the math parse. Which can be seen with: $ typeset -A a $ printf '%d\n' 'a[รรรรรร]=1' 1 $ (( a[รรรรรร] = 2 )) typeset -A a=( [รรรรรร]=2 [$'\M-C\M-c\M-c\M-c\M-c\M-c\M-\C-C']=1 ) > More relevant to this discussion is that math errors are one of the > two existing callers using the %l format, so any attempt to improve > this is going to require changing those calls anyway. I don't see why we'd need to change the call to zerr in those cases. Just fix printf. $ a=$'\x83 foobarbaz' b='\x83 foobarbaz' ~$ (( 1+|$b )) zsh: bad math expression: operand expected at `|\x83 foob...' $ (( 1+|$a )) zsh: bad math expression: operand expected at `|\M-^C foobarb...' Are correct, we do want the 0x83 byte which is not printable to be rendered as \M-^C. > > > For 1, IMO, when the error message is generated by zsh, it > > should go through nicezputs(). zsh should decide of the > > formatting, have it pass escape sequences as-is would make it > > hard to understand and diagnose the error. > > Agreed in concept, but there's a difference between errors actually > generated BY zsh, and errors with user input that zsh is reporting. > For example, the same literal string might be a file name generated by > globbing, or it might be something the user typed out in a > syntactically invalid command. There's no way to put intelligence > about how to format those into the guts of zerr(). I don't think that's a contention point. All those cases are cases where we need to make the non-printable characters in the user data visible with nicezputs. The question is not about user input vs no user input in the displayed error, but only for those where there's user input, whether that user input is mean to be an error message formatted by the user or not. And I can only think of ${var[:]?user-supplied-error}, and imagine that at least 99% of the 499 other cases are not about printing a user-supplied error message. > There's already a way to pass text not containing NUL (%s) and a way > to pass text as ptr+len (%l). There are a vanishingly small number of > uses of the latter (2 callers out of the ~500 total call examples). > There's exactly one case so far of wanting output to contain NUL, and > per the "only caller can interpret" assertion, it seems worthwhile to > use %l for the NUL case and let the other 3 callers decide to "nice" > the strings they pass (or not). > > This not only skips extra metafication needed to use the proposed %S, > but also simplifies the implementation of %l, and requires the > addition of only 1 or 2 lines of code to each of the two existing > callers using %l (maybe zero lines of code in the case of yyerror()). > > > %S also passed metafied, but no nicezputs. > > That requires metafy in the caller followed by unmetafy in zerr(). > Much easier to remove code from %l than to add it to a new %S, > especially given that we're editing the solitary caller where %S would > be used. But in the case of ${var?err}, the err is already metafied, so if you make %l take unmetafied input, you're just moving the unmetafication to the caller which is counterproductive as it makes it break on NULs. Also %l is intended (at least in the one case I saw it used) to truncase user input, so it should be nicezputs'ed. > > > Now, my previous message was showing there were quite a few > > issue with the metafication and possibly with the nicezputs'ing > > and/or multibyte handling. > > Fine, but not fixable in zerr() and friends. Sorry for the confusion, I didn't mean to say that's where it was to be fixed. I agree it's all cases where it's the caller failing to do the metafication (in the case of printf, the metafication was missing from much earlier). [...] > % printf '%d\n' '1+รรรรรร' > BUG: unexpected end of string in ztrlen() > zsh: bad math expression: operand expected at `\M-C\M-c\M-c\M-c\M-c\M-c' > 0 > % printf '%d\n' $((1+รรรรรร)) > 1 > > (Also a bit weird that the first \M-C is capitalized and the rest are > not?) Still not a problem to be resolved in zerr(). \M-C is the visual representaion of 0xc3, \M-c of 0xe3, รร is c3 83 c3 83. It's just that unmetafy turned 83 c3 into e3. > > > $ ((1+|รรรรรร)) > > > zsh: bad math expression: operand expected at `|รรรร\M-C...' > > > > In that case, metafication OK, but character cut in the middle. > > Still not zerr()'s fault and needs to be addressed where the number of > bytes for %l is being calculated in checkunary(). zerr could try and decode the string as text and truncate the specified number of *characters* instead of bytes, but like I said, that may be overkill as we can live with the odd character cut in the middle. > > > % ((1+|รรรรรร)) > > > zsh: bad math expression: operand expected at `|ร?ร?ร?...' > > > > It seems rather worse to me. > > That's because of the way I chose to lift nice-ifying up into > checkunary() for testing the approach. It's hard to be consistent > there, given the foregoing business about different formats being sent > down from printf vs. $((...)), and it's also why I said "no patch > without feedback". I guess those ? are some 0x83 bytes added by metafication, and we're missing the corresponding unmetafy. To me, the only things to do are: 1. add a %S for raw output (expects metafied input like everything else) to tbe used by ${var[:]?error} and likely only those. 2. Add missing metafy in bin_print (and possibly elsewhere) before calling the math parser 3. Fix those cases where zerrmsg is called with %s/%l/%S arguments non-metafied like in that "bad interpreter" case above. 4. (optional): Improve %l usages to truncate based on number of characters rather than bytes or at least avoid cutting characters in the middle. -- Stephane