From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17605 invoked by alias); 6 Dec 2015 15:55:34 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 37326 Received: (qmail 3674 invoked from network); 6 Dec 2015 15:55:32 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 X-Originating-IP: [86.6.158.222] X-Spam: 0 X-Authority: v=2.1 cv=YMdiskyx c=1 sm=1 tr=0 a=2SBOh4l1h08DI0L+aujZyQ==:117 a=2SBOh4l1h08DI0L+aujZyQ==:17 a=NLZqzBF-AAAA:8 a=kj9zAlcOel0A:10 a=pGLkceISAAAA:8 a=01cg2oMlqwYufZgSAFIA:9 a=l1iLtEU0l-yLGZqH:21 a=5T_L2y7bwiPzGifA:21 a=CjuIK1q_8ugA:10 Date: Sun, 6 Dec 2015 15:49:56 +0000 From: Peter Stephenson To: Zsh hackers list Subject: Re: num_in_chars incremented after each mbrtowc() Message-ID: <20151206154956.104b10c6@ntlworld.com> In-Reply-To: References: X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.28; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 6 Dec 2015 10:37:21 +0100 Sebastian Gniazdowski wrote: > Hello, > while working my hands off on implementing display width handling in > params.c rather than subst.c I encountered a bug in mb_metastrlenend. > It will reveal itself only on improper unicode strings. I don't understand your patch: the change is to increment num_in_char in exactly the cases where it is deliberately set to 0 later to reflect the fact we've now got a complete character and the effect is included in num instead. Somebody complained about this function a couple of months ago and I explained, then, too; it suggests it needs some more comments, so I've added some. It may be the real difficulty is with the API, in which case you'll need to say (in words, not videos, please) what you're expecting. As long as this is consistently dealt with in callers of the function it might be possible to change --- I guess you're only worried about the case for returning a width, which is uncommon in the code and indeed doesn't really have a well defined result for incomplete/invalid characters. Maybe you have a particular strategy in mind. Consequently I haven't looked at your other patch. pws diff --git a/Src/utils.c b/Src/utils.c index d131383..45f8286 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -5179,6 +5179,17 @@ mb_metastrlenend(char *ptr, int width, char *eptr) ret = mbrtowc(&wc, &inchar, 1, &mb_shiftstate); if (ret == MB_INCOMPLETE) { + /* + * "num_in_char" is only used for incomplete characters. The + * assumption is that we will output this octet as a single + * character (of single width) if we don't get a complete + * character; if we do get a complete character, num_in_char + * becomes irrelevant and is set to zero. + * + * This is in contrast to "num" which counts the characters + * or widths in complete characters. The two are summed, + * so we don't count characters twice. + */ num_in_char++; } else { if (ret == MB_INVALID) {