zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <p.w.stephenson@ntlworld.com>
To: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: num_in_chars incremented after each mbrtowc()
Date: Sun, 6 Dec 2015 15:49:56 +0000	[thread overview]
Message-ID: <20151206154956.104b10c6@ntlworld.com> (raw)
In-Reply-To: <CAKc7PVCmApbPrfcArqxJcDX0nHN9NmuE7zqcrSZb51bSfGyuuQ@mail.gmail.com>

On Sun, 6 Dec 2015 10:37:21 +0100
Sebastian Gniazdowski <sgniazdowski@gmail.com> wrote:
> Hello,
> while working my hands off on implementing display width handling in
> params.c rather than subst.c I encountered a bug in mb_metastrlenend.
> It will reveal itself only on improper unicode strings.

I don't understand your patch: the change is to increment num_in_char in
exactly the cases where it is deliberately set to 0 later to reflect the
fact we've now got a complete character and the effect is included in
num instead.

Somebody complained about this function a couple of months ago and I
explained, then, too; it suggests it needs some more comments, so I've
added some.

It may be the real difficulty is with the API, in which case you'll need
to say (in words, not videos, please) what you're expecting.  As long as
this is consistently dealt with in callers of the function it might be
possible to change --- I guess you're only worried about the case for
returning a width, which is uncommon in the code and indeed doesn't
really have a well defined result for incomplete/invalid characters.
Maybe you have a particular strategy in mind.

Consequently I haven't looked at your other patch.

pws

diff --git a/Src/utils.c b/Src/utils.c
index d131383..45f8286 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5179,6 +5179,17 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
 	ret = mbrtowc(&wc, &inchar, 1, &mb_shiftstate);
 
 	if (ret == MB_INCOMPLETE) {
+	    /*
+	     * "num_in_char" is only used for incomplete characters.  The
+	     * assumption is that we will output this octet as a single
+	     * character (of single width) if we don't get a complete
+	     * character; if we do get a complete character, num_in_char
+	     * becomes irrelevant and is set to zero.
+	     *
+	     * This is in contrast to "num" which counts the characters
+	     * or widths in complete characters.  The two are summed,
+	     * so we don't count characters twice.
+	     */
 	    num_in_char++;
 	} else {
 	    if (ret == MB_INVALID) {


  parent reply	other threads:[~2015-12-06 15:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-06  9:37 Sebastian Gniazdowski
2015-12-06 10:24 ` Sebastian Gniazdowski
2015-12-06 15:49 ` Peter Stephenson [this message]
2015-12-06 16:40   ` Sebastian Gniazdowski
2015-12-06 17:13     ` Peter Stephenson
2015-12-06 17:03   ` Sebastian Gniazdowski
2015-12-06 17:33     ` Peter Stephenson
2015-12-06 17:40       ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151206154956.104b10c6@ntlworld.com \
    --to=p.w.stephenson@ntlworld.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).