From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19403 invoked by alias); 6 Dec 2015 17:40:41 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 37331 Received: (qmail 12277 invoked from network); 6 Dec 2015 17:40:40 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 X-Originating-IP: [86.6.158.222] X-Spam: 0 X-Authority: v=2.1 cv=YMdiskyx c=1 sm=1 tr=0 a=2SBOh4l1h08DI0L+aujZyQ==:117 a=2SBOh4l1h08DI0L+aujZyQ==:17 a=NLZqzBF-AAAA:8 a=kj9zAlcOel0A:10 a=N996oIgjCQMRFQNu_4EA:9 a=hZBLtiq93od_DaYI:21 a=JlBmnlOMPjAAgigc:21 a=CjuIK1q_8ugA:10 Date: Sun, 6 Dec 2015 17:40:37 +0000 From: Peter Stephenson To: Zsh hackers list Subject: Re: num_in_chars incremented after each mbrtowc() Message-ID: <20151206174037.7c371fd3@ntlworld.com> In-Reply-To: <20151206173355.5b7cce8b@ntlworld.com> References: <20151206154956.104b10c6@ntlworld.com> <20151206173355.5b7cce8b@ntlworld.com> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.28; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 6 Dec 2015 17:33:55 +0000 Peter Stephenson wrote: > Are you saying, for example, that a trailing set of chracters that are > MB_INCOMPLETE appear as a single output (albeit invalid) character (I > guess with a single width)? That would mean the right return value was > > return num + (num_in_char > 0 ? 1 : 0); > > (perhaps that was even what you meant above?) Ah, reading your previous message in the light of the above, I think that *is* what you're saying. OK, as I said there isn't a really "right" answer here, just a convenient one. So if this is what works for you let's go with that. pws diff --git a/Src/utils.c b/Src/utils.c index 45f8286..fc2b192 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -5180,11 +5180,15 @@ mb_metastrlenend(char *ptr, int width, char *eptr) if (ret == MB_INCOMPLETE) { /* - * "num_in_char" is only used for incomplete characters. The - * assumption is that we will output this ocatet as a single + * "num_in_char" is only used for incomplete characters. + * The assumption is that we will output all trailing octets + * that form part of an incomplete character as a single * character (of single width) if we don't get a complete - * character; if we do get a complete character, num_in_char - * becomes irrelevant and is set to zero. + * character. This is purely pragmatic --- I'm not aware + * of a standard way of dealing with incomplete characters. + * + * If we do get a complete character, num_in_char + * becomes irrelevant and is set to zero * * This is in contrast to "num" which counts the characters * or widths in complete characters. The two are summed, @@ -5216,8 +5220,8 @@ mb_metastrlenend(char *ptr, int width, char *eptr) } } - /* If incomplete, treat remainder as trailing single bytes */ - return num + num_in_char; + /* If incomplete, treat remainder as trailing single character */ + return num + (num_in_char ? 1 : 0); } /*