From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8424 invoked by alias); 6 Dec 2015 17:34:01 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 37330 Received: (qmail 23638 invoked from network); 6 Dec 2015 17:33:58 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 X-Originating-IP: [86.6.158.222] X-Spam: 0 X-Authority: v=2.1 cv=YMdiskyx c=1 sm=1 tr=0 a=2SBOh4l1h08DI0L+aujZyQ==:117 a=2SBOh4l1h08DI0L+aujZyQ==:17 a=NLZqzBF-AAAA:8 a=kj9zAlcOel0A:10 a=CgTex4cWSfluqKmAsmoA:9 a=0GWmAuUapuaGNSWt:21 a=aDaU8jYmSaWCUyI5:21 a=CjuIK1q_8ugA:10 Date: Sun, 6 Dec 2015 17:33:55 +0000 From: Peter Stephenson To: Sebastian Gniazdowski Cc: Zsh hackers list Subject: Re: num_in_chars incremented after each mbrtowc() Message-ID: <20151206173355.5b7cce8b@ntlworld.com> In-Reply-To: References: <20151206154956.104b10c6@ntlworld.com> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.28; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 6 Dec 2015 18:03:08 +0100 > It should be: > return num + num_in_char > 0 ? 1 : 0; OK, here's my full explanation of what this function actually does; it's not what you say it does, but that doesn't mean I've got it right, of course. However, the real question is the one in the previous message, about what API you actually need for what you're doing. num is the "real" answer for real chracters. It counts: - 1 for MB_INVALID, and we count only the first octet from the input string, then move down the input string for more. We assume we'll represent the character as a single width. - 1 or the width for a valid character, depending on what the caller requested. Under these circumstances num_in_char is irrelevant. num_in_char is only useful if we get some number of bytes for MB_INCOMPLETE. They can only occur if there are no more characters at the end, since otherwise we would get MB_INVALID, not MB_INCOMPLETE. In this case, since we're never going to produce anything else, we *assume* (and this assumption may be wrong) that the right way to deal with it is as individual octets. As there is no standard (for obvious reasons) as to how to deal with incomplete characters, this may not be the most convenient answer in practice. If you think there is a logical error in the above, please state it. Are you saying, for example, that a trailing set of chracters that are MB_INCOMPLETE appear as a single output (albeit invalid) character (I guess with a single width)? That would mean the right return value was return num + (num_in_char > 0 ? 1 : 0); (perhaps that was even what you meant above?) pws