zsh-workers
 help / color / mirror / code / Atom feed
* Question about mb_metastrlen
@ 2015-10-27  8:31 Sebastian Gniazdowski
  2015-10-27  9:10 ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-27  8:31 UTC (permalink / raw)
  To: zsh-workers

Hello,
the function counts bytes in last incomplete wide character:

        ret = mbrtowc(&wc, &inchar, 1, &mb_shiftstate);
        if (ret == MB_INCOMPLETE) {
            num_in_char++;
        } else {

When returning, it makes use of the count:

    /* If incomplete, treat remainder as trailing single bytes */
    return num + num_in_char;

Strings are stored in wchar_t arrays. The incomplete character will
occupy single index, correct? So maybe the return should be:

    return num + ( num_in_char > 0 ? 1 : 0 );

?

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question about mb_metastrlen
  2015-10-27  8:31 Question about mb_metastrlen Sebastian Gniazdowski
@ 2015-10-27  9:10 ` Peter Stephenson
  2015-10-27 10:34   ` Sebastian Gniazdowski
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Stephenson @ 2015-10-27  9:10 UTC (permalink / raw)
  To: zsh-workers

On Tue, 27 Oct 2015 09:31:02 +0100
Sebastian Gniazdowski <sgniazdowski@gmail.com> wrote:
> Hello,
> the function counts bytes in last incomplete wide character:
> 
>         ret = mbrtowc(&wc, &inchar, 1, &mb_shiftstate);
>         if (ret == MB_INCOMPLETE) {
>             num_in_char++;
>         } else {
> 
> When returning, it makes use of the count:
> 
>     /* If incomplete, treat remainder as trailing single bytes */
>     return num + num_in_char;
> 
> Strings are stored in wchar_t arrays. The incomplete character will
> occupy single index, correct? So maybe the return should be:
> 
>     return num + ( num_in_char > 0 ? 1 : 0 );

The function you're talking about is for a string length, not a
character length.  num_in_char counts the number of trailing bytes that
didn't form a wide character.  Each will be treated as a single byte.
So each counts 1 for the length of the string.

I think your answer would be correct for a function that counts just the
next character (i.e. mb_metacharlenconv()), but I think that's already doing
what you'd expect in that case.

That's the intention of the function, anyway, but if you can see a use
that's inconsistent with it there could be a bug there.

pws


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question about mb_metastrlen
  2015-10-27  9:10 ` Peter Stephenson
@ 2015-10-27 10:34   ` Sebastian Gniazdowski
  2015-10-27 10:50     ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: Sebastian Gniazdowski @ 2015-10-27 10:34 UTC (permalink / raw)
  To: zsh-workers

On 27 October 2015 at 10:10, Peter Stephenson <p.stephenson@samsung.com> wrote:
> On Tue, 27 Oct 2015 09:31:02 +0100
> The function you're talking about is for a string length, not a
> character length.  num_in_char counts the number of trailing bytes that
> didn't form a wide character.  Each will be treated as a single byte.
> So each counts 1 for the length of the string.

There is the condition:
            if (ret == MB_INVALID) {

Isn't it that if there are many trailing bytes that do not form a
character, they will be catched into MB_INVALID, and only last
"character" can stay as not yet complete?

Best regards,
Sebastian Gniazdowski


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question about mb_metastrlen
  2015-10-27 10:34   ` Sebastian Gniazdowski
@ 2015-10-27 10:50     ` Peter Stephenson
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Stephenson @ 2015-10-27 10:50 UTC (permalink / raw)
  To: zsh-workers

On Tue, 27 Oct 2015 11:34:35 +0100
Sebastian Gniazdowski <sgniazdowski@gmail.com> wrote:

> On 27 October 2015 at 10:10, Peter Stephenson <p.stephenson@samsung.com> wrote:
> > On Tue, 27 Oct 2015 09:31:02 +0100
> > The function you're talking about is for a string length, not a
> > character length.  num_in_char counts the number of trailing bytes that
> > didn't form a wide character.  Each will be treated as a single byte.
> > So each counts 1 for the length of the string.
> 
> There is the condition:
>             if (ret == MB_INVALID) {
> 
> Isn't it that if there are many trailing bytes that do not form a
> character, they will be catched into MB_INVALID, and only last
> "character" can stay as not yet complete?

Only the last multibyte character can consist of multiple individual
bytes that look like part of an incomplete character rather than simply
as invalid, that's correct.  Hence the note at the end of the function
about use of num_in_char, and hence we reset num_in_char to 0 any time
we get a full multibyte character or mark a byte as invalid rather than
incomplete.

pws


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-10-27 10:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-27  8:31 Question about mb_metastrlen Sebastian Gniazdowski
2015-10-27  9:10 ` Peter Stephenson
2015-10-27 10:34   ` Sebastian Gniazdowski
2015-10-27 10:50     ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).