9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] a strange bug in grep
Date: Sun, 30 Mar 2014 02:10:27 -0400	[thread overview]
Message-ID: <f61d04b3ca914447fcd14ef6ada3a89c@brasstown.quanstro.net> (raw)
In-Reply-To: <13277e55555fc0e249f7ce04f144a19f@felloff.net>

On Sat Mar 29 21:46:33 EDT 2014, cinap_lenrek@felloff.net wrote:
> very good.
>
> one question about:
>
> - 		x = re2or(x, rclass(ov, Runemask));
> + 		x = re2or(x, rclass(ov, 0xffff));
>
> this seems wrong for 21 bit runes (the old is also wrong i think).
>
> shouldnt that be:
>
> + 		x = re2or(x, rclass(ov, Runemax));
>
> as Runemask (0x1fffff) is not a valid rune for 21-bit rune
>  as it is >Runemax.

yes, that's correct.  i left it at 0xffff because was still a bug.
tab2 still needs to burst the leading bytes so we enum all
the cases.  i think tab2 should be

Rune	tab2[] =
{
	0x003f,
	0x0fff,
	0x07ffff,
};

since the first byte of the 21-bit rune is 0b11110xxx.

what do you think?

> as i understand it, tab1[] array contains the last valid rune
> in a range of the same utf8 encoding length.
>
> basically:
>
> 0-07f	-> 1 byte, 0x80-0x7ff -> 2 byte  ect...
>
> so adding 0xffff is right. the next would be 0x10ffff for 21 bit
> runes but there shouldnt be any runes above 0x10ffff.
>
> makes any sense?

since the tab1 array is bursting at byte boundaries, the next
birst is at 0x1fffff.  but that's in undefined territory.

- erik



  reply	other threads:[~2014-03-30  6:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-29 23:31 arisawa
2014-03-29 23:54 ` erik quanstrom
2014-03-30  0:51   ` arisawa
2014-03-30  1:44   ` cinap_lenrek
2014-03-30  6:10     ` erik quanstrom [this message]
2014-03-30 15:40       ` cinap_lenrek
2014-03-30 16:26         ` erik quanstrom
2014-03-30 18:05           ` cinap_lenrek
2014-03-30 18:10             ` erik quanstrom
2014-03-30  6:24     ` erik quanstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f61d04b3ca914447fcd14ef6ada3a89c@brasstown.quanstro.net \
    --to=quanstro@quanstro.net \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).