From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] a strange bug in grep
Date: Sun, 30 Mar 2014 02:10:27 -0400 [thread overview]
Message-ID: <f61d04b3ca914447fcd14ef6ada3a89c@brasstown.quanstro.net> (raw)
In-Reply-To: <13277e55555fc0e249f7ce04f144a19f@felloff.net>
On Sat Mar 29 21:46:33 EDT 2014, cinap_lenrek@felloff.net wrote:
> very good.
>
> one question about:
>
> - x = re2or(x, rclass(ov, Runemask));
> + x = re2or(x, rclass(ov, 0xffff));
>
> this seems wrong for 21 bit runes (the old is also wrong i think).
>
> shouldnt that be:
>
> + x = re2or(x, rclass(ov, Runemax));
>
> as Runemask (0x1fffff) is not a valid rune for 21-bit rune
> as it is >Runemax.
yes, that's correct. i left it at 0xffff because was still a bug.
tab2 still needs to burst the leading bytes so we enum all
the cases. i think tab2 should be
Rune tab2[] =
{
0x003f,
0x0fff,
0x07ffff,
};
since the first byte of the 21-bit rune is 0b11110xxx.
what do you think?
> as i understand it, tab1[] array contains the last valid rune
> in a range of the same utf8 encoding length.
>
> basically:
>
> 0-07f -> 1 byte, 0x80-0x7ff -> 2 byte ect...
>
> so adding 0xffff is right. the next would be 0x10ffff for 21 bit
> runes but there shouldnt be any runes above 0x10ffff.
>
> makes any sense?
since the tab1 array is bursting at byte boundaries, the next
birst is at 0x1fffff. but that's in undefined territory.
- erik
next prev parent reply other threads:[~2014-03-30 6:10 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-29 23:31 arisawa
2014-03-29 23:54 ` erik quanstrom
2014-03-30 0:51 ` arisawa
2014-03-30 1:44 ` cinap_lenrek
2014-03-30 6:10 ` erik quanstrom [this message]
2014-03-30 15:40 ` cinap_lenrek
2014-03-30 16:26 ` erik quanstrom
2014-03-30 18:05 ` cinap_lenrek
2014-03-30 18:10 ` erik quanstrom
2014-03-30 6:24 ` erik quanstrom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f61d04b3ca914447fcd14ef6ada3a89c@brasstown.quanstro.net \
--to=quanstro@quanstro.net \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).