From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <7cfbee84ff7ffe431e0974e4281cb0e5@felloff.net> Date: Sun, 30 Mar 2014 20:05:27 +0200 From: cinap_lenrek@felloff.net To: 9fans@9fans.net In-Reply-To: <47ce4a1e83ccab956e28cc2b42fe2ba6@brasstown.quanstro.net> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] a strange bug in grep Topicbox-Message-UUID: d2eda964-ead8-11e9-9d60-3106f5b1d025 no. lets try example with p1=0x3ffff and p2=0x7ffff and assuming your m=0x7ffff mask in tab2. p1 (0x3ffff) = 0xf0 0xbf 0xbf 0xbf (0b11110000 0b10111111 0b10111111 0b10111111) p2 (0x7ffff) = 0xf1 0xbf 0xbf 0xbf (0b11110001 0b10111111 0b10111111 0b10111111) for m == 0x7ffff (p1 & ~m) == 0 (p2 & ~m) == 0 if((p1 & ~m) != (p2 & ~m)) { ... bust } so this if case is never taken, tho the lead byte in the utf8 encoding is different for this p1-p2 range and the following encoded bytes are not comparable. > 0xxxxxxx > 110xxxxx 10mmmmmm 0x3f -> 6 bits (count m's) > 1110xxxx 10mmmmmm 10mmmmmm 0xfff -> 12 bits (count m's) > 11110xxx 10mmmmmm 10mmmmmm 10mmmmmm 0x3ffff -> 18 bits (count m's) repeat example with 18 bit mask m=0x3ffff we get (p2 & ~m) == 0x40000 then we bust and following encoded bytes are compared in each branch separately. -- cinap