From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <7359f0490508310217c214f1f@mail.gmail.com> Date: Wed, 31 Aug 2005 19:17:14 +1000 From: Rob Pike To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] tcs bug In-Reply-To: <37F6EFDF-3C9E-48F9-A03F-6ED617BA6166@ar.aichi-u.ac.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <9D17C8E2-2DE4-4E34-A95B-59A6232B132D@ar.aichi-u.ac.jp> <37F6EFDF-3C9E-48F9-A03F-6ED617BA6166@ar.aichi-u.ac.jp> Topicbox-Message-UUID: 81d29998-ead0-11e9-9d60-3106f5b1d025 one problem with this fix is that it assumes valid utf-8 input. you're better off using fullrune. -rob On 8/31/05, arisawa@ar.aichi-u.ac.jp wrote: > The bellow is a first-aid bug fix > > we define read function for utf-8 > > /* read until utf boundary */ > int > readu(int fd, char *buf, int n) > { > static char b[3]; > static int nb; > int m; > char *s, *e; > if(nb) > memcpy(buf, b, nb); > m = read(fd, buf + nb, n - nb); > > /* > 01. x in [00000000.0bbbbbbb] → 0bbbbbbb > 10. x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb > 11. x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb,10bbbbbb > */ > > e = buf + m + nb; > for(s = buf; s < e; s++){ > if((*s & 0x80) == 0) > continue; > if((*s & 0xe0) == 0xd0){ > s++; > continue; > } > /* then *s is 111bbbbb */ > if(s+2 >= e) > break; > s += 2; > continue; > } > /* we have e - s bytes in s */ > nb = e - s; > memcpy(b, s, nb); > return s - buf; > } > > and replace 'read' by 'readu' in utf.c > > utf_in(int fd, long *notused, struct convert *out) > { > > ... > while((n = readu(fd, buf+tot, N-tot)) >= 0){ > ... > } > > Kenji Arisawa > >