From mboxrd@z Thu Jan 1 00:00:00 1970 Mime-Version: 1.0 (Apple Message framework v734) In-Reply-To: <9D17C8E2-2DE4-4E34-A95B-59A6232B132D@ar.aichi-u.ac.jp> References: <9D17C8E2-2DE4-4E34-A95B-59A6232B132D@ar.aichi-u.ac.jp> Content-Type: text/plain; charset=ISO-2022-JP; format=flowed Message-Id: <37F6EFDF-3C9E-48F9-A03F-6ED617BA6166@ar.aichi-u.ac.jp> Content-Transfer-Encoding: 7bit From: arisawa@ar.aichi-u.ac.jp Subject: Re: [9fans] tcs bug Date: Wed, 31 Aug 2005 18:11:49 +0900 To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Topicbox-Message-UUID: 81cedd1c-ead0-11e9-9d60-3106f5b1d025 The bellow is a first-aid bug fix we define read function for utf-8 /* read until utf boundary */ int readu(int fd, char *buf, int n) { static char b[3]; static int nb; int m; char *s, *e; if(nb) memcpy(buf, b, nb); m = read(fd, buf + nb, n - nb); /* 01. x in [00000000.0bbbbbbb] → 0bbbbbbb 10. x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb 11. x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb,10bbbbbb */ e = buf + m + nb; for(s = buf; s < e; s++){ if((*s & 0x80) == 0) continue; if((*s & 0xe0) == 0xd0){ s++; continue; } /* then *s is 111bbbbb */ if(s+2 >= e) break; s += 2; continue; } /* we have e - s bytes in s */ nb = e - s; memcpy(b, s, nb); return s - buf; } and replace 'read' by 'readu' in utf.c utf_in(int fd, long *notused, struct convert *out) { ... while((n = readu(fd, buf+tot, N-tot)) >= 0){ ... } Kenji Arisawa