From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <7359f04905083022574d2fcde9@mail.gmail.com> Date: Wed, 31 Aug 2005 15:57:27 +1000 From: Rob Pike To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] tcs bug In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <19450.1125421653@piper.nectar.cs.cmu.edu> <600308d60508301033589f9f55@mail.gmail.com> Topicbox-Message-UUID: 81c499ec-ead0-11e9-9d60-3106f5b1d025 ah yes, the dreaded partial rune problem. lots of programs must cope with this issue. -rob On 8/31/05, arisawa@ar.aichi-u.ac.jp wrote: > Hello, > > tcs both for plan 9 and for unix has a bug in reading utf text. > that comes from: > utf_in(int fd, long *notused, struct convert *out){ > char buf[N]; > ... > while((n = read(fd, buf+tot, N-tot)) >= 0){ > ... > } > > in utf.c > > N is assigned to be 10000 in hdr.h > > if you set N to 10, you will find the problem more clearly: > tcs cannot handle correctly utf character boundary. > > for example, assume a.txt have the content: > aaaaaaaこの > > term% xd -c a.txt > 0000000 a a a a a a a e3 81 93 e3 81 ae \n > 000000e > > tcs can handle this text because N=10 is just uft boundary > but tcs fails if 'a' are 6 or 8 ... > > tcs is very important for me. > Who maintains tcs ? > I might help debugging. > > Kenji Arisawa > >