From mboxrd@z Thu Jan 1 00:00:00 1970 Mime-Version: 1.0 (Apple Message framework v734) Content-Transfer-Encoding: quoted-printable Message-Id: <9D17C8E2-2DE4-4E34-A95B-59A6232B132D@ar.aichi-u.ac.jp> Content-Type: text/plain; charset=UTF-8; format=flowed To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> From: arisawa@ar.aichi-u.ac.jp Date: Wed, 31 Aug 2005 15:07:15 +0900 Subject: [9fans] tcs bug Topicbox-Message-UUID: 81c89fec-ead0-11e9-9d60-3106f5b1d025 Sorry I should have sent previous mail using uft-8 code. The following is same as previous one except character code. Hello, tcs both for plan 9 and for unix has a bug in reading utf text. that comes from: utf_in(int fd, long *notused, struct convert *out){ char buf[N]; ... while((n =3D read(fd, buf+tot, N-tot)) >=3D 0){ ... } in utf.c N is assigned to be 10000 in hdr.h if you set N to 10, you will find the problem more clearly: tcs cannot handle correctly utf character boundary. for example, assume a.txt have the content: aaaaaaa=E3=81=93=E3=81=AE term% xd -c a.txt 0000000 a a a a a a a e3 81 93 e3 81 ae \n 000000e tcs can handle this text because N=3D10 is just uft boundary but tcs fails if 'a' are 6 or 8 ... tcs is very important for me. Who maintains tcs ? I might help debugging. Kenji Arisawa