* [9fans] tcs bug
@ 2005-08-31 6:07 arisawa
2005-08-31 9:11 ` arisawa
0 siblings, 1 reply; 10+ messages in thread
From: arisawa @ 2005-08-31 6:07 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Sorry I should have sent previous mail using uft-8 code.
The following is same as previous one except character code.
Hello,
tcs both for plan 9 and for unix has a bug in reading utf text.
that comes from:
utf_in(int fd, long *notused, struct convert *out){
char buf[N];
...
while((n = read(fd, buf+tot, N-tot)) >= 0){
...
}
in utf.c
N is assigned to be 10000 in hdr.h
if you set N to 10, you will find the problem more clearly:
tcs cannot handle correctly utf character boundary.
for example, assume a.txt have the content:
aaaaaaaこの
term% xd -c a.txt
0000000 a a a a a a a e3 81 93 e3 81 ae \n
000000e
tcs can handle this text because N=10 is just uft boundary
but tcs fails if 'a' are 6 or 8 ...
tcs is very important for me.
Who maintains tcs ?
I might help debugging.
Kenji Arisawa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] tcs bug
2005-08-31 6:07 [9fans] tcs bug arisawa
@ 2005-08-31 9:11 ` arisawa
2005-08-31 9:17 ` Rob Pike
0 siblings, 1 reply; 10+ messages in thread
From: arisawa @ 2005-08-31 9:11 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
The bellow is a first-aid bug fix
we define read function for utf-8
/* read until utf boundary */
int
readu(int fd, char *buf, int n)
{
static char b[3];
static int nb;
int m;
char *s, *e;
if(nb)
memcpy(buf, b, nb);
m = read(fd, buf + nb, n - nb);
/*
01. x in [00000000.0bbbbbbb] → 0bbbbbbb
10. x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
11. x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb,10bbbbbb
*/
e = buf + m + nb;
for(s = buf; s < e; s++){
if((*s & 0x80) == 0)
continue;
if((*s & 0xe0) == 0xd0){
s++;
continue;
}
/* then *s is 111bbbbb */
if(s+2 >= e)
break;
s += 2;
continue;
}
/* we have e - s bytes in s */
nb = e - s;
memcpy(b, s, nb);
return s - buf;
}
and replace 'read' by 'readu' in utf.c
utf_in(int fd, long *notused, struct convert *out)
{
...
while((n = readu(fd, buf+tot, N-tot)) >= 0){
...
}
Kenji Arisawa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] tcs bug
2005-08-31 9:11 ` arisawa
@ 2005-08-31 9:17 ` Rob Pike
2005-08-31 10:48 ` arisawa
0 siblings, 1 reply; 10+ messages in thread
From: Rob Pike @ 2005-08-31 9:17 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
one problem with this fix is that it assumes valid utf-8 input.
you're better off using fullrune.
-rob
On 8/31/05, arisawa@ar.aichi-u.ac.jp <arisawa@ar.aichi-u.ac.jp> wrote:
> The bellow is a first-aid bug fix
>
> we define read function for utf-8
>
> /* read until utf boundary */
> int
> readu(int fd, char *buf, int n)
> {
> static char b[3];
> static int nb;
> int m;
> char *s, *e;
> if(nb)
> memcpy(buf, b, nb);
> m = read(fd, buf + nb, n - nb);
>
> /*
> 01. x in [00000000.0bbbbbbb] → 0bbbbbbb
> 10. x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
> 11. x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb,10bbbbbb
> */
>
> e = buf + m + nb;
> for(s = buf; s < e; s++){
> if((*s & 0x80) == 0)
> continue;
> if((*s & 0xe0) == 0xd0){
> s++;
> continue;
> }
> /* then *s is 111bbbbb */
> if(s+2 >= e)
> break;
> s += 2;
> continue;
> }
> /* we have e - s bytes in s */
> nb = e - s;
> memcpy(b, s, nb);
> return s - buf;
> }
>
> and replace 'read' by 'readu' in utf.c
>
> utf_in(int fd, long *notused, struct convert *out)
> {
>
> ...
> while((n = readu(fd, buf+tot, N-tot)) >= 0){
> ...
> }
>
> Kenji Arisawa
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] tcs bug
2005-08-31 9:17 ` Rob Pike
@ 2005-08-31 10:48 ` arisawa
2005-08-31 11:22 ` arisawa
0 siblings, 1 reply; 10+ messages in thread
From: arisawa @ 2005-08-31 10:48 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> one problem with this fix is that it assumes valid utf-8 input.
> you're better off using fullrune.
>
more simple and robust solution
that follows forsyth's suggestion
/* read until utf boundary */
int
readu(int fd, char *buf, int n)
{
static char b[3];
static int nb;
int m;
char *s, *e;
if(nb)
memcpy(buf, b, nb);
m = read(fd, buf + nb, n - nb);
/*
01. x in [00000000.0bbbbbbb] → 0bbbbbbb
10. x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
11. x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb,
10bbbbbb
*/
e = buf + m + nb;
for(s = e - 2; s < e; s++){
if((*s & 0xc0) == 0x80)
continue;
if((*s & 0xc0) == 0xc0)
break;
}
/* we have e - s bytes in s */
nb = e - s;
memcpy(b, s, nb);
return s - buf;
}
Kenji Arisawa
^ permalink raw reply [flat|nested] 10+ messages in thread
* [9fans] tcs bug
@ 2005-09-01 0:36 quanstro
0 siblings, 0 replies; 10+ messages in thread
From: quanstro @ 2005-09-01 0:36 UTC (permalink / raw)
To: 9fans
well, somebody's got to do it. ;-)
i guess i didn't think of using bio, having never had access before
p9p.
thanks, russ.
erik
^ permalink raw reply [flat|nested] 10+ messages in thread
* [9fans] tcs bug.
@ 2005-08-31 10:51 quanstro
2005-08-31 21:36 ` Russ Cox
0 siblings, 1 reply; 10+ messages in thread
From: quanstro @ 2005-08-31 10:51 UTC (permalink / raw)
To: 9fans
i just had a similar problem a day or two ago.
i needed to change some capitalization and the
tr 'A-Z' 'a-z' idiom doesn't work on random utf.
i solved it a bit differently -- lifting the fullrune()
check into the main loop. so i don't have a readu()
function. also (unlike tcs) at the cost of 1 extra check
at the end-of-input, the output buffer is dumped only
when full. on japanese, greek or other text with
>1 byte/char, this will save calls to OUT() --
or in my case print().
okay, total overkill. i know. but it was more interesting
to do that way.
here's upper.c. convert to upper/lower/title case:
#include <u.h>
#include <libc.h>
enum { BLOCK = 1024*4 };
typedef Rune (*Rconv)(Rune);
void output(Rune* r, int nrunes, Rconv R){
int i;
for(i=0; i<nrunes; i++){
r[i] = R(r[i]);
}
print("%.*S", nrunes, r);
}
const char* casify(int fd, Rconv R){
char in[BLOCK + UTFmax];
Rune r[BLOCK + UTFmax];
long rem_len;
long blen;
long j;
long i;
rem_len=0;
j = 0;
again: while (0 < (blen = read(fd, in + rem_len, BLOCK))){
blen += rem_len;
for(i=0; i<blen; ){
if (!fullrune(in + i, blen - i)){
rem_len = blen - i;
memcpy(in, in + i, rem_len);
goto again;
}
i += chartorune(r + j++, in + i);
if (j > BLOCK){
output(r, j, R);
j=0;
}
}
}
if (rem_len){
// non unicode garbage.
fprint(2, "non-utf8 garbage %.*s at eof\n", rem_len, in);
}
if (j){
output(r, j, R);
}
if (blen>0){
return 0;
}
return "read";
}
void main(int argc, /* pfft const */ char** argv){
Rconv R;
const char* v;
const char* status;
const char* s;
int fd;
v = strrchr(argv[0], '/');
if (v){
v++;
} else {
v = argv[0];
}
if (0 == strcmp(v, "tolower")){
R = tolowerrune;
} else if (0 == strcmp(v, "totitle")){
R = totitlerune;
} else {
R = toupperrune;
}
ARGBEGIN{
case 'u':
R = toupperrune;
break;
case 'l':
R = tolowerrune;
break;
case 't':
R = totitlerune;
break;
default:
fprint(2, "%s: bad option %c\n", argv0, ARGC());
fprint(2, "usage: %s -[ult]\n", argv0);
exits("usage");
} ARGEND
if (!*argv){
s = casify(0, R);
} else {
for(status = 0; *argv; argv++){
fd = open(*argv, OREAD);
if (-1 == fd){
if (s && !status){
status = "open";
}
continue;
}
s = casify(fd, R);
if (s && !status){
status = s;
}
close(fd);
}
}
exits(status ? status : "");
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] tcs bug.
2005-08-31 10:51 quanstro
@ 2005-08-31 21:36 ` Russ Cox
0 siblings, 0 replies; 10+ messages in thread
From: Russ Cox @ 2005-08-31 21:36 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
You've invented buffered I/O.
#include <u.h>
#include <libc.h>
#include <bio.h>
void
usage(void)
{
fprint(2, "usage: runecvt [-l | -t | -u] [file...]\n");
exits("usage");
}
void
convert(Biobuf *bin, Biobuf *bout, Rune (*fn)(Rune))
{
int c;
while((c = Bgetrune(bin)) != -1)
Bputrune(bout, fn(c));
}
void
main(int argc, char **argv)
{
int i;
Biobuf *b, bin, bout;
Rune (*fn)(Rune);
fn = toupperrune;
ARGBEGIN{
case 'l':
fn = tolowerrune;
break;
case 't':
fn = totitlerune;
break;
case 'u':
fn = toupperrune;
break;
default:
usage();
}ARGEND
Binit(&bout, 1, OWRITE);
if(argc == 0){
Binit(&bin, 0, OREAD);
convert(&bin, &bout, fn);
}else{
for(i=0; i<argc; i++){
if((b = Bopen(argv[i], OREAD)) == nil)
sysfatal("open %s: %r", argv[i]);
convert(b, &bout, fn);
}
}
Bterm(&bout);
exits(nil);
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* [9fans] some Plan9 related ideas
@ 2005-08-29 23:23 Bhanu Nagendra Pisupati
2005-08-30 17:07 ` [9fans] " Dave Eckhardt
0 siblings, 1 reply; 10+ messages in thread
From: Bhanu Nagendra Pisupati @ 2005-08-29 23:23 UTC (permalink / raw)
To: 9fans
Hello,
I am a PhD student at Indiana University working on the application of
some Plan 9 related to the embedded domain. Specifically I am looking at
embedded debugging, configuration and device management using the
distributed virtual filesystem model. I have looked at a few
different types of embedded systems including multi board systems
containing SoC's (system on chips) and sensor networks. I have come up
with a few different ideas (listed below) to address as part of my thesis
work regarding which I hope to get feedback from the wider Plan 9
community. Any thoughts and opinions are greatly appreciated.
Thanks in advance,
-Bhanu
==========================================================================
* Downloadable namespaces
The namespace supported by a filesystem can be modified (possibly using a
configuration file provided by the filesystem) to correspond to changes in
system that the filesystem in encapsulating. For instance, if the
filesystem was to abstract a set of sensor networks, then at start up
time the user would describe the layout of the network to the
filesystem possibly by writing it as an XML description to the
configuration file and
modify(initialize) the namespace. If the layout of the
network changes in due course, the namespace could be similarly modified
by using the configuration file to reflect the change.
The basic idea is to be able to describe and modify the layout of the
namespace of a filesystem namespace dynamically.
* 'Tailcall optimizations' for filesystems with other mounted filesystems
Consider a filesystem with the layout shown below:
FS1
/ | \
f1 f2 FS2
/ \
f3 f4
Filesystem FS1 contains 2 files (f1 & f2) locally apart from
having a mounted filesystem FS2. To a client that mounts FS1, the fact
that it has a mounted filesystem (FS2) is transparent. When a client tries
to access file in the mounted filesystem (f3 in FS2) then, FS1 passes the
request on to FS2 which processes
the request hands the result back to FS1 which then returns the result to
the client. However this operation could be made more efficient if FS2
could be made to return the result directly to the client rather than
sending it to an upstream filesystem. This is analogous to the tail call
optimization in compilers where a function call made in the tail position
of a subroutine returns directly to the original caller rather than
returning to the subroutine and then having it return to the caller.
The situation obviously get progressively worse as the number of mounted
filesystems in a chain get longer.
This idea is based on the VMTP protocol.
* Macro messages
Lightweight clients (such as microcontrollers) that communicate with a
fileserver using 9P protocol over flaky radio connections would benefit
from being able to compose several messages (eg: OPEN+READ+CLUNK)
together a a single macro packet. This because being able to send one
larger packet takes much lesser power than taking multiple smaller
packets. Also when multiple devices send data over radio, getting
access to a free time slot to communicate is hard. So it makes sense to
limit the number of occasions when messages have to be sent. Also if
in most cases, the number of operations performed during the time a file
is open are small, it limits the number of open files and corresponding
the state that needs to be stored for fids.
* Stateless variants of 9P
This is more hypothetical, but the basic idea is to design and use a
variant of 9P which is stateless (or uses little state) and hence
better suited for use on devices with little RAM
==========================================================================
^ permalink raw reply [flat|nested] 10+ messages in thread
* [9fans] Re: some Plan9 related ideas
2005-08-29 23:23 [9fans] some Plan9 related ideas Bhanu Nagendra Pisupati
@ 2005-08-30 17:07 ` Dave Eckhardt
2005-08-30 17:33 ` Francisco Ballesteros
0 siblings, 1 reply; 10+ messages in thread
From: Dave Eckhardt @ 2005-08-30 17:07 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> 'Tailcall optimizations' for filesystems with other mounted
> filesystems
In Plan 9 all mounts are done at the client side, so this
wouldn't be an optimization--it's the only case.
> * Macro messages
> Lightweight clients (such as microcontrollers) that communicate
> with a fileserver using 9P protocol over flaky radio connections
> would benefit from being able to compose several messages (eg:
> OPEN+READ+CLUNK) together a a single macro packet.
9P runs over TCP, so I don't think there's a packet-boundary
problem here. Getting a client to send the three messages
at once would seem to be the problem, since at present the
open() system call won't complete until the 9P message gets
its reply... there isn't an open()+read()+close() system
call. And you'd probably need to reinvent or clone existing
work on batch RPC's to do things like fill the result of the
OPEN into the subsequent READ request.
> Also if in most cases, the number of operations performed
> during the time a file is open are small, it limits the number
> of open files and corresponding the state that needs to be
> stored for fids.
Collecting or finding somebody else's data on that "if" might
be a good first step.
Dave Eckhardt
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] Re: some Plan9 related ideas
2005-08-30 17:07 ` [9fans] " Dave Eckhardt
@ 2005-08-30 17:33 ` Francisco Ballesteros
2005-08-30 17:46 ` Russ Cox
0 siblings, 1 reply; 10+ messages in thread
From: Francisco Ballesteros @ 2005-08-30 17:33 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
We added two (library) calls readf and writef that perform file I/O
besides resolving the name. We found ourselves calling them a lot, because
in many cases it's very convenient. They would be
an opportunity to "batch" walk/open/read(s)/clunk, which happen
a lot. However, this would require changing the kernel (even more than we
did for Plan B).
I think the man page is at http://planb.lsub.org/magic/man2html/2/readf
We would be very interested on implementing this change for Plan 9
because we found
that the main problem we have in Plan B
is the latency accessing the various file servers,
because of the addition of the individual RPC turn around times.
Any thought on this?
On 8/30/05, Dave Eckhardt <davide+p9@cs.cmu.edu> wrote:
> > 'Tailcall optimizations' for filesystems with other mounted
> > filesystems
>
> In Plan 9 all mounts are done at the client side, so this
> wouldn't be an optimization--it's the only case.
>
> > * Macro messages
> > Lightweight clients (such as microcontrollers) that communicate
> > with a fileserver using 9P protocol over flaky radio connections
> > would benefit from being able to compose several messages (eg:
> > OPEN+READ+CLUNK) together a a single macro packet.
>
> 9P runs over TCP, so I don't think there's a packet-boundary
> problem here. Getting a client to send the three messages
> at once would seem to be the problem, since at present the
> open() system call won't complete until the 9P message gets
> its reply... there isn't an open()+read()+close() system
> call. And you'd probably need to reinvent or clone existing
> work on batch RPC's to do things like fill the result of the
> OPEN into the subsequent READ request.
>
> > Also if in most cases, the number of operations performed
> > during the time a file is open are small, it limits the number
> > of open files and corresponding the state that needs to be
> > stored for fids.
>
> Collecting or finding somebody else's data on that "if" might
> be a good first step.
>
> Dave Eckhardt
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] Re: some Plan9 related ideas
2005-08-30 17:33 ` Francisco Ballesteros
@ 2005-08-30 17:46 ` Russ Cox
2005-08-31 5:54 ` [9fans] tcs bug arisawa
0 siblings, 1 reply; 10+ messages in thread
From: Russ Cox @ 2005-08-30 17:46 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
I don't know whether the change is worth doing, but
here is a simple way to do it. Define that a client may
send more than one message with the same tag, and
in that case servers must process those messages
sequentially. This is not very hard to implement on the
server side, and the single-threaded servers needn't
change at all.
Now to implement the so-called batch RPC, you just
send three messages in a row with the same tag:
tag Topen fid name mode
tag Twrite fid offset data
tag Tclunk fid
and then wait for three responses to come back.
Since the client has complete control over the choice of
tags and fids, there is no information in the R messages
needed to generate the T messages. The various results
are completely distinguishable: on success you get
back Ropen, Rwrite, Rclunk, If the Topen fails, then you'll
get back Rerror, Rerror (unknown fid), Rclunk. If the Twrite
fails you'll get Ropen, Rerror, Rclunk.
I have no idea whether this is worth doing. My gut reaction
is no, but maybe someone will prove me wrong. My point
is only that the protocol need hardly change.
Russ
^ permalink raw reply [flat|nested] 10+ messages in thread
* [9fans] tcs bug
2005-08-30 17:46 ` Russ Cox
@ 2005-08-31 5:54 ` arisawa
2005-08-31 5:57 ` Rob Pike
0 siblings, 1 reply; 10+ messages in thread
From: arisawa @ 2005-08-31 5:54 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Hello,
tcs both for plan 9 and for unix has a bug in reading utf text.
that comes from:
utf_in(int fd, long *notused, struct convert *out){
char buf[N];
...
while((n = read(fd, buf+tot, N-tot)) >= 0){
...
}
in utf.c
N is assigned to be 10000 in hdr.h
if you set N to 10, you will find the problem more clearly:
tcs cannot handle correctly utf character boundary.
for example, assume a.txt have the content:
aaaaaaaこの
term% xd -c a.txt
0000000 a a a a a a a e3 81 93 e3 81 ae \n
000000e
tcs can handle this text because N=10 is just uft boundary
but tcs fails if 'a' are 6 or 8 ...
tcs is very important for me.
Who maintains tcs ?
I might help debugging.
Kenji Arisawa
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [9fans] tcs bug
2005-08-31 5:54 ` [9fans] tcs bug arisawa
@ 2005-08-31 5:57 ` Rob Pike
0 siblings, 0 replies; 10+ messages in thread
From: Rob Pike @ 2005-08-31 5:57 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
ah yes, the dreaded partial rune problem. lots of programs
must cope with this issue.
-rob
On 8/31/05, arisawa@ar.aichi-u.ac.jp <arisawa@ar.aichi-u.ac.jp> wrote:
> Hello,
>
> tcs both for plan 9 and for unix has a bug in reading utf text.
> that comes from:
> utf_in(int fd, long *notused, struct convert *out){
> char buf[N];
> ...
> while((n = read(fd, buf+tot, N-tot)) >= 0){
> ...
> }
>
> in utf.c
>
> N is assigned to be 10000 in hdr.h
>
> if you set N to 10, you will find the problem more clearly:
> tcs cannot handle correctly utf character boundary.
>
> for example, assume a.txt have the content:
> aaaaaaaこの
>
> term% xd -c a.txt
> 0000000 a a a a a a a e3 81 93 e3 81 ae \n
> 000000e
>
> tcs can handle this text because N=10 is just uft boundary
> but tcs fails if 'a' are 6 or 8 ...
>
> tcs is very important for me.
> Who maintains tcs ?
> I might help debugging.
>
> Kenji Arisawa
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-09-01 0:36 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-31 6:07 [9fans] tcs bug arisawa
2005-08-31 9:11 ` arisawa
2005-08-31 9:17 ` Rob Pike
2005-08-31 10:48 ` arisawa
2005-08-31 11:22 ` arisawa
-- strict thread matches above, loose matches on Subject: below --
2005-09-01 0:36 quanstro
2005-08-31 10:51 quanstro
2005-08-31 21:36 ` Russ Cox
2005-08-29 23:23 [9fans] some Plan9 related ideas Bhanu Nagendra Pisupati
2005-08-30 17:07 ` [9fans] " Dave Eckhardt
2005-08-30 17:33 ` Francisco Ballesteros
2005-08-30 17:46 ` Russ Cox
2005-08-31 5:54 ` [9fans] tcs bug arisawa
2005-08-31 5:57 ` Rob Pike
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).