[9fans] some Plan9 related ideas

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] some Plan9 related ideas
@ 2005-08-29 23:23 Bhanu Nagendra Pisupati
  2005-08-30 12:27 ` Sape Mullender
  2005-08-30 17:07 ` [9fans] " Dave Eckhardt
  0 siblings, 2 replies; 23+ messages in thread
From: Bhanu Nagendra Pisupati @ 2005-08-29 23:23 UTC (permalink / raw)
  To: 9fans

Hello,
I am a PhD student at Indiana University working on the application of
some Plan 9 related to the embedded domain. Specifically I am looking at
embedded debugging, configuration and device management using the
distributed virtual filesystem model. I have looked at a few
different types of embedded systems including multi board systems
containing SoC's (system on chips) and sensor networks. I have come up
with a few different ideas (listed below) to address as part of my thesis
work regarding which I hope to get feedback from the wider Plan 9
community. Any thoughts and opinions are greatly appreciated.
Thanks in advance,
-Bhanu

==========================================================================

* Downloadable namespaces
The namespace supported by a filesystem can be modified (possibly using a
configuration file provided by the filesystem) to correspond to changes in
system that the filesystem in encapsulating. For instance, if the
filesystem was to abstract a set of sensor networks, then at start up
time the user would describe the layout of the network to the
filesystem possibly by writing it as an XML description to the
configuration file and
modify(initialize) the namespace. If the layout of the
network changes in due course, the namespace could be similarly modified
by using the configuration file to reflect the change.

The basic idea is to be able to describe and modify the layout of the
namespace of a filesystem namespace dynamically.

* 'Tailcall optimizations' for filesystems with other mounted filesystems

Consider a filesystem with the layout shown below:

            FS1
          /  |  \
         f1  f2  FS2
                 / \
                f3  f4

 Filesystem FS1 contains 2 files (f1 & f2) locally apart from
having a mounted filesystem FS2. To a client that mounts FS1, the fact
that it has a mounted filesystem (FS2) is transparent. When a client tries
to access file in the mounted filesystem (f3 in FS2) then, FS1 passes the
request on to FS2 which processes
the request hands the result back to FS1 which then returns the result to
the client. However this operation could be made more efficient if FS2
could be made to return the result directly to the client rather than
sending it to an upstream filesystem. This is analogous to the tail call
optimization in compilers where a function call made in the tail position
of a subroutine returns directly to the original caller rather than
returning to the subroutine and then having it return to the caller.
The situation obviously get progressively worse as the number of mounted
filesystems in a chain get longer.

This idea is based on the VMTP protocol.

* Macro messages
Lightweight clients (such as microcontrollers) that communicate with a
fileserver using 9P protocol over flaky radio connections would benefit
from being able to compose several messages (eg: OPEN+READ+CLUNK)
together a a single macro packet. This because being able to send one
larger packet takes much lesser power than taking multiple smaller
packets. Also when multiple devices send data over radio, getting
access to a free time slot to communicate is hard. So it makes sense to
limit the number of occasions when messages have to be sent. Also if
in most cases, the number of operations performed during the time a file
is open are small, it limits the number of open files and corresponding
the state that needs to be stored for fids.

* Stateless variants of 9P
This is more hypothetical, but the basic idea is to design and use a
variant of 9P which is stateless (or uses little state) and hence
better suited for use on devices with little RAM

==========================================================================

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] some Plan9 related ideas
  2005-08-29 23:23 [9fans] some Plan9 related ideas Bhanu Nagendra Pisupati
@ 2005-08-30 12:27 ` Sape Mullender
  2005-08-30 15:21   ` Francisco Ballesteros
  2005-08-30 17:07 ` [9fans] " Dave Eckhardt
  1 sibling, 1 reply; 23+ messages in thread
From: Sape Mullender @ 2005-08-30 12:27 UTC (permalink / raw)
  To: 9fans

> I am a PhD student at Indiana University working on the application of
> some Plan 9 related to the embedded domain. Specifically I am looking at
> embedded debugging, configuration and device management using the
> distributed virtual filesystem model. I have looked at a few
> different types of embedded systems including multi board systems
> containing SoC's (system on chips) and sensor networks. I have come up
> with a few different ideas (listed below) to address as part of my thesis
> work regarding which I hope to get feedback from the wider Plan 9
> community. Any thoughts and opinions are greatly appreciated.

That's great.  At the labs we're also using Plan 9 as an embedded operating
system and we're demonstrating to the old-fashioned  folks who believe
in dedicated embedded systems from manufacturares that shall remain nameless
that we can get things going reliably in a fraction of the time.

> * Downloadable namespaces
> The namespace supported by a filesystem can be modified (possibly using a
> configuration file provided by the filesystem) to correspond to changes in
> system that the filesystem in encapsulating.

Right now, Plan 9 interprets the file /lib/namespace when it starts up
(and when it starts up applications that respond to external requests).
That describes a tailored — but not dynamic — name space already.

> For instance, if the
> filesystem was to abstract a set of sensor networks, then at start up
> time the user would describe the layout of the network to the
> filesystem possibly by writing it as an XML description to the
> configuration file and
> modify(initialize) the namespace. If the layout of the
> network changes in due course, the namespace could be similarly modified
> by using the configuration file to reflect the change.

XML?  You must like overkill.  First question is, does each node have it's
own local file system or does it get its files from a file server?  If the latter,
I can imagine building a file system that serves different nodes with different,
tailored and dynamic, content.  If the former, I can imagine a distributed
implementation of such a service.  If your name space is highly dynamic,
mounting and unmounting may not be the most efficient or flexible path.
A file server that adapts the files it serves to the sensor node's needs may
be a more powerful way, as well as one that is easier to realize.

	Sape

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] some Plan9 related ideas
  2005-08-30 12:27 ` Sape Mullender
@ 2005-08-30 15:21   ` Francisco Ballesteros
  2005-08-30 15:25     ` Francisco Ballesteros
  0 siblings, 1 reply; 23+ messages in thread
From: Francisco Ballesteros @ 2005-08-30 15:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> > * Downloadable namespaces
> > The namespace supported by a filesystem can be modified (possibly using a
> > configuration file provided by the filesystem) to correspond to changes in
> > system that the filesystem in encapsulating.
> 
> Right now, Plan 9 interprets the file /lib/namespace when it starts up
> (and when it starts up applications that respond to external requests).
> That describes a tailored — but not dynamic — name space already.

Plan B changes this to provide dynamic name spaces, where you request
resources but do not directly mount them. See at lsub.org info about Plan B.
In a few days, I'll be implementing this service as a user level
process in Plan 9
(following  a suggestion from rsc). In fact, this is part of our effort
 to integrate our system services back into Plan 9.

For example, we are using RFID and X10 servers (which are small enough that
you might embed them). Our client machines run:
     mount -U  /devs/x10!L136 /n/x10
to request mounting a *U*nion of file trees at /n/x10 for any server
discovered, e.g.,  with name /devs/x10 and location "136"

Is this what you wanted? If so, our modified Plan 9 (i.e. Plan B)
already provides
that.

As Sape said, XML  would be an overkill (though it might be easier
to publish ;-)) ).  We use the "/devs/x10!L136!Unemo..." strings to
specify things.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] some Plan9 related ideas
  2005-08-30 15:21   ` Francisco Ballesteros
@ 2005-08-30 15:25     ` Francisco Ballesteros
  0 siblings, 0 replies; 23+ messages in thread
From: Francisco Ballesteros @ 2005-08-30 15:25 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Forgot to say, your "layout", we build by mounting whatever we want.
We can mount all the devices at a single directory, or group them by
location:

for (l in $locations)
    mount -U /devs/x10!L$l /n/x10$l

You can always automate this, thus there is no need for force or write
a particular spec.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [9fans] Re: some Plan9 related ideas
  2005-08-29 23:23 [9fans] some Plan9 related ideas Bhanu Nagendra Pisupati
  2005-08-30 12:27 ` Sape Mullender
@ 2005-08-30 17:07 ` Dave Eckhardt
  2005-08-30 17:33   ` Francisco Ballesteros
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Eckhardt @ 2005-08-30 17:07 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> 'Tailcall optimizations' for filesystems with other mounted
> filesystems

In Plan 9 all mounts are done at the client side, so this
wouldn't be an optimization--it's the only case.

> * Macro messages
> Lightweight clients (such as microcontrollers) that communicate
> with a fileserver using 9P protocol over flaky radio connections
> would benefit from being able to compose several messages (eg:
> OPEN+READ+CLUNK) together a a single macro packet.

9P runs over TCP, so I don't think there's a packet-boundary
problem here.  Getting a client to send the three messages
at once would seem to be the problem, since at present the
open() system call won't complete until the 9P message gets
its reply... there isn't an open()+read()+close() system
call.  And you'd probably need to reinvent or clone existing
work on batch RPC's to do things like fill the result of the
OPEN into the subsequent READ request.

> Also if in most cases, the number of operations performed
> during the time a file is open are small, it limits the number
> of open files and corresponding the state that needs to be
> stored for fids.

Collecting or finding somebody else's data on that "if" might
be a good first step.

Dave Eckhardt

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Re: some Plan9 related ideas
  2005-08-30 17:07 ` [9fans] " Dave Eckhardt
@ 2005-08-30 17:33   ` Francisco Ballesteros
  2005-08-30 17:46     ` Russ Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Francisco Ballesteros @ 2005-08-30 17:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

We added two (library) calls readf and writef that perform file I/O
besides resolving the name. We found ourselves calling them a lot, because
in many cases it's very convenient. They would be
an opportunity to "batch" walk/open/read(s)/clunk, which happen
a lot. However, this would require changing the kernel (even more than we
did for Plan B).

I think the man page is at http://planb.lsub.org/magic/man2html/2/readf

We would be very interested on implementing this change for Plan 9
because we found
that the main problem we have in Plan B
is  the latency accessing the various file servers,
because of the addition of the individual RPC  turn around times.

Any thought on this? 

On 8/30/05, Dave Eckhardt <davide+p9@cs.cmu.edu> wrote:
> > 'Tailcall optimizations' for filesystems with other mounted
> > filesystems
> 
> In Plan 9 all mounts are done at the client side, so this
> wouldn't be an optimization--it's the only case.
> 
> > * Macro messages
> > Lightweight clients (such as microcontrollers) that communicate
> > with a fileserver using 9P protocol over flaky radio connections
> > would benefit from being able to compose several messages (eg:
> > OPEN+READ+CLUNK) together a a single macro packet.
> 
> 9P runs over TCP, so I don't think there's a packet-boundary
> problem here.  Getting a client to send the three messages
> at once would seem to be the problem, since at present the
> open() system call won't complete until the 9P message gets
> its reply... there isn't an open()+read()+close() system
> call.  And you'd probably need to reinvent or clone existing
> work on batch RPC's to do things like fill the result of the
> OPEN into the subsequent READ request.
> 
> > Also if in most cases, the number of operations performed
> > during the time a file is open are small, it limits the number
> > of open files and corresponding the state that needs to be
> > stored for fids.
> 
> Collecting or finding somebody else's data on that "if" might
> be a good first step.
> 
> Dave Eckhardt
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Re: some Plan9 related ideas
  2005-08-30 17:33   ` Francisco Ballesteros
@ 2005-08-30 17:46     ` Russ Cox
  2005-08-31  5:54       ` [9fans] tcs bug arisawa
  2005-10-17  7:14       ` [9fans] Re: some Plan9 related ideas Uriel
  0 siblings, 2 replies; 23+ messages in thread
From: Russ Cox @ 2005-08-30 17:46 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I don't know whether the change is worth doing, but
here is a simple way to do it.  Define that a client may
send more than one message with the same tag, and
in that case servers must process those messages
sequentially.  This is not very hard to implement on the
server side, and the single-threaded servers needn't
change at all.

Now to implement the so-called batch RPC, you just
send three messages in a row with the same tag:

  tag Topen fid name mode
  tag Twrite fid offset data
  tag Tclunk fid

and then wait for three responses to come back.
Since the client has complete control over the choice of
tags and fids, there is no information in the R messages
needed to generate the T messages.  The various results
are completely distinguishable: on success you get
back Ropen, Rwrite, Rclunk,  If the Topen fails, then you'll
get back Rerror, Rerror (unknown fid), Rclunk.  If the Twrite
fails you'll get Ropen, Rerror, Rclunk.

I have no idea whether this is worth doing.  My gut reaction
is no, but maybe someone will prove me wrong.  My point
is only that the protocol need hardly change.

Russ

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [9fans] tcs bug
  2005-08-30 17:46     ` Russ Cox
@ 2005-08-31  5:54       ` arisawa
  2005-08-31  5:57         ` Rob Pike
  2005-10-17  7:14       ` [9fans] Re: some Plan9 related ideas Uriel
  1 sibling, 1 reply; 23+ messages in thread
From: arisawa @ 2005-08-31  5:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello,

tcs both for plan 9 and for unix has a bug in reading utf text.
that comes from:
utf_in(int fd, long *notused, struct convert *out){
     char buf[N];
     ...
     while((n = read(fd, buf+tot, N-tot)) >= 0){
         ...
}

in utf.c

N is assigned to be 10000 in hdr.h

if you set N to 10, you will find the problem more clearly:
tcs cannot handle correctly utf character boundary.

for example, assume a.txt have the content:
aaaaaaaこの

term% xd -c a.txt
0000000   a  a  a  a  a  a  a e3 81 93 e3 81 ae \n
000000e

tcs can handle this text because N=10 is just uft boundary
but tcs fails if 'a' are 6 or 8 ...

tcs is very important for me.
Who maintains tcs ?
I might help debugging.

Kenji Arisawa



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] tcs bug
  2005-08-31  5:54       ` [9fans] tcs bug arisawa
@ 2005-08-31  5:57         ` Rob Pike
  0 siblings, 0 replies; 23+ messages in thread
From: Rob Pike @ 2005-08-31  5:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

ah yes, the dreaded partial rune problem. lots of programs
must cope with this issue.

-rob

On 8/31/05, arisawa@ar.aichi-u.ac.jp <arisawa@ar.aichi-u.ac.jp> wrote:
> Hello,
> 
> tcs both for plan 9 and for unix has a bug in reading utf text.
> that comes from:
> utf_in(int fd, long *notused, struct convert *out){
>      char buf[N];
>      ...
>      while((n = read(fd, buf+tot, N-tot)) >= 0){
>          ...
> }
> 
> in utf.c
> 
> N is assigned to be 10000 in hdr.h
> 
> if you set N to 10, you will find the problem more clearly:
> tcs cannot handle correctly utf character boundary.
> 
> for example, assume a.txt have the content:
> aaaaaaaこの
> 
> term% xd -c a.txt
> 0000000   a  a  a  a  a  a  a e3 81 93 e3 81 ae \n
> 000000e
> 
> tcs can handle this text because N=10 is just uft boundary
> but tcs fails if 'a' are 6 or 8 ...
> 
> tcs is very important for me.
> Who maintains tcs ?
> I might help debugging.
> 
> Kenji Arisawa
> 
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Re: some Plan9 related ideas
  2005-08-30 17:46     ` Russ Cox
  2005-08-31  5:54       ` [9fans] tcs bug arisawa
@ 2005-10-17  7:14       ` Uriel
  2005-10-17 11:23         ` Russ Cox
  1 sibling, 1 reply; 23+ messages in thread
From: Uriel @ 2005-10-17  7:14 UTC (permalink / raw)
  To: 9fans

I was reading old archives, and I'm probably a bit dense; but what is
the reason to use the same tag for the three messages?

Wouldn't that be able to break a server that expected tags not to be
reused until the corresponding Rmessage had been sent? Or I'm probably
misunderstanding how the tag is used...

And what are the kernel changes nemo mentioned?

I have to add that this is a big deal for connections with latencies 
of >100ms. 

I was also thinking of some library that would make it easier to add
fcp(1)-like functionality to other apps, but that is not very general.

uriel

On Tue, Aug 30, 2005 at 01:46:40PM -0400, Russ Cox wrote:
> I don't know whether the change is worth doing, but
> here is a simple way to do it.  Define that a client may
> send more than one message with the same tag, and
> in that case servers must process those messages
> sequentially.  This is not very hard to implement on the
> server side, and the single-threaded servers needn't
> change at all.
> 
> Now to implement the so-called batch RPC, you just
> send three messages in a row with the same tag:
> 
>   tag Topen fid name mode
>   tag Twrite fid offset data
>   tag Tclunk fid
> 
> and then wait for three responses to come back.
> Since the client has complete control over the choice of
> tags and fids, there is no information in the R messages
> needed to generate the T messages.  The various results
> are completely distinguishable: on success you get
> back Ropen, Rwrite, Rclunk,  If the Topen fails, then you'll
> get back Rerror, Rerror (unknown fid), Rclunk.  If the Twrite
> fails you'll get Ropen, Rerror, Rclunk.
> 
> I have no idea whether this is worth doing.  My gut reaction
> is no, but maybe someone will prove me wrong.  My point
> is only that the protocol need hardly change.
> 
> Russ


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Re: some Plan9 related ideas
  2005-10-17  7:14       ` [9fans] Re: some Plan9 related ideas Uriel
@ 2005-10-17 11:23         ` Russ Cox
  2005-10-17 12:45           ` Uriel
  0 siblings, 1 reply; 23+ messages in thread
From: Russ Cox @ 2005-10-17 11:23 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> I was reading old archives, and I'm probably a bit dense; but what is
> the reason to use the same tag for the three messages?

The reason is you don't have to wait for the response to the first
before sending the second and third, avoiding two round trip times.

> Wouldn't that be able to break a server that expected tags not to be
> reused until the corresponding Rmessage had been sent?

Yes, but I did say I was redefining the protocol.  And single-threaded
servers (the majority of our servers by code volume) don't care.

    Define that a client may send more than one message
    with the same tag, and in that case servers must process
    those messages sequentially.  This is not very hard to
    implement on the server side, and the single-threaded
    servers needn't change at all.

Russ


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Re: some Plan9 related ideas
  2005-10-17 11:23         ` Russ Cox
@ 2005-10-17 12:45           ` Uriel
  2005-10-17 13:03             ` Russ Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Uriel @ 2005-10-17 12:45 UTC (permalink / raw)
  To: 9fans

On Mon, Oct 17, 2005 at 07:23:39AM -0400, Russ Cox wrote:
> > I was reading old archives, and I'm probably a bit dense; but what is
> > the reason to use the same tag for the three messages?
> 
> The reason is you don't have to wait for the response to the first
> before sending the second and third, avoiding two round trip times.
Yes, but what I didn't understand is why you needed to use the same tag,
I thought you could do this without chaning the protocol.

After some discussion in #plan9 we guessed that the reason is threaded
servers...

Could you explain with more detail how it would work from the (threaded)
server POV? I was thinking that the server could use the fid to avoid
threads stepping into each other, and still avoid having to change the
protocol at all...

And I'm still curious what kernel changes nemo was talking about.

Sorry for being dense

uriel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Re: some Plan9 related ideas
  2005-10-17 12:45           ` Uriel
@ 2005-10-17 13:03             ` Russ Cox
  2005-10-17 13:22               ` Uriel
  0 siblings, 1 reply; 23+ messages in thread
From: Russ Cox @ 2005-10-17 13:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> > > I was reading old archives, and I'm probably a bit dense; but what is
> > > the reason to use the same tag for the three messages?

> > The reason is you don't have to wait for the response to the first
> > before sending the second and third, avoiding two round trip times.

> Yes, but what I didn't understand is why you needed to use the same tag,
> I thought you could do this without chaning the protocol.

I redefined use of the same tag to mean "you have to finish the
previous message with this tag before processing this message",
so that if you send a Topen followed by a Tread and the open
blocks for whatever reason (disk i/o, say), the remote server
doesn't try to run the Tread and send back a "fid not in use"
error or some such.

> Could you explain with more detail how it would work from the (threaded)
> server POV? I was thinking that the server could use the fid to avoid
> threads stepping into each other, and still avoid having to change the
> protocol at all...

The threaded server would just have a list of requests associated with
each tag instead of a single request.  When it finishes one it can move
on to the next.

Under the current protocol you are not allowed to send a Tread request
using a fid that the server has not acknowleged via Rattach or Ropen.
So your approach still requires redefining the protocol.  Also I might have
multiple I/Os going on and not care what order they get handled.
Synchronization based on the fid changes current situations.  Basing it
on the tag uses what were previously illegal situations.

> And I'm still curious what kernel changes nemo was talking about.

Read his post where he talks about mount -U.

If you mean readf and writef, those weren't kernel changes.
They were the obvious library wrappers.

Russ

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Re: some Plan9 related ideas
  2005-10-17 13:03             ` Russ Cox
@ 2005-10-17 13:22               ` Uriel
  2005-10-17 15:14                 ` Russ Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Uriel @ 2005-10-17 13:22 UTC (permalink / raw)
  To: 9fans

On Mon, Oct 17, 2005 at 09:03:22AM -0400, Russ Cox wrote:
> I redefined use of the same tag to mean "you have to finish the
> previous message with this tag before processing this message",
> so that if you send a Topen followed by a Tread and the open
> blocks for whatever reason (disk i/o, say), the remote server
> doesn't try to run the Tread and send back a "fid not in use"
> error or some such.
Yes, that would be a problem with threaded servers, as you indicated.
 
> > Could you explain with more detail how it would work from the (threaded)
> > server POV? I was thinking that the server could use the fid to avoid
> > threads stepping into each other, and still avoid having to change the
> > protocol at all...
> 
> The threaded server would just have a list of requests associated with
> each tag instead of a single request.  When it finishes one it can move
> on to the next.
> 
> Under the current protocol you are not allowed to send a Tread request
> using a fid that the server has not acknowleged via Rattach or Ropen.
Ah! I was not aware of this restriction.

> So your approach still requires redefining the protocol.  Also I might have
> multiple I/Os going on and not care what order they get handled.
> Synchronization based on the fid changes current situations.  Basing it
> on the tag uses what were previously illegal situations.
Good point. I understand better now the reasons for your approach.
 
> > And I'm still curious what kernel changes nemo was talking about.
> 
> Read his post where he talks about mount -U.
> 
> If you mean readf and writef, those weren't kernel changes.
> They were the obvious library wrappers.
Sorry, I was not clear enough, I was referring to this:

"We added two (library) calls readf and writef that perform file I/O
besides resolving the name. We found ourselves calling them a lot, because
in many cases it's very convenient. They would be
an opportunity to "batch" walk/open/read(s)/clunk, which happen
a lot. However, this would require changing the kernel (even more than we
did for Plan B)."

(BTW, I get the feeling that readf/writef might be convenient, but you could
easily end up sending superfluous walks/opens/clunks) 

And I'm not convinced by Plan B's style of 'one value per file' for (almost?)
everything, maybe it smells to me too much of linux's sysfs :)

(Nothing wrong with 'one value per file' where it makes sense, I just don't
think it works well as a general rule)

uriel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] Re: some Plan9 related ideas
  2005-10-17 13:22               ` Uriel
@ 2005-10-17 15:14                 ` Russ Cox
  0 siblings, 0 replies; 23+ messages in thread
From: Russ Cox @ 2005-10-17 15:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> "We added two (library) calls readf and writef that perform file I/O
> besides resolving the name. We found ourselves calling them a lot, because
> in many cases it's very convenient. They would be
> an opportunity to "batch" walk/open/read(s)/clunk, which happen
> a lot. However, this would require changing the kernel (even more than we
> did for Plan B)."

Well once you've changed the protocol to accomodate some
form of batching, changes are good you'll have to change
the kernel to use it.

Russ


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [9fans] tcs bug
@ 2005-09-01  0:36 quanstro
  0 siblings, 0 replies; 23+ messages in thread
From: quanstro @ 2005-09-01  0:36 UTC (permalink / raw)
  To: 9fans

well, somebody's got to do it. ;-)

i guess i didn't think of using bio, having never had access before
p9p.

thanks, russ.

erik

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] tcs bug.
  2005-08-31 10:51 quanstro
@ 2005-08-31 21:36 ` Russ Cox
  0 siblings, 0 replies; 23+ messages in thread
From: Russ Cox @ 2005-08-31 21:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

You've invented buffered I/O.

#include <u.h>
#include <libc.h>
#include <bio.h>

void
usage(void)
{
	fprint(2, "usage: runecvt [-l | -t | -u] [file...]\n");
	exits("usage");
}

void
convert(Biobuf *bin, Biobuf *bout, Rune (*fn)(Rune))
{
	int c;
	
	while((c = Bgetrune(bin)) != -1)
		Bputrune(bout, fn(c));
}

void
main(int argc, char **argv)
{
	int i;
	Biobuf *b, bin, bout;
	Rune (*fn)(Rune);
	
	fn = toupperrune;
	ARGBEGIN{
	case 'l':
		fn = tolowerrune;
		break;
	case 't':
		fn = totitlerune;
		break;
	case 'u':
		fn = toupperrune;
		break;
	default:
		usage();
	}ARGEND
	
	Binit(&bout, 1, OWRITE);
	if(argc == 0){
		Binit(&bin, 0, OREAD);
		convert(&bin, &bout, fn);
	}else{
		for(i=0; i<argc; i++){
			if((b = Bopen(argv[i], OREAD)) == nil)
				sysfatal("open %s: %r", argv[i]);
			convert(b, &bout, fn);
		}
	}
	Bterm(&bout);
	exits(nil);
}


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] tcs bug
  2005-08-31 10:48     ` arisawa
@ 2005-08-31 11:22       ` arisawa
  0 siblings, 0 replies; 23+ messages in thread
From: arisawa @ 2005-08-31 11:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


>         for(s = e - 2; s < e; s++){
>                 if((*s & 0xc0) == 0x80)
>                         continue;
>                 if((*s & 0xc0) == 0xc0)
>                         break;
>         }
>

this is redundant
replace by

         for(s = e - 2; s < e; s++)
                 if((*s & 0xc0) == 0xc0)
                         break;

Kenji Arisawa




^ permalink raw reply	[flat|nested] 23+ messages in thread

* [9fans] tcs bug.
@ 2005-08-31 10:51 quanstro
  2005-08-31 21:36 ` Russ Cox
  0 siblings, 1 reply; 23+ messages in thread
From: quanstro @ 2005-08-31 10:51 UTC (permalink / raw)
  To: 9fans

i just had a similar problem a day or two ago.

i needed to change some capitalization and the 
tr 'A-Z' 'a-z' idiom doesn't work on random utf.

i solved it a bit differently -- lifting the fullrune()
check into the main loop. so i don't have a readu() 
function. also (unlike tcs) at the cost of 1 extra check 
at the end-of-input, the output buffer is dumped only 
when full. on japanese, greek or other text with 
>1 byte/char, this will save calls to OUT() --
or in my case print().

okay, total overkill. i know. but it was more interesting
to do that way. 

here's upper.c. convert to upper/lower/title case:



#include <u.h>
#include <libc.h>

enum { BLOCK = 1024*4 };

typedef Rune (*Rconv)(Rune);

void output(Rune* r, int nrunes, Rconv R){
	int i;

	for(i=0; i<nrunes; i++){
		r[i] = R(r[i]);
	}
	print("%.*S", nrunes, r);
}

const char* casify(int fd, Rconv R){
	char in[BLOCK + UTFmax];
	Rune r[BLOCK + UTFmax];
	long rem_len;
	long blen;
	long j;
	long i;

	rem_len=0;
	j = 0;
again:	while (0 < (blen = read(fd, in + rem_len, BLOCK))){
		blen += rem_len;

		for(i=0; i<blen; ){
			if (!fullrune(in + i, blen - i)){
				rem_len = blen - i;
				memcpy(in, in + i, rem_len);
				goto again;
			}
			i += chartorune(r + j++, in + i);
			if (j > BLOCK){
				output(r, j, R);
				j=0;
			}
		}
	}

	if (rem_len){
		// non unicode garbage.
		fprint(2, "non-utf8 garbage %.*s at eof\n", rem_len, in);
	}

	if (j){
		output(r, j, R);
	}

	if (blen>0){
		return 0;
	}
	return "read";
}

void main(int argc, /* pfft const */ char** argv){
	Rconv R;
	const char* v;
	const char* status;
	const char* s;
	int fd;

	v = strrchr(argv[0], '/');
	if (v){
		v++;
	} else {
		v = argv[0];
	}
	
	if (0 == strcmp(v, "tolower")){
		R = tolowerrune;
	} else if (0 == strcmp(v, "totitle")){
		R = totitlerune;
	} else {
		R = toupperrune;
	}

	ARGBEGIN{
	case 'u':
		R = toupperrune;
		break;
	case 'l':
		R = tolowerrune;
		break;
	case 't':
		R = totitlerune;
		break;
	default:
		fprint(2, "%s: bad option %c\n", argv0, ARGC());
		fprint(2, "usage: %s -[ult]\n", argv0);
		exits("usage");
	} ARGEND

	if (!*argv){
		s = casify(0, R);
	} else {
		for(status = 0; *argv; argv++){
			fd = open(*argv, OREAD);
			if (-1 == fd){
				if (s && !status){
					status = "open";
				}
				continue;
			}
			s = casify(fd, R);
			if (s && !status){
				status = s;
			}
			close(fd);
		}
	}

	exits(status ? status : "");
}





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] tcs bug
  2005-08-31  9:17   ` Rob Pike
@ 2005-08-31 10:48     ` arisawa
  2005-08-31 11:22       ` arisawa
  0 siblings, 1 reply; 23+ messages in thread
From: arisawa @ 2005-08-31 10:48 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


> one problem with this fix is that it assumes valid utf-8 input.
> you're better off using fullrune.
>

more simple and robust solution
that follows forsyth's suggestion


/* read until utf boundary */
int
readu(int fd, char *buf, int n)
{
         static char b[3];
         static int nb;
         int m;
         char *s, *e;
         if(nb)
                 memcpy(buf, b, nb);
         m = read(fd, buf + nb, n - nb);

         /*
         01.   x in [00000000.0bbbbbbb] → 0bbbbbbb
         10.   x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
         11.   x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 
10bbbbbb
         */

         e = buf + m + nb;
         for(s = e - 2; s < e; s++){
                 if((*s & 0xc0) == 0x80)
                         continue;
                 if((*s & 0xc0) == 0xc0)
                         break;
         }

         /* we have e - s bytes in s     */
         nb = e - s;
         memcpy(b, s, nb);
         return s - buf;
}

Kenji Arisawa



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] tcs bug
  2005-08-31  9:11 ` arisawa
@ 2005-08-31  9:17   ` Rob Pike
  2005-08-31 10:48     ` arisawa
  0 siblings, 1 reply; 23+ messages in thread
From: Rob Pike @ 2005-08-31  9:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

one problem with this fix is that it assumes valid utf-8 input.
you're better off using fullrune.

-rob

On 8/31/05, arisawa@ar.aichi-u.ac.jp <arisawa@ar.aichi-u.ac.jp> wrote:
> The bellow is a first-aid bug fix
> 
> we define read function for utf-8
> 
> /* read until utf boundary */
> int
> readu(int fd, char *buf, int n)
> {
>      static char b[3];
>      static int nb;
>      int m;
>      char *s, *e;
>      if(nb)
>          memcpy(buf, b, nb);
>      m = read(fd, buf + nb, n - nb);
> 
>      /*
>      01.   x in [00000000.0bbbbbbb] → 0bbbbbbb
>      10.   x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
>      11.   x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb,10bbbbbb
>      */
> 
>      e = buf + m + nb;
>      for(s = buf; s < e; s++){
>          if((*s & 0x80) == 0)
>              continue;
>          if((*s & 0xe0) == 0xd0){
>              s++;
>              continue;
>          }
>          /* then *s is 111bbbbb */
>          if(s+2 >= e)
>              break;
>          s += 2;
>          continue;
>      }
>      /* we have e - s bytes in s    */
>      nb = e - s;
>      memcpy(b, s, nb);
>      return s - buf;
> }
> 
> and replace 'read' by 'readu' in utf.c
> 
> utf_in(int fd, long *notused, struct convert *out)
> {
> 
>      ...
>      while((n = readu(fd, buf+tot, N-tot)) >= 0){
>          ...
> }
> 
> Kenji Arisawa
> 
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [9fans] tcs bug
  2005-08-31  6:07 [9fans] tcs bug arisawa
@ 2005-08-31  9:11 ` arisawa
  2005-08-31  9:17   ` Rob Pike
  0 siblings, 1 reply; 23+ messages in thread
From: arisawa @ 2005-08-31  9:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

The bellow is a first-aid bug fix

we define read function for utf-8

/* read until utf boundary */
int
readu(int fd, char *buf, int n)
{
     static char b[3];
     static int nb;
     int m;
     char *s, *e;
     if(nb)
         memcpy(buf, b, nb);
     m = read(fd, buf + nb, n - nb);

     /*
     01.   x in [00000000.0bbbbbbb] → 0bbbbbbb
     10.   x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
     11.   x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb,10bbbbbb
     */

     e = buf + m + nb;
     for(s = buf; s < e; s++){
         if((*s & 0x80) == 0)
             continue;
         if((*s & 0xe0) == 0xd0){
             s++;
             continue;
         }
         /* then *s is 111bbbbb */
         if(s+2 >= e)
             break;
         s += 2;
         continue;
     }
     /* we have e - s bytes in s    */
     nb = e - s;
     memcpy(b, s, nb);
     return s - buf;
}

and replace 'read' by 'readu' in utf.c

utf_in(int fd, long *notused, struct convert *out)
{

     ...
     while((n = readu(fd, buf+tot, N-tot)) >= 0){
         ...
}

Kenji Arisawa



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [9fans] tcs bug
@ 2005-08-31  6:07 arisawa
  2005-08-31  9:11 ` arisawa
  0 siblings, 1 reply; 23+ messages in thread
From: arisawa @ 2005-08-31  6:07 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Sorry I should have sent previous mail using uft-8 code.
The following is same as previous one except character code.

Hello,

tcs both for plan 9 and for unix has a bug in reading utf text.
that comes from:
utf_in(int fd, long *notused, struct convert *out){
     char buf[N];
     ...
     while((n = read(fd, buf+tot, N-tot)) >= 0){
         ...
}

in utf.c

N is assigned to be 10000 in hdr.h

if you set N to 10, you will find the problem more clearly:
tcs cannot handle correctly utf character boundary.

for example, assume a.txt have the content:
aaaaaaaこの

term% xd -c a.txt
0000000   a  a  a  a  a  a  a e3 81 93 e3 81 ae \n
000000e

tcs can handle this text because N=10 is just uft boundary
but tcs fails if 'a' are 6 or 8 ...

tcs is very important for me.
Who maintains tcs ?
I might help debugging.

Kenji Arisawa

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2005-10-17 15:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-29 23:23 [9fans] some Plan9 related ideas Bhanu Nagendra Pisupati
2005-08-30 12:27 ` Sape Mullender
2005-08-30 15:21   ` Francisco Ballesteros
2005-08-30 15:25     ` Francisco Ballesteros
2005-08-30 17:07 ` [9fans] " Dave Eckhardt
2005-08-30 17:33   ` Francisco Ballesteros
2005-08-30 17:46     ` Russ Cox
2005-08-31  5:54       ` [9fans] tcs bug arisawa
2005-08-31  5:57         ` Rob Pike
2005-10-17  7:14       ` [9fans] Re: some Plan9 related ideas Uriel
2005-10-17 11:23         ` Russ Cox
2005-10-17 12:45           ` Uriel
2005-10-17 13:03             ` Russ Cox
2005-10-17 13:22               ` Uriel
2005-10-17 15:14                 ` Russ Cox
2005-08-31  6:07 [9fans] tcs bug arisawa
2005-08-31  9:11 ` arisawa
2005-08-31  9:17   ` Rob Pike
2005-08-31 10:48     ` arisawa
2005-08-31 11:22       ` arisawa
2005-08-31 10:51 quanstro
2005-08-31 21:36 ` Russ Cox
2005-09-01  0:36 quanstro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).