caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Select on channels (again)
@ 2006-08-15  0:46 Nathaniel Gray
  2006-08-21 22:47 ` Nathaniel Gray
  0 siblings, 1 reply; 21+ messages in thread
From: Nathaniel Gray @ 2006-08-15  0:46 UTC (permalink / raw)
  To: Caml Mailing List

Like others before me, I'm hitting the frustrating limitation that
it's impossible to select on a channel.  This limitation forces one to
reimplement buffered IO in many situations where using channels would
otherwise be easy and sensible.  There has been discussion on this
list in the past about this limitation[1], and there have been two
requests filed in mantis for this feature[2,3], but the ocaml dev team
has been silent on the matter.  Is this feature objectionable to the
dev team or is it just a case of "too much work, not enough time?"  If
somebody was to submit a quality patch implementing the feature would
it be accepted?

Cheers,
-n8

[1] <http://caml.inria.fr/pub/ml-archives/caml-list/2005/03/8aae2e3c54cfb976fe52664ab1c84994.en.html>
[2] <http://caml.inria.fr/mantis/view.php?id=3075>
[3] <http://caml.inria.fr/mantis/view.php?id=3579>

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Select on channels (again)
  2006-08-15  0:46 Select on channels (again) Nathaniel Gray
@ 2006-08-21 22:47 ` Nathaniel Gray
  2006-08-22  0:42   ` [Caml-list] " Jonathan Roewen
  0 siblings, 1 reply; 21+ messages in thread
From: Nathaniel Gray @ 2006-08-21 22:47 UTC (permalink / raw)
  To: Caml Mailing List

Tap... tap...

Hello?  Is this thing on?

On 8/14/06, Nathaniel Gray <n8gray@gmail.com> wrote:
> Like others before me, I'm hitting the frustrating limitation that
> it's impossible to select on a channel.  This limitation forces one to
> reimplement buffered IO in many situations where using channels would
> otherwise be easy and sensible.  There has been discussion on this
> list in the past about this limitation[1], and there have been two
> requests filed in mantis for this feature[2,3], but the ocaml dev team
> has been silent on the matter.  Is this feature objectionable to the
> dev team or is it just a case of "too much work, not enough time?"  If
> somebody was to submit a quality patch implementing the feature would
> it be accepted?
>
> Cheers,
> -n8
>
> [1] <http://caml.inria.fr/pub/ml-archives/caml-list/2005/03/8aae2e3c54cfb976fe52664ab1c84994.en.html>
> [2] <http://caml.inria.fr/mantis/view.php?id=3075>
> [3] <http://caml.inria.fr/mantis/view.php?id=3579>
>
> --
> >>>-- Nathaniel Gray -- Caltech Computer Science ------>
> >>>-- Mojave Project -- http://mojave.cs.caltech.edu -->
>


-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-21 22:47 ` Nathaniel Gray
@ 2006-08-22  0:42   ` Jonathan Roewen
  2006-08-22  6:27     ` Nathaniel Gray
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Roewen @ 2006-08-22  0:42 UTC (permalink / raw)
  To: Nathaniel Gray; +Cc: Caml Mailing List

Why can't you just use the unix file opening functions since you're
using unix select? And if you need the ocaml in/out channels, convert
the unix file descriptors to ocaml ones instead of the other way
around. Seems simple enough to me.

On 8/22/06, Nathaniel Gray <n8gray@gmail.com> wrote:
> Tap... tap...
>
> Hello?  Is this thing on?
>
> On 8/14/06, Nathaniel Gray <n8gray@gmail.com> wrote:
> > Like others before me, I'm hitting the frustrating limitation that
> > it's impossible to select on a channel.  This limitation forces one to
> > reimplement buffered IO in many situations where using channels would
> > otherwise be easy and sensible.  There has been discussion on this
> > list in the past about this limitation[1], and there have been two
> > requests filed in mantis for this feature[2,3], but the ocaml dev team
> > has been silent on the matter.  Is this feature objectionable to the
> > dev team or is it just a case of "too much work, not enough time?"  If
> > somebody was to submit a quality patch implementing the feature would
> > it be accepted?
> >
> > Cheers,
> > -n8
> >
> > [1] <http://caml.inria.fr/pub/ml-archives/caml-list/2005/03/8aae2e3c54cfb976fe52664ab1c84994.en.html>
> > [2] <http://caml.inria.fr/mantis/view.php?id=3075>
> > [3] <http://caml.inria.fr/mantis/view.php?id=3579>
> >
> > --
> > >>>-- Nathaniel Gray -- Caltech Computer Science ------>
> > >>>-- Mojave Project -- http://mojave.cs.caltech.edu -->
> >
>
>
> --
> >>>-- Nathaniel Gray -- Caltech Computer Science ------>
> >>>-- Mojave Project -- http://mojave.cs.caltech.edu -->
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-22  0:42   ` [Caml-list] " Jonathan Roewen
@ 2006-08-22  6:27     ` Nathaniel Gray
  2006-08-22  6:41       ` Jonathan Roewen
                         ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Nathaniel Gray @ 2006-08-22  6:27 UTC (permalink / raw)
  To: Jonathan Roewen; +Cc: Caml Mailing List

On 8/21/06, Jonathan Roewen <jonathan.roewen@gmail.com> wrote:
> Why can't you just use the unix file opening functions since you're
> using unix select? And if you need the ocaml in/out channels, convert
> the unix file descriptors to ocaml ones instead of the other way
> around. Seems simple enough to me.

It sounds simple but doesn't work.  If select tells you a file
descriptor doesn't have data waiting you can't be sure there isn't
still data in the corresponding channel's buffer.  See the thread that
I referenced for a good discussion of why this is annoying.  For one
thing, it makes it impossible to use Marshal.from_channel without
potentially blocking.

Cheers,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-22  6:27     ` Nathaniel Gray
@ 2006-08-22  6:41       ` Jonathan Roewen
  2006-08-22  8:15         ` skaller
  2006-08-23  5:12         ` Nathaniel Gray
  2006-08-22  8:10       ` Olivier Andrieu
  2006-08-22  8:21       ` Jacques Garrigue
  2 siblings, 2 replies; 21+ messages in thread
From: Jonathan Roewen @ 2006-08-22  6:41 UTC (permalink / raw)
  To: Nathaniel Gray; +Cc: Caml Mailing List

> It sounds simple but doesn't work.  If select tells you a file
> descriptor doesn't have data waiting you can't be sure there isn't
> still data in the corresponding channel's buffer.  See the thread that
> I referenced for a good discussion of why this is annoying.  For one
> thing, it makes it impossible to use Marshal.from_channel without
> potentially blocking.

Either one of us is misunderstanding the other....

Instead of using Pervasives.open_xxx, use Unix.openfile which returns
Unix.file_descr, and also doesn't use internal ocaml buffering.

Then, presumably, Unix.select would do what you expect, and then you
can use Unix.in_channel_of_descr to get an ocaml in_channel to read
from.

And if I'm misunderstanding you, then perhaps the problem isn't really
Unix.select...

Jonathan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-22  6:27     ` Nathaniel Gray
  2006-08-22  6:41       ` Jonathan Roewen
@ 2006-08-22  8:10       ` Olivier Andrieu
  2006-08-23  5:27         ` Nathaniel Gray
  2006-08-22  8:21       ` Jacques Garrigue
  2 siblings, 1 reply; 21+ messages in thread
From: Olivier Andrieu @ 2006-08-22  8:10 UTC (permalink / raw)
  To: Nathaniel Gray; +Cc: Caml Mailing List

 Nathaniel Gray [Monday 21 August 2006] :
 >
 > On 8/21/06, Jonathan Roewen <jonathan.roewen@gmail.com> wrote:
 > > Why can't you just use the unix file opening functions since you're
 > > using unix select? And if you need the ocaml in/out channels, convert
 > > the unix file descriptors to ocaml ones instead of the other way
 > > around. Seems simple enough to me.
 > 
 > It sounds simple but doesn't work.  If select tells you a file
 > descriptor doesn't have data waiting you can't be sure there isn't
 > still data in the corresponding channel's buffer.  See the thread that
 > I referenced for a good discussion of why this is annoying.  For one
 > thing, it makes it impossible to use Marshal.from_channel without
 > potentially blocking.

Indeed, Marshal.from_channel would block, but it's not the only way to
read a marshalled value: cf. Marshal.header_size and
Marshal.data_size. 

With these, you can read your marshalled value from file_descr into a
buffer in a non-blocking, select-compatible way and then use
Marshal.from_string. 

-- 
   Olivier


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-22  6:41       ` Jonathan Roewen
@ 2006-08-22  8:15         ` skaller
  2006-08-22 21:15           ` Mike Lin
  2006-08-23  5:12         ` Nathaniel Gray
  1 sibling, 1 reply; 21+ messages in thread
From: skaller @ 2006-08-22  8:15 UTC (permalink / raw)
  To: Jonathan Roewen; +Cc: Nathaniel Gray, Caml Mailing List

On Tue, 2006-08-22 at 18:41 +1200, Jonathan Roewen wrote:
> > It sounds simple but doesn't work.  If select tells you a file
> > descriptor doesn't have data waiting you can't be sure there isn't
> > still data in the corresponding channel's buffer.  See the thread that
> > I referenced for a good discussion of why this is annoying.  For one
> > thing, it makes it impossible to use Marshal.from_channel without
> > potentially blocking.
> 
> Either one of us is misunderstanding the other....

You missed the first half of the discussion:

> Instead of using Pervasives.open_xxx, use Unix.openfile which returns
> Unix.file_descr, and also doesn't use internal ocaml buffering.
> 
> Then, presumably, Unix.select would do what you expect, and then you
> can use Unix.in_channel_of_descr to get an ocaml in_channel to read
> from.
> 
> And if I'm misunderstanding you, then perhaps the problem isn't really
> Unix.select...

The problem is that this defeats the use of all the formatting
and buffering functions that work on buffered I/O channels.

What's required is something that tells:

(a) there is some data in the buffer OR
(b) there is some data on the descriptor

so that in either case some progress can be made.

Unfortunately .. there's a reason this makes no sense:

For raw byte streams .. you can just use the file descriptors
already with select.

Otherwise, there's no way to predict if an input will block,
whether or not there is data in the buffer, and whether
or not the file descriptor is ready, because the input
operation can read some data THEN block.

The same argument applies to output.

Therefore .. there is no choice but to replace all the
buffering anyhow, and in general the whole programming
paradigm needs to be replaced.

Felix demux system already does this I think, for
both read/write n bytes, and for read/write a line.
More difficult cases should be handled by in-core
formatting eg:

	print_string (string_of_int i)

is correct and

	print_int i

is wrong. The former cannot block on formatting, the latter can.
(assuming nonblocking line I/O is available).

You're stuck between a rock and hard place here :)

The read/write functions of a system are designed to
provide control inversion: data coming in or going out
is naturally interrupt (callback) driven, but it is
inconvenient to program with callbacks (I would say
it more strongly -- it is *untenable* to use callbacks).

Therefore the scheduler provides blocking I/O, and switches
out programmatic demands for I/O, effecting control inversion.

You can try to work around this with non-blocking I/O,
but it is really a hack because doing so is tantamount
to writing your own scheduler to provide control inversion,
in other words, inventing your own operating system.
It is even worse if you use event notifications to avoid
polling (I mean, it is even more complex).

In general the only really sound solution is indeed to 
provide a full scale operating system abstraction layer,
which requires the underlying programming language computational
model be designed to work with it.

Several systems can work this way: Felix and Haskell both
have continuations, which seem to be the pre-requisite.
MLton may also cope with this.

The Ocaml computational model doesn't provide the required
resources natively, although of course they could be implemented
in Ocaml .. but then you would be programming with, for example,
suitable monadic combinators, rather than arbitrary raw Ocaml code.

Just so it is clear: given two sockets, you want to read integers
off them. You can do this with two threads, both of which block.
Or you can block, and invoke a callback when one conversion
finally completes.

The two techniques are control inverse. The only difference
is that the thread model uses OS control inversion and the
callback model uses hand written control inversion.

BOTH techniques suck. The only way to do this properly is the
way Felix does it: you write threaded code, but language
control inverts it into callback driven code systematically,
and provides its own OS abstraction layer: this gives you
the responsiveness and performance of user space callback
driven code, but the illusion of using threads.

You will note this is not a magical silver bullet: it
only works because the user code handles more specialised
cases than a general purpose OS can handle well: if one tried
to do this with full generality you'd just end up with yet
another low performance operating system. IMHO the key here
is that application specific information .. perhaps embodied
in the type system .. can be used by the user program and
language translator, but not the underlying OS.

Just to see, in Felix you'd do it something like:

	var ich = mk_schannel[int]();

	spawn_sthread { 
		forever { 
 			var x : int;
			read_int (sock1, &x);
			write ich, x;
		}
	}

	spawn_sthread { 
		forever { 
 			var x : int;
			read_int (sock2, &x);
			write ich, x;
		}
	}

	forever {
		var x:int;
		read (ich, &x);
		print x; endl;
	}

The two 'threads' spawned here are NOT pre-emptive threads.
They're actually continuations, which are resumed by
the underlying demux library notification mechanism
starting them up again based on epoll/poll/kqueue/select etc.
The interaction along the channel 'ich' is entirely synchronous.

Ocaml can do this now using Event module .. but it only works
across pthread boundaries.

Strangely .. the Ocaml VM system does this stuff for the
bytecode interpreter already, interleaving bytecode to
emulate threads, and forwarding blocking operations
so the emulated threads block .. but the actual pre-emptive
thread (process) does not.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-22  6:27     ` Nathaniel Gray
  2006-08-22  6:41       ` Jonathan Roewen
  2006-08-22  8:10       ` Olivier Andrieu
@ 2006-08-22  8:21       ` Jacques Garrigue
  2006-08-23  5:16         ` Nathaniel Gray
  2 siblings, 1 reply; 21+ messages in thread
From: Jacques Garrigue @ 2006-08-22  8:21 UTC (permalink / raw)
  To: n8gray; +Cc: caml-list

From: "Nathaniel Gray" <n8gray@gmail.com>
> On 8/21/06, Jonathan Roewen <jonathan.roewen@gmail.com> wrote:
> > Why can't you just use the unix file opening functions since you're
> > using unix select? And if you need the ocaml in/out channels, convert
> > the unix file descriptors to ocaml ones instead of the other way
> > around. Seems simple enough to me.
> 
> It sounds simple but doesn't work.  If select tells you a file
> descriptor doesn't have data waiting you can't be sure there isn't
> still data in the corresponding channel's buffer.  See the thread that
> I referenced for a good discussion of why this is annoying.  For one
> thing, it makes it impossible to use Marshal.from_channel without
> potentially blocking.

(I didn't follow the discussion, so I may misunderstand...)

The problem with Marshal.from_channel seems independent of channel
buffering. The point is that Marshal.from_channel cannot work in
non-blocking mode, as it doesn't know in advance how many bytes it
will need to obtain a well-formed value. The only way I see is to do
the buffering yourself, and extract the data using Marshal.from_string
and Marshal.total_size.

Jacques Garrigue


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-22  8:15         ` skaller
@ 2006-08-22 21:15           ` Mike Lin
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Lin @ 2006-08-22 21:15 UTC (permalink / raw)
  To: Caml Mailing List

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

>
> Several systems can work this way: Felix and Haskell both
> have continuations, which seem to be the pre-requisite.
> MLton may also cope with this.
>
> The Ocaml computational model doesn't provide the required
> resources natively, although of course they could be implemented
> in Ocaml .. but then you would be programming with, for example,
> suitable monadic combinators, rather than arbitrary raw Ocaml code.


It is not a completely impractical idea to just write everything in CPS. At
least, it's been done before :-)

[-- Attachment #2: Type: text/html, Size: 718 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-22  6:41       ` Jonathan Roewen
  2006-08-22  8:15         ` skaller
@ 2006-08-23  5:12         ` Nathaniel Gray
  1 sibling, 0 replies; 21+ messages in thread
From: Nathaniel Gray @ 2006-08-23  5:12 UTC (permalink / raw)
  To: Jonathan Roewen; +Cc: Caml Mailing List

On 8/21/06, Jonathan Roewen <jonathan.roewen@gmail.com> wrote:
> > It sounds simple but doesn't work.  If select tells you a file
> > descriptor doesn't have data waiting you can't be sure there isn't
> > still data in the corresponding channel's buffer.  See the thread that
> > I referenced for a good discussion of why this is annoying.  For one
> > thing, it makes it impossible to use Marshal.from_channel without
> > potentially blocking.
>
> Either one of us is misunderstanding the other....
>
> Instead of using Pervasives.open_xxx, use Unix.openfile which returns
> Unix.file_descr, and also doesn't use internal ocaml buffering.
>
> Then, presumably, Unix.select would do what you expect, and then you
> can use Unix.in_channel_of_descr to get an ocaml in_channel to read
> from.

... which is buffered.  All ocaml channels are buffered, regardless of
their origins.

Cheers,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-22  8:21       ` Jacques Garrigue
@ 2006-08-23  5:16         ` Nathaniel Gray
  2006-08-23  6:35           ` skaller
  0 siblings, 1 reply; 21+ messages in thread
From: Nathaniel Gray @ 2006-08-23  5:16 UTC (permalink / raw)
  To: Jacques Garrigue; +Cc: caml-list

On 8/22/06, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
> (I didn't follow the discussion, so I may misunderstand...)
>
> The problem with Marshal.from_channel seems independent of channel
> buffering. The point is that Marshal.from_channel cannot work in
> non-blocking mode, as it doesn't know in advance how many bytes it
> will need to obtain a well-formed value. The only way I see is to do
> the buffering yourself, and extract the data using Marshal.from_string
> and Marshal.total_size.

You're right -- *truly* non-blocking Marshal.from_channel requires
more than select on channels, but it's often good enough to know if
there's any data in the channel at all.  It's often acceptable to
block long enough for the *rest* of a partial message to arrive but
unacceptable to block indefinitely waiting for a new message to
arrive.

In any case, Marshal.from_channel isn't the only reason one might want
select on channels.  Is this something the dev team would be willing
to accept as a patch?  It's a pretty trivial thing to implement.

Cheers,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-22  8:10       ` Olivier Andrieu
@ 2006-08-23  5:27         ` Nathaniel Gray
  0 siblings, 0 replies; 21+ messages in thread
From: Nathaniel Gray @ 2006-08-23  5:27 UTC (permalink / raw)
  To: Olivier Andrieu; +Cc: Caml Mailing List

On 8/22/06, Olivier Andrieu <oandrieu@nerim.net> wrote:
>  Nathaniel Gray [Monday 21 August 2006] :
>  >
>  > On 8/21/06, Jonathan Roewen <jonathan.roewen@gmail.com> wrote:
>  > > Why can't you just use the unix file opening functions since you're
>  > > using unix select? And if you need the ocaml in/out channels, convert
>  > > the unix file descriptors to ocaml ones instead of the other way
>  > > around. Seems simple enough to me.
>  >
>  > It sounds simple but doesn't work.  If select tells you a file
>  > descriptor doesn't have data waiting you can't be sure there isn't
>  > still data in the corresponding channel's buffer.  See the thread that
>  > I referenced for a good discussion of why this is annoying.  For one
>  > thing, it makes it impossible to use Marshal.from_channel without
>  > potentially blocking.
>
> Indeed, Marshal.from_channel would block, but it's not the only way to
> read a marshalled value: cf. Marshal.header_size and
> Marshal.data_size.
>
> With these, you can read your marshalled value from file_descr into a
> buffer in a non-blocking, select-compatible way and then use
> Marshal.from_string.

Yes, as I said, it's possible to work around this limitation by
creating yet another implementation of buffered I/O.  My point is that
there's already a good buffered I/O implementation in ocaml that could
suit many (but not all) needs -- channels.  Adding channel_select
would make channels a lot more useful at very little expense.  Heck, I
would be satisfied with in/out_channel_is_ready, which would be even
easier!

Cheers,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-23  5:16         ` Nathaniel Gray
@ 2006-08-23  6:35           ` skaller
  2006-08-23 19:31             ` Nathaniel Gray
  0 siblings, 1 reply; 21+ messages in thread
From: skaller @ 2006-08-23  6:35 UTC (permalink / raw)
  To: Nathaniel Gray; +Cc: Jacques Garrigue, caml-list

On Tue, 2006-08-22 at 22:16 -0700, Nathaniel Gray wrote:

> You're right -- *truly* non-blocking Marshal.from_channel requires
> more than select on channels, but it's often good enough to know if
> there's any data in the channel at all.  It's often acceptable to
> block long enough for the *rest* of a partial message to arrive but
> unacceptable to block indefinitely waiting for a new message to
> arrive.
> 
> In any case, Marshal.from_channel isn't the only reason one might want
> select on channels.  Is this something the dev team would be willing
> to accept as a patch?  It's a pretty trivial thing to implement.

The problem would be the semantics you indicate above.
You're going to get non-blocking behaviour when the
channels are quiet, but once there is some data,
you get blocking. 

It sounds hard to reason about the
properties of a system with these semantics:
you can already get the fd of a channel and select
on it, and have a similar problem (fd is quiet,
the buffer has data).

An alternative may be a pair of functions:

	bytes_of_in_channel: in_channel -> int
	bytes_of_out_channel: out_channel -> int

which tell how much data remains in the buffers,
you could use that in conjunction with Unix.select.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-23  6:35           ` skaller
@ 2006-08-23 19:31             ` Nathaniel Gray
  2006-08-24  5:37               ` skaller
  0 siblings, 1 reply; 21+ messages in thread
From: Nathaniel Gray @ 2006-08-23 19:31 UTC (permalink / raw)
  To: skaller; +Cc: Jacques Garrigue, caml-list

On 8/22/06, skaller <skaller@users.sourceforge.net> wrote:
> On Tue, 2006-08-22 at 22:16 -0700, Nathaniel Gray wrote:
>
> > You're right -- *truly* non-blocking Marshal.from_channel requires
> > more than select on channels, but it's often good enough to know if
> > there's any data in the channel at all.  It's often acceptable to
> > block long enough for the *rest* of a partial message to arrive but
> > unacceptable to block indefinitely waiting for a new message to
> > arrive.
> >
> > In any case, Marshal.from_channel isn't the only reason one might want
> > select on channels.  Is this something the dev team would be willing
> > to accept as a patch?  It's a pretty trivial thing to implement.
>
> The problem would be the semantics you indicate above.
> You're going to get non-blocking behaviour when the
> channels are quiet, but once there is some data,
> you get blocking.

I don't follow you.  Are you talking about input or output channels?
Can you give an example?

I suppose that on output channels you might not want to try to write
to the buffer if the fd is blocked, but you can always use plain old
Unix.select to check for that condition.

> It sounds hard to reason about the
> properties of a system with these semantics:
> you can already get the fd of a channel and select
> on it, and have a similar problem (fd is quiet,
> the buffer has data).

The semantics would be the same as the normal select.  If an
in_channel is ready it means at least one byte can be read without
blocking.  If an out_channel is ready then at least one byte can be
written without blocking.

> An alternative may be a pair of functions:
>
>         bytes_of_in_channel: in_channel -> int
>         bytes_of_out_channel: out_channel -> int
>
> which tell how much data remains in the buffers,
> you could use that in conjunction with Unix.select.

That would be sufficient, but it would end up being a bit deceptive --
both sizes would be limited by the buffer sizes involved.

There are lots of ways to skin this particular cat.  I chose
channel_select because it reveals very little about implementation
details like buffer sizes, which seems in keeping with the rest of the
channel API.  Another option would be {in,out}_channel_ready, upon
which one could easily build channel_select.

Cheers,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-23 19:31             ` Nathaniel Gray
@ 2006-08-24  5:37               ` skaller
  2006-08-24 19:06                 ` Nathaniel Gray
  0 siblings, 1 reply; 21+ messages in thread
From: skaller @ 2006-08-24  5:37 UTC (permalink / raw)
  To: Nathaniel Gray; +Cc: Jacques Garrigue, caml-list

On Wed, 2006-08-23 at 12:31 -0700, Nathaniel Gray wrote:
> On 8/22/06, skaller <skaller@users.sourceforge.net> wrote:

> > The problem would be the semantics you indicate above.
> > You're going to get non-blocking behaviour when the
> > channels are quiet, but once there is some data,
> > you get blocking.
> 
> I don't follow you.  Are you talking about input or output channels?
> Can you give an example?

Sure .. two sockets, with integers coming down them encoded as
characters .. perhaps a CSV file.

The sockets are quiet. Then some data comes down one,
and you go off and try to read an integer .. but half way
through you run out of data and block.

Now a whole stream of integers comes down the OTHER socket.
Bad luck, your thread is blocked waiting on data from the
first socket.

Select or not .. buffering or not: you might as well have just 
picked a socket and done a blocking read on it.

Unless you can *guarantee* some invariant is preserved,
you've achieved nothing.

The correct way to use select is in conjunction
with a polling loop that does non-blocking I/O. 

Polling all your channels without blocking is what you want,
that way you read or write data efficiently.

But such a polling loop wastes cycles and isn't very responsive,
so you use select() to skip over polling channels that cannot 
be ready, if that happens to be all of them, you might as well
block until the situation changes, if some channels are
ready you can go off and try I/O on them, and not bother
with the non-ready channels.

The key thing here is that select is an optimisation:
the IMPORTANT point is that the polling must use
non-blocking I/O.

Note: just because select() indicates an fd is ready
doesn't mean it is. There's no assurance even one byte
can be read or written.

Select() is just like a condition variable -- you have
to assume spurious wakeups. Both provide a way to know
when and what to check which avoids excessive polling.
Select, like condition variables, are just hints to
improve response time and resource utilisation:
they have no semantics. (More precisely the delivery
of a signal MUST wake up the client but the converse
isn't required).

The important semantics are in YOUR code: the condition 
check must not block, similarly, I/O must not block.

So basically doing blocking I/O in conjunction
with select makes no sense.

IF you had some known semantics such as that
messages were of bounded length and transmitted
in bounded time .. select would be useful, since
blocking operations would block only for bounded time,
in which case they're not blocking at all.

So you are right, a channel based select COULD be useful,
given such a constraint, however it doesn't break the
rule above, since blocking for a bounded time is 
semantically a non-blocking operation: an operation
is only blocking if it can block for an unbounded time.

For this reason most programmers consider all disk I/O
as non-blocking .. even though the OS kernel may
not agree with this interpretation :)


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-24  5:37               ` skaller
@ 2006-08-24 19:06                 ` Nathaniel Gray
  2006-08-25  1:55                   ` skaller
  0 siblings, 1 reply; 21+ messages in thread
From: Nathaniel Gray @ 2006-08-24 19:06 UTC (permalink / raw)
  To: skaller; +Cc: Jacques Garrigue, caml-list

On 8/23/06, skaller <skaller@users.sourceforge.net> wrote:
>
> IF you had some known semantics such as that
> messages were of bounded length and transmitted
> in bounded time .. select would be useful, since
> blocking operations would block only for bounded time,
> in which case they're not blocking at all.

That's exactly what I'm talking about.  There are plenty of cases
where such assumptions are justified, including IPC on a single
machine or communication on a small trusted network.  Select on
channels is no more or less useful than regular select, other than the
fact that it makes it possible to use OCaml functions that only
operate on channels.

Cheers,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-24 19:06                 ` Nathaniel Gray
@ 2006-08-25  1:55                   ` skaller
  2006-08-25 22:19                     ` Nathaniel Gray
  0 siblings, 1 reply; 21+ messages in thread
From: skaller @ 2006-08-25  1:55 UTC (permalink / raw)
  To: Nathaniel Gray; +Cc: Jacques Garrigue, caml-list

On Thu, 2006-08-24 at 12:06 -0700, Nathaniel Gray wrote:
> On 8/23/06, skaller <skaller@users.sourceforge.net> wrote:
> >
> > IF you had some known semantics such as that
> > messages were of bounded length and transmitted
> > in bounded time .. select would be useful, since
> > blocking operations would block only for bounded time,
> > in which case they're not blocking at all.
> 
> That's exactly what I'm talking about.  There are plenty of cases
> where such assumptions are justified, including IPC on a single
> machine or communication on a small trusted network.  Select on
> channels is no more or less useful than regular select, other than the
> fact that it makes it possible to use OCaml functions that only
> operate on channels.

This is true, and in C it is the same:

	printf(...)
	sscanf(... )

are both going to block, possibly for a bounded time
if they're only called when you know the underlying
fd has, at least temporarily, enough throughput to allow
the formatting algorithms to terminate.

The thing is, you can already do this, guaranteed!
It is rather heavy though: you make two pthreads
and read one fd in each, then send the
results down a single Event channel: the user code
can then read the parsed objects in sequence,
blocking until the first is available, then blocking
until the second is available.

The point is that although this is heavy, it is guaranteed
to work, so using Event module with pthreads is actually
easier to reason about.

If you would like to do this without the cost of spawning
pthreads .. then you need to use Felix, not Ocaml :)

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-25  1:55                   ` skaller
@ 2006-08-25 22:19                     ` Nathaniel Gray
  0 siblings, 0 replies; 21+ messages in thread
From: Nathaniel Gray @ 2006-08-25 22:19 UTC (permalink / raw)
  To: skaller; +Cc: Jacques Garrigue, caml-list

On 8/24/06, skaller <skaller@users.sourceforge.net> wrote:
>
> The thing is, you can already do this, guaranteed!
> It is rather heavy though: you make two pthreads
> and read one fd in each, then send the
> results down a single Event channel: the user code
> can then read the parsed objects in sequence,
> blocking until the first is available, then blocking
> until the second is available.
>
> The point is that although this is heavy, it is guaranteed
> to work, so using Event module with pthreads is actually
> easier to reason about.

I understand this perfectly, but I don't think it obviates the need
for select on channels.  The programmer should be free to choose the
approach that suits him best, not have it forced upon him by
limitations in the standard library.

> If you would like to do this without the cost of spawning
> pthreads .. then you need to use Felix, not Ocaml :)

Unfortunately that's not an option for me, but maybe in the future...

Cheers,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-23  8:29 Christoph Bauer
  2006-08-23 17:35 ` Robert Roessler
@ 2006-08-24  8:18 ` Robert Roessler
  1 sibling, 0 replies; 21+ messages in thread
From: Robert Roessler @ 2006-08-24  8:18 UTC (permalink / raw)
  To: Caml-list

Sigh - this was first sent 15 hours ago, and a single "Mail Delivery 
Subsystem" notice about a non-fatal error came back at the 4-hour 
mark... but as it has never shown up, here it is again.
--------
Christoph Bauer wrote:
>>> ...
>>> I did this, but on windows with two programs communicating 
>> over a pipe 
>>> this isn't enough. select on windows and on a pipe doesn't work. 
>>> Therefore I wrote a stub for PeekNamedPipe():
>> "Select on windows" certainly does work... and why not use a 
>> socket pair, just as one might on a *nix system?  That way, 
>> it will work on both.
> 
> Select doesn't work on window pipes. In retrospect sockects
> would be the better choice. I stumbled in a strange dead lock
> with theses pipes, because under windows the pipe buffer
> is set to 1024 bytes (otherlibs/win32unix/pipe.c) and will then
> block til the reader reads the contents. IMO this
> value (SIZEBUF) should be zero to let the system choose the best
> buffer size. [1] 
> 
>> And the fact that socketpair has been left out of the Windows 
>> version of the Unix module is not an impediment - it is easy 
>> to write a useful implementation in OCaml (I can supply one 
>> if needed).
> 
> Please supply one.

Here is what I use:

open Unix
module U2 = struct
   let socketpair af typ proto =
     let listener = socket af typ proto in
     let listener_addr = ADDR_INET(inet_addr_loopback, 0) in
     bind listener listener_addr;
     listen listener 1;
     let listener_name = getsockname listener in
     let connector = socket af typ proto in
     connect connector listener_name;
     let acceptor_full = accept listener in
     close listener;
     (* assert ((getsockname connector) = (snd acceptor_full)); *)
     (connector, (fst acceptor_full))
end

... and then select this socketpair or the one from the Unix module at
runtime.

Robert Roessler
roessler@rftp.com
http://www.rftp.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
  2006-08-23  8:29 Christoph Bauer
@ 2006-08-23 17:35 ` Robert Roessler
  2006-08-24  8:18 ` Robert Roessler
  1 sibling, 0 replies; 21+ messages in thread
From: Robert Roessler @ 2006-08-23 17:35 UTC (permalink / raw)
  To: Caml-list

Christoph Bauer wrote:
>>> ...
>>> I did this, but on windows with two programs communicating 
>> over a pipe 
>>> this isn't enough. select on windows and on a pipe doesn't work. 
>>> Therefore I wrote a stub for PeekNamedPipe():
>> "Select on windows" certainly does work... and why not use a 
>> socket pair, just as one might on a *nix system?  That way, 
>> it will work on both.
> 
> Select doesn't work on window pipes. In retrospect sockects
> would be the better choice. I stumbled in a strange dead lock
> with theses pipes, because under windows the pipe buffer
> is set to 1024 bytes (otherlibs/win32unix/pipe.c) and will then
> block til the reader reads the contents. IMO this
> value (SIZEBUF) should be zero to let the system choose the best
> buffer size. [1] 
> 
>> And the fact that socketpair has been left out of the Windows 
>> version of the Unix module is not an impediment - it is easy 
>> to write a useful implementation in OCaml (I can supply one 
>> if needed).
> 
> Please supply one.

Here is what I use:

open Unix
module U2 = struct
   let socketpair af typ proto =
     let listener = socket af typ proto in
     let listener_addr = ADDR_INET(inet_addr_loopback, 0) in
     bind listener listener_addr;
     listen listener 1;
     let listener_name = getsockname listener in
     let connector = socket af typ proto in
     connect connector listener_name;
     let acceptor_full = accept listener in
     close listener;
     (* assert ((getsockname connector) = (snd acceptor_full)); *)
     (connector, (fst acceptor_full))
end

... and then select this socketpair or the one from the Unix module at 
runtime.

Robert Roessler
roessler@rftp.com
http://www.rftp.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: Select on channels (again)
@ 2006-08-23  8:29 Christoph Bauer
  2006-08-23 17:35 ` Robert Roessler
  2006-08-24  8:18 ` Robert Roessler
  0 siblings, 2 replies; 21+ messages in thread
From: Christoph Bauer @ 2006-08-23  8:29 UTC (permalink / raw)
  To: Robert Roessler, Caml-list


> > ...
> > I did this, but on windows with two programs communicating 
> over a pipe 
> > this isn't enough. select on windows and on a pipe doesn't work. 
> > Therefore I wrote a stub for PeekNamedPipe():
> 
> "Select on windows" certainly does work... and why not use a 
> socket pair, just as one might on a *nix system?  That way, 
> it will work on both.

Select doesn't work on window pipes. In retrospect sockects
would be the better choice. I stumbled in a strange dead lock
with theses pipes, because under windows the pipe buffer
is set to 1024 bytes (otherlibs/win32unix/pipe.c) and will then
block til the reader reads the contents. IMO this
value (SIZEBUF) should be zero to let the system choose the best
buffer size. [1] 


> And the fact that socketpair has been left out of the Windows 
> version of the Unix module is not an impediment - it is easy 
> to write a useful implementation in OCaml (I can supply one 
> if needed).

Please supply one.

Christoph Bauer

[1]
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/ipc/base/cr
eatepipe.asp


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2006-08-25 22:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-15  0:46 Select on channels (again) Nathaniel Gray
2006-08-21 22:47 ` Nathaniel Gray
2006-08-22  0:42   ` [Caml-list] " Jonathan Roewen
2006-08-22  6:27     ` Nathaniel Gray
2006-08-22  6:41       ` Jonathan Roewen
2006-08-22  8:15         ` skaller
2006-08-22 21:15           ` Mike Lin
2006-08-23  5:12         ` Nathaniel Gray
2006-08-22  8:10       ` Olivier Andrieu
2006-08-23  5:27         ` Nathaniel Gray
2006-08-22  8:21       ` Jacques Garrigue
2006-08-23  5:16         ` Nathaniel Gray
2006-08-23  6:35           ` skaller
2006-08-23 19:31             ` Nathaniel Gray
2006-08-24  5:37               ` skaller
2006-08-24 19:06                 ` Nathaniel Gray
2006-08-25  1:55                   ` skaller
2006-08-25 22:19                     ` Nathaniel Gray
2006-08-23  8:29 Christoph Bauer
2006-08-23 17:35 ` Robert Roessler
2006-08-24  8:18 ` Robert Roessler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).