caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Unix file descriptors vs. in/out channels
@ 2014-08-18 14:42 Thomas Braibant
  2014-08-18 16:10 ` Adrien Nader
  2014-08-18 16:33 ` Xavier Leroy
  0 siblings, 2 replies; 9+ messages in thread
From: Thomas Braibant @ 2014-08-18 14:42 UTC (permalink / raw)
  To: OCaML Mailing List

Hi list,

[summary] I would like to open a file in read-write mode, and use it
(mainly) to stream a big data-structure in it and (sometime) reading
the content of this data-structure.

[problem] I am a bit puzzled w.r.t. the interplay between Pervasives
functions that operate on in/out channels and the Unix function that
operate on file descriptors. From the documentation, I assume that it
is not possible to close an (input) channel that was created using
Unix.in_channel_of_descr without closing the associated file
descriptor. Therefore, I assume that I cannot use
Unix.in_channel_of_descr and Unix.out_channel_of_descr more that once
for my file-descriptor (because otherwise, these channels would not be
reclaimed). But, is is safe to use both kind of channels?

Btw, while playing with this problem, I found the following strange
behavior: if I uncomment the second line in debug (see below), I can
read data from the input channel, while if the debug line is comment,
reading from the channel yields an End_of_file exception. Is this
expected?

let debug msg i o =
  Printf.printf "[%s] posi:%i poso:%o\n%!" msg (pos_in i) (pos_out o);
  (* Printf.printf "[%s] leni:%i leno:%o\n%!" msg (in_channel_length
i) (out_channel_length o);*)
  ()

let test =
  let open Unix in
  let fd = openfile "foo.bar" [O_RDWR; O_TRUNC; O_CREAT] 0o640 in
  Printf.printf "openfile\n%!";
  let o = out_channel_of_descr fd in
  Printf.printf "out_channel_of_descr\n%!";
  let i = in_channel_of_descr fd in
  Printf.printf "in_channel_of_descr\n%!";
  debug "1" i o;
  let _ = Printf.fprintf o "test1\n%!" in
  debug "2" i o;
  assert (write fd "test2" 0 5 = 5);
  debug "3" i o;
  let _ = input_char i  in
  debug "4" i o;
  let _ = close_out o in
  Printf.printf "closeout\n%!";
  try
    ignore (write fd "test3" 0 5);
    close fd;
    Printf.printf "success\n%!"
  with
    _ -> Printf.printf "fail\n%!"

Best,
Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Unix file descriptors vs. in/out channels
  2014-08-18 14:42 [Caml-list] Unix file descriptors vs. in/out channels Thomas Braibant
@ 2014-08-18 16:10 ` Adrien Nader
  2014-08-18 16:15   ` Edouard Evangelisti
  2014-08-18 16:29   ` Thomas Braibant
  2014-08-18 16:33 ` Xavier Leroy
  1 sibling, 2 replies; 9+ messages in thread
From: Adrien Nader @ 2014-08-18 16:10 UTC (permalink / raw)
  To: Thomas Braibant; +Cc: OCaML Mailing List

Hi,

You cannot safely mix buffered (in/out_channel) and un-buffered
(file_descr) uses of the same underlying resource.

IIRC an in_channel or out_channel has a buffer in OCaml memory. 
If you close the underlying file_descr of an out_channel with
functions operating on file desriptors directly, it is possible that
some data will still be buffered.
If you read alternatively through file_descr and in_channel, you might
skip some data if reading with the in_channel reads more than just "n
chars" (it could read 4K for instance, I'm not completely sure).

As for using in/out_channel_of_descr more than once, I don't know
offhand: if it creates new buffers each time (likely), it will be an
issue.

-- 
Adrien Nader

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Unix file descriptors vs. in/out channels
  2014-08-18 16:10 ` Adrien Nader
@ 2014-08-18 16:15   ` Edouard Evangelisti
  2014-08-18 16:29   ` Thomas Braibant
  1 sibling, 0 replies; 9+ messages in thread
From: Edouard Evangelisti @ 2014-08-18 16:15 UTC (permalink / raw)
  To: Adrien Nader; +Cc: Thomas Braibant, OCaML Mailing List

[-- Attachment #1: Type: text/plain, Size: 1863 bytes --]

​​Dear Paolo and Anders,

Thank you for your messages. I tried to use Paolo's code and the exception
Timeout was raised. I was not able to read char by char without O_NONBLOCK
: the program is just waiting forever for something to read.

I was able to retrieve some data using Unix.in_channel_of_descr, but the
data are not transmitted synchronously, which is required in my case. Thus,
I will probably rewrite this part in C using the same strategy and see if
it works. I cannot understand why this strategy works for HyperTerminal and
not for me with the same material.

Thank you again.

Edouard


2014-08-18 17:10 GMT+01:00 Adrien Nader <adrien@notk.org>:

> Hi,
>
> You cannot safely mix buffered (in/out_channel) and un-buffered
> (file_descr) uses of the same underlying resource.
>
> IIRC an in_channel or out_channel has a buffer in OCaml memory.
> If you close the underlying file_descr of an out_channel with
> functions operating on file desriptors directly, it is possible that
> some data will still be buffered.
> If you read alternatively through file_descr and in_channel, you might
> skip some data if reading with the in_channel reads more than just "n
> chars" (it could read 4K for instance, I'm not completely sure).
>
> As for using in/out_channel_of_descr more than once, I don't know
> offhand: if it creates new buffers each time (likely), it will be an
> issue.
>
> --
> Adrien Nader
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>



-- 
Edouard Evangelisti
Post doctoral Research Associate
Sainsbury Laboratory, Cambridge University (SLCU)
Bateman Street
Cambridge CB2 1LR (United Kingdom)

[-- Attachment #2: Type: text/html, Size: 3087 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Unix file descriptors vs. in/out channels
  2014-08-18 16:10 ` Adrien Nader
  2014-08-18 16:15   ` Edouard Evangelisti
@ 2014-08-18 16:29   ` Thomas Braibant
  1 sibling, 0 replies; 9+ messages in thread
From: Thomas Braibant @ 2014-08-18 16:29 UTC (permalink / raw)
  To: Adrien Nader; +Cc: OCaML Mailing List

Hi Adrien,

Thanks a lot for clarifying that one should not use accesses through
the file descriptors and the in/out_channels.

I am pretty sure that using in/out_channel_of_descr more than once
will be an issue too, looking at the code in io.c [1] (at least
because the close function would be called twice on the same fd, which
might refer to something else at that point in time, as Goswin pointed
out recently).

The only part that I still do not fully understand is why the
following code, that does not mix accesses through the file_descr and
the channels has a behavior that depends on the fact that I compute
the length of the input channel. I would expect the out_channel to be
flushed by the call to printf with "%!", isn't it?

let test =
  let open Unix in
  let fd = openfile "foo.bar" [O_RDWR; O_TRUNC; O_CREAT] 0o640 in
  Printf.printf "openfile\n%!";
  let o = out_channel_of_descr fd in
  Printf.printf "out_channel_of_descr\n%!";
  let i = in_channel_of_descr fd in
  Printf.printf "in_channel_of_descr\n%!";
  debug "1" i o;
  let _ = Printf.fprintf o "test1\n%!" in
  debug "2" i o;
  let _ = input_char i  in
  close_in i;
  Printf.printf "Ok\n%!"



[1] https://github.com/ocaml/ocaml/blob/774e30e138dc22a5acd6cfac03ae25194ae8cd6e/byterun/io.c

On Mon, Aug 18, 2014 at 6:10 PM, Adrien Nader <adrien@notk.org> wrote:
> Hi,
>
> You cannot safely mix buffered (in/out_channel) and un-buffered
> (file_descr) uses of the same underlying resource.
>
> IIRC an in_channel or out_channel has a buffer in OCaml memory.
> If you close the underlying file_descr of an out_channel with
> functions operating on file desriptors directly, it is possible that
> some data will still be buffered.
> If you read alternatively through file_descr and in_channel, you might
> skip some data if reading with the in_channel reads more than just "n
> chars" (it could read 4K for instance, I'm not completely sure).
>
> As for using in/out_channel_of_descr more than once, I don't know
> offhand: if it creates new buffers each time (likely), it will be an
> issue.
>
> --
> Adrien Nader

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Unix file descriptors vs. in/out channels
  2014-08-18 14:42 [Caml-list] Unix file descriptors vs. in/out channels Thomas Braibant
  2014-08-18 16:10 ` Adrien Nader
@ 2014-08-18 16:33 ` Xavier Leroy
  2014-08-18 16:52   ` Thomas Braibant
  1 sibling, 1 reply; 9+ messages in thread
From: Xavier Leroy @ 2014-08-18 16:33 UTC (permalink / raw)
  To: caml-list

Hi Thomas,

> [problem] I am a bit puzzled w.r.t. the interplay between Pervasives
> functions that operate on in/out channels and the Unix function that
> operate on file descriptors. From the documentation, I assume that it
> is not possible to close an (input) channel that was created using
> Unix.in_channel_of_descr without closing the associated file
> descriptor. 

Correct.

> Therefore, I assume that I cannot use
> Unix.in_channel_of_descr and Unix.out_channel_of_descr more that once
> for my file-descriptor (because otherwise, these channels would not be
> reclaimed). 

I don't quite understand your remark.  You need to close (explicitly
and at once) all in_channels and out_channels associated with your
file_descr, once you're done with it.  The first close_in/close_out
will close the underlying FD, and the others will ignore the fact that
the FD is already closed.

> But, is is safe to use both kind of channels?

Sometimes :-)  An example is Unix.open_connection, which gives you a
pair of in/out channels on the same socket.  The only caveat is that
writes on out_channels are buffered, so you need to flush explicitly
to make sure the data is actually sent over the socket.

> [summary] I would like to open a file in read-write mode, and use it
> (mainly) to stream a big data-structure in it and (sometime) reading
> the content of this data-structure.

For a file opened in RW mode, the problem is that reads through
the in_channel may not see the data written through the out_channel,
even if you religiously flush the out_channel before reading anything.
The reason is that in_channels are also buffered, and may hold stale
data corresponding to the state of the file before recent writes.  And
there is no flush operation for in_channels...

> Btw, while playing with this problem, I found the following strange
> behavior: if I uncomment the second line in debug (see below), I can
> read data from the input channel, while if the debug line is comment,
> reading from the channel yields an End_of_file exception. Is this
> expected?

This is another gotcha :-)  in_channels and out_channels maintain
(their idea of) the current position in the file.  This helps avoiding
unnecessary "lseek" operations to determine current position and
length.  However, if you share a FD between two channels, the
channels's idea of the current position is inconsistent with the
actual position of the FD.

Bottom line: for your intended application, it's better to use Unix
functions exclusively.  The trick with an (in_channel, out_channel) pair
does work pretty well for sockets, named pipes and terminals, though.

- Xavier Leroy


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Unix file descriptors vs. in/out channels
  2014-08-18 16:33 ` Xavier Leroy
@ 2014-08-18 16:52   ` Thomas Braibant
  2014-08-18 16:57     ` Xavier Leroy
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Braibant @ 2014-08-18 16:52 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: OCaML Mailing List

Hi Xavier.

Thanks for your answer.

>> Therefore, I assume that I cannot use
>> Unix.in_channel_of_descr and Unix.out_channel_of_descr more that once
>> for my file-descriptor (because otherwise, these channels would not be
>> reclaimed).
>
> I don't quite understand your remark.  You need to close (explicitly
> and at once) all in_channels and out_channels associated with your
> file_descr, once you're done with it.  The first close_in/close_out
> will close the underlying FD, and the others will ignore the fact that
> the FD is already closed.

Well, I was thinking about the following situation

  let open Unix in
  let fd = openfile "foo.bar" [O_RDWR; O_TRUNC; O_CREAT] 0o640 in
  let o = out_channel_of_descr fd in
  let i = in_channel_of_descr fd in
  let i2 = in_channel_of_descr fd in
  Printf.printf "1\n%!";
  close_in i;
  Printf.printf "2\n%!";
  close_in i2;
  Printf.printf "3\n%!";
  close_out o;
  Printf.printf "Ok\n%!"

that raises the fatal error: exception Sys_error("Bad file
descriptor"), and now, I do not understand your remark either :(.

> For a file opened in RW mode, the problem is that reads through
> the in_channel may not see the data written through the out_channel,
> even if you religiously flush the out_channel before reading anything.
> The reason is that in_channels are also buffered, and may hold stale
> data corresponding to the state of the file before recent writes.  And
> there is no flush operation for in_channels...
>
> This is another gotcha :-)  in_channels and out_channels maintain
> (their idea of) the current position in the file.  This helps avoiding
> unnecessary "lseek" operations to determine current position and
> length.  However, if you share a FD between two channels, the
> channels's idea of the current position is inconsistent with the
> actual position of the FD.

Good to know (especially the part about the in channels not being flushed).

> Bottom line: for your intended application, it's better to use Unix
> functions exclusively.  The trick with an (in_channel, out_channel) pair
> does work pretty well for sockets, named pipes and terminals, though.

That seems like a very good advice, and I will do likewise. (Even if
one minor issue is that it makes interacting with some third-party
libraries that use in/out channels as abstraction more complex.)

Best,
Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Unix file descriptors vs. in/out channels
  2014-08-18 16:52   ` Thomas Braibant
@ 2014-08-18 16:57     ` Xavier Leroy
  2014-08-18 17:18       ` Thomas Braibant
  0 siblings, 1 reply; 9+ messages in thread
From: Xavier Leroy @ 2014-08-18 16:57 UTC (permalink / raw)
  To: OCaML Mailing List

On 18/08/14 18:52, Thomas Braibant wrote:

> Well, I was thinking about the following situation
> 
>   let open Unix in
>   let fd = openfile "foo.bar" [O_RDWR; O_TRUNC; O_CREAT] 0o640 in
>   let o = out_channel_of_descr fd in
>   let i = in_channel_of_descr fd in
>   let i2 = in_channel_of_descr fd in
>   Printf.printf "1\n%!";
>   close_in i;
>   Printf.printf "2\n%!";
>   close_in i2;
>   Printf.printf "3\n%!";
>   close_out o;
>   Printf.printf "Ok\n%!"
> 
> that raises the fatal error: exception Sys_error("Bad file
> descriptor"), and now, I do not understand your remark either :(.

I said "close all your channels at once when you're done with the
underlying file descriptor".  What you observe is that after the first
close_in, all the other channels are unusable, because the underlying
FD is closed.

- Xavier Leroy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Unix file descriptors vs. in/out channels
  2014-08-18 16:57     ` Xavier Leroy
@ 2014-08-18 17:18       ` Thomas Braibant
  2014-08-18 17:55         ` David Sheets
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Braibant @ 2014-08-18 17:18 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: OCaML Mailing List

[-- Attachment #1: Type: text/plain, Size: 1257 bytes --]

Ah, my bad, sorry for the noise!
Le 18 août 2014 18:58, "Xavier Leroy" <Xavier.Leroy@inria.fr> a écrit :

> On 18/08/14 18:52, Thomas Braibant wrote:
>
> > Well, I was thinking about the following situation
> >
> >   let open Unix in
> >   let fd = openfile "foo.bar" [O_RDWR; O_TRUNC; O_CREAT] 0o640 in
> >   let o = out_channel_of_descr fd in
> >   let i = in_channel_of_descr fd in
> >   let i2 = in_channel_of_descr fd in
> >   Printf.printf "1\n%!";
> >   close_in i;
> >   Printf.printf "2\n%!";
> >   close_in i2;
> >   Printf.printf "3\n%!";
> >   close_out o;
> >   Printf.printf "Ok\n%!"
> >
> > that raises the fatal error: exception Sys_error("Bad file
> > descriptor"), and now, I do not understand your remark either :(.
>
> I said "close all your channels at once when you're done with the
> underlying file descriptor".  What you observe is that after the first
> close_in, all the other channels are unusable, because the underlying
> FD is closed.
>
> - Xavier Leroy
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 1937 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Unix file descriptors vs. in/out channels
  2014-08-18 17:18       ` Thomas Braibant
@ 2014-08-18 17:55         ` David Sheets
  0 siblings, 0 replies; 9+ messages in thread
From: David Sheets @ 2014-08-18 17:55 UTC (permalink / raw)
  To: Thomas Braibant; +Cc: Xavier Leroy, OCaML Mailing List

On Mon, Aug 18, 2014 at 6:18 PM, Thomas Braibant
<thomas.braibant@gmail.com> wrote:
> Ah, my bad, sorry for the noise!
>
> Le 18 août 2014 18:58, "Xavier Leroy" <Xavier.Leroy@inria.fr> a écrit :
>
>> On 18/08/14 18:52, Thomas Braibant wrote:
>>
>> > Well, I was thinking about the following situation
>> >
>> >   let open Unix in
>> >   let fd = openfile "foo.bar" [O_RDWR; O_TRUNC; O_CREAT] 0o640 in
>> >   let o = out_channel_of_descr fd in
>> >   let i = in_channel_of_descr fd in
>> >   let i2 = in_channel_of_descr fd in
>> >   Printf.printf "1\n%!";
>> >   close_in i;
>> >   Printf.printf "2\n%!";
>> >   close_in i2;
>> >   Printf.printf "3\n%!";
>> >   close_out o;
>> >   Printf.printf "Ok\n%!"
>> >
>> > that raises the fatal error: exception Sys_error("Bad file
>> > descriptor"), and now, I do not understand your remark either :(.
>>
>> I said "close all your channels at once when you're done with the
>> underlying file descriptor".  What you observe is that after the first
>> close_in, all the other channels are unusable, because the underlying
>> FD is closed.

I'm sorry. I still don't understand. The following raises on the first close_in:

let open Unix in
let fd = openfile "foo.bar" [O_RDWR; O_TRUNC; O_CREAT] 0o640 in
let o = out_channel_of_descr fd in
let i = in_channel_of_descr fd in
let i2 = in_channel_of_descr fd in
Printf.printf "1\n%!";
close_out o;
close_in i;
close_in i2;
Printf.printf "Ok\n%!"

To get the error ignoring property, I think you have to use
close_out_noerr (not in this case) and close_in_noerr.

Maybe I've missed some subtlety here, though. I'm a little uneasy
ignoring *all* errors...

David

>> - Xavier Leroy
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-08-18 17:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-18 14:42 [Caml-list] Unix file descriptors vs. in/out channels Thomas Braibant
2014-08-18 16:10 ` Adrien Nader
2014-08-18 16:15   ` Edouard Evangelisti
2014-08-18 16:29   ` Thomas Braibant
2014-08-18 16:33 ` Xavier Leroy
2014-08-18 16:52   ` Thomas Braibant
2014-08-18 16:57     ` Xavier Leroy
2014-08-18 17:18       ` Thomas Braibant
2014-08-18 17:55         ` David Sheets

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).