caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: [Caml-list] Unix.lseek versus Pervasives.pos
       [not found] <46CF368E-5912-11D7-8289-000A95773ED2@rouaix.org>
  2003-03-18 17:35 ` [Caml-list] Unix.lseek versus Pervasives.pos Shivkumar Chandrasekaran
@ 2003-03-18 17:39 ` Shivkumar Chandrasekaran
  2003-03-19 20:27   ` Xavier Leroy
  1 sibling, 1 reply; 10+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-03-18 17:39 UTC (permalink / raw)
  To: caml-list

It would seem to me that it would be convenient to have 64 bit versions 
of seek_in, seek_out, pos_in, pos_out in the Pervasives module. This 
would help decouple the Pervasives I/O module a little more from the 
Unix module.

--shiv--


On Monday, March 17, 2003, at 11:21 PM, Francois Rouaix wrote:

> You may need to flush the channel. If the data is still in the 
> buffers, the fd position will not have been updated.
>
> --f
>
> On Monday, Mar 17, 2003, at 23:45 Europe/Paris, Shivkumar 
> Chandrasekaran wrote:
>
>> Hi,
>>
>> Currently I am trying to handle "LargeFiles" while marshalling caml 
>> values and I have run into this incidental problem (nothing to do 
>> with LargeFile). If I open a file with "open_out_bin", write to it 
>> using "output_value" and then try to determine the position in the 
>> file using "pos", I get the correct value. However, if I use 
>> Unix.lseek > thus
>>
>> Unix.lseek (Unix.descr_of_out_channel fd_out) 0 Unix.SEEK_CUR
>>
>> I get a different value (so far always 0) than the one I get from
>>
>> pos fd_out
>>
>> The manual does not seem to help. Any advice will be appreciated. 
>> Thanks,
>>
>> --shiv--
>>
>> -------------------
>> To unsubscribe, mail caml-list-request@inria.fr Archives: 
>> http://caml.inria.fr
>> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: 
>> http://caml.inria.fr/FAQ/
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>>
>>
>
>
--shiv--

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Unix.lseek versus Pervasives.pos
  2003-03-18 17:39 ` Shivkumar Chandrasekaran
@ 2003-03-19 20:27   ` Xavier Leroy
  0 siblings, 0 replies; 10+ messages in thread
From: Xavier Leroy @ 2003-03-19 20:27 UTC (permalink / raw)
  To: Shivkumar Chandrasekaran; +Cc: caml-list

> It would seem to me that it would be convenient to have 64 bit versions 
> of seek_in, seek_out, pos_in, pos_out in the Pervasives module. This 
> would help decouple the Pervasives I/O module a little more from the 
> Unix module.

You wish was granted in OCaml 3.05 and 3.06: module Pervasives.LargeFile.

And, yes, not mixing Pervasives I/O with Unix I/O is recommended,
unless you enjoy puzzles :-)

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Unix.lseek versus Pervasives.pos
  2003-03-19 18:36 cashin
  2003-03-19 18:48 ` Nicolas George
  2003-03-19 18:55 ` Ken Rose
@ 2003-03-19 19:08 ` Basile STARYNKEVITCH
  2 siblings, 0 replies; 10+ messages in thread
From: Basile STARYNKEVITCH @ 2003-03-19 19:08 UTC (permalink / raw)
  To: cashin; +Cc: caml-list

>>>>> "cashin" == cashin  <cashin@cs.uga.edu> writes:

    cashin> Sorry if this shows up as a duplicate.  Basile
    cashin> STARYNKEVITCH <basile@starynkevitch.net> writes:

    Basile>> You apparently forgot to flush the channel.

Ok, I made a stupid mistake (flushing is only for writes!) but my
intuition was right, in the sense of taking buffering into account.

    cashin> Flushes are for writes, but even when using a test program
    cashin> that just reads, zero is returned when it appears that it
    cashin> shouldn't return zero.  Compare the short ocaml program
    cashin> below to the comparable C version.


Ok; but the problem is the same: Ocaml I/O subsystem manage internal
buffering. Channels are not Unix filedescriptors, but a buffering of
these. See the source code (in particular ocaml/byterun/io.c and io.h) for
details. In particular, a channel is (from io.h) implemented as

  struct channel {
    int fd;                       /* Unix file descriptor */
    file_offset offset;           /* Absolute position of fd in the file */
    char * end;                   /* Physical end of the buffer */
    char * curr;                  /* Current position in the buffer */
    char * max;                   /* Logical end of the buffer (for input) */
    void * mutex;                 /* Placeholder for mutex (for systhreads) */
    struct channel * next;        /* Linear chaining of channels (flush_all) */
    int revealed;                 /* For Cash only */
    int old_revealed;             /* For Cash only */
    int refcount;                 /* For flush_all and for Cash */
    char buff[IO_BUFFER_SIZE];    /* The buffer itself */
  };
 

where IO_BUFFER_SIZE is usually 4096 bytes.

The equivalent C library would mix lseek with <stdio.h> FILE, and also
get a mess:

  /* file main.c */
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <unistd.h>
  #include <stdio.h>
  
  int main(void)
  {
      FILE *f = fopen("main.c", "r");
      char buf[1024];
      int fd = fileno(f);
  
      memset(buf, '\0', sizeof(buf));
      fread(buf, 1, 10, f);
      printf("after reading \"%s\" lseek returns %d\n",
             buf, (int) lseek(fd, 0, SEEK_CUR));
  
      return 0;
  }

When I run above file with tcc (www.tinycc.org) I get 

after reading "  /* file " lseek returns 483 

which is messy as I was expecting.

In a short sentence, never mix Unix.read (or other Unix IO) &
Pervasive.* channel operations. 


As usual with advices, it is a "don't do what I did" advice; shame on
me :-( I must admit that I once did open a channel and then only do
Unix.read operations on it, but I commented this code (opensource code
in Poesia monitor) with

(** IMPORTANT NOTICE: here outputxchannel_t-s are only used for their
   Unix file descriptor; no output takes actually place on the output
   channel; all output is thru Unix.write *) 

and later

(** the reply channel from filter to monitor [don't use the
   Pervasives.channel; using Unix] *)

The bad reasons for mixing channels & unix file descriptors (beside
perhaps a design bug) is that I use nonblocking unix IO and that I
want precise control over the actual read & write system calls -so I
don't want extra buffering-

-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Unix.lseek versus Pervasives.pos
  2003-03-19 18:48 ` Nicolas George
@ 2003-03-19 19:01   ` cashin
  0 siblings, 0 replies; 10+ messages in thread
From: cashin @ 2003-03-19 19:01 UTC (permalink / raw)
  To: caml-list

Nicolas George <nicolas.george@ens.fr> writes:

...
> So you can see that the lseek is done before the read. And indeed, your
> calls to read and lseek can occur in an unspecified order. I guess that
> if you write
>
>   let len = UnixLabels.read ... in
>   let pos = UnixLabels.lseek ... in
>   Printf.printf ...
>
> you will get the right result.

You're right!

  ecashin@meili seek-tell$ ./test 
  after reading 10 chars: "let main =", position is 10
  ecashin@meili seek-tell$ cat main.ml
  let main = 
    let fd = Unix.openfile "main.ml" [Unix.O_RDONLY] 0 
    and buf = String.create 1024 in
    let n = UnixLabels.read fd ~buf ~pos:0 ~len:10 in
    Printf.printf "after reading %d chars: \"%s\", position is %d\n"
      n
      buf
      (UnixLabels.lseek fd 0 ~mode:Unix.SEEK_CUR)
  ;;
  
  main


-- 
--Ed L Cashin            |   PGP public key:
  ecashin@uga.edu        |   http://noserose.net/e/pgp/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Unix.lseek versus Pervasives.pos
  2003-03-19 18:36 cashin
  2003-03-19 18:48 ` Nicolas George
@ 2003-03-19 18:55 ` Ken Rose
  2003-03-19 19:08 ` Basile STARYNKEVITCH
  2 siblings, 0 replies; 10+ messages in thread
From: Ken Rose @ 2003-03-19 18:55 UTC (permalink / raw)
  To: cashin; +Cc: caml-list

cashin@cs.uga.edu wrote:
> 
> Flushes are for writes, but even when using a test program that just
> reads, zero is returned when it appears that it shouldn't return zero.
> Compare the short ocaml program below to the comparable C version.
> 
> The ocaml version has lseek returning position zero after reading 10
> bytes from the file.
> 
>   ecashin@meili seek-tell$ ./test
>   after reading 10 chars: "let main =", position is 0
>   ecashin@meili seek-tell$ cat main.ml
>   let main =
>     let fd = Unix.openfile "main.ml" [Unix.O_RDONLY] 0
>     and buf = String.create 1024 in
>     Printf.printf "after reading %d chars: \"%s\", position is %d\n"
>       (UnixLabels.read fd ~buf ~pos:0 ~len:10)
>       buf
>       (UnixLabels.lseek fd 0 ~mode:Unix.SEEK_CUR)
>   ;;
> 
>   main
> 

It looks like you're getting bitten by the order of evaluation of
function arguments.  

$ cat main.ml
let main = 
    let fd = Unix.openfile "main.ml" [Unix.O_RDONLY] 0 
    and buf = String.create 1024 in
    let r = (UnixLabels.read fd ~buf ~pos:0 ~len:10) in
    Printf.printf "after reading %d chars: \"%s\", position is %d\n"
      r
      buf
      (UnixLabels.lseek fd 0 ~mode:Unix.SEEK_CUR)
  ;;

main

$ ./a.out
after reading 10 chars: "let main =", position is 10
$

 - ken

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Unix.lseek versus Pervasives.pos
  2003-03-19 18:36 cashin
@ 2003-03-19 18:48 ` Nicolas George
  2003-03-19 19:01   ` cashin
  2003-03-19 18:55 ` Ken Rose
  2003-03-19 19:08 ` Basile STARYNKEVITCH
  2 siblings, 1 reply; 10+ messages in thread
From: Nicolas George @ 2003-03-19 18:48 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

Le nonidi 29 ventôse, an CCXI, cashin@cs.uga.edu a écrit :
>     Printf.printf "after reading %d chars: \"%s\", position is %d\n"
>       (UnixLabels.read fd ~buf ~pos:0 ~len:10)
>       buf
>       (UnixLabels.lseek fd 0 ~mode:Unix.SEEK_CUR)

Use strace, and you'll see something like that :

open("main.ml", O_RDONLY|O_LARGEFILE)   = 3
_llseek(3, 0, [0], SEEK_CUR)            = 0
read(3, "  let main", 10)               = 10
write(1, "after reading 10 chars: \"  let m"..., 1066) = 1066

So you can see that the lseek is done before the read. And indeed, your
calls to read and lseek can occur in an unspecified order. I guess that
if you write

  let len = UnixLabels.read ... in
  let pos = UnixLabels.lseek ... in
  Printf.printf ...

you will get the right result.

[-- Attachment #2: Type: application/pgp-signature, Size: 185 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Unix.lseek versus Pervasives.pos
@ 2003-03-19 18:36 cashin
  2003-03-19 18:48 ` Nicolas George
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: cashin @ 2003-03-19 18:36 UTC (permalink / raw)
  To: caml-list

Sorry if this shows up as a duplicate.  

Basile STARYNKEVITCH <basile@starynkevitch.net> writes:

...
> You apparently forgot to flush the channel.

Flushes are for writes, but even when using a test program that just
reads, zero is returned when it appears that it shouldn't return zero.
Compare the short ocaml program below to the comparable C version.

The ocaml version has lseek returning position zero after reading 10
bytes from the file.  

  ecashin@meili seek-tell$ ./test 
  after reading 10 chars: "let main =", position is 0
  ecashin@meili seek-tell$ cat main.ml
  let main = 
    let fd = Unix.openfile "main.ml" [Unix.O_RDONLY] 0 
    and buf = String.create 1024 in
    Printf.printf "after reading %d chars: \"%s\", position is %d\n"
      (UnixLabels.read fd ~buf ~pos:0 ~len:10)
      buf
      (UnixLabels.lseek fd 0 ~mode:Unix.SEEK_CUR)
  ;;
  
  main
  

... but in the C version you get the expected position reported.
  
  ecashin@meili seek-tell$ ./test 
  after reading "#include <" lseek returns 10
  ecashin@meili seek-tell$ cat main.c
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <unistd.h>
  
  int main(void)
  {
      int fd = open("main.c", O_RDONLY);
      char buf[1024];
  
      if (fd == -1) {
        perror("open");
        exit(EXIT_FAILURE);
      }
      memset(buf, '\0', sizeof(buf));
      read(fd, buf, 10);
      printf("after reading \"%s\" lseek returns %d\n",
             buf, (int) lseek(fd, 0, SEEK_CUR));
  
      return 0;
  }


-- 
--Ed L Cashin            |   PGP public key:
  ecashin@uga.edu        |   http://noserose.net/e/pgp/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Unix.lseek versus Pervasives.pos
       [not found] <46CF368E-5912-11D7-8289-000A95773ED2@rouaix.org>
@ 2003-03-18 17:35 ` Shivkumar Chandrasekaran
  2003-03-18 17:39 ` Shivkumar Chandrasekaran
  1 sibling, 0 replies; 10+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-03-18 17:35 UTC (permalink / raw)
  To: caml-list

I went back to my code and put flushes after all writes. It still did 
not help. Furthermore, once I replaced output_value by Unix.write (not 
followed by flushes) lseek worked perfectly well! So I am not sure 
whether the problem is due to non-flushing or not. Furthermore I 
observed that in the Unix module there is no way to flush/sync a file. 
Is it not needed? Apparently not.

--shiv--


On Monday, March 17, 2003, at 11:21 PM, Francois Rouaix (and similarly 
Basile STARYNKEVITCH) wrote:

> You may need to flush the channel. If the data is still in the 
> buffers, the fd position will not have been updated.
>
> --f
>
> On Monday, Mar 17, 2003, at 23:45 Europe/Paris, Shivkumar 
> Chandrasekaran wrote:
>
>> Hi,
>>
>> Currently I am trying to handle "LargeFiles" while marshalling caml 
>> values and I have run into this incidental problem (nothing to do 
>> with LargeFile). If I open a file with "open_out_bin", write to it 
>> using "output_value" and then try to determine the position in the 
>> file using "pos", I get the correct value. However, if I use 
>> Unix.lseek > thus
>>
>> Unix.lseek (Unix.descr_of_out_channel fd_out) 0 Unix.SEEK_CUR
>>
>> I get a different value (so far always 0) than the one I get from
>>
>> pos fd_out
>>
>> The manual does not seem to help. Any advice will be appreciated. 
>> Thanks,
>>
>> --shiv--
>>
>> -------------------
>> To unsubscribe, mail caml-list-request@inria.fr Archives: 
>> http://caml.inria.fr
>> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: 
>> http://caml.inria.fr/FAQ/
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>>
>>
>
>
--shiv--

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Caml-list] Unix.lseek versus Pervasives.pos
  2003-03-17 22:45 Shivkumar Chandrasekaran
@ 2003-03-18  6:54 ` Basile STARYNKEVITCH
  0 siblings, 0 replies; 10+ messages in thread
From: Basile STARYNKEVITCH @ 2003-03-18  6:54 UTC (permalink / raw)
  To: caml-list

>>>>> "Shivkumar" == Shivkumar Chandrasekaran <shiv@ece.ucsb.edu> writes:

    Shivkumar> Hi, Currently I am trying to handle "LargeFiles" while
    Shivkumar> marshalling caml values and I have run into this
    Shivkumar> incidental problem (nothing to do with LargeFile). If I
    Shivkumar> open a file with "open_out_bin", write to it using
    Shivkumar> "output_value" 

You apparently forgot to flush the channel.

    Shivkumar> and then try to determine the position
    Shivkumar> in the file using "pos", I get the correct
    Shivkumar> value. However, if I use Unix.lseek thus

    Shivkumar> Unix.lseek (Unix.descr_of_out_channel fd_out) 0
    Shivkumar> Unix.SEEK_CUR

    Shivkumar> I get a different value (so far always 0) 


Forgetting to flush files on most systems should give similar
errors. Flushing files is not Ocaml specific, but a general issue (at
least under Unix, and probably Windows).

Perhaps the Ocaml manual could add as a hint to never forget flushing
files (but the tip might be there already), but this hint is very
basic & generic and is not Ocaml specific.

Regards.

-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Caml-list] Unix.lseek versus Pervasives.pos
@ 2003-03-17 22:45 Shivkumar Chandrasekaran
  2003-03-18  6:54 ` Basile STARYNKEVITCH
  0 siblings, 1 reply; 10+ messages in thread
From: Shivkumar Chandrasekaran @ 2003-03-17 22:45 UTC (permalink / raw)
  To: caml-list

Hi,

Currently I am trying to handle "LargeFiles" while marshalling caml 
values and I have run into this incidental problem (nothing to do with 
LargeFile). If I open a file with "open_out_bin", write to it using 
"output_value" and then try to determine the position in the file using 
"pos", I get the correct value. However, if I use Unix.lseek thus

Unix.lseek (Unix.descr_of_out_channel fd_out) 0 Unix.SEEK_CUR

I get a different value (so far always 0) than the one I get from

pos fd_out

The manual does not seem to help. Any advice will be appreciated. 
Thanks,

--shiv--

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-03-19 20:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <46CF368E-5912-11D7-8289-000A95773ED2@rouaix.org>
2003-03-18 17:35 ` [Caml-list] Unix.lseek versus Pervasives.pos Shivkumar Chandrasekaran
2003-03-18 17:39 ` Shivkumar Chandrasekaran
2003-03-19 20:27   ` Xavier Leroy
2003-03-19 18:36 cashin
2003-03-19 18:48 ` Nicolas George
2003-03-19 19:01   ` cashin
2003-03-19 18:55 ` Ken Rose
2003-03-19 19:08 ` Basile STARYNKEVITCH
  -- strict thread matches above, loose matches on Subject: below --
2003-03-17 22:45 Shivkumar Chandrasekaran
2003-03-18  6:54 ` Basile STARYNKEVITCH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).