caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Basile STARYNKEVITCH <basile@starynkevitch.net>
To: cashin@cs.uga.edu
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Unix.lseek versus Pervasives.pos
Date: Wed, 19 Mar 2003 20:08:08 +0100	[thread overview]
Message-ID: <15992.49176.43893.768644@hector.lesours> (raw)
In-Reply-To: <877kavryp3.fsf@cs.uga.edu>

>>>>> "cashin" == cashin  <cashin@cs.uga.edu> writes:

    cashin> Sorry if this shows up as a duplicate.  Basile
    cashin> STARYNKEVITCH <basile@starynkevitch.net> writes:

    Basile>> You apparently forgot to flush the channel.

Ok, I made a stupid mistake (flushing is only for writes!) but my
intuition was right, in the sense of taking buffering into account.

    cashin> Flushes are for writes, but even when using a test program
    cashin> that just reads, zero is returned when it appears that it
    cashin> shouldn't return zero.  Compare the short ocaml program
    cashin> below to the comparable C version.


Ok; but the problem is the same: Ocaml I/O subsystem manage internal
buffering. Channels are not Unix filedescriptors, but a buffering of
these. See the source code (in particular ocaml/byterun/io.c and io.h) for
details. In particular, a channel is (from io.h) implemented as

  struct channel {
    int fd;                       /* Unix file descriptor */
    file_offset offset;           /* Absolute position of fd in the file */
    char * end;                   /* Physical end of the buffer */
    char * curr;                  /* Current position in the buffer */
    char * max;                   /* Logical end of the buffer (for input) */
    void * mutex;                 /* Placeholder for mutex (for systhreads) */
    struct channel * next;        /* Linear chaining of channels (flush_all) */
    int revealed;                 /* For Cash only */
    int old_revealed;             /* For Cash only */
    int refcount;                 /* For flush_all and for Cash */
    char buff[IO_BUFFER_SIZE];    /* The buffer itself */
  };
 

where IO_BUFFER_SIZE is usually 4096 bytes.

The equivalent C library would mix lseek with <stdio.h> FILE, and also
get a mess:

  /* file main.c */
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  #include <unistd.h>
  #include <stdio.h>
  
  int main(void)
  {
      FILE *f = fopen("main.c", "r");
      char buf[1024];
      int fd = fileno(f);
  
      memset(buf, '\0', sizeof(buf));
      fread(buf, 1, 10, f);
      printf("after reading \"%s\" lseek returns %d\n",
             buf, (int) lseek(fd, 0, SEEK_CUR));
  
      return 0;
  }

When I run above file with tcc (www.tinycc.org) I get 

after reading "  /* file " lseek returns 483 

which is messy as I was expecting.

In a short sentence, never mix Unix.read (or other Unix IO) &
Pervasive.* channel operations. 


As usual with advices, it is a "don't do what I did" advice; shame on
me :-( I must admit that I once did open a channel and then only do
Unix.read operations on it, but I commented this code (opensource code
in Poesia monitor) with

(** IMPORTANT NOTICE: here outputxchannel_t-s are only used for their
   Unix file descriptor; no output takes actually place on the output
   channel; all output is thru Unix.write *) 

and later

(** the reply channel from filter to monitor [don't use the
   Pervasives.channel; using Unix] *)

The bad reasons for mixing channels & unix file descriptors (beside
perhaps a design bug) is that I use nonblocking unix IO and that I
want precise control over the actual read & write system calls -so I
don't want extra buffering-

-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2003-03-19 19:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-03-19 18:36 cashin
2003-03-19 18:48 ` Nicolas George
2003-03-19 19:01   ` cashin
2003-03-19 18:55 ` Ken Rose
2003-03-19 19:08 ` Basile STARYNKEVITCH [this message]
     [not found] <46CF368E-5912-11D7-8289-000A95773ED2@rouaix.org>
2003-03-18 17:35 ` Shivkumar Chandrasekaran
2003-03-18 17:39 ` Shivkumar Chandrasekaran
2003-03-19 20:27   ` Xavier Leroy
  -- strict thread matches above, loose matches on Subject: below --
2003-03-17 22:45 Shivkumar Chandrasekaran
2003-03-18  6:54 ` Basile STARYNKEVITCH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15992.49176.43893.768644@hector.lesours \
    --to=basile@starynkevitch.net \
    --cc=caml-list@inria.fr \
    --cc=cashin@cs.uga.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).