Interestingly, this probably has nothing to do with the size of the buffer.
input_char actually acquires and releases a lock for every single call,
whether or not an underlying system call is required to fill the buffer.
This has always struck me as an odd aspect of the in/out channel
implementation, and means that IO is a lot more expensive in a threaded
context than it should be.

At Jane Street, performance-sensitive code tends to use other libraries that
we've built directly on top of file descriptors that batches the IO and
doesn't require constant lock acquisition.

y

On Tue, Feb 17, 2009 at 5:07 AM, Sylvain Le Gall <sylvain@le-gall.net>wrote:

> On 17-02-2009, Rémi Dewitte <remi@gide.net> wrote:
> >
> > test.csv is a 21mo file with ~13k rows and a thousands of columns on a
> 15rp=
> > m
> > disk.
> >
> > ocaml version : 3.11.0
> >
>
> You are using input_char and standard IO channel. This is a good choice
> for non-threaded program. But in your case, I will use Unix.read with a
> big buffer (32KB to 4MB) and change your program to use it. As
> benchmarked by John Harrop, you are spending most of your time in
> caml_enter|leave_blocking section. I think it comes from reading using
> std IO channel which use 4k buffer. Using a bigger buffer will allow
> less call to this two functions (but you won't win time at the end, I
> think you will just reduce the difference between non-threaded and
> threaded code).
>
> Regards
> Sylvain Le Gall
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>