Interestingly, this probably has nothing to do with the size of the buffer. input_char actually acquires and releases a lock for every single call, whether or not an underlying system call is required to fill the buffer. This has always struck me as an odd aspect of the in/out channel implementation, and means that IO is a lot more expensive in a threaded context than it should be. At Jane Street, performance-sensitive code tends to use other libraries that we've built directly on top of file descriptors that batches the IO and doesn't require constant lock acquisition. y On Tue, Feb 17, 2009 at 5:07 AM, Sylvain Le Gall wrote: > On 17-02-2009, Rémi Dewitte wrote: > > > > test.csv is a 21mo file with ~13k rows and a thousands of columns on a > 15rp= > > m > > disk. > > > > ocaml version : 3.11.0 > > > > You are using input_char and standard IO channel. This is a good choice > for non-threaded program. But in your case, I will use Unix.read with a > big buffer (32KB to 4MB) and change your program to use it. As > benchmarked by John Harrop, you are spending most of your time in > caml_enter|leave_blocking section. I think it comes from reading using > std IO channel which use 4k buffer. Using a bigger buffer will allow > less call to this two functions (but you won't win time at the end, I > think you will just reduce the difference between non-threaded and > threaded code). > > Regards > Sylvain Le Gall > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs >