On Tue, Feb 17, 2009 at 11:26, Mark Shinwell <mshinwell@janestcapital.com> wrote:

On Tue, Feb 17, 2009 at 10:07:05AM +0000, Sylvain Le Gall wrote:
> On 17-02-2009, Rémi Dewitte <remi@gide.net> wrote:

> You are using input_char and standard IO channel. This is a good choice
> for non-threaded program. But in your case, I will use Unix.read with a
> big buffer (32KB to 4MB) and change your program to use it. As
> benchmarked by John Harrop, you are spending most of your time in
> caml_enter|leave_blocking section.

This isn't quite right actually -- the profile is deceiving. It is true
that there are a lot of calls to enter/leave_blocking_section, but you're
actually being killed by the overhead of an independent locking strategy
in the channel-based I/O calls. I've measured this using some hackery
with a hex editor. When you call input_char, you acquire and then release
another lock which is specific to these calls (the global runtime lock is
often not released here). This process isn't especially cheap, so it would
be better to use one of the other channel calls to read data in larger blocks.

Mark