From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: ** X-Spam-Status: No, score=2.0 required=5.0 tests=AWL,DNS_FROM_RFC_POST, HTML_MESSAGE,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 599E0BB84 for ; Tue, 17 Feb 2009 13:20:25 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AikCALk6mknRVd0Vimdsb2JhbACCQTCKHYYUZT8BAQEKCQwHDwWsGgiBAI8lAQMBA4QQBoNt X-IronPort-AV: E=Sophos;i="4.38,223,1233529200"; d="scan'208";a="24200990" Received: from mail-qy0-f21.google.com ([209.85.221.21]) by mail1-smtp-roc.national.inria.fr with ESMTP; 17 Feb 2009 13:20:24 +0100 Received: by qyk14 with SMTP id 14so9635329qyk.9 for ; Tue, 17 Feb 2009 04:20:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=2NYBgQQyhWYP50l4vj4BlU4NGFgsi8LgRgWiKuoQGa0=; b=ILGcEqt7qNNNf7qKb/eIPWKqGhcNsbzAGLdvfDTeyJYUl2WOUB2mjAGVU+ZqCzu90i JobZbZ2rZABbnD21W+ndYlD+f4YbY+cq+oO2XmfTH5Ao1NVeEMNITYlOnsUNdlQJKnrv YSezkrWUbuSFIkrFWw+LbqVroknOe7cyx3eas= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=rtrk1cfNbrSWCH5xwD2gu/1iMJx/nhqhh5+j4hraW8U/9laq6rK8Px4o9j33kFviv3 wWuitHf7p3Bsv19ZTg7QQWt6wo6olOK7pAVz63Witn/bhOheU/DcB41RU3+HJbmFfGJ+ mbg9GggbIy/Lb1tNmRVHXtpjWgIEzsnhTqgwk= MIME-Version: 1.0 Received: by 10.229.100.14 with SMTP id w14mr1771081qcn.48.1234873223883; Tue, 17 Feb 2009 04:20:23 -0800 (PST) Reply-To: yminsky@gmail.com In-Reply-To: References: <2184b2340902160715y1f935b5ehc0e6195b3f75b66b@mail.gmail.com> <891bd3390902160847p25ad3bf1pe59da620dfc667f2@mail.gmail.com> <2184b2340902160937i53b8f3fbga01eaf14ed829f8f@mail.gmail.com> <2184b2340902162340s540c5ac7g9f42b59d03f643cb@mail.gmail.com> Date: Tue, 17 Feb 2009 07:20:23 -0500 Message-ID: <891bd3390902170420i6e7647ccl44c55456952d3f5a@mail.gmail.com> Subject: Re: [Caml-list] Re: Threads performance issue. From: Yaron Minsky To: Sylvain Le Gall Cc: caml-list@inria.fr Content-Type: multipart/alternative; boundary=0016363109bbda2d1e04631c53d6 X-Spam: no; 0.00; yaron:01 minsky:01 yminsky:01 buffer:01 buffer:01 descriptors:01 le-gall:01 ocaml:01 beginner's:01 ocaml:01 bug:01 descriptors:01 le-gall:01 beginner's:01 bug:01 --0016363109bbda2d1e04631c53d6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Interestingly, this probably has nothing to do with the size of the buffer. input_char actually acquires and releases a lock for every single call, whether or not an underlying system call is required to fill the buffer. This has always struck me as an odd aspect of the in/out channel implementation, and means that IO is a lot more expensive in a threaded context than it should be. At Jane Street, performance-sensitive code tends to use other libraries tha= t we've built directly on top of file descriptors that batches the IO and doesn't require constant lock acquisition. y On Tue, Feb 17, 2009 at 5:07 AM, Sylvain Le Gall wrote= : > On 17-02-2009, R=E9mi Dewitte wrote: > > > > test.csv is a 21mo file with ~13k rows and a thousands of columns on a > 15rp=3D > > m > > disk. > > > > ocaml version : 3.11.0 > > > > You are using input_char and standard IO channel. This is a good choice > for non-threaded program. But in your case, I will use Unix.read with a > big buffer (32KB to 4MB) and change your program to use it. As > benchmarked by John Harrop, you are spending most of your time in > caml_enter|leave_blocking section. I think it comes from reading using > std IO channel which use 4k buffer. Using a bigger buffer will allow > less call to this two functions (but you won't win time at the end, I > think you will just reduce the difference between non-threaded and > threaded code). > > Regards > Sylvain Le Gall > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > --0016363109bbda2d1e04631c53d6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Interestingly, this probably has nothing to do with the size of the buffer.=   input_char actually acquires and releases a lock for every single ca= ll, whether or not an underlying system call is required to fill the buffer= .  This has always struck me as an odd aspect of the in/out channel im= plementation, and means that IO is a lot more expensive in a threaded conte= xt than it should be. 

At Jane Street, performance-sensitive code tends to use other libraries= that we've built directly on top of file descriptors that batches the = IO and doesn't require constant lock acquisition.

y

On Tue, Feb 17, 2009 at 5:07 AM, Sylvain Le Gall <sylvain@le-gall.net> wrote= :
On 17-02-2009, R=E9mi Dewitte <remi@gid= e.net> wrote:
>
> test.csv is a 21mo file with ~13k rows and a thousands of columns on a= 15rp=3D
> m
> disk.
>
> ocaml version : 3.11.0
>

You are using input_char and standard IO channel. This is a good choi= ce
for non-threaded program. But in your case, I will use Unix.read with a
big buffer (32KB to 4MB) and change your program to use it. As
benchmarked by John Harrop, you are spending most of your time in
caml_enter|leave_blocking section. I think it comes from reading using
std IO channel which use 4k buffer. Using a bigger buffer will allow
less call to this two functions (but you won't win time at the end, I think you will just reduce the difference between non-threaded and
threaded code).

Regards
Sylvain Le Gall

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.in= ria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

--0016363109bbda2d1e04631c53d6--