From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.4 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 38022BB84 for ; Tue, 17 Feb 2009 10:49:09 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AuMAAGQWmknUnwdjlGdsb2JhbACCPZIIAQEBAQkLCAkRBLswhBMG X-IronPort-AV: E=Sophos;i="4.38,222,1233529200"; d="scan'208";a="23069336" Received: from relay.pcl-ipout01.plus.net ([212.159.7.99]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 17 Feb 2009 10:49:08 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApsEAFMXmknUnw4T/2dsb2JhbACCPc13hBMG Received: from pih-relay06.plus.net ([212.159.14.19]) by relay.pcl-ipout01.plus.net with ESMTP; 17 Feb 2009 09:49:03 +0000 Received: from [87.114.148.153] (helo=leper.local) by pih-relay06.plus.net with esmtp (Exim) id 1LZMZb-0002GA-1L for caml-list@yquem.inria.fr; Tue, 17 Feb 2009 09:49:03 +0000 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Threads performance issue. Date: Tue, 17 Feb 2009 09:53:56 +0000 User-Agent: KMail/1.9.9 References: <2184b2340902160715y1f935b5ehc0e6195b3f75b66b@mail.gmail.com> <2184b2340902162340s540c5ac7g9f42b59d03f643cb@mail.gmail.com> <20090217085944.GC29651@janestcapital.com> In-Reply-To: <20090217085944.GC29651@janestcapital.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200902170953.56911.jon@ffconsultancy.com> X-Plusnet-Relay: 1451dc9600f7fc121d9a70621a5c2aab X-Spam: no; 0.00; shinwell:01 0100,:01 ocamlopt:01 foo:01 ocamlopt:01 -thread:01 cmxa:01 cmxa:01 foo:01 uncomment:01 threading:01 threading:01 ocaml's:01 ocaml:01 byte:01 On Tuesday 17 February 2009 08:59:44 Mark Shinwell wrote: > On Tue, Feb 17, 2009 at 08:40:11AM +0100, R=E9mi Dewitte wrote: > > I have made some further experiments. > > I have a functional version of the reading algorithm. I have the origin= al > > imperative version of the algorithm. > > Either it is linked to thread (T) or not (X). Either it uses extlib (E) > > or not (X). > > Using: > > ocamlopt -o foo transf.ml > ocamlopt -thread -o foothread transf.ml unix.cmxa threads.cmxa > > on local disk with a 24Mb, approx. 500,000-line CSV file, I only see a > minor slowdown with foothread as opposed to foo. (A minor slowdown would > inded be expected.) So I'm confused as to why your results are so > different. If you uncomment R=E9mi's line of code that requires threading (or factor i= t out=20 into a different file and compile that as well when using threading) then y= ou=20 should be able to reproduce his results as I have. > You could use ocamlopt -p and run gprof on the resulting gmon.out file. Indeed the profiling results are most enlightening. Without threads everyth= ing=20 looks great: Each sample counts as 0.01 seconds. % cumulative self self total =20 time seconds seconds calls ms/call ms/call name =20 38.89 0.07 0.07 3 23.33 60.00 camlTransf__loop_109 22.22 0.11 0.04 2873940 0.00 0.00 caml_ml_input_char 11.11 0.13 0.02 2873940 0.00 0.00 =20 camlTransf__read_char_106 11.11 0.15 0.02 381696 0.00 0.00 camlTransf__code_begin =2E.. With threads, everything looks awful: Each sample counts as 0.01 seconds. % cumulative self self total =20 time seconds seconds calls ms/call ms/call name =20 22.22 0.08 0.08 =20 caml_thread_leave_blocking_section 16.67 0.14 0.06 =20 caml_thread_enter_blocking_section 13.89 0.19 0.05 3 16.67 50.00 camlTransf__loop_109 9.72 0.23 0.04 caml_io_mutex_unlock 5.56 0.24 0.02 3255698 0.00 0.00 caml_c_call =2E.. So most of the time is spent locking and unlocking OCaml's good old giant=20 global lock. Unfortunately the blocking calls are "spontaneous" so you cann= ot=20 track them down using a profile but, given the huge rise in time spent in=20 read_char, my guess is that OCaml has introduced an OS kernel lock around=20 every single byte read! If so, I'm surprised it is running *this* fast... R=E9mi: assuming I am correct, I'd recommend using memory mapping (and rewr= iting=20 that code ;-). =2D-=20 Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e