From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: ** X-Spam-Status: No, score=2.2 required=5.0 tests=AWL,DNS_FROM_RFC_POST, HTML_MESSAGE,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id E91BBBB84 for ; Mon, 16 Feb 2009 18:38:13 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArEEAL8zmUnRVdqnkWdsb2JhbACCQy6KHIcVPwEBAQEJCQwHD65njVoBAwEDhBkGhDI X-IronPort-AV: E=Sophos;i="4.38,217,1233529200"; d="scan'208";a="23041343" Received: from mail-bw0-f167.google.com ([209.85.218.167]) by mail3-smtp-sop.national.inria.fr with ESMTP; 16 Feb 2009 18:37:49 +0100 Received: by bwz11 with SMTP id 11so1467772bwz.3 for ; Mon, 16 Feb 2009 09:37:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type; bh=ZpuZx4RgrOmzZQUdtXEPg0Ds2RZNrl4dCtDxLyYhFGc=; b=QRQsoe1ZHrxG4OG9fZ/gUjicOncLpIytdYWxX6/L+stmG5TZiGwtn6n6dSvW5KWALY xiwsymuMMEL2ZUkaxvO+7czUtntiOICFT8oMEMKDkQwtkyvsXFmJMCj8/Yj/AZ8oaWWY dwSVrmsRxH1mHfMNMSmavXOqBEgBcUPg75PWQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=oO7mII0BIFwuOwdVXPGfIYtrajPbSznUU4sIj9jDkQd6Rx7lp39jLhJxqrkJnIvZG/ jnSPk6vPFb6VraITiT0zZqMf7/sNX7yOOODZcioAuL0qg0qi8uPUK1CUQl1yMZeViVOa o7gwAoxErvVj2oXvO9B5wR3xK3fwEO1e1HVzE= MIME-Version: 1.0 Sender: remidewitte@gmail.com Received: by 10.223.106.68 with SMTP id w4mr1406326fao.19.1234805868006; Mon, 16 Feb 2009 09:37:48 -0800 (PST) In-Reply-To: <891bd3390902160847p25ad3bf1pe59da620dfc667f2@mail.gmail.com> References: <2184b2340902160715y1f935b5ehc0e6195b3f75b66b@mail.gmail.com> <891bd3390902160847p25ad3bf1pe59da620dfc667f2@mail.gmail.com> Date: Mon, 16 Feb 2009 18:37:47 +0100 X-Google-Sender-Auth: 000d6a75e35d2456 Message-ID: <2184b2340902160937i53b8f3fbga01eaf14ed829f8f@mail.gmail.com> Subject: Re: [Caml-list] Threads performance issue. From: =?UTF-8?Q?R=C3=A9mi_Dewitte?= To: yminsky@gmail.com Cc: caml-list@yquem.inria.fr Content-Type: multipart/alternative; boundary=001636c594e2210c0804630ca5ca X-Spam: no; 0.00; yaron:01 printf:01 endline:01 nread:01 nread:01 yaron:01 minsky:01 yminsky:01 pcre:01 ocaml:01 pointers:01 beginner's:01 ocaml:01 bug:01 printf:01 X-Attachments: cset="UTF-8" cset="UTF-8" --001636c594e2210c0804630ca5ca Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Yaron, I use a slightly modified version of the CSV library's load_rows . Here is the main code which is highly imperative style. I might transform it in purely functional style ? The main program is : open Printf;; open Sys;; let timed_exec start_message f =3D print_string start_message; let st1 =3D time () in let r =3D f () in print_endline ("done in " ^ (string_of_float ((time ()) -. st1)) ); r;; (* This line enabled makes the program really slow ! *) let run_threaded f =3D Thread.create (fun () -> f (); Thread.exit ()) () let () =3D timed_exec "Reading data " (fun () -> load_rows (fun _ -> ()) (open_in "file1.csv"); load_rows (fun _ -> ()) (open_in "file2.csv"); () ) The load_rows : let load_rows ?(separator =3D ',') ?(nread =3D -1) f chan =3D let nr =3D ref 0 in let row =3D ref [] in (* Current row. *) let field =3D ref [] in (* Current field. *) let state =3D ref StartField in (* Current state. *) let end_of_field () =3D let field_list =3D List.rev !field in let field_len =3D List.length field_list in let field_str =3D String.create field_len in let rec loop i =3D function [] -> () | x :: xs -> field_str.[i] <- x; loop (i+1) xs in loop 0 field_list; row :=3D (Some field_str) :: !row; field :=3D []; state :=3D StartField in let empty_field () =3D row :=3D None :: !row; field :=3D []; state :=3D StartField in let end_of_row () =3D let row_list =3D List.rev !row in f row_list; row :=3D []; state :=3D StartField; nr :=3D !nr + 1; in let rec loop () =3D let c =3D input_char chan in if c !=3D '\r' then ( (* Always ignore \r characters. *) match !state with StartField -> (* Expecting quote or other char. *) if c =3D '"' then ( state :=3D InQuotedField; field :=3D [] ) else if c =3D separator then (* Empty field. *) empty_field () else if c =3D '\n' then ( (* Empty field, end of row. *) empty_field (); end_of_row () ) else ( state :=3D InUnquotedField; field :=3D [c] ) | InUnquotedField -> (* Reading chars to end of field. *) if c =3D separator then (* End of field. *) end_of_field () else if c =3D '\n' then ( (* End of field and end of row. *) end_of_field (); end_of_row () ) else field :=3D c :: !field | InQuotedField -> (* Reading chars to end of field. *) if c =3D '"' then state :=3D InQuotedFieldAfterQuote else field :=3D c :: !field | InQuotedFieldAfterQuote -> if c =3D '"' then ( (* Doubled quote. *) field :=3D c :: !field; state :=3D InQuotedField ) else if c =3D '0' then ( (* Quote-0 is ASCII NUL. *) field :=3D '\000' :: !field; state :=3D InQuotedField ) else if c =3D separator then (* End of field. *) end_of_field () else if c =3D '\n' then ( (* End of field and end of row. *) end_of_field (); end_of_row () ) else ( (* Bad single quote in field. *) field :=3D c :: '"' :: !field; state :=3D InQuotedField ) ); (* end of match *) if( nread < 0 or !nr < nread) then loop () else () in try loop () with End_of_file -> (* Any part left to write out? *) (match !state with StartField -> if !row <> [] then ( empty_field (); end_of_row () ) | InUnquotedField | InQuotedFieldAfterQuote -> end_of_field (); end_of_row () | InQuotedField -> raise (Bad_CSV_file "Missing end quote after quoted field.") ) Thanks, R=C3=A9mi On Mon, Feb 16, 2009 at 17:47, Yaron Minsky wrote: > 2009/2/16 R=C3=A9mi Dewitte > >> Hello, >> >> I would like to read two files in two different threads. >> >> I have made a first version reading the first then the second and it tak= es >> 2.8s (native). >> >> I decided to make a threaded version and before any use of thread I >> realized that just linking no even using it to the threads library makes= my >> first version of the program to run in 12s ! > > > Do you have a short benchmark you can post? The idea that the > thread-overhead would make a difference like that, particularly for IO-bo= und > code (which I'm guessing this is) is pretty surprising. > > y > > >> >> I use pcre, extlib, csv libraries as well. >> >> I guess it might come from GC slowing down thinks here, doesn't it ? Whe= re >> can it come from otherwise ? Is there a workaround or something I should >> know ? >> >> Can ocaml use multiple cores ? >> >> Do you have few pointers on libraries to make parallel I/Os ? >> >> Thanks, >> R=C3=A9mi >> >> _______________________________________________ >> Caml-list mailing list. Subscription management: >> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list >> Archives: http://caml.inria.fr >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs >> >> > --001636c594e2210c0804630ca5ca Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Yaron,

I use a slightly modified version of the CSV library's lo= ad_rows . Here is the main code which is highly imperative style. I might t= ransform it in purely functional style ?

The main program is :

open Printf;;
open Sys;;
let timed_exec start_message f =3D
&n= bsp; print_string start_message;
  let st1 =3D time () in
 = ; let r =3D f () in
  print_endline ("done in " ^ (string= _of_float ((time ()) -. st1)) );
  r;;

(* This line enabled makes the program really slow ! *)let run_threaded f =3D Thread.create (fun () -> f (); Thread.exit ()) = ()

let () =3D timed_exec "Reading data " (fun () ->   load_rows (fun _ -> ()) (open_in "file1.csv");
  load_rows (fun _ -> ()) (open_in "file2.csv");
 = ; ()
)

The load_rows :
let load_rows ?(separator =3D ',= 9;) ?(nread =3D -1) f chan =3D
  let nr =3D ref 0 in
  let = row =3D ref [] in            (= * Current row. *)
  let field =3D ref [] in         &= nbsp;  (* Current field. *)
  let state =3D ref StartField in&= nbsp;       (* Current state. *)
  let end= _of_field () =3D
    let field_list =3D List.rev !field i= n
    let field_len =3D List.length field_list in
    let field_str =3D String.create field_len in
 &n= bsp;  let rec loop i =3D function
    [] -> ()      | x :: xs ->
     = ; field_str.[i] <- x;
      loop (i+1) xs
&nbs= p;   in
    loop 0 field_list;
  &= nbsp; row :=3D (Some field_str) :: !row;
    field :=3D [];
    state :=3D StartFie= ld
  in
  let empty_field () =3D
    row = :=3D None :: !row;
    field :=3D [];
  &nbs= p; state :=3D StartField
  in
  let end_of_row () =3D
&n= bsp;   let row_list =3D List.rev !row in
    f row_list;
    row :=3D [];
 =    state :=3D StartField;
    nr :=3D !nr + 1;<= br>  in
  let rec loop () =3D
    let c =3D = input_char chan in
    if c !=3D '\r' then ( = ;           (* Always ignore \r cha= racters. *)
      match !state with
    &nbs= p; StartField ->           = (* Expecting quote or other char. *)
      &nb= sp; if c =3D '"' then (
      &nbs= p;   state :=3D InQuotedField;
      =     field :=3D []
        )= else if c =3D separator then (* Empty field. *)
          empty_field ()
 &= nbsp;      else if c =3D '\n' then (  = ;  (* Empty field, end of row. *)
      &n= bsp;   empty_field ();
       &n= bsp;  end_of_row ()
        ) else (<= br>          state :=3D InUnquotedF= ield;
          field :=3D [c]         )
    | InUnquotedF= ield ->        (* Reading chars to end of = field. *)
        if c =3D separator then&= nbsp;   (* End of field. *)
      &nb= sp;   end_of_field ()
        el= se if c =3D '\n' then (    (* End of field and end o= f row. *)
          end_of_field ();
 = ;         end_of_row ()
  &= nbsp;     ) else
       &nb= sp;  field :=3D c :: !field
    | InQuotedField ->= ;        (* Reading chars to end of field. *)=
        if c =3D '"' then           state :=3D InQuotedFieldA= fterQuote
        else
  &nbs= p;       field :=3D c :: !field
  &nb= sp; | InQuotedFieldAfterQuote ->
       = ; if c =3D '"' then (        (* = Doubled quote. *)
          fiel= d :=3D c :: !field;
          state :=3D InQuotedField<= br>        ) else if c =3D '0' then (=     (* Quote-0 is ASCII NUL. *)
     =      field :=3D '\000' :: !field;
 &nbs= p;        state :=3D InQuotedField
 &= nbsp;      ) else if c =3D separator then (* End of fie= ld. *)
          end_of_field ()
 =        else if c =3D '\n' then ( &nbs= p;  (* End of field and end of row. *)
     &nb= sp;    end_of_field ();
      &n= bsp;   end_of_row ()
        ) e= lse (            (* Bad single= quote in field. *)
          field :=3D c :: '&quo= t;' :: !field;
          sta= te :=3D InQuotedField
        )
 &= nbsp;  ); (* end of match *)
  if( nread < 0 or !nr < nr= ead) then loop () else ()
  in
  try
    = loop ()
  with
      End_of_file ->
 &n= bsp;  (* Any part left to write out? *)
    (match != state with
         StartField -><= br>           if !row <>= [] then
         ( empty_field (); e= nd_of_row () )
       | InUnquotedField | InQuo= tedFieldAfterQuote ->
           end_of_field (); en= d_of_row ()
       | InQuotedField ->
&nb= sp;          raise (Bad_CSV_file &q= uot;Missing end quote after quoted field.")
    )

Thanks,
R=C3=A9mi

On Mon, Feb 16, 2009 at 17:47, Yaron Minsky <yminsky@gmail.com> wrote:
2009/2/16 R=C3=A9mi Dewitte <remi@gide.net>
Hello,

I would like to read two files in two different threads.
<= br>I have made a first version reading the first then the second and it tak= es 2.8s (native).

I decided to make a threaded version and before an= y use of thread I realized that just linking no even using it to the thread= s library makes my first version of the program to run in 12s !

Do you have a short benchmark you can post?  The idea t= hat the thread-overhead would make a difference like that, particularly for= IO-bound code (which I'm guessing this is) is pretty surprising.

y
 

I use pcre, extlib, csv libraries as well.
<= br>I guess it might come from GC slowing down thinks here, doesn't it ?= Where can it come from otherwise ? Is there a workaround or something I sh= ould know ?

Can ocaml use multiple cores ?

Do you have few pointers on libraries= to make parallel I/Os ?

Thanks,
R=C3=A9m= i

____________________________________= ___________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.in= ria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



--001636c594e2210c0804630ca5ca--