From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 314FEBC69 for ; Mon, 24 Sep 2007 20:22:20 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAK6d90bAXQInemdsb2JhbACOGwIJCg X-IronPort-AV: E=Sophos;i="4.20,292,1186351200"; d="scan'208";a="3186563" Received: from concorde.inria.fr ([192.93.2.39]) by mail3-smtp-sop.national.inria.fr with ESMTP; 24 Sep 2007 20:22:19 +0200 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l8OIMCiA008083 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Mon, 24 Sep 2007 20:22:19 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAK6d90aGuIEKn2dsb2JhbACOGwIBAQcEBgcIGA X-IronPort-AV: E=Sophos;i="4.20,292,1186351200"; d="scan'208";a="1374228" Received: from smtp.vub.ac.be (HELO dorado.vub.ac.be) ([134.184.129.10]) by mail1-smtp-roc.national.inria.fr with ESMTP; 24 Sep 2007 20:22:14 +0200 Received: from exchange.etro.vub.ac.be (etro10.vub.ac.be [134.184.2.10]) by dorado.vub.ac.be (Postfix) with ESMTP id 685F4538 for ; Mon, 24 Sep 2007 20:22:16 +0200 (CEST) Received: from ssel.vub.ac.be ([10.0.4.37]) by exchange.etro.vub.ac.be with Microsoft SMTPSVC(6.0.3790.3959); Mon, 24 Sep 2007 20:22:16 +0200 Received: by ssel.vub.ac.be (Postfix, from userid 241) id 3B7C224689F5; Mon, 24 Sep 2007 20:22:16 +0200 (CEST) Received: from localhost (localhost.localdomain [127.0.0.1]) by ssel.vub.ac.be (Postfix) with ESMTP id F205124689F4; Mon, 24 Sep 2007 20:22:15 +0200 (CEST) X-Virus-Scanned: amavisd-new at ssel.vub.ac.be Received: from ssel.vub.ac.be ([127.0.0.1]) by localhost (ssel.vub.ac.be [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q0Bv6ZBOeuo5; Mon, 24 Sep 2007 20:22:04 +0200 (CEST) Received: from [192.168.2.2] (213.219.134.88.adsl.dyn.edpnet.net [213.219.134.88]) by ssel.vub.ac.be (Postfix) with ESMTP id 76EB224689F3; Mon, 24 Sep 2007 20:22:04 +0200 (CEST) In-Reply-To: <20070427231220.GA1507@first.in-berlin.de> References: <20070427135425.GA1161@first.in-berlin.de> <20070427162911.GA10099@furbychan.cocan.org> <20070427231220.GA1507@first.in-berlin.de> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Cc: Oliver Bandel Content-Transfer-Encoding: 7bit From: Bruno De Fraine Subject: ocamllex speed [was Re: [Caml-list] mboxlib reloaded ;-)] Date: Mon, 24 Sep 2007 20:22:00 +0200 To: Caml-list ml X-Mailer: Apple Mail (2.752.3) X-ClamAV-Alt: OK X-OriginalArrivalTime: 24 Sep 2007 18:22:16.0309 (UTC) FILETIME=[D5E16250:01C7FED7] X-Miltered: at concorde with ID 46F80054.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocamllex:01 bandel:01 native-code:01 ocamllex:01 endline:01 getcwd:01 lexbuf:01 lexbuf:01 argv:01 lexing:01 argv:01 regexp:01 endline:01 getcwd:01 avoided:01 Hello, On 28 Apr 2007, at 01:12, Oliver Bandel wrote: > So, I then checked my mboxlib and saw that it is quite slow, > compared to what I expected ( expect! I did not tried it > on my development machine because I have nomutt installed there) > and even if native-code smuch faster, it's nevertheless slow... > ...so I thought I have to redesign my scanner-stage. > (I use Str-module and ocamnllex mixed together; maybe > using a plain selfwritten OCaml-scanner might be better here). I don't know if Oliver ever got to the bottom of this speed problem, but, I also noticed ocamllex can be quite slow for simple scanning. For example, I used this ocamllex source: { } rule translate = parse | "current_directory" { print_endline (Sys.getcwd ()); translate lexbuf } | _ { translate lexbuf } | eof { () } { for i = 1 to (Array.length Sys.argv - 1); do translate (Lexing.from_channel (open_in Sys.argv.(i))) done ;; } And compared it against this version using the Str module: let re = Str.regexp_string "current_directory" ;; for i = 1 to (Array.length Sys.argv - 1); do let ch = open_in Sys.argv.(i) in try while true; do let line = input_line ch in try let _ = Str.search_forward re line 0 in print_endline (Sys.getcwd ()) with Not_found -> () done with End_of_file -> close_in ch done ;; Neither version does anything useful, except print the current directory when it encounters the string "current_directory". I tested this on a 57M text file (that has only a few "current_directory" occurrences). The ocamllex-version takes about 3.5s, while the Str- version takes only 0.35s. What causes this difference? Perhaps there is a high overhead in calling the translate function for every input character in such big input files, but I don't know how this can be avoided. Thanks, Bruno