From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.7 required=5.0 tests=AWL,SPF_SOFTFAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 01D3DBC37 for ; Wed, 29 Apr 2009 06:31:28 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtMBAKZ090mp5TxXe2dsb2JhbACWbgEBFiIFplGHZIhNg3MF X-IronPort-AV: E=Sophos;i="4.40,264,1238968800"; d="scan'208";a="39187686" Received: from gateway0.eecs.berkeley.edu ([169.229.60.87]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 29 Apr 2009 06:31:28 +0200 Received: from [10.0.1.6] (adsl-70-137-145-160.dsl.snfc21.sbcglobal.net [70.137.145.160]) (authenticated bits=0) by gateway0.EECS.Berkeley.EDU (8.14.3/8.13.5) with ESMTP id n3T4VLSv003414 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Tue, 28 Apr 2009 21:31:23 -0700 (PDT) Cc: OCaml List Message-Id: <59BD1B3A-B449-4963-9910-ED5E755D00E6@cs.berkeley.edu> From: Brighten Godfrey To: Markus Mottl In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: [Caml-list] Strange performance bug Date: Tue, 28 Apr 2009 21:31:21 -0700 References: <324B24CA-9671-42C0-B722-C7710C0C45C7@cs.berkeley.edu> X-Mailer: Apple Mail (2.930.3) X-Spam: no; 0.00; bug:01 markus:01 mottl:01 bug:01 compiler:01 pcre:01 pcre:01 regexp:01 regexp:01 28,:98 28,:98 2009:98 wrote:01 wrote:01 parsing:01 On Apr 28, 2009, at 8:37 PM, Markus Mottl wrote: > On Tue, Apr 28, 2009 at 22:43, Brighten Godfrey > wrote: >> I've encountered a very odd performance problem which I suspect is >> not a bug >> in my code. Could it be the compiler, or maybe PCRE? > > I'm not sure it solves your problem (haven't tried the example), but > just looking at the code there is clearly a performance bug: the > pattern is passed to Pcre.pmatch "on the fly" using label "~pat". > This is ok and convenient if it is used only once, but is bad if it > happens in a loop. Precompile the regular expression outside of the > loop (let rex = Pcre.regexp "...") and pass it in with label "~rex" to > solve this problem. I can't see how this explains the problem. Why should the parsing get dramatically slower when starting to parse the file a *second* time? (Changing to the precompiled regexp does make this bug go away -- but so do many other small changes, like commenting out the last line of the code, *after* the parsing is complete.) Thanks very much, ~Brighten