From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.9 required=5.0 tests=AWL,DNS_FROM_RFC_POST, SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 64DB7BC37 for ; Wed, 29 Apr 2009 15:58:34 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: As0IAIz590nRVdypdWdsb2JhbACDAJMwPwEMCgsHEaZDgQiRGgEDAQODcgWHdA X-IronPort-AV: E=Sophos;i="4.40,266,1238968800"; d="scan'208";a="25365441" Received: from mail-fx0-f169.google.com ([209.85.220.169]) by mail2-smtp-roc.national.inria.fr with ESMTP; 29 Apr 2009 15:58:34 +0200 Received: by fxm17 with SMTP id 17so1515887fxm.27 for ; Wed, 29 Apr 2009 06:58:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=SoB1LqZDUYphAnKophLLSr+I3Sx8CfzSICBCOmjt26g=; b=OFhc8pTd4QczN6Ly7jMO1KVUAohGS6nGfJMxAia9Wjxzp3dV0rDSsiTmlQeCvqnq5e Jlib/GreiHtI0WrK90jpioW1fzZGBhnK+CDAPvUuKi0qZwKgGJmzEWrRicn5yFSUwe01 LeRcnBDWsMhouivbEV18kxA0xHHqrH4oWuS2s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=QlJvupICrCewW00LbF2gC29YsdJheW0LQYiczyt+D2tQj5Z33EOuk3tSTiYsB6X07b 2v3P8IHYAteg93POOnBtBqL68PyPy/CKuvbezYfUCXlLq3s8D4b/XOnj54IyQ9T0ToDx M4W2WPQx/7lrJyfPonABjdyNe29qgRvALJ8/I= MIME-Version: 1.0 Received: by 10.86.49.13 with SMTP id w13mr729826fgw.38.1241013513882; Wed, 29 Apr 2009 06:58:33 -0700 (PDT) In-Reply-To: <6D9C5A68-1874-4BBC-AE3D-9CCC3614AF7C@cs.berkeley.edu> References: <324B24CA-9671-42C0-B722-C7710C0C45C7@cs.berkeley.edu> <59BD1B3A-B449-4963-9910-ED5E755D00E6@cs.berkeley.edu> <49F7F135.5080504@gmail.com> <08C6A0FC-90F2-43A4-AB78-4A3B68291FAA@cs.berkeley.edu> <49F7F59B.7070204@frisch.fr> <6D9C5A68-1874-4BBC-AE3D-9CCC3614AF7C@cs.berkeley.edu> Date: Wed, 29 Apr 2009 09:58:33 -0400 Message-ID: Subject: Re: [Caml-list] Strange performance bug From: Markus Mottl To: Brighten Godfrey Cc: Alain Frisch , OCaml List Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; bug:01 markus:01 mottl:01 markus:01 mottl:01 regexp:01 vastly:01 pcre:01 compilation:01 regexp:01 ocaml:01 2009:98 continuously:98 wrote:01 dynamically:01 On Wed, Apr 29, 2009 at 04:29, Brighten Godfrey wrote= : > I know nothing about the internals of these libraries. =A0But, the progra= m is > continuously reading lines from the file. =A0Thus, isn't there about the = same > amount of memory on the heap just before the problem starts and just afte= r > the problem starts? =A0I guess it is plausible that somehow, closing the = file > and re-opening it triggers a bad interaction with the GC... > > But in comparison, using Str in the same way (i.e., compiling the regexp > every time it is used) works fine. Note that the effect of not precompiling the regular expressions is not just the overhead of this computation, but also vastly greater GC-pressure. The current GC-settings in Pcre will trigger a full GC-cycle every 500 regular expressions allocated, i.e. would perform a full major collection every 500 lines in your case. This setting works fine for just about any application I've seen, because virtually nobody has to create patterns dynamically at rates so high that this matters. Thus, try hoisting out the compilation of the regexp first... Markus --=20 Markus Mottl http://www.ocaml.info markus.mottl@gmail.com