caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Markus Mottl <markus.mottl@gmail.com>
To: Brighten Godfrey <pbg@cs.berkeley.edu>
Cc: Alain Frisch <alain@frisch.fr>, OCaml List <caml-list@yquem.inria.fr>
Subject: Re: [Caml-list] Strange performance bug
Date: Wed, 29 Apr 2009 09:58:33 -0400	[thread overview]
Message-ID: <f8560b80904290658p6f5cacb9vb6a2cec1c77359a4@mail.gmail.com> (raw)
In-Reply-To: <6D9C5A68-1874-4BBC-AE3D-9CCC3614AF7C@cs.berkeley.edu>

On Wed, Apr 29, 2009 at 04:29, Brighten Godfrey <pbg@cs.berkeley.edu> wrote:
> I know nothing about the internals of these libraries.  But, the program is
> continuously reading lines from the file.  Thus, isn't there about the same
> amount of memory on the heap just before the problem starts and just after
> the problem starts?  I guess it is plausible that somehow, closing the file
> and re-opening it triggers a bad interaction with the GC...
>
> But in comparison, using Str in the same way (i.e., compiling the regexp
> every time it is used) works fine.

Note that the effect of not precompiling the regular expressions is
not just the overhead of this computation, but also vastly greater
GC-pressure.

The current GC-settings in Pcre will trigger a full GC-cycle every 500
regular expressions allocated, i.e. would perform a full major
collection every 500 lines in your case.  This setting works fine for
just about any application I've seen, because virtually nobody has to
create patterns dynamically at rates so high that this matters.

Thus, try hoisting out the compilation of the regexp first...

Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


  reply	other threads:[~2009-04-29 13:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-29  2:43 Brighten Godfrey
2009-04-29  3:37 ` [Caml-list] " Markus Mottl
2009-04-29  4:31   ` Brighten Godfrey
2009-04-29  6:18     ` Alain Frisch
2009-04-29  6:27       ` Brighten Godfrey
2009-04-29  6:37         ` Alain Frisch
2009-04-29  8:29           ` Brighten Godfrey
2009-04-29 13:58             ` Markus Mottl [this message]
2009-04-29 14:48               ` Damien Doligez
2009-04-29 16:03                 ` Markus Mottl
2009-04-29 19:19                   ` Brighten Godfrey
2009-04-29 19:38                     ` Markus Mottl
2009-04-29 20:23                       ` Brighten Godfrey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8560b80904290658p6f5cacb9vb6a2cec1c77359a4@mail.gmail.com \
    --to=markus.mottl@gmail.com \
    --cc=alain@frisch.fr \
    --cc=caml-list@yquem.inria.fr \
    --cc=pbg@cs.berkeley.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).