From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA06331; Thu, 15 May 2003 09:57:52 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA06069 for caml-list@pauillac.inria.fr; Thu, 15 May 2003 09:57:51 +0200 (MET DST) Date: Thu, 15 May 2003 09:57:51 +0200 From: Xavier Leroy To: caml-list@pauillac.inria.fr Subject: Re: [Caml-list] Performance problem Message-ID: <20030515095751.A5384@pauillac.inria.fr> References: <20030514210332.GA27927@swordfish> <16066.49355.349534.889868@karryall.dnsalias.org> <20030514210332.GA27927@swordfish> <55DB7CE4-8659-11D7-9845-000393863F70@exomi.com> <20030514223956.GB27927@swordfish> <20030514224130.GC27927@swordfish> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20030514224130.GC27927@swordfish>; from mgushee@havenrock.com on Wed, May 14, 2003 at 04:41:31PM -0600 X-Spam: no; 0.00; caml-list:01 buffering:01 terrible:01 ocamlopt:01 slower:01 non-trivial:01 matcher:01 regexp:01 compiler:01 ocaml:01 surprising:01 bytecode:01 syntax:02 match:02 ambiguous:02 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > > >that occur to me are an output buffering problem or some delay in > > >interacting with the X display, but I'm not sure how to solve either of > > >those. And maybe it's something else entirely. Can anyone explain my > > >terrible results? > > > > My guess (without profiling) is that most of the time is being spent > > doing regular expression matching. Profiling (ocamlopt -p) confirms this guess: 99.9% of the time is spent in the regexp matching engine... > > Regular expressions can be much slower than you might expect for > > non-trivial cases. The expression in your program looks particularly > > nasty, since there are a lot of ambiguous cases (whether '-' should > > match '.*' or '-' depends on everything that follows). Indeed. This kind of regexp is among the worst cases for a greedy backtracking regexp matcher like that of Str. As someone else suggested, making the regexp deterministic (replace .*- with [^-]*-) helps immensely. > > 1) you regexp string is not properly escaped, all the "\(" and "\)" > > should be "\\(" and "\\)". > > Okay, I've run into that with other languages. But I didn't know you had > to do that in OCaml. Is that documented anywhere? In the syntax for strings, yes: a literal \ should be written \\, hence the \( regexp construct should be written \\( in a string literal. Perhaps this should be recalled in the docs for Str.regexp. At any rate, the compiler will nicely warn you. > By the way, why doesn't the compiler just reject regexes with single > backslashes? What is the point of issuing warnings and then accepting > the incorrect syntax? Backward compatibility, mostly. But this is the kind of warning that might become an error at some point in the future. > > 2) there are no problems using bytecode : > > 0.3s using bytecode > > 23s using native This I doubt very much, and indeed a quick test shows that bytecode and native take exactly the same time. This isn't surprising: since 99% of the time is spent is the regexp engine, and this engine is in C, it runs at the same speed in bytecode and in native-code. - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners