From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA03880; Wed, 7 Aug 2002 14:24:18 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA03935 for ; Wed, 7 Aug 2002 14:24:17 +0200 (MET DST) Received: from leia.mandrakesoft.com (office.mandrakesoft.com [195.68.114.34]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id g77COF526615 for ; Wed, 7 Aug 2002 14:24:16 +0200 (MET DST) Received: by leia.mandrakesoft.com (Postfix, from userid 505) id 3C6E54333; Wed, 7 Aug 2002 14:23:13 +0200 (CEST) To: Jerome Vouillon Cc: John Skaller , caml-list@inria.fr Subject: Re: [Caml-list] Regarding regular expressions References: <3D50A9E8.9060605@ozemail.com.au> <20020807073600.GA31615@strontium.pps.jussieu.fr> From: Pixel Date: 07 Aug 2002 14:23:12 +0200 In-Reply-To: <20020807073600.GA31615@strontium.pps.jussieu.fr> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Jerome Vouillon writes: [...] > > So, nearly half of all perl scripts on CPAN that use regular expressions > > make use of the backreference feature. IMO this argues strongly in > > favor of supporting backreferences in C++. (Backreferences can only be > > handled by a backtracking NFA engine, IIRC.) > > What he means by backreference is a way to refer to a submatch. For > instance, with the regular expression "^([^ ]*) *([^ ]*)", the > backreference "$1" will refer to the substring matched by the first > parenthesed subexpression "([^ ]*)". As long as the references do not > occur in the regular expression itself, they can be handled perfectly > well with a DFA engine. So, the numbers above do not prove anything. agreed. Here are the numbers I get (only scanning perl 5.8.0, not CPAN): % cd /usr/lib/perl5/5.8.0 % find -name "*.p[lm]" | xargs perl -ne 'print "$ARGV: $_" if /[=!]~\s*[^ty ]/' | wc -l 2640 % find -name "*.p[lm]" | xargs perl -ne 'print "$ARGV: $_" if /[=!]~\s*[^ty ]/' | perl -ne 'print if /\\[1-9]\D/' | wc -l 3 % find -name "*.p[lm]" | xargs perl -ne 'print "$ARGV: $_" if /[=!]~\s*[^ty ]/' | perl -ne 'print if /\\[1-9]\D/' ./ExtUtils/MM_Win32.pm: while ($libs =~ s/(?:^|\s)(("?)-L.+?\2)(?:\s|$)/ /) { ./Pod/Text/Overstrike.pm: $text =~ s/(.)[\b]\1/$1/g; ./Test.pm: (undef, $regex) = ($expected =~ m,^ m([^\w\s]) (.+) \1 $,sx)) { > - how often are backreferences used within a pattern? I get 3 / 2640 = 0.1% ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners