From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id CAA21373; Fri, 17 Sep 2004 02:22:33 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id CAA20670 for ; Fri, 17 Sep 2004 02:22:32 +0200 (MET DST) Received: from fichte.ai.univie.ac.at (fichte.ai.univie.ac.at [131.130.174.156]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id i8H0MV2j005122 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Fri, 17 Sep 2004 02:22:32 +0200 Received: from fichte.ai.univie.ac.at (markus@localhost [127.0.0.1]) by fichte.ai.univie.ac.at (8.12.3/8.12.3/Debian-6.6) with ESMTP id i8H0LxDu008704; Fri, 17 Sep 2004 02:21:59 +0200 Received: (from markus@localhost) by fichte.ai.univie.ac.at (8.12.3/8.12.3/Debian-6.6) id i8H0LwnY008703; Fri, 17 Sep 2004 02:21:58 +0200 Date: Fri, 17 Sep 2004 02:21:58 +0200 From: Markus Mottl To: Richard Jones Cc: caml-list@inria.fr Subject: Re: [Caml-list] Strange PCRE bug Message-ID: <20040917002158.GA31673@fichte.ai.univie.ac.at> Mail-Followup-To: Richard Jones , caml-list@inria.fr References: <20040916154403.GA20490@annexia.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vkogqOf2sHV7VnPd" Content-Disposition: inline In-Reply-To: <20040916154403.GA20490@annexia.org> User-Agent: Mutt/1.5.6i X-Miltered: at nez-perce with ID 414A2E47.002 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 pcre:01 bug:01 pcre:01 val:01 abstr:01 bug:01 subj:99 subj:99 endline:01 ignores:01 arrays:01 regexp:01 regexp:01 null:01 X-Attachments: type="application/pgp-signature" Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk --vkogqOf2sHV7VnPd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, 16 Sep 2004, Richard Jones wrote: > # #load "pcre.cma";; > # let rex =3D Pcre.regexp "(:?([a-z]+)\\s+)*";; > val rex : Pcre.regexp =3D > # Pcre.extract_all ~rex "a b c d ee ff ";; >=20 > (* Hangs, rapidly consuming memory. Killed with ^C ... *) This is a bug concerning null patterns (i.e. ones that match empty strings, too). I have fixed this now. > On a more general point, how do I access all the strings captured by > the inner brackets in a pattern like (:? (..) )* ? The "(:?" should be "(?:". Anyway, to answer your question: you can't. The capturing subpattern "([a-z])+)" will always only capture the last in a series (as introduced by "*" in your example). I'm not sure what you want to do, but I guess you want to extract all words containing characters from a-z in a string? In that case I'd rather use the much simpler pattern "[a-z]+". "extract_all" will then return an array of arrays of strings. Each array in the former denotes an array of matched substrings. Unless you specify "~full_match:false" the latter will contain the full match in position 0. The full match is what we want here. E.g.: let () =3D let rex =3D Pcre.regexp "[a-z]+" in let subj =3D "this is 1 test" in let many_sstrs =3D Pcre.extract_all ~rex subj in let words =3D Array.map (fun sstrs -> sstrs.(0)) many_sstrs in Array.iter print_endline words This will print: this is test "extract_all" is the dual to "split". In contrast to the latter it does not remove the matching patterns but keeps them (including matching substrings), and ignores all else. Regards, Markus --=20 Markus Mottl http://www.oefai.at/~markus markus@oefai.at --vkogqOf2sHV7VnPd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBSi4kg26NqD/jgbERAiFAAKCo/ddQsCWTBSx8MDad1Lwl/z4IEACeI6Pz rMD3Q7TXF3vusx2bRFota5Q= =aEWc -----END PGP SIGNATURE----- --vkogqOf2sHV7VnPd-- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners