caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Markus Mottl <markus@oefai.at>
To: Richard Jones <rich@annexia.org>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Strange PCRE bug
Date: Fri, 17 Sep 2004 02:21:58 +0200	[thread overview]
Message-ID: <20040917002158.GA31673@fichte.ai.univie.ac.at> (raw)
In-Reply-To: <20040916154403.GA20490@annexia.org>

[-- Attachment #1: Type: text/plain, Size: 1742 bytes --]

On Thu, 16 Sep 2004, Richard Jones wrote:
> # #load "pcre.cma";;
> # let rex = Pcre.regexp "(:?([a-z]+)\\s+)*";;
> val rex : Pcre.regexp = <abstr>
> # Pcre.extract_all ~rex "a b c d ee ff ";;
> 
>   (* Hangs, rapidly consuming memory.  Killed with ^C ... *)

This is a bug concerning null patterns (i.e. ones that match empty
strings, too).  I have fixed this now.

> On a more general point, how do I access all the strings captured by
> the inner brackets in a pattern like (:? (..)  )*  ?

The "(:?" should be "(?:".

Anyway, to answer your question: you can't.  The capturing subpattern
"([a-z])+)" will always only capture the last in a series (as introduced
by "*" in your example).

I'm not sure what you want to do, but I guess you want to extract all
words containing characters from a-z in a string?  In that case I'd
rather use the much simpler pattern "[a-z]+".  "extract_all" will then
return an array of arrays of strings.  Each array in the former denotes
an array of matched substrings.  Unless you specify "~full_match:false"
the latter will contain the full match in position 0.  The full match
is what we want here.

E.g.:

  let () =
    let rex = Pcre.regexp "[a-z]+" in
    let subj = "this is 1 test" in
    let many_sstrs = Pcre.extract_all ~rex subj in
    let words = Array.map (fun sstrs -> sstrs.(0)) many_sstrs in
    Array.iter print_endline words

This will print:

  this
  is
  test

"extract_all" is the dual to "split".  In contrast to the latter it
does not remove the matching patterns but keeps them (including matching
substrings), and ignores all else.

Regards,
Markus

-- 
Markus Mottl          http://www.oefai.at/~markus          markus@oefai.at

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

      reply	other threads:[~2004-09-17  0:22 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-16 15:44 Richard Jones
2004-09-17  0:21 ` Markus Mottl [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040917002158.GA31673@fichte.ai.univie.ac.at \
    --to=markus@oefai.at \
    --cc=caml-list@inria.fr \
    --cc=rich@annexia.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).