caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Strange PCRE bug
@ 2004-09-16 15:44 Richard Jones
  2004-09-17  0:21 ` Markus Mottl
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Jones @ 2004-09-16 15:44 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

$ ocaml -I +pcre
        Objective Caml version 3.08.1

# #load "pcre.cma";;
# let rex = Pcre.regexp "(:?([a-z]+)\\s+)*";;
val rex : Pcre.regexp = <abstr>
# Pcre.extract_all ~rex "a b c d ee ff ";;

  (* Hangs, rapidly consuming memory.  Killed with ^C ... *)

Interrupted.
# Gc.full_major ();;
- : unit = ()

The Gc.full_major () doesn't recover any memory.

On a more general point, how do I access all the strings captured by
the inner brackets in a pattern like (:? (..)  )*  ?

Rich.

-- 
Richard Jones. http://www.annexia.org/ http://www.j-london.com/
Merjis Ltd. http://www.merjis.com/ - improving website return on investment
MOD_CAML lets you run type-safe Objective CAML programs inside the Apache
webserver. http://www.merjis.com/developers/mod_caml/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Caml-list] Strange PCRE bug
  2004-09-16 15:44 [Caml-list] Strange PCRE bug Richard Jones
@ 2004-09-17  0:21 ` Markus Mottl
  0 siblings, 0 replies; 2+ messages in thread
From: Markus Mottl @ 2004-09-17  0:21 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1742 bytes --]

On Thu, 16 Sep 2004, Richard Jones wrote:
> # #load "pcre.cma";;
> # let rex = Pcre.regexp "(:?([a-z]+)\\s+)*";;
> val rex : Pcre.regexp = <abstr>
> # Pcre.extract_all ~rex "a b c d ee ff ";;
> 
>   (* Hangs, rapidly consuming memory.  Killed with ^C ... *)

This is a bug concerning null patterns (i.e. ones that match empty
strings, too).  I have fixed this now.

> On a more general point, how do I access all the strings captured by
> the inner brackets in a pattern like (:? (..)  )*  ?

The "(:?" should be "(?:".

Anyway, to answer your question: you can't.  The capturing subpattern
"([a-z])+)" will always only capture the last in a series (as introduced
by "*" in your example).

I'm not sure what you want to do, but I guess you want to extract all
words containing characters from a-z in a string?  In that case I'd
rather use the much simpler pattern "[a-z]+".  "extract_all" will then
return an array of arrays of strings.  Each array in the former denotes
an array of matched substrings.  Unless you specify "~full_match:false"
the latter will contain the full match in position 0.  The full match
is what we want here.

E.g.:

  let () =
    let rex = Pcre.regexp "[a-z]+" in
    let subj = "this is 1 test" in
    let many_sstrs = Pcre.extract_all ~rex subj in
    let words = Array.map (fun sstrs -> sstrs.(0)) many_sstrs in
    Array.iter print_endline words

This will print:

  this
  is
  test

"extract_all" is the dual to "split".  In contrast to the latter it
does not remove the matching patterns but keeps them (including matching
substrings), and ignores all else.

Regards,
Markus

-- 
Markus Mottl          http://www.oefai.at/~markus          markus@oefai.at

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-09-17  0:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-16 15:44 [Caml-list] Strange PCRE bug Richard Jones
2004-09-17  0:21 ` Markus Mottl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).