caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Str, regular expressions, longest match
@ 2014-09-29 20:04 Tom Ridge
  2014-09-30 12:47 ` Christophe Raffalli
  2014-10-01 16:28 ` Xavier Leroy
  0 siblings, 2 replies; 4+ messages in thread
From: Tom Ridge @ 2014-09-29 20:04 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 569 bytes --]

Dear All,

I am trying to use the Str module to match regular expressions. I want to
return the longest match.

let txt = "ab"
let reg = "a\\|ab"
let _ =
  let b = Str.string_match (Str.regexp reg) txt 0 in
  Str.matched_string txt

After the last line, the string "a" is apparently matched. But I hope that
the result should be "ab". If I reverse the order of the alternatives in
the regular expression, the string "ab" is matched. I don't want the order
of the alternatives to matter. What am I doing wrong? What can I do to
match the longest substring?

Thanks

Tom

[-- Attachment #2: Type: text/html, Size: 831 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Str, regular expressions, longest match
  2014-09-29 20:04 [Caml-list] Str, regular expressions, longest match Tom Ridge
@ 2014-09-30 12:47 ` Christophe Raffalli
  2014-10-01 16:28 ` Xavier Leroy
  1 sibling, 0 replies; 4+ messages in thread
From: Christophe Raffalli @ 2014-09-30 12:47 UTC (permalink / raw)
  To: Tom Ridge; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]

On 14-09-29 21:04:17, Tom Ridge wrote:
>    Dear All,
>    I am trying to use the Str module to match regular expressions. I want to
>    return the longest match.
>    let txt = "ab"
>    let reg = "a\\|ab"
>    let _ =A 
>    A  let b = Str.string_match (Str.regexp reg) txt 0 in
>    A  Str.matched_string txt
>    After the last line, the string "a" is apparently matched. But I hope that
>    the result should be "ab". If I reverse the order of the alternatives in
>    the regular expression, the string "ab" is matched. I don't want the order
>    of the alternatives to matter. What am I doing wrong? What can I do to
>    match the longest substring?
>    Thanks
>    Tom

Hello,

I had the same problem just last week and almost wrote the same mail.

This is at least a bug in the documentation ?

Cheers,
Christophe

-- 
Christophe Raffalli
Universite de Savoie
Batiment Le Chablais, bureau 21
73376 Le Bourget-du-Lac Cedex

tel: (33) 4 79 75 81 03
fax: (33) 4 79 75 87 42
mail: Christophe.Raffalli@univ-savoie.fr
www: http://www.lama.univ-savoie.fr/~RAFFALLI
---------------------------------------------
IMPORTANT: this mail is signed using PGP/MIME
At least Enigmail/Mozilla, mutt or evolution 
can check this signature. The public key is
stored on www.keyserver.net
---------------------------------------------

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Str, regular expressions, longest match
  2014-09-29 20:04 [Caml-list] Str, regular expressions, longest match Tom Ridge
  2014-09-30 12:47 ` Christophe Raffalli
@ 2014-10-01 16:28 ` Xavier Leroy
  2014-10-01 19:10   ` Tom Ridge
  1 sibling, 1 reply; 4+ messages in thread
From: Xavier Leroy @ 2014-10-01 16:28 UTC (permalink / raw)
  To: caml-list

On 29/09/14 22:04, Tom Ridge wrote:
> I am trying to use the Str module to match regular expressions. I want to
> return the longest match.[...]  I don't want the order of
> the alternatives to matter. What am I doing wrong?

Nothing: it's just that Str has first-match semantics, not longest-match.

> What can I do to match the longest substring?

In this particular example, you can sort the string patterns in
decreasing lexicographic order before putting them in ...|...|...

In more general case, you might have more luck with other regexp
libraries (e.g. PCRE or Vouillon's RE, but I didn't check whether they
implement longest match).

Hope this helps,

- Xavier Leroy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Str, regular expressions, longest match
  2014-10-01 16:28 ` Xavier Leroy
@ 2014-10-01 19:10   ` Tom Ridge
  0 siblings, 0 replies; 4+ messages in thread
From: Tom Ridge @ 2014-10-01 19:10 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1208 bytes --]

Ah great. Thank you.

For reference (for future readers of the caml-list archive perhaps), I can
confirm that ocaml-re supports functionality to specify first or shortest
or longest match.

Tom

On 1 October 2014 17:28, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:

> On 29/09/14 22:04, Tom Ridge wrote:
> > I am trying to use the Str module to match regular expressions. I want to
> > return the longest match.[...]  I don't want the order of
> > the alternatives to matter. What am I doing wrong?
>
> Nothing: it's just that Str has first-match semantics, not longest-match.
>
> > What can I do to match the longest substring?
>
> In this particular example, you can sort the string patterns in
> decreasing lexicographic order before putting them in ...|...|...
>
> In more general case, you might have more luck with other regexp
> libraries (e.g. PCRE or Vouillon's RE, but I didn't check whether they
> implement longest match).
>
> Hope this helps,
>
> - Xavier Leroy
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 2003 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-01 19:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-29 20:04 [Caml-list] Str, regular expressions, longest match Tom Ridge
2014-09-30 12:47 ` Christophe Raffalli
2014-10-01 16:28 ` Xavier Leroy
2014-10-01 19:10   ` Tom Ridge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).