From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 943297EFCD for ; Wed, 1 Oct 2014 21:11:17 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of tom.j.ridge@googlemail.com) identity=pra; client-ip=209.85.216.170; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="tom.j.ridge@googlemail.com"; x-sender="tom.j.ridge@googlemail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of tom.j.ridge@googlemail.com designates 209.85.216.170 as permitted sender) identity=mailfrom; client-ip=209.85.216.170; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="tom.j.ridge@googlemail.com"; x-sender="tom.j.ridge@googlemail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-qc0-f170.google.com) identity=helo; client-ip=209.85.216.170; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="tom.j.ridge@googlemail.com"; x-sender="postmaster@mail-qc0-f170.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq0BAFpRLFTRVdiqm2dsb2JhbABgg2FZBIJ+tiuPIYFth02BBwgWAREBAQEBAQYLCwkULIQDAQEBAwESER0BASwLAQQLAQoEBw0NHQICIhIBBQEKEgYTEhCIBwEDCQgNm3Ruijh4hQIBBYhyChknAwqHFBIGkCKDA4FThRkFkRCCP4RLgWWSHBgpgWyDKTwvgkoBAQE X-IPAS-Result: Aq0BAFpRLFTRVdiqm2dsb2JhbABgg2FZBIJ+tiuPIYFth02BBwgWAREBAQEBAQYLCwkULIQDAQEBAwESER0BASwLAQQLAQoEBw0NHQICIhIBBQEKEgYTEhCIBwEDCQgNm3Ruijh4hQIBBYhyChknAwqHFBIGkCKDA4FThRkFkRCCP4RLgWWSHBgpgWyDKTwvgkoBAQE X-IronPort-AV: E=Sophos;i="5.04,634,1406584800"; d="scan'208";a="98892496" Received: from mail-qc0-f170.google.com ([209.85.216.170]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 01 Oct 2014 21:11:16 +0200 Received: by mail-qc0-f170.google.com with SMTP id m20so992744qcx.29 for ; Wed, 01 Oct 2014 12:11:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=wQ/AuN0WOOqInq6fdDKSpVbefjN46JrfNfPE9jtOHAg=; b=hB8pfuTGfDPChKftRi/Vfsl3W0SSPZcbm9znVyrIyDxatmUGrWeAdu7N2B85SC4wJL p0ApL1/qHoADQL/CA4gAIeDhxKgQ01b/ACKtue33inmoSdk2aYUJ4ghWxg47gTpaCqZt LsKVY2eg9WHQMc2ceIlWwcTq+I+ol8OskAqrXI9OM8bAL9QP6XR9MwpJR/HcpWl+9fRx ie7FY7hswWCrX7kb3bbKD1mvtqL9VUeP5Dommi7DvVY1BMJ80FnS3ZrwSMjAmD+hKWNW 3UO9TjdGL0CDUEYLIq2IzVH+iUgCS0on3NyABss43OBrKzz2XPJf3Zpts+JZQS/FWXPp 9xBA== X-Received: by 10.140.91.43 with SMTP id y40mr27091176qgd.58.1412190675856; Wed, 01 Oct 2014 12:11:15 -0700 (PDT) MIME-Version: 1.0 Sender: tom.j.ridge@googlemail.com Received: by 10.140.156.3 with HTTP; Wed, 1 Oct 2014 12:10:55 -0700 (PDT) In-Reply-To: <542C2BA0.8020900@inria.fr> References: <542C2BA0.8020900@inria.fr> From: Tom Ridge Date: Wed, 1 Oct 2014 20:10:55 +0100 X-Google-Sender-Auth: vANXNKpkmvtYyoeyUin0XVWxQc8 Message-ID: To: Xavier Leroy Cc: caml-list Content-Type: multipart/alternative; boundary=001a1139164e970f4f0504614344 Subject: Re: [Caml-list] Str, regular expressions, longest match --001a1139164e970f4f0504614344 Content-Type: text/plain; charset=UTF-8 Ah great. Thank you. For reference (for future readers of the caml-list archive perhaps), I can confirm that ocaml-re supports functionality to specify first or shortest or longest match. Tom On 1 October 2014 17:28, Xavier Leroy wrote: > On 29/09/14 22:04, Tom Ridge wrote: > > I am trying to use the Str module to match regular expressions. I want to > > return the longest match.[...] I don't want the order of > > the alternatives to matter. What am I doing wrong? > > Nothing: it's just that Str has first-match semantics, not longest-match. > > > What can I do to match the longest substring? > > In this particular example, you can sort the string patterns in > decreasing lexicographic order before putting them in ...|...|... > > In more general case, you might have more luck with other regexp > libraries (e.g. PCRE or Vouillon's RE, but I didn't check whether they > implement longest match). > > Hope this helps, > > - Xavier Leroy > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > --001a1139164e970f4f0504614344 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Ah great. Thank you.

For reference (for= future readers of the caml-list archive perhaps), I can confirm that ocaml= -re supports functionality to specify first or shortest or longest match.

Tom

On 1 October 2014 17:28, Xavier Leroy <Xavier.Le= roy@inria.fr> wrote:
On 29/09/14 22:04, Tom Ridge wrote:
> I am trying to use the Str module to match regular expressions. I want= to
> return the longest match.[...]=C2=A0 I don't want the order= of
> the alternatives to matter. What am I doing wrong?

Nothing: it's just that Str has first-match semantics, not longe= st-match.

> What can I do to match the longest substring?

In this particular example, you can sort the string patterns in
decreasing lexicographic order before putting them in ...|...|...

In more general case, you might have more luck with other regexp
libraries (e.g. PCRE or Vouillon's RE, but I didn't check whether t= hey
implement longest match).

Hope this helps,

- Xavier Leroy

--
Caml-list mailing list.=C2=A0 Subscription management and archives:
ht= tps://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

--001a1139164e970f4f0504614344--