From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 262B0BBCA for ; Wed, 19 Mar 2008 17:39:43 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmwCAEfe4EfY7zq9mGdsb2JhbACCPjSOEQEBAQEBCAUHCRSSLYZI X-IronPort-AV: E=Sophos;i="4.25,525,1199660400"; d="scan'208";a="23963403" Received: from gv-out-0910.google.com ([216.239.58.189]) by mail4-smtp-sop.national.inria.fr with ESMTP; 19 Mar 2008 17:39:42 +0100 Received: by gv-out-0910.google.com with SMTP id l14so193721gvf.4 for ; Wed, 19 Mar 2008 09:39:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; bh=UasIQDyYtYwhW3Z1oI27mMOSnEUPBEI+1jPRvhD9MjE=; b=gdoa/nwZr6HRNhXzTphTgrkrqO74G4oUBDGg41+UWYf82EjI/3Iuo2D+/x7XPurIvnU19/cRPCphf9se2+McvZ/k7DCe1+6KmCggdkuupVr72clQCdJxnPCrZDqYmnP0pgwoJWgBn8wQObGKl2TEe0QvYiOeaOXp3gV2oCeMnCs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:references:x-google-sender-auth; b=il89d8R3jufRPn/9zPnwmVZNE+GNoSGZf8CvCrw0H0P9XIHRASZ4Q3mx3fAaatf6x7FWr//Bwgjz4N8r/cq7GaSkZXOvb4VXaBUDWPOal/oho5clSH5xzpWsQuJlQ2W8lmGLBqTPTWieuJ+Bu46JioMHVBXjoOHFnLfPcu6B8Ic= Received: by 10.142.72.21 with SMTP id u21mr396084wfa.65.1205944781090; Wed, 19 Mar 2008 09:39:41 -0700 (PDT) Received: by 10.70.32.5 with HTTP; Wed, 19 Mar 2008 09:39:41 -0700 (PDT) Message-ID: Date: Wed, 19 Mar 2008 09:39:41 -0700 From: "Jake Donham" Sender: jake.donham@gmail.com To: caml-list@yquem.inria.fr Subject: Re: ocamllex regexp problem In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_17837_21277982.1205944781052" References: X-Google-Sender-Auth: 5f4e74b97a79c72a X-Spam: no; 0.00; ocamllex:01 regexp:01 regexp:01 foo:01 pcre:01 ocamllex:01 pcre:01 repairs:98 repairs:98 wrote:01 wrote:01 match:02 match:02 let:03 let:03 ------=_Part_17837_21277982.1205944781052 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Tue, Mar 18, 2008 at 7:03 PM, Jake Donham wrote: > let ends_sq = [^']']* ']' > let ends_sq_sq = ends_sq ([^']'] ends_sq)* ']'+ > let ends_sq_sq_ang = ends_sq_sq ([^'>'] ends_sq_sq)* '>' My colleague Haoyang Wang points out that my regexp, when viewed nondeterministically, matches "foo]]]>bar]]>", since ']'+ may match only "]]", then [^'>'] matches the third "]". Changing it to [^'>'']'] repairs it. So I guess the answer is that Micmatch on PCRE treats the regexp as greedy, while ocamllex does not. Thanks to those who replied, Jake ------=_Part_17837_21277982.1205944781052 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Tue, Mar 18, 2008 at 7:03 PM, Jake Donham <jake.donham@skydeck.com> wrote:
  let ends_sq = [^']']* ']'
  let ends_sq_sq = ends_sq ([^']'] ends_sq)* ']'+
  let ends_sq_sq_ang = ends_sq_sq ([^'>'] ends_sq_sq)* '>'

My colleague Haoyang Wang points out that my regexp, when viewed nondeterministically, matches "foo]]]>bar]]>", since ']'+ may match only "]]", then [^'>'] matches the third "]". Changing it to [^'>'']'] repairs it. So I guess the answer is that Micmatch on PCRE treats the regexp as greedy, while ocamllex does not.

Thanks to those who replied,

Jake

------=_Part_17837_21277982.1205944781052--