From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA24772; Fri, 5 Jul 2002 17:56:16 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id RAA24972 for ; Fri, 5 Jul 2002 17:56:15 +0200 (MET DST) Received: from leia.mandrakesoft.com (office.mandrakesoft.com [195.68.114.34]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g65FuEf02919; Fri, 5 Jul 2002 17:56:14 +0200 (MET DST) Received: by leia.mandrakesoft.com (Postfix, from userid 505) id 8DC404277; Fri, 5 Jul 2002 17:56:11 +0200 (CEST) To: Xavier Leroy Cc: caml-list@inria.fr Subject: Re: [Caml-list] Regular expression library: a poll on features References: <20020705161302.C20462@pauillac.inria.fr> From: Pixel Date: 05 Jul 2002 17:56:10 +0200 In-Reply-To: <20020705161302.C20462@pauillac.inria.fr> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Xavier Leroy writes: (posting on the list cuz of the wishlist below) > 1- back-references in regexps (e.g. "([a-z]+)\1", meaning any sequence > of lowercase letters followed by another occurrence of the same sequence); IMO only usefull in few cases. Could be dropped with no pb. > > 2- partial string matching as per Str.string_partial_match, i.e. > the ability to recognize that a string is a prefix of a string that > match a regexp. quite a surprising feature. I can't imagine an interesting example of its use. Here is my wishlist to have regexps easily _usable_ (ie. fill the gap with perl/ruby) - with compiler support (a la printf's format): flags = [ `CASELESS | `MULTILINE | `DOTALL | `EXTENDED | `ANCHORED ] m : ?option:flags -> regexp -> string -> string tuple option match_all : ?option:flags -> regexp -> string -> string tuple list cond_match : ?option:flags -> string -> regexp -> (string tuple -> 'a) -> regexp -> (string tuple -> 'a) ... -> 'a (with checking the last regexp is always true unless 'a is unit) subst : string -> regexp -> (string tuple -> string) -> string gsubst : string -> regexp -> (string tuple -> string) -> string try_subst : string -> regexp -> (string tuple -> string) -> string option examples match Re.m "(\w+)=(\S+)" s with | None -> ... | Some(v, val) -> ... Re.cond_match s "^\s*#" (fun() -> None) "(\w+)=(\S+)" (fun (v, val) -> Some(v, val)) "" (fun() -> failwith ("bad line " ^ s)) Re.cond_match s "warning: (.*)" (fun s -> eprintf "a problem occured: %s\n" s) Re.subst (Re.subst s "^\s+" (fun() -> "")) "\s+$" (fun() -> "") - Without compiler support flags = [ `CASELESS | `MULTILINE | `DOTALL | `EXTENDED | `ANCHORED ] re : string -> regexp m : ?option:flags -> regexp -> string -> string list option match_all : ?option:flags -> regexp -> string -> string list list cond_match : ?option:flags -> string -> (regexp * (string list -> 'a)) list -> 'a subst : string -> regexp -> (string list -> string) -> string gsubst : string -> regexp -> (string list -> string) -> string try_subst : string -> regexp -> (string list -> string) -> string option examples match m (re"(\w+)=(\S+)") s with | Some [ v ; val ] -> ... | _ -> ... cond_match s [ re"^\s*#", fun() -> None ; re"(\w+)=(\S+)", fun (v, val) -> Some(v, val) ; re"", fun() -> failwith ("bad line " ^ s) ; ] cond_match s "warning: (.*)" (fun s -> eprintf "a problem occured: %s\n" (hd s)) subst (subst s (re"^\s+") (fun _ -> "")) (re"\s+$") (fun _ -> "") ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners