From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id MAA26145; Thu, 23 Aug 2001 12:21:16 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id MAA26689 for ; Thu, 23 Aug 2001 12:21:15 +0200 (MET DST) Received: from mail.cs.uu.nl (sunset.cs.uu.nl [131.211.80.32]) by nez-perce.inria.fr (8.11.1/8.10.0) with ESMTP id f7NALEX27045 for ; Thu, 23 Aug 2001 12:21:15 +0200 (MET DST) Received: from silvester.cs.uu.nl (silvester.cs.uu.nl [131.211.80.119]) by mail.cs.uu.nl (Postfix) with SMTP id 8E5144536; Thu, 23 Aug 2001 12:21:14 +0200 (MET DST) Received: by silvester.cs.uu.nl (sSMTP sendmail emulation); Thu, 23 Aug 2001 12:21:14 +0200 Date: Thu, 23 Aug 2001 12:21:14 +0200 From: Frank Atanassow To: Neale Pickett Cc: caml-list@pauillac.inria.fr Subject: Re: [Caml-list] Str.string_match raising Invalid_argument "String.sub" in gc Message-ID: <20010823122114.A3873@cs.uu.nl> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from neale-caml@woozle.org on Wed, Aug 22, 2001 at 01:41:51PM -0700 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Neale Pickett wrote (on 22-08-01 13:41 -0700): > Alain Frisch writes: > > On 22 Aug 2001 neale-caml@woozle.org wrote: > > >> # let rec f l = > >> let sep = Str.regexp "^[ \t\n]*\\(.+\\)" in > >> match l with > >> | [] -> [] > >> | [""] -> [] > >> | s :: rest -> if (Str.string_match sep s 0) then > >> let foo = print_string ("match " ^ Str.matched_group 1 s ^ "\n") in > >> (Str.matched_group 1 s) :: (f rest) > > ^^ > > > This is wrong; with the current OCaml implementation, the right > > operand of (::) is called first; so (Str.matched_group 1 s) is called > > after subsequent calls to Str.string_match, which is obviously > > incorrect. > > Aha! Thank you. > > This makes sense, but it is certainly not obvious, especially in a > language which purports to have no side-effects. Ocaml does not purport to have no side-effects. It has plenty of side-effects. You must be thinking of Haskell or Miranda. > I can't help thinking > that s should be a different string for every invocation, but clearly it > is somehow related to the initial input string. No doubt this is a > clever optimization within OCaml which makes for drastically reduced > memory usage when processing strings, but it does make things a bit > confusing to the beginner. I'm pretty sure there is no such optimization, but I'm not sure what you're talking about here. Anyway, if an optimization affected the behavior of a program, it would not be an optimization but rather an compiler bug. > I don't have any good suggestions on how else to do it, although my base > desire is to have a regexp matching function which returns a string list > of the matched groups. There is no need to mutate the list/string(s). If I understand you correctly (but I don't think I do): let sep_list = let sep = Str.regexp "[ \t\n]+\\([^ \t\n]*\\)" in fun s -> let rec loop i = if Str.string_match sep s i then let m = Str.matched_group 1 s in m :: loop (Str.match_end ()) else [] in loop 0 # sep_list " abc def ghi j";; - : string list = ["abc"; "def"; "ghi"; "j"] But this is what the Str.split procedure does already: # Str.split (Str.regexp "[ \t\n]+") " abc def ghi j";; - : string list = ["abc"; "def"; "ghi"; "j"] Your function has type string list -> string list, and it seems like it just does the same match on every element of the list, so it's much easier: let map_match = let sep = Str.regexp "[ \t\n]*\\(.+\\)" in fun l -> let f s = Str.string_match sep s 0; Str.matched_group 1 s in List.map f l # map_match [" arf"; " barf"];; - : string list = ["arf"; "barf"] -- Frank Atanassow, Information & Computing Sciences, Utrecht University Padualaan 14, PO Box 80.089, 3508 TB Utrecht, Netherlands Tel +31 (030) 253-3261 Fax +31 (030) 251-379 ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr