From: Matt Gushee <mgushee@havenrock.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] index of substring
Date: Wed, 26 Jan 2005 21:21:01 -0700 [thread overview]
Message-ID: <41F86C2D.8030209@havenrock.com> (raw)
In-Reply-To: <1106786478.41f838aecb7dd@www.doc.ic.ac.uk>
yjc01@doc.ic.ac.uk wrote:
> I tried to write a function to return the index of a substring in a string from
> an index. So I can use it to extract some comments (text between (* and *) in a
> file. But came across some problem when returning the value.
Well, I'm not the world's greatest OCaml expert, but I enjoy this sort
of problem, so I took a stab at it.
First, a couple of quick comments on your code:
> (* create variable to store file *)
> let comments = String.make fileSize 'c';;
I don't really understand the need for this. If, as it appears, this
string will hold the contents of the input file, why not just create a
file-reading function that returns a string?
> (* return index of a substring in a string from an index, if substring not
> matched retun -1 *)
You must be a C programmer ;-) Not that it's necessarily wrong to return
-1. But there are OCaml idioms that to me seem more natural for this
case. You could use an int option type ( None | Some x ), or you could
keep the return type as an int, raising Not_found when appropriate.
> Or is there an easier way to do this?
There are a number of possible solutions. The easiest would be to use
regexp matching (standard Str module, or 3rd-party Pcre). I'm sure you
can figure that one out on your own, so I won't pursue it further. If
you want a more robust solution, you should probably use one of the fine
parsing tools available: ocamllex/ocamlyacc or camlp4. But those are
large topics in their own right.
I worked up another solution that uses only the standard library. It's
probably not easier, but it is mostly functional where your code was
imperative, and is arguably more elegant. Maybe it will give you an idea
or two.
(* val sync_start : char -> char Stream.t -> char Stream.t -> unit *)
let sync_start chr s1 s2 =
(* "Sets" both streams to the position of the starting character.
* A Stream.Failure exception may be raised; it is deliberately
* not handled here. *)
let rec sync () =
match Stream.next s1 with
(* Since the start character is removed from s1, s2 must
* be advanced by one to match. *)
| c when c = chr -> Stream.junk s2
| _ -> sync () in
sync ()
(* val index_left : ?start:int -> string -> string -> int *)
let index_left ?(start=0) src target =
let first = target.[0]
and sstream = Stream.of_string src in
(* Set the source stream to the desired start position. *)
for i = 1 to start do
try ignore (Stream.next sstream)
with Stream.Failure -> raise Not_found
done;
(* We need a fresh target stream each time the search is
* repeated, so the following is a function. Hmm, there's
* probably a better solution, but it hasn't occurred to
* me yet. *)
let mkts () = Stream.of_string target
and advance s =
(* The return type is char option because it will be used
* in pattern matching. *)
try Some (Stream.next s)
with Stream.Failure -> None in
let rec search tstream =
( try sync_start first sstream tstream
(* If we reach the end of the source stream, then of course
* that means the substring was not found, so there's no
* point in continuing. *)
with Stream.Failure -> raise Not_found );
let startpos = Stream.count sstream - 1 in
(* Continue searching once the start character is found. *)
let rec continue () =
match advance sstream, advance tstream with
(* Everything matches up to this point. Continue
* searching. *)
| Some c1, Some c2 when c1 = c2 -> continue ()
(* Substring is not matched here. Start over. *)
| Some c1, Some c2 -> search (mkts ())
(* The target stream is exhausted, no non-matching
* characters found, so we have a complete match. Return
* the starting index. *)
| _, None -> startpos
(* The source stream is exhausted, but one or more
* characters remain in the target. No match. *)
| None, Some _ -> raise Not_found in
continue () in
search (mkts ())
It occurs to me now that if you wanted to use this approach to search
sequentially for different substrings, it would probably be best to
create the source stream outside the search function. Then you would
reuse that stream over several invocations of the function; each time
you searched, the stream would be positioned at the end of the previous
search.
Hope this helps a bit.
--
Matt Gushee
Englewood, CO USA
next prev parent reply other threads:[~2005-01-27 4:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-01-27 0:41 yjc01
2005-01-27 4:21 ` Matt Gushee [this message]
2005-01-27 6:44 ` [Caml-list] " Radu Grigore
2005-01-27 9:36 ` skaller
2005-01-27 10:20 ` Jean-Christophe Filliatre
2005-01-27 16:55 ` skaller
2005-01-27 17:20 ` Radu Grigore
2005-01-28 8:51 ` Jean-Christophe Filliatre
2005-01-28 13:31 ` skaller
2005-01-27 8:37 ` Oliver Bandel
2005-01-27 9:17 ` Xavier Leroy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41F86C2D.8030209@havenrock.com \
--to=mgushee@havenrock.com \
--cc=caml-list@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).