caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Dawid Toton <d0@wp.pl>
To: caml-list <caml-list@yquem.inria.fr>
Subject: Re: help with regular expression
Date: Mon, 06 Dec 2010 18:31:27 +0100	[thread overview]
Message-ID: <4CFD1DEF.6070006@wp.pl> (raw)
In-Reply-To: <926418.88102.qm@web65410.mail.ac4.yahoo.com>

On 12/06/2010 12:43 PM, zaid khalid wrote:
 > I want some help in writing regular expressions in Ocaml, as I know 
how to write it in informal way but in Ocaml syntax I can not. For 
example I want to write "a* | (aba)* ".
 >
 > Another question if I want the string to be matched against the 
regular expression to be matched as whole string not as substring what 
symbol I need to attach to the substring, i.e if I want only concrete 
strings accepted (like (" ", a , aa , aaa, aba, abaaba), but not ab or 
not abaa).
 >

I also had problems with Str (regexp descriptions being unreadable, 
error-prone and hard to generate dynamically) and decided just to stop 
using Str.
I have a tiny module [1] made with clarity in mind. It is pure OCaml. It 
defines operators like $$ to be used in regexp construction. This way 
syntax of the expressions is checked at compile time. Also, it is 
trivial to build them at run time.
The whole "engine" is contained in a relatively short function 
HRegex.subwords_of_subexpressions, so I believe anybody can hack it 
without much effort.

I haven't measured performance of this implementation. I expect it to be 
slow when processing long strings. It's just OK for my needs so far. 
Anyway, the important part is the module interface. It expresses my 
point of view on this topic.

The code is available in a mercurial repository [2].

The exemple "a* | (aba)* " would become:

open HRegex.Operators

let rx = (!* !$ "a") +$ (!* !$ "aba")

Dawid

[1] 
http://hg.ocamlcore.org/cgi-bin/hgwebdir.cgi/hlibrary/hlibrary/raw-file/tip/HRegex.mli
[2] http://hg.ocamlcore.org/cgi-bin/hgwebdir.cgi/hlibrary/hlibrary


  parent reply	other threads:[~2010-12-06 17:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-06 11:43 zaid khalid
2010-12-06 12:03 ` [Caml-list] " David Allsopp
2010-12-06 13:11   ` Sylvain Le Gall
2010-12-06 20:41     ` [Caml-list] " Martin Jambon
2010-12-06 17:31 ` Dawid Toton [this message]
2010-12-06 23:29 HELP : " zaid khalid

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CFD1DEF.6070006@wp.pl \
    --to=d0@wp.pl \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).