From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id B6DBEBBAF for ; Mon, 6 Dec 2010 18:31:38 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AoIAAKKs/EzUTWUHkWdsb2JhbACDUJ9sFQEBAQEJCwoHEQQerTSQQIEhgzVzBI11 X-IronPort-AV: E=Sophos;i="4.59,306,1288566000"; d="scan'208";a="83211332" Received: from mx3.wp.pl ([212.77.101.7]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 06 Dec 2010 18:31:38 +0100 Received: (wp-smtpd smtp.wp.pl 238 invoked from network); 6 Dec 2010 18:31:34 +0100 Received: from 18-161.2-0.pl (HELO [192.168.1.101]) (d0@[91.189.18.161]) (envelope-sender ) by smtp.wp.pl (WP-SMTPD) with AES256-SHA encrypted SMTP for ; 6 Dec 2010 18:31:34 +0100 Message-ID: <4CFD1DEF.6070006@wp.pl> Date: Mon, 06 Dec 2010 18:31:27 +0100 From: Dawid Toton User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: caml-list Subject: Re: help with regular expression References: <926418.88102.qm@web65410.mail.ac4.yahoo.com> In-Reply-To: <926418.88102.qm@web65410.mail.ac4.yahoo.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-WP-AV: skaner antywirusowy poczty Wirtualnej Polski S. A. X-WP-SPAM: NO 0000000 [QdOk] X-Spam: no; 0.00; ocaml:01 ocaml:01 syntax:01 regexp:01 unreadable:01 error-prone:01 regexp:01 syntax:01 trivial:01 mli:01 aba:98 aba:98 wrote:01 dynamically:01 compile:01 On 12/06/2010 12:43 PM, zaid khalid wrote: > I want some help in writing regular expressions in Ocaml, as I know how to write it in informal way but in Ocaml syntax I can not. For example I want to write "a* | (aba)* ". > > Another question if I want the string to be matched against the regular expression to be matched as whole string not as substring what symbol I need to attach to the substring, i.e if I want only concrete strings accepted (like (" ", a , aa , aaa, aba, abaaba), but not ab or not abaa). > I also had problems with Str (regexp descriptions being unreadable, error-prone and hard to generate dynamically) and decided just to stop using Str. I have a tiny module [1] made with clarity in mind. It is pure OCaml. It defines operators like $$ to be used in regexp construction. This way syntax of the expressions is checked at compile time. Also, it is trivial to build them at run time. The whole "engine" is contained in a relatively short function HRegex.subwords_of_subexpressions, so I believe anybody can hack it without much effort. I haven't measured performance of this implementation. I expect it to be slow when processing long strings. It's just OK for my needs so far. Anyway, the important part is the module interface. It expresses my point of view on this topic. The code is available in a mercurial repository [2]. The exemple "a* | (aba)* " would become: open HRegex.Operators let rx = (!* !$ "a") +$ (!* !$ "aba") Dawid [1] http://hg.ocamlcore.org/cgi-bin/hgwebdir.cgi/hlibrary/hlibrary/raw-file/tip/HRegex.mli [2] http://hg.ocamlcore.org/cgi-bin/hgwebdir.cgi/hlibrary/hlibrary