caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* help with regular expression
@ 2010-12-06 11:43 zaid khalid
  2010-12-06 12:03 ` [Caml-list] " David Allsopp
  2010-12-06 17:31 ` Dawid Toton
  0 siblings, 2 replies; 6+ messages in thread
From: zaid khalid @ 2010-12-06 11:43 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 521 bytes --]

Hi Folks

I want some help in writing regular expressions in Ocaml, as I know how to write it in informal way but in Ocaml syntax I can not. For example I want to write "a* | (aba)* ".

Another question if I want the string to be matched against the regular expression to be matched as whole string not as substring what symbol I need to attach to the substring, i.e if I want only concrete strings accepted (like (" ", a , aa , aaa, aba, abaaba), but not ab or not abaa).


Hint I am using (Str.regexp)
Thanks




      

[-- Attachment #2: Type: text/html, Size: 667 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [Caml-list] help with regular expression
  2010-12-06 11:43 help with regular expression zaid khalid
@ 2010-12-06 12:03 ` David Allsopp
  2010-12-06 13:11   ` Sylvain Le Gall
  2010-12-06 17:31 ` Dawid Toton
  1 sibling, 1 reply; 6+ messages in thread
From: David Allsopp @ 2010-12-06 12:03 UTC (permalink / raw)
  To: zaid khalid, caml-list

zaid Khalid wrote:
> Hi Folks
>
> I want some help in writing regular expressions in Ocaml, as I know how to write it
> in informal way but in Ocaml syntax I can not. For example I want to write "a* | (aba)* ".

This question would better be posted on the beginners' list - http://caml.inria.fr/resources/forums.en.html#id2267683

Regular Expressions can be done using the Standard Library with the Str module (as you've found) - see http://caml.inria.fr/pub/docs/manual-ocaml/libref/Str.html so your expression above (assuming you have loaded/linked str.cm[x]a) is Str.regexp "a*\\|\\(aba\\)*". The language of regexps is given in the docs for Str.regexp function. Remember to escape backslash characters as the regular expression is given in an OCaml string (so to escape a backslash in your regexp you have to write "\\\\").

> Another question if I want the string to be matched against the regular expression
> to be matched as whole string not as substring what symbol I need to attach to the
> substring, i.e if I want only concrete strings accepted (like (" ", a , aa , aaa, 
> aba, abaaba), but not ab or not abaa).

Use ^ and $ at the beginning and end of your regexp to ensure that it matches the entire string only - "^\\(a*\\|\\(aba\\)*\\)$"

> Hint I am using (Str.regexp)

There are other libraries (e.g. pcre-ocaml) which provide different (I would say more powerful, rather than strictly better!) implementations.


David


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: help with regular expression
  2010-12-06 12:03 ` [Caml-list] " David Allsopp
@ 2010-12-06 13:11   ` Sylvain Le Gall
  2010-12-06 20:41     ` [Caml-list] " Martin Jambon
  0 siblings, 1 reply; 6+ messages in thread
From: Sylvain Le Gall @ 2010-12-06 13:11 UTC (permalink / raw)
  To: caml-list

On 06-12-2010, David Allsopp <dra-news@metastack.com> wrote:
> zaid Khalid wrote:
>>
>
>> Hint I am using (Str.regexp)
>
> There are other libraries (e.g. pcre-ocaml) which provide different (I
> would say more powerful, rather than strictly better!)
> implementations.
>
>

There is also syntax extension like mikmatch, that helps to write regexp
in a very meaningful syntax:

match str with 
| RE bol "a"* | "ab"* eol ->
  true
| _ ->
  false

http://martin.jambon.free.fr/mikmatch-manual.html
http://martin.jambon.free.fr/mikmatch.html

You can use pcre and str with mikmatch.

Regards,
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: help with regular expression
  2010-12-06 11:43 help with regular expression zaid khalid
  2010-12-06 12:03 ` [Caml-list] " David Allsopp
@ 2010-12-06 17:31 ` Dawid Toton
  1 sibling, 0 replies; 6+ messages in thread
From: Dawid Toton @ 2010-12-06 17:31 UTC (permalink / raw)
  To: caml-list

On 12/06/2010 12:43 PM, zaid khalid wrote:
 > I want some help in writing regular expressions in Ocaml, as I know 
how to write it in informal way but in Ocaml syntax I can not. For 
example I want to write "a* | (aba)* ".
 >
 > Another question if I want the string to be matched against the 
regular expression to be matched as whole string not as substring what 
symbol I need to attach to the substring, i.e if I want only concrete 
strings accepted (like (" ", a , aa , aaa, aba, abaaba), but not ab or 
not abaa).
 >

I also had problems with Str (regexp descriptions being unreadable, 
error-prone and hard to generate dynamically) and decided just to stop 
using Str.
I have a tiny module [1] made with clarity in mind. It is pure OCaml. It 
defines operators like $$ to be used in regexp construction. This way 
syntax of the expressions is checked at compile time. Also, it is 
trivial to build them at run time.
The whole "engine" is contained in a relatively short function 
HRegex.subwords_of_subexpressions, so I believe anybody can hack it 
without much effort.

I haven't measured performance of this implementation. I expect it to be 
slow when processing long strings. It's just OK for my needs so far. 
Anyway, the important part is the module interface. It expresses my 
point of view on this topic.

The code is available in a mercurial repository [2].

The exemple "a* | (aba)* " would become:

open HRegex.Operators

let rx = (!* !$ "a") +$ (!* !$ "aba")

Dawid

[1] 
http://hg.ocamlcore.org/cgi-bin/hgwebdir.cgi/hlibrary/hlibrary/raw-file/tip/HRegex.mli
[2] http://hg.ocamlcore.org/cgi-bin/hgwebdir.cgi/hlibrary/hlibrary


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: help with regular expression
  2010-12-06 13:11   ` Sylvain Le Gall
@ 2010-12-06 20:41     ` Martin Jambon
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Jambon @ 2010-12-06 20:41 UTC (permalink / raw)
  To: caml-list

On 12/06/10 05:11, Sylvain Le Gall wrote:
> On 06-12-2010, David Allsopp <dra-news@metastack.com> wrote:
>> zaid Khalid wrote:
>>>
>>
>>> Hint I am using (Str.regexp)
>>
>> There are other libraries (e.g. pcre-ocaml) which provide different (I
>> would say more powerful, rather than strictly better!)
>> implementations.
>>
>>
> 
> There is also syntax extension like mikmatch, that helps to write regexp
> in a very meaningful syntax:
> 
> match str with 
> | RE bol "a"* | "ab"* eol ->
>   true
> | _ ->
>   false

If I understand correctly the original problem, the solution is:

match str with
  | RE ("a"* | "aba"*) eos ->
      (* matches always the beginning of the string,
         eos enforces a match at the end of the string,
         and the vertical bar has the lowest priority
         and so parentheses are needed. *)
      true
  | _ ->
      false


> http://martin.jambon.free.fr/mikmatch-manual.html
> http://martin.jambon.free.fr/mikmatch.html
> 
> You can use pcre and str with mikmatch.

I would recommend the pcre variant mostly for one feature that is not
provided by str:  lazy quantifiers, i.e. "repeat as little as possible
before trying to match what comes next".


Martin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] HELP : with regular expression
  2010-12-06 23:29 HELP : " zaid khalid
@ 2010-12-07 15:55 ` Ashish Agarwal
  0 siblings, 0 replies; 6+ messages in thread
From: Ashish Agarwal @ 2010-12-07 15:55 UTC (permalink / raw)
  To: zaid khalid; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 3020 bytes --]

I know you're asking for a solution with Str. Since I can't help with that,
let me give you the Pcre solution instead. Hopefully my explanations will
make up for the lack of examples to help you start using it.

pmatch is the function you care about. Ignore most of the arguments. All you
need to do is pass the regular expression and the string to match against.
The regular expression can be given in either the ~rex or ~pat named
arguments (not both). Use ~pat for hacking, and be sure to compile to ~rex
if you're doing lots of matches. Notice that almost every function has the
same arguments, so once you learn this one the others will make sense too.

extract is the second useful function. pmatch just returns true or false if
the regexp matches or not. extract will give you all the matching
substrings, which is useful to see what your regexp is actually doing (and
of course if you need the matched string for later use).

After you use these two functions you can start looking at the numerous
other ones that let you do more detailed things.

The regular expression I came up with for what you want is "(a+|(aba)+)$".
Let's test it:

$ ocaml
        Objective Caml version 3.12.0

# #require "pcre";;
# open Pcre;;

# pmatch ~pat:"(a+|(aba)+)$" "abaaba";;
- : bool = true

# extract ~pat:"(a+|(aba)+)$" "abaaba";;
- : string array = [|"abaaba"; "abaaba"; "aba"|]

Note that extract gives as its first result the "full match at index 0" and
then all the substrings that match. I'm actually not sure what the use of
this is, and I always set full_match to false to avoid the duplication.

# pmatch ~pat:"(a+|(aba)+)$" "abaa";;
- : bool = true

# extract ~full_match:false ~pat:"(a+|(aba)+)$" "abaa";;
- : string array = [|"aa"; ""|]

# pmatch ~pat:"(a+|(aba)+)$" "abb";;
- : bool = false

# extract ~full_match:false ~pat:"(a+|(aba)+)$" "abb";;
Exception: Not_found.

Hope that helps.


On Mon, Dec 6, 2010 at 6:29 PM, zaid khalid <zaidbenaz@yahoo.com> wrote:

> Hi folks
>
> Thank you for all your replies. I think I am still struggling to find a
> solution to my issue using "Str.regexp", and using Pcre-ocaml needs some
> time to be familiar with as there is no enough examples and discussion on
> it.
>
> Ill put my issue again as if someone can help me to find a solution to it
> with the "Str" .
>
> I want to define regular expression and after that I want to check if
> particular string is a prefix of the given regular expression.
>
> Example: a* | (aba)* so when you test "abaaba" the result will be true
> (complete match) and when we check "abaa" the result is true as well but
> when we check "abb" the result is false.
>
> I look forward to your suggestions.
>
> Cheers,
> Zaid
>
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

[-- Attachment #2: Type: text/html, Size: 4532 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-12-07 15:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-06 11:43 help with regular expression zaid khalid
2010-12-06 12:03 ` [Caml-list] " David Allsopp
2010-12-06 13:11   ` Sylvain Le Gall
2010-12-06 20:41     ` [Caml-list] " Martin Jambon
2010-12-06 17:31 ` Dawid Toton
2010-12-06 23:29 HELP : " zaid khalid
2010-12-07 15:55 ` [Caml-list] " Ashish Agarwal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).