caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] ocamllex problem
@ 2005-08-04 23:12 Jonathan Roewen
  2005-08-04 23:53 ` Jonathan Roewen
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Roewen @ 2005-08-04 23:12 UTC (permalink / raw)
  To: caml-list

Hi,

Here's my .mll file, to match IRC strings. The problem is that it's
including spaces, which I assume it shouldn't.

let letter = [^' ']

rule token = parse
        | ':'((letter|' ')* as s)       { STRING s }
        | letter+ as s                  { STRING s }
        | [' ']+                        { token lexbuf }
        | eof                           { EOL }

(BTW, this is for matching rule 2). Maybe I'm just retarded, but this
should work, correct?

Jonathan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamllex problem
  2005-08-04 23:12 [Caml-list] ocamllex problem Jonathan Roewen
@ 2005-08-04 23:53 ` Jonathan Roewen
  2005-08-05  1:03   ` skaller
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Roewen @ 2005-08-04 23:53 UTC (permalink / raw)
  To: caml-list

Yes, I'm retarded. Ignore me ;-)

On 8/5/05, Jonathan Roewen <jonathan.roewen@gmail.com> wrote:
> Hi,
> 
> Here's my .mll file, to match IRC strings. The problem is that it's
> including spaces, which I assume it shouldn't.
> 
> let letter = [^' ']
> 
> rule token = parse
>        | ':'((letter|' ')* as s)       { STRING s }
>        | letter+ as s                  { STRING s }
>        | [' ']+                        { token lexbuf }
>        | eof                           { EOL }
> 
> (BTW, this is for matching rule 2). Maybe I'm just retarded, but this
> should work, correct?
> 
> Jonathan
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamllex problem
  2005-08-04 23:53 ` Jonathan Roewen
@ 2005-08-05  1:03   ` skaller
  2005-08-05  5:11     ` Alain Frisch
  2005-08-05  6:15     ` james woodyatt
  0 siblings, 2 replies; 10+ messages in thread
From: skaller @ 2005-08-05  1:03 UTC (permalink / raw)
  To: Jonathan Roewen; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

On Fri, 2005-08-05 at 11:53 +1200, Jonathan Roewen wrote:
> Yes, I'm retarded. Ignore me ;-)

No way .. you picked something I didn't know:

> >        | ':'((letter|' ')* as s)       { STRING s }

I check the manual .. yup, you can match
subgroups like that..

How does that work??? The algorithm for handling
this for a DFA is non-trivial. Any pointers to
the algorithm used?

Alain Frisch pointed me at some nasty papers on
this, one with a regexp -> NFA conversion and the
other with a NFA-> DFA conversion, but I couldn't
figure out how to do the direct regexp->DFA conversion,
I'd sure like to find an algorithm for that..

-- 
John Skaller <skaller at users dot sourceforge dot net>


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamllex problem
  2005-08-05  1:03   ` skaller
@ 2005-08-05  5:11     ` Alain Frisch
  2005-08-05  6:15     ` james woodyatt
  1 sibling, 0 replies; 10+ messages in thread
From: Alain Frisch @ 2005-08-05  5:11 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

skaller wrote:
> How does that work??? The algorithm for handling
> this for a DFA is non-trivial. Any pointers to
> the algorithm used?

As you can find in the source code of ocamllex:


(* To generate directly a NFA from a regular expression.
     Confer Aho-Sethi-Ullman, dragon book, chap. 3
   Extension to tagged automata.
     Confer
       Ville Larikari
      ``NFAs with Tagged Transitions, their Conversion to Deterministic
        Automata and Application to Regular Expressions''.
       Symposium on String Processing and Information Retrieval (SPIRE
2000),
     http://kouli.iki.fi/~vlaurika/spire2000-tnfa.ps
(See also)
     http://kouli.iki.fi/~vlaurika/regex-submatch.ps.gz
*)

-- Alain


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamllex problem
  2005-08-05  1:03   ` skaller
  2005-08-05  5:11     ` Alain Frisch
@ 2005-08-05  6:15     ` james woodyatt
  2005-08-05  8:35       ` skaller
  2005-08-05  9:15       ` Berke Durak
  1 sibling, 2 replies; 10+ messages in thread
From: james woodyatt @ 2005-08-05  6:15 UTC (permalink / raw)
  To: Ocaml Trade

On 04 Aug 2005, at 18:03, skaller wrote:
>
> Alain Frisch pointed me at some nasty papers on this, one with a  
> regexp -> NFA conversion and the other with a NFA-> DFA conversion,  
> but I couldn't figure out how to do the direct regexp->DFA  
> conversion, I'd sure like to find an algorithm for that..

In my OCaml NAE Core Foundation, there is a something you may find  
interesting.  See the [Cf_lex] module and its subordinate [Cf_dfa].   
Since it isn't trying to be a multi-stage programming tool like  
[ocamllex], it produces a parser monad that executes a Lazy-DFA,  
instead of a fully space-time optimized DFA.  At some point, I may  
implement a [study] function that fully evaluates the Lazy-DFA and  
optimizes it, but I don't yet see a compelling need for that.

One thing: the pattern [':'((letter|' ')* as s)] is interesting.   
You're definitely right that something non-trivial is happening  
inside the DFA.  My [Cf_dfa] module does not keep a stack of  
backtracking sequences because I did something else to resolve the  
problem.  Look at the ( $@ ) operators, which allow you to use a  
parser monad on the recognized input sequence to obtain the result of  
a lexical rule.  Using this, you can implement something like the  
feature you're interested in by defining a nested hierarchy of parsers.

I know.  This is probably not what you're looking for.  To get what  
you're looking for, I'd have to extend [Cf_dfa] to handle marker  
nodes in the NFA.  I thought that would be more appropriate for  
[ocamllex] and similar tools, so I didn't do it.  Nice to see  
[ocamllex] did.


-- 
j h woodyatt <jhw@wetware.com>
that's my village calling... no doubt, they want their idiot back.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamllex problem
  2005-08-05  6:15     ` james woodyatt
@ 2005-08-05  8:35       ` skaller
  2005-08-05  9:15       ` Berke Durak
  1 sibling, 0 replies; 10+ messages in thread
From: skaller @ 2005-08-05  8:35 UTC (permalink / raw)
  To: james woodyatt; +Cc: Ocaml Trade

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

On Thu, 2005-08-04 at 23:15 -0700, james woodyatt wrote:
> On 04 Aug 2005, at 18:03, skaller wrote:

> I know.  This is probably not what you're looking for.  

Not directly: I have to generate C++ code from Ocaml.
At present, this is easy -- the Ocaml is a transliteration
of the Dragon book algorithm, and the C++ is two lines
of matrix lookup.

I would like to adapt the algorithm, which is regexp -> DFA
without constructing an NFA, to add enough information
so the C++ automata can capture groups.

Whilst a lazy automaton is nice in Ocaml, it's a bit
more challenging when the compile and run time
have to be split between Ocaml and C++ ;(

-- 
John Skaller <skaller at users dot sourceforge dot net>


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamllex problem
  2005-08-05  6:15     ` james woodyatt
  2005-08-05  8:35       ` skaller
@ 2005-08-05  9:15       ` Berke Durak
  2005-08-05 11:05         ` skaller
  1 sibling, 1 reply; 10+ messages in thread
From: Berke Durak @ 2005-08-05  9:15 UTC (permalink / raw)
  To: james woodyatt; +Cc: caml-list

On Thu, Aug 04, 2005 at 11:15:48PM -0700, james woodyatt wrote:
> One thing: the pattern [':'((letter|' ')* as s)] is interesting.   
> You're definitely right that something non-trivial is happening  
> inside the DFA.  My [Cf_dfa] module does not keep a stack of  
> backtracking sequences because I did something else to resolve the  
> problem.  Look at the ( $@ ) operators, which allow you to use a  
> parser monad on the recognized input sequence to obtain the result of  
> a lexical rule.  Using this, you can implement something like the  
> feature you're interested in by defining a nested hierarchy of parsers.

You may also wish to have a look at :

  Thomas Reps, "Maximal-Munch" Tokenization in Linear Time, ACM Trans.
  Program. Lang. Syst., vol. 20, num. 2, 1998, pp.259-273
  http://www.cs.wisc.edu/wpis/papers/toplas98b.pdf

-- 
Berke Durak


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamllex problem
  2005-08-05  9:15       ` Berke Durak
@ 2005-08-05 11:05         ` skaller
  2005-08-05 12:21           ` Jonathan Bryant
  0 siblings, 1 reply; 10+ messages in thread
From: skaller @ 2005-08-05 11:05 UTC (permalink / raw)
  To: Berke Durak; +Cc: james woodyatt, caml-list

[-- Attachment #1: Type: text/plain, Size: 394 bytes --]

On Fri, 2005-08-05 at 11:15 +0200, Berke Durak wrote:

> You may also wish to have a look at :
> 
>   Thomas Reps, "Maximal-Munch" Tokenization in Linear Time, ACM Trans.
>   Program. Lang. Syst., vol. 20, num. 2, 1998, pp.259-273
>   http://www.cs.wisc.edu/wpis/papers/toplas98b.pdf

Ah, that is interesting, thanks! 

-- 
John Skaller <skaller at users dot sourceforge dot net>


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamllex problem
  2005-08-05 11:05         ` skaller
@ 2005-08-05 12:21           ` Jonathan Bryant
  2005-08-05 12:39             ` David MENTRE
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Bryant @ 2005-08-05 12:21 UTC (permalink / raw)
  To: skaller, caml-list

Maybe also "Modern Compiler Implementation In Java".  It's in Java, but
the examples are written in a functional style (as far as is possible).
Unfortunately, I can't remember the author's name off hand, but it is
published by Cambridge Press...

--Jonathan

On Fri, 2005-08-05 at 21:05 +1000, skaller wrote:
> On Fri, 2005-08-05 at 11:15 +0200, Berke Durak wrote:
> 
> > You may also wish to have a look at :
> > 
> >   Thomas Reps, "Maximal-Munch" Tokenization in Linear Time, ACM Trans.
> >   Program. Lang. Syst., vol. 20, num. 2, 1998, pp.259-273
> >   http://www.cs.wisc.edu/wpis/papers/toplas98b.pdf
> 
> Ah, that is interesting, thanks! 
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
-- 
*=========================*
|Jonathan Bryant          |
|Valdosta State University|
|Information Technology   |
|System Operations        |
|-------------------------|
|jtbryant@valdosta.edu    |
|x6358                    |
*=========================*


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] ocamllex problem
  2005-08-05 12:21           ` Jonathan Bryant
@ 2005-08-05 12:39             ` David MENTRE
  0 siblings, 0 replies; 10+ messages in thread
From: David MENTRE @ 2005-08-05 12:39 UTC (permalink / raw)
  To: jtbryant; +Cc: skaller, caml-list

Hello,

2005/8/5, Jonathan Bryant <jtbryant@valdosta.edu>:
> Maybe also "Modern Compiler Implementation In Java".  It's in Java, but
> the examples are written in a functional style (as far as is possible).
> Unfortunately, I can't remember the author's name off hand, but it is
> published by Cambridge Press...

The book exists in 3 languages, Java, ML and C :
  http://www.cs.princeton.edu/~appel/modern/

Yours,
d.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-08-05 12:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-08-04 23:12 [Caml-list] ocamllex problem Jonathan Roewen
2005-08-04 23:53 ` Jonathan Roewen
2005-08-05  1:03   ` skaller
2005-08-05  5:11     ` Alain Frisch
2005-08-05  6:15     ` james woodyatt
2005-08-05  8:35       ` skaller
2005-08-05  9:15       ` Berke Durak
2005-08-05 11:05         ` skaller
2005-08-05 12:21           ` Jonathan Bryant
2005-08-05 12:39             ` David MENTRE

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).