caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Jack Andrews" <effbiae@ivorykite.com>
To: caml-list@yquem.inria.fr
Subject: some comments on ocaml{lex,yacc} from a novice's POV
Date: Sat, 2 Apr 2005 15:10:04 +1000 (EST)	[thread overview]
Message-ID: <50130.202.164.198.46.1112418604.squirrel@www.ivorykite.com> (raw)
In-Reply-To: <424DA923.7020106@tfb.com>

hi,

this is a little long.  i'm new to ocaml, but like most, have been
educated in FLs and experimented with and applied functional languages and
techniques.  python has been the first language i turn to for a few years
now.

i need to parse text as a sequence of records (with odd variations). i
have used ply (python lex-yacc) most recently for parsing and believe it
to be one of the more elegant mechanisms i've seen. 
http://systems.cs.uchicago.edu/ply/ply.html

elegant because there are no lex and yacc input files, but rather the
tokens and grammar rules are defined in python code -- succinctly!  eg:

# calclex.py
import lex
tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN',)
t_PLUS    = r'\+'  # in python, the r prefix to a string literal
t_MINUS   = r'-'   #  means as-is.  r'\' in python is "\\" in c
[snip]
def t_NUMBER(t):
    r'\d+'
    try: t.value = int(t.value)
    except ValueError:
         print "Line %d: Number %s is too large!" % (t.lineno,t.value)
         t.value = 0
    return t

by reflection/introspection ply finds all the token definitions in
calclex.py.  the only trick here is the first line of the t_NUMBER
function.  in python, any string literal as the first expression in a
function is the doc_string (accessible by t_NUMBER.__doc__ in this case)

#!/usr/local/bin/python
import yacc
from calclex import tokens  # this is where python builds the lexer

def p_expression_plus(p):
    'expression : expression PLUS term'
    p[0] = p[1] + p[3]
[snip]
def p_factor_expr(p):
    'factor : LPAREN expression RPAREN'
    p[0] = p[2]
# this is where python builds the parser
yacc.yacc()   # or yacc.yacc(method="LALR") for alternate parsing methods
while 1:
   try: s = raw_input('calc > ')
   except EOFError: break
   if not s: continue
   result = yacc.parse(s)
   print result

once again, using the names of functions and their docstrings, ply can
build a parser.

but i want to use ocaml, not python because i know i need (more) speed. 
after using ply, the ocaml{yacc,lex} implementation looks like it's just
glued on GNU tools.  not that there's anything wrong with that, but
integration with the language is nothing like that of ply.

don't get me wrong, i don't think ply is perfect, and i don't know enough
about parsing to be any kind of authority, but it seems to me a bit odd
that a comment in a caml parser is either (**) or /**/ depending on
context and in lexical analysis, a character set is expressed as ['A'-'Z'
'a'-'z' '_'] rather than usual (succinct) regexp syntax: [A-Za-z_]  (less
than half the characters)   really, the .mll and .mly look nothing like
caml

take what i say with a grain of salt, i'm no authority on anything i've said.


jack


  reply	other threads:[~2005-04-02  5:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-01 11:32 bug in "developing applications with objective caml" (english translation) Jack Andrews
2005-04-01 20:03 ` [Caml-list] " Ken Rose
2005-04-02  5:10   ` Jack Andrews [this message]
2005-04-02  7:02     ` [Caml-list] some comments on ocaml{lex,yacc} from a novice's POV Erik de Castro Lopo
2005-04-02  7:38     ` Jacques Garrigue
2005-04-03 16:18       ` Parser combinators [was: some comments on ocaml{lex,yacc} from a novice's POV] Alex Baretta
2005-04-04  0:40         ` [Caml-list] Parser combinators Jacques Garrigue
2005-04-05 16:06       ` [Caml-list] some comments on ocaml{lex,yacc} from a novice's POV Oliver Bandel
     [not found]   ` <50130.202.164.198.46.1112418605.squirrel@www.ivorykite.com>
2005-04-04  3:42     ` Jack Andrews
2005-04-04  5:44       ` [Caml-list] " Erik de Castro Lopo
2005-04-04  9:51         ` Jon Harrop
2005-04-05 12:00           ` Geoff Wozniak
2005-04-05 13:49             ` Jon Harrop
2005-04-05 14:26               ` Richard Jones
2005-04-05 16:13                 ` Oliver Bandel
2005-04-06  4:52               ` Geoff Wozniak
2005-04-06  5:12                 ` Kenneth Knowles
2005-04-06  6:15                 ` some comments on ocaml{lex,yacc} from anovice's POV Jack Andrews
2005-04-04 10:29         ` [Caml-list] Re: some comments on ocaml{lex,yacc} from a novice's POV Daan Leijen
2005-04-04 17:39         ` Paul Snively
2005-04-04 18:16           ` skaller
2005-04-04 18:49             ` Paul Snively

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50130.202.164.198.46.1112418604.squirrel@www.ivorykite.com \
    --to=effbiae@ivorykite.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).