caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jon Harrop <jon@ffconsultancy.com>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] ocamllex+ocamlyacc and not parsing properly
Date: Mon, 8 Aug 2005 00:58:11 +0100	[thread overview]
Message-ID: <200508080058.12357.jon@ffconsultancy.com> (raw)
In-Reply-To: <ad8cfe7e050807143962166f9@mail.gmail.com>

On Sunday 07 August 2005 22:39, Jonathan Roewen wrote:
> I'm having some trouble with a lexer+parser I've written to parse IRC
> strings. Just about all strings are parsed correctly, but I'm having a
> few minor issues.
>
> Here are two strings that fail to parse correctly:
> :Sovereign.Wyldryde.org 254 dst 112 :holodeck programs running
> :
> :Sovereign.Wyldryde.org 333 dst #bfos Helio 112025589

I just added "irc_types.ml":

type command = JOIN | PART | MODE | TOPIC | NAMES | LIST | INVITE
               | KICK | PRIVMSG | NOTICE | QUIT | PING | Numeric of int

and compiled with:

ocamllex irc_lexer.mll
ocamlyacc irc_parser.mly
ocamlc -c irc_types.ml irc_parser.mli irc_parser.ml irc_lexer.ml
ocamlmktop irc_types.cmo irc_parser.cmo irc_lexer.cmo -o irc.top

ran the custom top-level with "./irc.top" and asked it to lex the first of 
your example strings:

# let lexbuf = Lexing.from_string ":Sovereign.Wyldryde.org 254 dst 
112:holodeck programs running";;
val lexbuf : Lexing.lexbuf =
  {Lexing.refill_buff = <fun>;
   Lexing.lex_buffer =
    ":Sovereign.Wyldryde.org 254 dst 112 :holodeck programs running";
   Lexing.lex_buffer_len = 62; Lexing.lex_abs_pos = 0;
   Lexing.lex_start_pos = 0; Lexing.lex_curr_pos = 0;
   Lexing.lex_last_pos = 0; Lexing.lex_last_action = 0;
   Lexing.lex_eof_reached = true; Lexing.lex_mem = [||];
   Lexing.lex_start_p =
    {Lexing.pos_fname = ""; Lexing.pos_lnum = 1; Lexing.pos_bol = 0;
     Lexing.pos_cnum = 0};
   Lexing.lex_curr_p =
    {Lexing.pos_fname = ""; Lexing.pos_lnum = 1; Lexing.pos_bol = 0;
     Lexing.pos_cnum = 0}}
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.STRING "Sovereign.Wyldryde.org"
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.COMMAND (Irc_types.Numeric 254)
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.STRING "dst"
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.COMMAND (Irc_types.Numeric 112)
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.STRING "holodeck programs running"
# Irc_lexer.message lexbuf;;
- : Irc_parser.token = Irc_parser.EOL

So you're lexer is emitting the tokens str, com, str, com, str, eol but your 
parser looks as though it is expecting str, com, str, str, str, eol.

I'm guessing the error is in the lexer because the grammar in the parser is 
very simple. So ":Sovereign.Wyldryde.org" is lexed by "message" into str, " " 
then invokes "command" which parses 254 into com, " " then invokes "param" 
which parses "dst" into str, "param" then invokes the remaining into strs.

However, that can't be correct because the lexer has clearly gone back into 
"command" in order to emit "Irc_types.Numeric 112".

It's just a guess, but have you assumed that each time the lexer is invoked by 
the parser that it starts in the rule it was left in when, in fact, the 
parser invokes the "message" rule every time?

> BTW: As an aside, if the lexer doesn't cover all the bases, it doesn't
> throw an exception, just screws up my OS (Bounds check error, followed
> by seg-fault).

Any idea what is causing the segfault?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


  parent reply	other threads:[~2005-08-08  0:03 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-08-07 21:39 Jonathan Roewen
2005-08-07 21:54 ` Jonathan Roewen
2005-08-07 23:58 ` Jon Harrop [this message]
2005-08-08  2:17   ` Jonathan Roewen
2005-08-08  4:23     ` Jonathan Roewen
2005-08-08  5:03       ` Jonathan Roewen
2005-08-08  6:39         ` Jon Harrop
2005-08-08  6:47           ` Jonathan Roewen
2005-08-08  8:59       ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200508080058.12357.jon@ffconsultancy.com \
    --to=jon@ffconsultancy.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).