Typesystem and Parsers

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Typesystem and Parsers
@ 2010-10-22 14:59 Oliver Bandel
  2010-10-22 17:41 ` [Caml-list] " Oliver Bandel
  2010-10-22 23:43 ` Norman Hardy
  0 siblings, 2 replies; 4+ messages in thread
From: Oliver Bandel @ 2010-10-22 14:59 UTC (permalink / raw)
  To: caml-list

Hello,

when reading papers or books on parsing
techniques, the parsing often is done in
different distinctive steps, where type checking
and semantic checks are done after the parse tree
is build up.

This may be the classical way, for example when doing it in C.

When using OCaml with it's strong type system,
IMHO the big advantage is, that the type system can be used to  
restrict the input data in a way that without additionally checks  
correct input is enforced, otherwise a parse error is detected.

Also with arranging a parser (e.g. with ocamlyacc) both ways can be  
walked along, either by just accepting everything and build up the  
tree, and later detect erros in syntax or type... (for example all  
scanned entities given back as strings or string lists)...

...or the parser can just accept only what the type system would accept,
which would be enforced by using sum types.

I think, both ways have their advantages, but the strongly typed  
approach seems not to be talked about in books and papers.

So somehow I'm looking for arguments and techniques on how to use  
Ocaml's type system efficiently, but maybe also take advantage of the  
flexibility of a weak type system (e.g. by exploring the tree qith the  
wrong syntax, to analyze it nevertheless).

Can you point me to some arguments on how you would do your parsing
and why and in which csituation you would chose the one or the other approach?

For making it more specific: the current program I'm sitting on, is a simple
interpreter that helps me in analyzing webpages - a DSL for doing that.
It already runs and has some nice features, but in the development stage
I toggled between using sum-types (changed forth and back between some  
solutuons) and using just simple basic types (string and string list).

I once read an interesting paper on which types to use...
... it was by... hmhhh forgot the name, but I think it was
one of the outstanding FP-programmers. The argument was: use
basic types as often as you can, and avoid specific types.

This approach has helped: string and string list can be easily used
in a parser, when uzsing recursive riiules: just prepend the new value  
to the already built list. Easy done.
But this throws out the type system's strength.

Giving the values sum types, which is more specific, would also have  
advantages.

One reason why I'm not really decided, which way to go also is, that  
at the moment it's an interpreter, but maybe lkater I want to make it  
createing in-between-code maybe even optimizing.

So, can you please elaborate on advantages and disadvantages,
when to use which way?

Ciao,
     Oliver

P.S.: Writing a compiler I also have in mind, but it has nothing to do  
with the
       web-parser-DSL, and is related to microcontroller programming... but if
       you could also elaborate on the compiler issues regarding usage of
       typesystem in parsers, this would be also very interesting too me.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Typesystem and Parsers
  2010-10-22 14:59 Typesystem and Parsers Oliver Bandel
@ 2010-10-22 17:41 ` Oliver Bandel
  2010-10-22 23:43 ` Norman Hardy
  1 sibling, 0 replies; 4+ messages in thread
From: Oliver Bandel @ 2010-10-22 17:41 UTC (permalink / raw)
  To: caml-list

Zitat von "Oliver Bandel" <oliver@first.in-berlin.de>:

> Hello,
>
> when reading papers or books on parsing
> techniques, the parsing often is done in
> different distinctive steps, where type checking
> and semantic checks are done after the parse tree
> is build up.
>
> This may be the classical way, for example when doing it in C.
>
> When using OCaml with it's strong type system,
> IMHO the big advantage is, that the type system can be used to  
> restrict the input data in a way that without additionally checks  
> correct input is enforced, otherwise a parse error is detected.
[...]

Would something like a parse-error-token make sense, when
the typesystem and the ocamlyacc-grammar forbidds parsing
wrong sequences?

Maybe something like (simple example):

type token_t =
     Add
   | Sub
   | Int of int
   | String of string
   | ParseError of string

??


Ciao,
    Oliver


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Typesystem and Parsers
  2010-10-22 14:59 Typesystem and Parsers Oliver Bandel
  2010-10-22 17:41 ` [Caml-list] " Oliver Bandel
@ 2010-10-22 23:43 ` Norman Hardy
  2010-10-23  2:25   ` oliver
  1 sibling, 1 reply; 4+ messages in thread
From: Norman Hardy @ 2010-10-22 23:43 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On 2010 Oct 22, at 7:59 , Oliver Bandel wrote:

> Also with arranging a parser (e.g. with ocamlyacc) both ways can be walked along, either by just accepting everything and build up the tree, and later detect erros in syntax or type... (for example all scanned entities given back as strings or string lists)...

Let me describe an advantage that I see for Scheme over OCaml which pushes in the opposite direction to your proposal.
Analyzing a Scheme  program such as (let ( ... (r ...) ...) ...) where the ellipses may be pages of code, I can put the cursor just to the right of the r and type splat B twice on at least the Mac and learn the scope of r.
There are few more frequent steps for me as I study Scheme code.
Any of several editors that are not Scheme savvy can do this and even the Mac's Terminal program (a glass TTY for the shell) knows this trick.
I miss this in OCaml.
Perhaps this is because the gross parsing of Scheme is carried out at the lexical level—parentheses first!

As I reason about the scope of an OCaml variable, the whole cache of program logic in my head is flushed!
And them I am not sure of the result.
A systematic algorithm to determine scope is complex and error prone.

I want simple syntactic rules to determine scope, rules that fairly general editors can help with.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Typesystem and Parsers
  2010-10-22 23:43 ` Norman Hardy
@ 2010-10-23  2:25   ` oliver
  0 siblings, 0 replies; 4+ messages in thread
From: oliver @ 2010-10-23  2:25 UTC (permalink / raw)
  To: Norman Hardy; +Cc: caml-list

On Fri, Oct 22, 2010 at 04:43:46PM -0700, Norman Hardy wrote:
> 
> On 2010 Oct 22, at 7:59 , Oliver Bandel wrote:
> 
> > Also with arranging a parser (e.g. with ocamlyacc) both ways can be walked along, either by just accepting everything and build up the tree, and later detect erros in syntax or type... (for example all scanned entities given back as strings or string lists)...
> 
> Let me describe an advantage that I see for Scheme over OCaml which pushes in the opposite direction to your proposal.
> Analyzing a Scheme  program such as (let ( ... (r ...) ...) ...) where the ellipses may be pages of code, I can put the cursor just to the right of the r and type splat B twice on at least the Mac and learn the scope of r.
> There are few more frequent steps for me as I study Scheme code.
> Any of several editors that are not Scheme savvy can do this and even the Mac's Terminal program (a glass TTY for the shell) knows this trick.
> I miss this in OCaml.
> Perhaps this is because the gross parsing of Scheme is carried out at the lexical level—parentheses first!
> 
> As I reason about the scope of an OCaml variable, the whole cache of program logic in my head is flushed!
> And them I am not sure of the result.
> A systematic algorithm to determine scope is complex and error prone.
> 
> I want simple syntactic rules to determine scope, rules that fairly general editors can help with.

Hmhh, ok that's a convenience thing you like in Scheme.

But I really meant not how to work with Ocaml vs. other languages
(which tools are there and IDEs).

I meant the tradeoff between either using the type system (and how the parser is written)
to make the code robust but  that it will break the compilation on the one side
  vs.
on the other side making the parser (and used types) less rigid and rather
going a weakly typed way (or using for example reinvocation of the parser with
"error" catching in ocamlyacc for example, to just ignore errors) 
so that the tokens can be picked up, and the decision if the e.g. a found string
should have been an int.

So it's rather about parsing techniques.

Maybe I'm too much off-topic here.

Sorry if I am.

But somehow I think when using OCaml things behave different than doing it in
C, and if I want to take the advantages of OCaml and the type system (to use
the type system as constraints to program structure, pushing invariants into
the program via type system), how can I nevertheless do a complete parse - as
far as possible - and after creating a parse tree, analyze the tree (instead of
being forced to stop at the first error).

Should there be error-tokens inserted into the parse tree?
Or will this become a big mess and I better just stop at the first error
of my input language?

For example in a book I found mentioned, that type checking is a very late stage
in parsing. But I doubt this was written with OCaml in mind, even though
it was stated as general.

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-10-23  2:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-22 14:59 Typesystem and Parsers Oliver Bandel
2010-10-22 17:41 ` [Caml-list] " Oliver Bandel
2010-10-22 23:43 ` Norman Hardy
2010-10-23  2:25   ` oliver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).