caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Eray Ozkural <exa@kablonet.com.tr>
To: skaller@users.sourceforge.net
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Efficient C++ interfacing?
Date: Mon, 7 Jun 2004 06:04:39 +0300	[thread overview]
Message-ID: <200406070604.39686.exa@kablonet.com.tr> (raw)
In-Reply-To: <1086554511.16811.651.camel@pelican.wigram>

On Sunday 06 June 2004 23:41, skaller wrote:
> On Mon, 2004-06-07 at 04:44, Eray Ozkural wrote:
> > On Sunday 06 June 2004 10:00, skaller wrote:
> > > (1) Flxcc can't parse C++ yet
> >
> > I had started writing a modern top-down C++ parser using the
> > combinatorial parser library Parsec in Haskell, but I had failed due to
> > lack of time with the project.
> >
> > There are freely available parser models [LL(k), not LR(k)] which
> > we can build upon.
>
> C++ *interfaces* can probably be parsed with LALR
> tool like ocamlyacc. At least, I hope so!

Well, the interface language isn't that easy, but yes you are basically right. 
g++ was still using the old bison thing; they were talking about writing a 
parser from scratch.  I haven't followed the developments lately, though.

> The needs of a wrapper generator are considerably weaker
> than a compiler. For example given the construction:
>
> int x = expr;
>
> we can parse 'expr' by as a list of tokens followed by semi.
> We don't care about the RHS at all, we just need to know 'x'
> is an 'int'.

You can skip code blocks, assignment statements, etc., but that means you have 
to parse them. :) My idea was that you first needed a complete C++ syntax 
analyzer, and then you can build whatever minimal semantic analyzer you want 
to. Here, we need just a (possibly) full C++ static type analyzer. I was 
venturing in that direction and saw it was so much harder than C, naturally. 
The C++ type system and *syntax* is a mess, especially with templates.

At any rate, I strongly suggest that we (whoever is interested) don't try to 
write the parser from scratch. Just take a known, working and complete 
(supports ISO C++ syntax 100%) model, and translate it into concise 
notation... Of course it'd be great if we could have an LALR parser that we 
wouldn't care about later, but I have no idea how practical that would be. My 
impression with g++ was "ouch", for instance. And in other freely available 
C++ parsers, which are embedded in compilers, I was scrolling up and down 
100kbs of parser code. I left the whole thing a little confused :) Even if 
you have the ISO standard in front of you, you'd be in a lot of trouble 
coding a C++ parser. It's the most stinking PL syntax ever invented. 

I believe my course of action had a merit: use a combinatorial parser system 
so that you have the entire C++ syntax with a 20-30kb code or maybe even 
less. All the compilers I saw were written in the horrible C or C++ languages 
themselves, and even browsing the syntax analyzer code was a nightmare. 
Another bonus you get with these modern parser libs is that it may be easier 
to recover from errors, etc.

program = many declarationExternal

See, that was easy wasn't it? I'm now looking at the Parsec parser I've 
written and it's not quite comprehensible. However, I've at least noted that 
it is adapted from PCCTS. Are there any decent combinatorial parser packages 
for ocaml? (Or is ocaml not abstract enough to do that? *grin*)

The thing to decide is what kind of parser you want. LALR doesn't seem the 
right choice for me. It'd probably take an LR(1) [or even LR(k)] parser to 
write it in a nice way, or LL(k) which might be the more fashionable among 
compiler designers nowadays... (odd as it sounds) Then find a really good 
open source compiler which works that way, and strip its parser logic.

Here are some links about parsing C++
http://www.empathy.com/pccts/
http://www.nobugs.org/developer/parsingcpp/
http://sourceforge.net/projects/opencxx/
http://www.antlr.org/  ---> Does it support C++ now?

I've now recollected the history of my development. I started out with 
Tendra's parser, and after being frustrated with what I thought was a little 
hackish solution, I switched to the approach of PCCTS.

Funny thing, maybe the easiest would be just write the translator code in Java 
using ANTLR, but I've no idea how acceptable that is to ocaml hackers :->

> Even if we can't quite parse everything, there are
> ways a wrapper generator can handle that (eg just
> ignore the declaration, and allow the user to fix
> it by hand writing the wrapper for that function).

Yes, but I'd prefer automatic operation.

> > > (2) Flxcc can't target Ocaml yet
> >
> > This one wouldn't be hard.
>
> I agree, but I haven't tried it.
>
> > So, you do have a rudimentary C++ parser?
>
> I have a sophisticated C parser (frontc/Cil)
> which handles GNU and MSVC extensions.
>
> Extending frontc to handle C++ seems easy.

I believe I had checked FrontC, but I think it is not easy at all because in 
general you shouldn't proceed that way with C++. The two languages are 
unbelievably different! As I said, look around for commentary on writing C++ 
parsers, there are some non-trivial problems that the parser has to solve, 
and I hadn't found them fun :) [Though it was easy enough to write the lexer 
haha]

> Cil converts the representation to one
> more amenable to analysis. Hacking Cil
> to accept the new constructors is harder.
> For example it checks for redeclarations
> of functions:
>
> void f();
> void f() { printf("Hello world"); }
>
> and knows there are the same .. but of course
> C++ allows overloading.

Well, I guess Cil wouldn't be so nice for all the advanced features of C++ 
like classes, class/function templates, member templates, etc.

> > > What's needed is developers.
> >
> > Yes. However, doing this wrapper generator is considerably valuable.
>
> I agree. It would be a great boost for Ocaml I think.
> And Ocaml people helping get it to work will also be
> helping it to work for Felix ..

A parser generator that can cope with C++ adequately is most definitely a 
great boost. As I said, I lack experience with the new features of SWIG, but 
you seem to think it is inadequate. Have you tried it out with any real-world 
C++ libs, does it have serious shortcomings?

Regards,


-- 
Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara  KDE Project: http://www.kde.org
http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://malfunct.iuma.com
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2004-06-07  3:04 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-30  7:41 Brandon J. Van Every
2004-05-30 11:47 ` ronniec95
2004-05-30 20:17   ` Brandon J. Van Every
2004-06-05 16:45   ` Eray Ozkural
2004-06-05 19:07     ` skaller
2004-06-06  0:31       ` Eray Ozkural
2004-06-06  3:33         ` John Goerzen
2004-06-06  7:00           ` skaller
2004-06-06 16:02             ` David Fox
2004-06-06 18:44             ` Eray Ozkural
2004-06-06 20:41               ` skaller
2004-06-07  3:04                 ` Eray Ozkural [this message]
2004-06-07  7:41                   ` Benjamin Geer
2004-06-07 13:38                     ` Eray Ozkural
2004-06-07 14:18                     ` Basile Starynkevitch local
2004-06-07 14:29                       ` Eray Ozkural
2004-06-07 16:29                         ` Eray Ozkural
2004-06-07 15:46                   ` skaller
2004-06-06 16:00       ` David Fox
2004-06-05 21:39     ` Brandon J. Van Every
2004-06-06 16:18       ` David Fox
2004-06-06 18:47         ` Eray Ozkural

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200406070604.39686.exa@kablonet.com.tr \
    --to=exa@kablonet.com.tr \
    --cc=caml-list@inria.fr \
    --cc=erayo@cs.bilkent.edu.tr \
    --cc=skaller@users.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).