From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id FAA00724; Mon, 7 Jun 2004 05:04:50 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id FAA01723 for ; Mon, 7 Jun 2004 05:04:48 +0200 (MET DST) X-SPAM-Warning: Sending machine is listed in blackholes.five-ten-sg.com Received: from eposta.kablonet.com.tr ([62.248.102.66]) by nez-perce.inria.fr (8.12.10/8.12.10) with SMTP id i5734kEV017503 for ; Mon, 7 Jun 2004 05:04:47 +0200 Received: (qmail 34824 invoked by uid 1007); 7 Jun 2004 03:13:35 -0000 Received: from exa@kablonet.com.tr by eposta.kablonet.com.tr by uid 0 with qmail-scanner-1.21 (clamdscan: 0.70-rc. Clear:RC:0(81.214.24.132):. Processed in 2.729817 secs); 07 Jun 2004 03:13:35 -0000 Received: from unknown (HELO orion) (exa@kablonet.com.tr@81.214.24.132) by 0 with SMTP; 7 Jun 2004 03:13:33 -0000 From: Eray Ozkural Reply-To: erayo@cs.bilkent.edu.tr Organization: Bilkent University CS Dept. To: skaller@users.sourceforge.net Subject: Re: [Caml-list] Efficient C++ interfacing? Date: Mon, 7 Jun 2004 06:04:39 +0300 User-Agent: KMail/1.6.51 References: <40800C10000A6ABC@mk-cpfrontend-4.mail.uk.tiscali.com> <200406062144.55004.exa@kablonet.com.tr> <1086554511.16811.651.camel@pelican.wigram> In-Reply-To: <1086554511.16811.651.camel@pelican.wigram> Cc: caml-list MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200406070604.39686.exa@kablonet.com.tr> X-Miltered: at nez-perce with ID 40C3DB4E.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; eray:01 ozkural:01 caml-list:01 interfacing:01 2004:99 2004:99 44,:01 eray:01 ozkural:01 00,:99 haskell:01 lalr:01 bison:01 expr:01 'int':01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Sunday 06 June 2004 23:41, skaller wrote: > On Mon, 2004-06-07 at 04:44, Eray Ozkural wrote: > > On Sunday 06 June 2004 10:00, skaller wrote: > > > (1) Flxcc can't parse C++ yet > > > > I had started writing a modern top-down C++ parser using the > > combinatorial parser library Parsec in Haskell, but I had failed due to > > lack of time with the project. > > > > There are freely available parser models [LL(k), not LR(k)] which > > we can build upon. > > C++ *interfaces* can probably be parsed with LALR > tool like ocamlyacc. At least, I hope so! Well, the interface language isn't that easy, but yes you are basically right. g++ was still using the old bison thing; they were talking about writing a parser from scratch. I haven't followed the developments lately, though. > The needs of a wrapper generator are considerably weaker > than a compiler. For example given the construction: > > int x = expr; > > we can parse 'expr' by as a list of tokens followed by semi. > We don't care about the RHS at all, we just need to know 'x' > is an 'int'. You can skip code blocks, assignment statements, etc., but that means you have to parse them. :) My idea was that you first needed a complete C++ syntax analyzer, and then you can build whatever minimal semantic analyzer you want to. Here, we need just a (possibly) full C++ static type analyzer. I was venturing in that direction and saw it was so much harder than C, naturally. The C++ type system and *syntax* is a mess, especially with templates. At any rate, I strongly suggest that we (whoever is interested) don't try to write the parser from scratch. Just take a known, working and complete (supports ISO C++ syntax 100%) model, and translate it into concise notation... Of course it'd be great if we could have an LALR parser that we wouldn't care about later, but I have no idea how practical that would be. My impression with g++ was "ouch", for instance. And in other freely available C++ parsers, which are embedded in compilers, I was scrolling up and down 100kbs of parser code. I left the whole thing a little confused :) Even if you have the ISO standard in front of you, you'd be in a lot of trouble coding a C++ parser. It's the most stinking PL syntax ever invented. I believe my course of action had a merit: use a combinatorial parser system so that you have the entire C++ syntax with a 20-30kb code or maybe even less. All the compilers I saw were written in the horrible C or C++ languages themselves, and even browsing the syntax analyzer code was a nightmare. Another bonus you get with these modern parser libs is that it may be easier to recover from errors, etc. program = many declarationExternal See, that was easy wasn't it? I'm now looking at the Parsec parser I've written and it's not quite comprehensible. However, I've at least noted that it is adapted from PCCTS. Are there any decent combinatorial parser packages for ocaml? (Or is ocaml not abstract enough to do that? *grin*) The thing to decide is what kind of parser you want. LALR doesn't seem the right choice for me. It'd probably take an LR(1) [or even LR(k)] parser to write it in a nice way, or LL(k) which might be the more fashionable among compiler designers nowadays... (odd as it sounds) Then find a really good open source compiler which works that way, and strip its parser logic. Here are some links about parsing C++ http://www.empathy.com/pccts/ http://www.nobugs.org/developer/parsingcpp/ http://sourceforge.net/projects/opencxx/ http://www.antlr.org/ ---> Does it support C++ now? I've now recollected the history of my development. I started out with Tendra's parser, and after being frustrated with what I thought was a little hackish solution, I switched to the approach of PCCTS. Funny thing, maybe the easiest would be just write the translator code in Java using ANTLR, but I've no idea how acceptable that is to ocaml hackers :-> > Even if we can't quite parse everything, there are > ways a wrapper generator can handle that (eg just > ignore the declaration, and allow the user to fix > it by hand writing the wrapper for that function). Yes, but I'd prefer automatic operation. > > > (2) Flxcc can't target Ocaml yet > > > > This one wouldn't be hard. > > I agree, but I haven't tried it. > > > So, you do have a rudimentary C++ parser? > > I have a sophisticated C parser (frontc/Cil) > which handles GNU and MSVC extensions. > > Extending frontc to handle C++ seems easy. I believe I had checked FrontC, but I think it is not easy at all because in general you shouldn't proceed that way with C++. The two languages are unbelievably different! As I said, look around for commentary on writing C++ parsers, there are some non-trivial problems that the parser has to solve, and I hadn't found them fun :) [Though it was easy enough to write the lexer haha] > Cil converts the representation to one > more amenable to analysis. Hacking Cil > to accept the new constructors is harder. > For example it checks for redeclarations > of functions: > > void f(); > void f() { printf("Hello world"); } > > and knows there are the same .. but of course > C++ allows overloading. Well, I guess Cil wouldn't be so nice for all the advanced features of C++ like classes, class/function templates, member templates, etc. > > > What's needed is developers. > > > > Yes. However, doing this wrapper generator is considerably valuable. > > I agree. It would be a great boost for Ocaml I think. > And Ocaml people helping get it to work will also be > helping it to work for Felix .. A parser generator that can cope with C++ adequately is most definitely a great boost. As I said, I lack experience with the new features of SWIG, but you seem to think it is inadequate. Have you tried it out with any real-world C++ libs, does it have serious shortcomings? Regards, -- Eray Ozkural (exa) Comp. Sci. Dept., Bilkent University, Ankara KDE Project: http://www.kde.org http://www.cs.bilkent.edu.tr/~erayo Malfunction: http://malfunct.iuma.com GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners