From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 568167F7B8 for ; Fri, 31 Jan 2014 10:08:43 +0100 (CET) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of adrien@notk.org) identity=pra; client-ip=91.121.71.147; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="adrien@notk.org"; x-sender="adrien@notk.org"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of adrien@notk.org designates 91.121.71.147 as permitted sender) identity=mailfrom; client-ip=91.121.71.147; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="adrien@notk.org"; x-sender="adrien@notk.org"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@nautica.notk.org) identity=helo; client-ip=91.121.71.147; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="adrien@notk.org"; x-sender="postmaster@nautica.notk.org"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjYFAAdn61JbeUeT/2dsb2JhbABZgww4gwZSp2WSToEHFnSCJQEBBSMECwFGEAsYAgIFEw4CAg8FGDGIHKtYoSgXgSmNWQeCbzWBFASTJIUEAYEykG+DLjs X-IPAS-Result: AjYFAAdn61JbeUeT/2dsb2JhbABZgww4gwZSp2WSToEHFnSCJQEBBSMECwFGEAsYAgIFEw4CAg8FGDGIHKtYoSgXgSmNWQeCbzWBFASTJIUEAYEykG+DLjs X-IronPort-AV: E=Sophos;i="4.95,756,1384297200"; d="scan'208";a="47262985" Received: from nautica.notk.org ([91.121.71.147]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/ADH-AES256-SHA; 31 Jan 2014 10:08:42 +0100 Received: by nautica.notk.org (Postfix, from userid 1003) id A0E8EC009; Fri, 31 Jan 2014 10:08:41 +0100 (CET) Date: Fri, 31 Jan 2014 10:08:41 +0100 From: Adrien Nader To: =?utf-8?Q?=C3=96mer_Sinan_A=C4=9Facan?= Cc: OCaml Mailing List Message-ID: <20140131090841.GA10602@notk.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [Caml-list] any automated FFI bindings generators? Hi, On Thu, Jan 30, 2014, Ömer Sinan Ağacan wrote: > Hi all, > > I want to be able to use some very big C libraries from OCaml and I > want to automate process of writing bindings as much as possible. What > are my options for this? Do we have any tools to generate bindings? Some time ago I've started a project named "cowboy" (because it makes bindings...) which could match what you're looking for. I've been using it to generate lablgtk-compatible bindings to webkit-gtk and more generally glib-based libraries since they all share the same conventions. In a few words, it's yacfe-light, an AST-simplification layer and a custom output. Now, for the full explanation. 1- Yacfe-light Yacfe-light is a parser for unpreprocessed C (and C++ and Java). This means it can extract more programmer-level information than other parsers. Consider these code excerpts from my /usr/include/webkit/webkitwebview.h: #include #include #include This module depends on "webkitwebbackforwardlist", "webkitwebframe", "webkitwebhistoryitem" (and a few others). #define WEBKIT_TYPE_WEB_VIEW (webkit_web_view_get_type()) We can also see that the short name of the module is "WEB_VIEW". struct _WebKitWebViewClass { GtkContainerClass parent_class; /*< public >*/ [...] /* internal */ void (* set_scroll_adjustments) (WebKitWebView *web_view, [...] /* Padding for future expansion */ void (*_webkit_reserved0) (void); [...] } struct _WebKitWebViewClass is the central object to this library. It has a "parent_class" which is used for inheritance in glib-based libraries. Some of the elements in that class are for internal use and some of them are merely padding for future extension of the object without changing its ABI. #if !defined(WEBKIT_DISABLE_DEPRECATED) WEBKIT_API GdkPixbuf * webkit_web_view_get_icon_pixbuf (WebKitWebView *web_view); #endif And one last bit is there is an API (as seen by the "WEBKIT_API" attribute) which is available but deprecated. Most of these bits would go away with a regular C parser which requires the code to be preprocessed through 'cpp' first: there would be many more lines of code, names and comments wouldn't be preserved, macros which mean something to the programmer would be expanded to an unreadable form and some lines would be dropped. 2- AST simplication layer Yacfe-light is great. But it's a parser for a quite large language and it tries to extract as much information as possible. It also cares about the implementation of functions while this doesn't matter for binding generation. That's why cowboy has a layer to strip most of it and offer something simpler. 3- Output The last step is to output the actual bindings. I believe that large and mature libraries require a specific backend. They have their own API-style and it shouldn't be handled in a generic way since it would make the bindings much lower-level and less pleasant to use. In practice, for my glib backend which outputs code to be used with lablgtk, this means (output of 'wc -l *.ml'): 128 glib.ml -> main module which call others 29 glibAnnots.ml -> don't remember 71 glibC.ml -> .c file with the low-level code 98 glibFixes.ml -> work-arounds for inconsistencies in the C libs 158 glibG.ml -> g${Library}.ml file (lablgtk convention) 68 glibGtk.ml -> gtk${Library}.ml file (lablgtk convention) 23 glibGtkTypes.ml -> outputs type declarations 27 glibH.ml -> .h file with type conversion macros (Val_foo()) 190 glibLasso.ml -> parsing of names following the glib conventions 14 glibMETA.ml -> outputs a META file for use with ocamlfind 32 glibOCaml.ml -> translate between C and ocaml type names 45 glibOasis.ml -> outputs a _oasis file 27 glibObjects.ml -> don't remember 240 glibProps.ml -> outputs a ".props" file which is parsed by a lablgtk tool which then outputs several files 83 glibVar.ml -> outputs a ".var" file which is parsed by a lablgtk tool which then outputs several files Glib and lablgtk compat are probably more than what most large libraries would require though and you can get something useful in way fewer lines than that. The most annoying bit was definitely the names in glib-based libraries: alternating between "WEBKIT_WEB_VIEW", "WebKitWebView", "webkit_web_view_*", ... telling the code to understand that as "web" and "kit" or as "webkit". I also haven't had much trouble with updating and newer versions of the C libraries and have been fairly happy to have spent some time working on the automation. Caveats: I haven't been able to work on it recently and the code could most probably be improved but it the codebase isn't huge either and it doesn't have dead corpses in it. Yacfe-light doesn't enjoy C++ that much. This means that a .cpp file will make it choke even though the only functions you're interested in are C ones. Automation is useful for large and/or evolving libraries which have conventions. For instance, for glib, the headers all have the same shapes, they use the same macros and the "object" always goes in the first argument of the function. PHP's API on the other hand is everything but consistent. I think that should cover most of it. As I've said, I haven't updated cowboy recently. I haven't had much time and I will be horribly until the end of FOSDEM since I'm presenting there (which is sunday :) ). Regards, Adrien Nader