caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Adrien Nader <adrien@notk.org>
To: "Ömer Sinan Ağacan" <omeragacan@gmail.com>
Cc: OCaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] any automated FFI bindings generators?
Date: Fri, 31 Jan 2014 10:08:41 +0100	[thread overview]
Message-ID: <20140131090841.GA10602@notk.org> (raw)
In-Reply-To: <CAMQQO3nfqhpL+tq+tTiHaDirDL+ayP1ucodN4f3amnT5+4u+gA@mail.gmail.com>

Hi,

On Thu, Jan 30, 2014, Ömer Sinan Ağacan wrote:
> Hi all,
> 
> I want to be able to use some very big C libraries from OCaml and I
> want to automate process of writing bindings as much as possible. What
> are my options for this? Do we have any tools to generate bindings?

Some time ago I've started a project named "cowboy" (because it makes
bindings...) which could match what you're looking for. I've been using
it to generate lablgtk-compatible bindings to webkit-gtk and more
generally glib-based libraries since they all share the same
conventions.

In a few words, it's yacfe-light, an AST-simplification layer and a
custom output.

Now, for the full explanation.

1- Yacfe-light

Yacfe-light is a parser for unpreprocessed C (and C++ and Java). This
means it can extract more programmer-level information than other
parsers. Consider these code excerpts from my
/usr/include/webkit/webkitwebview.h:

  #include <webkit/webkitwebbackforwardlist.h>
  #include <webkit/webkitwebframe.h>
  #include <webkit/webkitwebhistoryitem.h>

This module depends on "webkitwebbackforwardlist", "webkitwebframe",
"webkitwebhistoryitem" (and a few others).

  #define WEBKIT_TYPE_WEB_VIEW (webkit_web_view_get_type())

We can also see that the short name of the module is "WEB_VIEW".

  struct _WebKitWebViewClass {
      GtkContainerClass parent_class;

      /*< public >*/
      [...]

      /* internal */
      void                       (* set_scroll_adjustments) (WebKitWebView        *web_view,
      [...]

      /* Padding for future expansion */
      void (*_webkit_reserved0) (void);
      [...]
  }

struct _WebKitWebViewClass is the central object to this library. It has
a "parent_class" which is used for inheritance in glib-based libraries.
Some of the elements in that class are for internal use and some of
them are merely padding for future extension of the object without
changing its ABI.

  #if !defined(WEBKIT_DISABLE_DEPRECATED)
  WEBKIT_API GdkPixbuf *
  webkit_web_view_get_icon_pixbuf (WebKitWebView *web_view);
  #endif

And one last bit is there is an API (as seen by the "WEBKIT_API"
attribute) which is available but deprecated.

Most of these bits would go away with a regular C parser which requires
the code to be preprocessed through 'cpp' first: there would be many
more lines of code, names and comments wouldn't be preserved, macros
which mean something to the programmer would be expanded to an
unreadable form and some lines would be dropped.

2- AST simplication layer
Yacfe-light is great. But it's a parser for a quite large language and
it tries to extract as much information as possible. It also cares about
the implementation of functions while this doesn't matter for binding
generation.

That's why cowboy has a layer to strip most of it and offer something
simpler.

3- Output
The last step is to output the actual bindings. I believe that large and
mature libraries require a specific backend. They have their own
API-style and it shouldn't be handled in a generic way since it would
make the bindings much lower-level and less pleasant to use.

In practice, for my glib backend which outputs code to be used with
lablgtk, this means (output of 'wc -l *.ml'):
  128 glib.ml -> main module which call others
   29 glibAnnots.ml -> don't remember
   71 glibC.ml -> .c file with the low-level code
   98 glibFixes.ml -> work-arounds for inconsistencies in the C libs
  158 glibG.ml -> g${Library}.ml file (lablgtk convention)
   68 glibGtk.ml -> gtk${Library}.ml file (lablgtk convention)
   23 glibGtkTypes.ml -> outputs type declarations
   27 glibH.ml -> .h file with type conversion macros (Val_foo())
  190 glibLasso.ml -> parsing of names following the glib conventions
   14 glibMETA.ml -> outputs a META file for use with ocamlfind
   32 glibOCaml.ml -> translate between C and ocaml type names
   45 glibOasis.ml -> outputs a _oasis file
   27 glibObjects.ml -> don't remember
  240 glibProps.ml -> outputs a ".props" file which is parsed by a
                      lablgtk tool which then outputs several files
   83 glibVar.ml -> outputs a ".var" file which is parsed by a
                    lablgtk tool which then outputs several files

Glib and lablgtk compat are probably more than what most large libraries
would require though and you can get something useful in way fewer lines
than that. The most annoying bit was definitely the names in glib-based
libraries: alternating between "WEBKIT_WEB_VIEW", "WebKitWebView",
"webkit_web_view_*", ... telling the code to understand that as "web"
and "kit" or as "webkit".

I also haven't had much trouble with updating and newer versions of the
C libraries and have been fairly happy to have spent some time working
on the automation.


Caveats:
I haven't been able to work on it recently and the code could most
probably be improved but it the codebase isn't huge either and it
doesn't have dead corpses in it.

Yacfe-light doesn't enjoy C++ that much. This means that a .cpp file
will make it choke even though the only functions you're interested in
are C ones.

Automation is useful for large and/or evolving libraries which have
conventions. For instance, for glib, the headers all have the same
shapes, they use the same macros and the "object" always goes in the
first argument of the function. PHP's API on the other hand is
everything but consistent.


I think that should cover most of it. As I've said, I haven't updated
cowboy recently. I haven't had much time and I will be horribly until
the end of FOSDEM since I'm presenting there (which is sunday :) ).


Regards,
Adrien Nader


  parent reply	other threads:[~2014-01-31  9:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-30 11:01 Ömer Sinan Ağacan
2014-01-30 14:13 ` Gerd Stolpmann
2014-01-31  1:28 ` Francois Berenger
2014-01-31 15:22   ` Markus Weißmann
2014-01-31 15:31     ` Daniel Bünzli
2014-01-31 16:23       ` Markus Weissmann
2014-01-31  9:08 ` Adrien Nader [this message]
2014-01-31  9:12   ` Adrien Nader
2014-02-02 17:04     ` Adrien Nader
2014-01-31 15:41 ` Xavier Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140131090841.GA10602@notk.org \
    --to=adrien@notk.org \
    --cc=caml-list@inria.fr \
    --cc=omeragacan@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).