Re: [Caml-list] How to write an efficient interpreter

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Gabriel Scherer <gabriel.scherer@gmail.com>
To: Diego Olivier Fernandez Pons <dofp.ocaml@gmail.com>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] How to write an efficient interpreter
Date: Mon, 24 Oct 2011 11:58:49 +0200	[thread overview]
Message-ID: <CAPFanBEwEgmXo6+dGy8ScMbJdOdVu+SP8dzGL2Wk5s1EKq5kkA@mail.gmail.com> (raw)
In-Reply-To: <CAHqiZ-J15s9PiVnvT+rw8KF--OooFLyP8YRk6x+e31dTEGX_SQ@mail.gmail.com>

Even staying at the "interpreter" stage, you can probably improve your
performance seriously by being careful about your implementation : use
efficient data structure (what is your implementation of variable
lookup?), avoid unnecessary allocation, use tail-recursive functions
where possible, etc. Try to find a time-consuming script example that
exercise the type of computations you'll usually perform, and profile
your interpreter with gprof (compiled natively), try to optimize the
parts that show high in the profile, rinse and repeat.

The natural next step is to use a bytecode instead of interpreting an
AST directly. You can probably find something relatively simple and
yet have a net efficiency gain over direct interpretation, because the
bytecode structure is more linear and can be interpreted more
efficiently; in a way, you compiled away the tree traversal. At this
level, you become quite sensible to micro-optimisation : bytecode
interpreters written in C have access to efficiency tricks (direct
threaded code and the like, or even pinning registers directly) that
are difficult or impossible to emulate in OCaml, even if you can get
something reasonable with mutually tail-recursive functions. As a
single point of measure, I spent a few hours hacking an OCaml bytecode
interpreter for a very small subset of Caml recently, and got within
5x-10x slower than the OCaml bytecode interpreter itself for favorable
cases. It's not very good, or not very bad, depending on what you
expect (but definitely slower than the ocaml native compiler, or even
the various JIT projects).

Of course, if possible, you could also target an existing bytecode or
intermediate representation, or even compile to an existing language
with a rich enough runtime. If you're familiar, for example, with
either JVM or LLVM, those are the popular choices these days. I don't
know how that would mix with inline Javascript, though (I suppose both
environments have a javascript interpreter). You could also compile to
OCaml code or Javascript directly, and reuse their implementation, GC,
etc. With the current Javascript engines, you can get very good
performances if you compile in a way that resemble the ugly Javascript
code they're trying to optimize. For example, js_of_ocaml compiles
OCaml bytecode into Javascript and the result performs better than
using OCaml bytecode interpreter (written in C and well designed for
efficiency) directly. That's already quite a lot more work, however,
than a good interpreter or a simple home-made bytecode.

On Mon, Oct 24, 2011 at 11:10 AM, Diego Olivier Fernandez Pons
<dofp.ocaml@gmail.com> wrote:
>     Caml-list,
> I have to write an interpreter for a datatype rich purely applicative
> language. I have already written a naive interpreter (like in programming
> languages class) and was wondering what where the options for writing
> something that would perform better while keeping it maintainable by a
> single person < 5% dedicated and preferably only in core-ML (no C code or
> fancy ML extensions).
> The language could be described as a rich datastructure typed SQL with a
> programming language syntax
> - first class sets, arrays, dictionaries, lists and their corresponding
> comprehensions
> - tuples and records merged into a single concept (accessible per position
> like in (x, y) = ... or per label like in for t in tupleSet if t.label == 3
> then)
> - only applicative functions (no lambda operator, no partial application)
> - simple types are int, double and string
> - only user declared types are tuples-records
> It is mainly used for data transformation : take a list of countries,
> extract from an database the international airports of those countries,
> geolocalize them using city/location table, generate a distance table using
> a great-circle distance, assign to each size of plane the legs they can do
> based on their maximum fight range, etc.
> The language has a JavaScript inline capability
>     execute JavaScript {
>         //write your javascript code here
>     }
> that's typically used to define functions, unroll comprehensions to make
> them more efficient and to call external libraries (JavaScript has full
> visibility on all the language objects and can read/write directly inside,
> probably the existing interpreter was written in JavaScript), so I am
> considering allowing those features in the core language and only supporting
> a very slow JavaScript deprecated compatibility mode.
>
>          Diego Olivier

next prev parent reply	other threads:[~2011-10-24  9:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-24  9:10 Diego Olivier Fernandez Pons
2011-10-24  9:58 ` Gabriel Scherer [this message]
2011-10-24 10:57   ` Gerd Stolpmann
2011-10-24 11:28 ` Xavier Leroy
2011-10-24 11:50   ` Diego Olivier Fernandez Pons
2011-10-24 12:33     ` Jérémie Dimino
2011-10-24 12:40     ` Gerd Stolpmann
2011-10-24 12:46       ` oliver
2011-10-24 12:58         ` Gerd Stolpmann
2011-10-24 21:01           ` oliver
2011-10-26  9:27             ` Diego Olivier Fernandez Pons
2011-11-07  6:45               ` Jon Harrop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPFanBEwEgmXo6+dGy8ScMbJdOdVu+SP8dzGL2Wk5s1EKq5kkA@mail.gmail.com \
    --to=gabriel.scherer@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=dofp.ocaml@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).