From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p9OAvOSL004657
	for <caml-list@sympa-roc.inria.fr>; Mon, 24 Oct 2011 12:57:24 +0200
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AjQBAC9EpU7U4367kWdsb2JhbABDhHWkHCIBAQEBCQsLBxQDIoFuAQEEASNJCAIDBQsLGAICJgICITYGEwmHdwIGoX2RIYEwhXyBFASMT4xmhQeHKg
X-IronPort-AV: E=Sophos;i="4.69,397,1315173600"; 
   d="scan'208";a="125556722"
Received: from moutng.kundenserver.de ([212.227.126.187])
  by mail1-smtp-roc.national.inria.fr with ESMTP; 24 Oct 2011 12:57:18 +0200
Received: from office1.lan.sumadev.de (dslb-094-219-221-089.pools.arcor-ip.net [94.219.221.89])
	by mrelayeu.kundenserver.de (node=mreu4) with ESMTP (Nemesis)
	id 0LkWsS-1Qi7z73dvo-00aSZ2; Mon, 24 Oct 2011 12:57:18 +0200
Received: from [192.168.178.19] (546BF816.cm-12-4d.dynamic.ziggo.nl [84.107.248.22])
	by office1.lan.sumadev.de (Postfix) with ESMTPSA id 5A4B2C00C7;
	Mon, 24 Oct 2011 12:57:17 +0200 (CEST)
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Gabriel Scherer <gabriel.scherer@gmail.com>
Cc: Diego Olivier Fernandez Pons <dofp.ocaml@gmail.com>,
        caml-list <caml-list@inria.fr>
In-Reply-To: <CAPFanBEwEgmXo6+dGy8ScMbJdOdVu+SP8dzGL2Wk5s1EKq5kkA@mail.gmail.com>
References: 
	 <CAHqiZ-J15s9PiVnvT+rw8KF--OooFLyP8YRk6x+e31dTEGX_SQ@mail.gmail.com>
	 <CAPFanBEwEgmXo6+dGy8ScMbJdOdVu+SP8dzGL2Wk5s1EKq5kkA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Date: Mon, 24 Oct 2011 12:57:16 +0200
Message-ID: <1319453836.18639.82.camel@thinkpad>
Mime-Version: 1.0
X-Mailer: Evolution 2.28.3 
Content-Transfer-Encoding: 7bit
X-Provags-ID: V02:K0:TmE9oF3k3uwM/ZS6203rvWJSQv/zUEONm2Ql2idjmPE
 5AJJvOblpmCJRbDFaBTP3Xr1nvilQeR07HbrHTx3cSOvm5k3RM
 XsPgBS9ePpB9169MiRCQ1qUOoribXTSvdZnb3aXKKqYKU9hwLQ
 1+goWjL0fTSD0tV/sWB7FFlKNmcSUsL5Gsx0yabKEzgMYd4vno
 iAN2CR1hxIOFex8lWhX9YWsSdbyhBi1ub6RdBOiai53zGRtXly
 MoqwJ4OTGIftDrAa1YvyfuTOKbV/fks5pAru1Z+OwEPDGlLRt/
 IyXgE+eeggGGqpwCp6n+VSj3bLtFn1QKK1PuFrOsORTSUUwDG5
 jDn5r7UM0OtoUaQ2HKYrKFjGNERfskXIEAV2A51fz
Subject: Re: [Caml-list] How to write an efficient interpreter

Am Montag, den 24.10.2011, 11:58 +0200 schrieb Gabriel Scherer:
> Even staying at the "interpreter" stage, you can probably improve your
> performance seriously by being careful about your implementation : use
> efficient data structure (what is your implementation of variable
> lookup?), avoid unnecessary allocation, use tail-recursive functions
> where possible, etc. Try to find a time-consuming script example that
> exercise the type of computations you'll usually perform, and profile
> your interpreter with gprof (compiled natively), try to optimize the
> parts that show high in the profile, rinse and repeat.
> 
> The natural next step is to use a bytecode instead of interpreting an
> AST directly. You can probably find something relatively simple and
> yet have a net efficiency gain over direct interpretation, because the
> bytecode structure is more linear and can be interpreted more
> efficiently; in a way, you compiled away the tree traversal. 

Well, I wouldn't be too optimistic that this also works for an
interpreter written in Ocaml. Bytecode profits a lot from cache
locality, i.e. all bytecode instructions in a single cache line are
loaded at once from RAM. Cache lines are short, though, usually only 64
bytes. If the bytecode is an int array, this translates to 8 or 16 array
elements (64/32 bit platforms). If you stick to the int array, you
probably get the performance benefit, but you have to live with the
integer representation, which will make other parts of the interpreter
difficult (e.g. representation of values). If you use an instruction
array, where "instruction" is a variant type with and without arguments,
the performance benefit is probably completely lost, because you deviate
too much from the linear memory representation.

Also, anything in the direction of good bytecode is really complicated.
What about this alternative: It is much simpler to "JIT"-compile to
ocaml (i.e. dump your AST in a different syntax), compile the ocaml code
by invoking ocamlc or ocamlopt, and to load the compiled code
dynamically (I'm just guessing that you have a dynamic environment).
There are only a few drawbacks of this approach, namely that you need
ocamlc or ocamlopt at runtime (plus "helper files", i.e. the cmi's of
the standard library), and that you cannot delete code from RAM once it
is loaded. The latter is usually not too problematic if there is a
possibility to restart the system now and then.

Gerd

> At this
> level, you become quite sensible to micro-optimisation : bytecode
> interpreters written in C have access to efficiency tricks (direct
> threaded code and the like, or even pinning registers directly) that
> are difficult or impossible to emulate in OCaml, even if you can get
> something reasonable with mutually tail-recursive functions. As a
> single point of measure, I spent a few hours hacking an OCaml bytecode
> interpreter for a very small subset of Caml recently, and got within
> 5x-10x slower than the OCaml bytecode interpreter itself for favorable
> cases. It's not very good, or not very bad, depending on what you
> expect (but definitely slower than the ocaml native compiler, or even
> the various JIT projects).
> 
> Of course, if possible, you could also target an existing bytecode or
> intermediate representation, or even compile to an existing language
> with a rich enough runtime. If you're familiar, for example, with
> either JVM or LLVM, those are the popular choices these days. I don't
> know how that would mix with inline Javascript, though (I suppose both
> environments have a javascript interpreter). You could also compile to
> OCaml code or Javascript directly, and reuse their implementation, GC,
> etc. With the current Javascript engines, you can get very good
> performances if you compile in a way that resemble the ugly Javascript
> code they're trying to optimize. For example, js_of_ocaml compiles
> OCaml bytecode into Javascript and the result performs better than
> using OCaml bytecode interpreter (written in C and well designed for
> efficiency) directly. That's already quite a lot more work, however,
> than a good interpreter or a simple home-made bytecode.
> 
> On Mon, Oct 24, 2011 at 11:10 AM, Diego Olivier Fernandez Pons
> <dofp.ocaml@gmail.com> wrote:
> >     Caml-list,
> > I have to write an interpreter for a datatype rich purely applicative
> > language. I have already written a naive interpreter (like in programming
> > languages class) and was wondering what where the options for writing
> > something that would perform better while keeping it maintainable by a
> > single person < 5% dedicated and preferably only in core-ML (no C code or
> > fancy ML extensions).
> > The language could be described as a rich datastructure typed SQL with a
> > programming language syntax
> > - first class sets, arrays, dictionaries, lists and their corresponding
> > comprehensions
> > - tuples and records merged into a single concept (accessible per position
> > like in (x, y) = ... or per label like in for t in tupleSet if t.label == 3
> > then)
> > - only applicative functions (no lambda operator, no partial application)
> > - simple types are int, double and string
> > - only user declared types are tuples-records
> > It is mainly used for data transformation : take a list of countries,
> > extract from an database the international airports of those countries,
> > geolocalize them using city/location table, generate a distance table using
> > a great-circle distance, assign to each size of plane the legs they can do
> > based on their maximum fight range, etc.
> > The language has a JavaScript inline capability
> >     execute JavaScript {
> >         //write your javascript code here
> >     }
> > that's typically used to define functions, unroll comprehensions to make
> > them more efficient and to call external libraries (JavaScript has full
> > visibility on all the language objects and can read/write directly inside,
> > probably the existing interpreter was written in JavaScript), so I am
> > considering allowing those features in the core language and only supporting
> > a very slow JavaScript deprecated compatibility mode.
> >
> >          Diego Olivier
> 
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.
------------------------------------------------------------