High-performance bytecode interpreter in OCaml

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* High-performance bytecode interpreter in OCaml
@ 2007-08-15 11:49 Joel Reymont
  2007-08-15 13:01 ` [Caml-list] " Jon Harrop
  0 siblings, 1 reply; 8+ messages in thread
From: Joel Reymont @ 2007-08-15 11:49 UTC (permalink / raw)
  To: Caml List

Folks,

I would like to write a high-performance VM in OCaml. I understand  
that OCaml itself uses either a threaded interpreter or a switch- 
style one.

What's the most efficient way to write a bytecode interpreter in  
OCaml itself, though?

Would CPS style be inlined if I were to write a threaded interpreter  
this way?

	Thanks in advance, Joel

--
http://wagerlabs.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] High-performance bytecode interpreter in OCaml
  2007-08-15 11:49 High-performance bytecode interpreter in OCaml Joel Reymont
@ 2007-08-15 13:01 ` Jon Harrop
  2007-08-15 13:20   ` Joel Reymont
  0 siblings, 1 reply; 8+ messages in thread
From: Jon Harrop @ 2007-08-15 13:01 UTC (permalink / raw)
  To: caml-list

On Wednesday 15 August 2007 12:49:31 Joel Reymont wrote:
> Folks,
>
> I would like to write a high-performance VM in OCaml. I understand
> that OCaml itself uses either a threaded interpreter or a switch-
> style one.

The performance of interpreters is heavily dependent upon what exactly you're 
evaluating (both language and program properties). If you start with a naive 
term-level interpreter or rewriter then you can get an order of magnitude in 
performance by optimizing the interpreter without leaving OCaml or moving to 
(real) bytecode compilation.

> What's the most efficient way to write a bytecode interpreter in
> OCaml itself, though?

I would start with a simple term-level interpreter, optimize that by avoiding 
lookups as much as possible. Then maybe switch to arrays of instructions. 
Finally, maybe something that generated OCaml bytecode and dynamically loaded 
it. However, you get diminishing returns and the latter is difficult and 
might well be no faster.

The main advantage of a real bytecode is locality: the instructions and data 
are stored together in a contiguous array. This is ideally suited to a simple 
C interpreter. A bytecode in OCaml would be a string and, IMHO, OCaml is not 
very efficient at string munging.

> Would CPS style be inlined if I were to write a threaded interpreter
> this way?

Quite a bit can be usefully inlined if you crank up the command line argument 
and write in a certain style. However, this is of limited use in interpreters 
of general purpose languages as their evaluation requires loops and so forth. 
Also, increasing global inlining often degrades the performance of symbolic 
code.

Depending upon your requirements, my recommendations are probably:

1. Optimize your term-level interpreter first and then consider compiling to a 
lower-level instruction array.

2. Don't bother compiling to OCaml bytecode.

3. Consider compiling to native code or doing something more sophisticated 
with your interpreter, like gathering statistics on the most common sequences 
of instructions and making the interpreter optimize itself.

If you send me the source I'll have a look over it and tell you any obvious 
optimizations I can think of.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] High-performance bytecode interpreter in OCaml
  2007-08-15 13:01 ` [Caml-list] " Jon Harrop
@ 2007-08-15 13:20   ` Joel Reymont
  2007-08-15 15:18     ` Jon Harrop
  2007-08-15 23:31     ` Grant Olson
  0 siblings, 2 replies; 8+ messages in thread
From: Joel Reymont @ 2007-08-15 13:20 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

I have an existing bytecode file that I need to execute. The bytecode  
is produced by some other compiler. Does this change anything?

I'm not sure if a term-level interpreter or rewriter applies in this  
scenario.

As for CPS, what I meant is implementing each bytecode instruction as  
a function that takes a continuation (next instruction?).

	Thanks, Joel

On Aug 15, 2007, at 2:01 PM, Jon Harrop wrote:

> The performance of interpreters is heavily dependent upon what  
> exactly you're
> evaluating (both language and program properties). If you start  
> with a naive
> term-level interpreter or rewriter then you can get an order of  
> magnitude in
> performance by optimizing the interpreter without leaving OCaml or  
> moving to
> (real) bytecode compilation.

--
http://wagerlabs.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] High-performance bytecode interpreter in OCaml
  2007-08-15 13:20   ` Joel Reymont
@ 2007-08-15 15:18     ` Jon Harrop
  2007-08-15 23:26       ` Joel Reymont
  2007-08-15 23:31     ` Grant Olson
  1 sibling, 1 reply; 8+ messages in thread
From: Jon Harrop @ 2007-08-15 15:18 UTC (permalink / raw)
  To: Joel Reymont; +Cc: caml-list

On Wednesday 15 August 2007 14:20:34 Joel wrote:
> I have an existing bytecode file that I need to execute. The bytecode
> is produced by some other compiler. Does this change anything?

Yes, that rules out most of the solutions I proposed. :-)

> I'm not sure if a term-level interpreter or rewriter applies in this
> scenario.

Indeed. If string operations turn out to be slow in OCaml and the CPS way of 
doing things is too obfuscated, you might try translating the byte array into 
an instruction array with a more OCaml-friendly notion of instruction (i.e. a 
variant type!). This would be particularly preferable if you think there are 
any simplifying transformations that you can do to reduce your workload.

> As for CPS, what I meant is implementing each bytecode instruction as
> a function that takes a continuation (next instruction?).

That should certainly work. Make sure you hoist out as much computation as 
possible from the continuation closures that you generate.

For example, make sure you change this:

  let rec compile bytecode = match bytecode with
    | 0x83::x::t ->
        (fun k ->
           print_string (String.lowercase (string_of_int x));
           k(compile t)) 

into this:

  let rec compile bytecode = match bytecode with
    | 0x83::x::t ->
        let x = String.lowercase (string_of_int x) in
        let t = compile t in
        (fun k -> print_string x; k t) 

This allows you to precompute as much as possible while you build up the 
continuation that will evaluate the program.

In particular, make sure you hoist out any lookups, such as jump labels.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] High-performance bytecode interpreter in OCaml
  2007-08-15 15:18     ` Jon Harrop
@ 2007-08-15 23:26       ` Joel Reymont
  2007-08-16  3:58         ` Jon Harrop
  0 siblings, 1 reply; 8+ messages in thread
From: Joel Reymont @ 2007-08-15 23:26 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Aug 15, 2007, at 4:18 PM, Jon Harrop wrote:

>   let rec compile bytecode = match bytecode with
>     | 0x83::x::t ->
>         let x = String.lowercase (string_of_int x) in
>         let t = compile t in
>         (fun k -> print_string x; k t)

I'm curious, would this be any faster than doing whatever the 0x83  
opcode requires and then just invoking (compile t)?

Does it actually make sense to convert the whole bytecode file into a  
chain of closures and then execute it repeatedly? Would there be a  
huge gain compared to interpreting every time?

I guess it would make sense if closures could be compiled into  
machine code "just in time" or if my VM was compiled using ocamlopt.  
I'm not sure, though, so I'm looking for input.

	Thanks, Joel

--
http://wagerlabs.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Caml-list] High-performance bytecode interpreter in OCaml
  2007-08-15 13:20   ` Joel Reymont
  2007-08-15 15:18     ` Jon Harrop
@ 2007-08-15 23:31     ` Grant Olson
  2007-08-16 15:55       ` Jon Harrop
  1 sibling, 1 reply; 8+ messages in thread
From: Grant Olson @ 2007-08-15 23:31 UTC (permalink / raw)
  To: caml-list


> 
> As for CPS, what I meant is implementing each bytecode 
> instruction as a function that takes a continuation (next 
> instruction?).
> 
> 	Thanks, Joel
> 

Once again I made the mistake of following up on google groups! Anyway,
nothing good was on tv so I wrote up a better example.

I think you'd make things too complicated with CPS.  Why not just create a
dispatcher function and opcode instructions that are all tail recursive?
Without knowing the internals, I don't see how it could be that much slower
than CPS and is much easier for us humans to read.  Anyway, here's a trivial
vm:

(*

Stupid vm.  It is initailized with an instruction counter of zero
  and has one register that is initially zero.  There are four instructions:

    'i' - increment the register
    'd' - decrement the register
    'p' - print the value of the register
    'r' - reset the instruction counter to zero

*)

type vm = {current_inst:int;register:int;bytecode:string}

let create_vm b =
  {current_inst=0;register=0;bytecode=b}

let inc_inst v =
  {v with current_inst=v.current_inst+1}

let update_reg v i =
  {v with register=v.register+i}

let update_reg_and_inc v i =
  let new_v = update_reg v i in
    inc_inst new_v

let rec dispatch v =
  let opcode=v.bytecode.[v.current_inst] in
    begin
      match opcode with
        'i' -> inc v
      | 'd' -> dec v
      | 'p' -> print v
      | 'r' -> reset v
    end
and inc v =
  let new_v = update_reg_and_inc v 1 in
    dispatch new_v
and dec v =
  let new_v = update_reg_and_inc v 1 in
    dispatch new_v
and print v =
  Printf.printf "%i\n" v.register;
  dispatch (inc_inst v)
and reset v =
  dispatch {v with current_inst=0}

    
let vm_instance = create_vm "piiipdpdddddddpiiiprpididi"    
            
let _ = dispatch vm_instance  
  
  


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] High-performance bytecode interpreter in OCaml
  2007-08-15 23:26       ` Joel Reymont
@ 2007-08-16  3:58         ` Jon Harrop
  0 siblings, 0 replies; 8+ messages in thread
From: Jon Harrop @ 2007-08-16  3:58 UTC (permalink / raw)
  To: Joel Reymont; +Cc: caml-list

On Thursday 16 August 2007 00:26:49 Joel Reymont wrote:
> On Aug 15, 2007, at 4:18 PM, Jon Harrop wrote:
> >   let rec compile bytecode = match bytecode with
> >
> >     | 0x83::x::t ->
> >
> >         let x = String.lowercase (string_of_int x) in
> >         let t = compile t in
> >         (fun k -> print_string x; k t)
>
> I'm curious, would this be any faster than doing whatever the 0x83
> opcode requires and then just invoking (compile t)?
>
> Does it actually make sense to convert the whole bytecode file into a
> chain of closures and then execute it repeatedly? Would there be a
> huge gain compared to interpreting every time?
>
> I guess it would make sense if closures could be compiled into
> machine code "just in time" or if my VM was compiled using ocamlopt.
> I'm not sure, though, so I'm looking for input.

Yes, this is not compilation, it just hoists code from the inner loop (such as 
String.lowercase and string_of_int in that case) and replaces dynamic 
dispatch with static dispatch. The former is usually the more important of 
the two optimizations, particularly if you can hoist lookups and replace them 
with single indirections.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] High-performance bytecode interpreter in OCaml
  2007-08-15 23:31     ` Grant Olson
@ 2007-08-16 15:55       ` Jon Harrop
  0 siblings, 0 replies; 8+ messages in thread
From: Jon Harrop @ 2007-08-16 15:55 UTC (permalink / raw)
  To: caml-list

On Thursday 16 August 2007 00:31:51 Grant Olson wrote:
> I think you'd make things too complicated with CPS.  Why not just create a
> dispatcher function and opcode instructions that are all tail recursive?

Just spent this morning quantifying this out of interest. The BF language is a 
good place to start:

  http://www.muppetlabs.com/~breadbox/bf/

You can write an interpreter for BF in a small amount of code and learn how to 
optimize it without too much difficulty.

I'm going to present three different OCaml programs. I'll start by comparing 
their performance before presenting and discussing the code. I've benchmarked 
them all evaluating a BF program that solves the Towers of Hanoi (including 
ASCII art!) from that web page:

OCaml 1: 186s  41LOC
C:        79s  77LOC
OCaml 2:  59s  42LOC
OCaml 3:  29s  59LOC
OCaml 3:  25s  47LOC

Note that most of the OCaml implementations are substantially shorter and 
faster than the C.

Here is a simple OCaml1 interpreter of the BF language:

open Printf
let data = String.make 1000000 '\000'
let edit f p =
  let c = f (int_of_char data.[p]) land 0xff in
  data.[p] <- char_of_int c
let try_input_line ch = try Some(input_line ch) with End_of_file -> None
let rec read ch = match try_input_line ch with
  | Some line -> line :: read ch
  | None -> []
let code = match Sys.argv with
  | [|_; file|] ->
      let ch = open_in file in
      let code = read ch in
      close_in ch;
      code
  | _ -> read stdin
let code = String.concat "" code
let rec run p pc stack =
  if pc < 0 || pc >= String.length code then exit 0;
  match code.[pc], stack with
   | '>', _ -> run (p + 1) (pc + 1) stack
   | '<', _ -> run (p - 1) (pc + 1) stack
   | '+', _ -> edit ((+) 1) p; run p (pc + 1) stack
   | '-', _ -> edit (fun n -> n-1) p; run p (pc + 1) stack
   | '.', _ -> printf "%c%!" data.[p]; run p (pc + 1) stack
   | ',', _ -> data.[p] <- input_char stdin; run p (pc + 1) stack
   | '[', _ when data.[p] = '\000' -> ffwd 1 p (pc + 1) stack
   | '[', _ -> run p (pc + 1) (pc :: stack)
   | ']', _::stack when data.[p] = '\000' -> run p (pc + 1) stack
   | ']', pc::_ -> run p (pc + 1) stack
   | _ -> run p (pc + 1) stack
and ffwd lev p pc stack =
  match code.[pc], lev with
  | ']', 1 -> run p (pc + 1) stack
  | ']', lev -> ffwd (lev - 1) p (pc + 1) stack
  | '[', _ -> ffwd (lev + 1) p (pc + 1) stack
  | _ -> ffwd lev p (pc + 1) stack
let () = run 0 0 []

As you can see, this program dispatches over the characters directly from the 
bytecode itself. No attempt is made to make the bytecode into a friendlier 
data structure for OCaml. This implementation is over 2x slower than the 
naive C code for two main reasons:

1. Reading and writing characters is slow in OCaml.

2. Pattern matching over characters is slow in OCaml.

If you look at some typical BF source code you'll notice that it is often 
quite repetitive:

>>>>>>>>>>>>>>>>>>>>>>>>>[-]>[-]+++++++++++++++++++++++++++.++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++.-------------------.-------
--------------------------------------.+++++++++++++++++++++++++++++++++++++

This is easily exploited by run-length encoding the bytecode as it is loaded. 
The second interpreter does exactly this, converting the BF source code into 
an intermediate representation that is denser:

open Printf
let data = String.make 1000000 '\000'
let try_input_line ch = try Some(input_line ch) with End_of_file -> None
let rec read ch = match try_input_line ch with
  | Some line -> line :: read ch
  | None -> []
let code = match Sys.argv with
  | [|_; file|] ->
      let ch = open_in file in
      let code = read ch in
      close_in ch;
      code
  | _ -> read stdin
let code = String.concat "" code
let rec compile pc t =
  if pc >= String.length code then pc, List.rev t else
  match code.[pc], t with
  | '>', `Move n::t -> compile (pc+1) (`Move (n + 1)::t)
  | '<', `Move n::t -> compile (pc+1) (`Move (n - 1)::t)
  | ('>' | '<'), t -> compile pc (`Move 0::t)
  | '+', `Add n::t -> compile (pc+1) (`Add (n + 1)::t)
  | '-', `Add n::t -> compile (pc+1) (`Add (n - 1)::t)
  | ('+' | '-'), t -> compile pc (`Add 0::t)
  | '.', t -> compile (pc+1) (`Out::t)
  | ',', t -> compile (pc+1) (`In::t)
  | '[', t ->
      let pc, t' = compile (pc+1) [] in
      compile (pc+1) (`Loop t'::t)
  | ']', t -> pc, List.rev t
  | _, t -> compile (pc+1) t
let rec run p = function
  | `Move dp::t -> run (p + dp) t
  | `Add dn::t -> data.[p] <- char_of_int(int_of_char data.[p] + dn); run p t
  | `Out::t -> print_char data.[p]; run p t
  | `In::t -> data.[p] <- input_char stdin; run p t
  | `Loop t'::t ->
      let rec aux p = if data.[p] = '\000' then p else aux (run p t') in
      run (aux p) t
  | [] -> p;;
let _ =
  let _, t = compile 0 [] in
  run 0 t

Sequences such as +++++ or >>>>> are now amortized into `Add 5 and `Move 5, 
respectively. Note that this often less memory efficient but OCaml is much 
faster at handling variant types. The resulting data structure looks like:

 [`Move 121; `Loop [`Add -1]; `Move 41; `Loop [`Add -1]; `Move 41;
  `Loop [`Add -1]; `Move 41; `Loop [`Add -1]; `Move 1; `Loop [`Add -1];
  `Add 27; `Out; `Add 64; `Out; `Add -19; `Out; `Add -45; `Out; `Add 64;
  `Out; `Add -41; `Out; `Add 24; `Out; `Loop [`Add -1]; `Add 27; `Out;
  `Add 64; `Out; `Add -41; `Out; `Add 9; `Out; `Add -9; `Out; `Add 5; `Out;
  `Add 17; `Out; `Add 12; `Out; `Add 27; `Out; `Add 8; `Out; `Add -18; `Out;
  `Add 13; `Out; `Add 1; `Out; `Add -83; `Out; `Add 79; `Out; `Add -9; `Out;
  `Add -70; `Out; `Add 40; `Out; `Add 25; `Out; `Add 13; `Out; `Add 1; `Out;
  `Add -6; `Out; `Add -73; `Out; `Add 73; `Out; `Add 5; `Out; `Add -78; `Out;
  `Add 34; `Out; `Add 48; `Out; `Add -17; `Out; `Add 8; `Out; `Add 5; `Out;
  `Add -8; `Out; `Add -60; `Out; `Add 57; `Out; `Add 8; `Out;
  `Loop [`Add -1]; `Add 27; `Out; `Add 64; `Out; `Add -40; `Out; `Add 8;
  `Out; `Add -10; `Out; `Add 4; `Out; `Add 19; `Out; `Add 15; `Out; `Add 27;
  `Out; `Add -9; `Out; `Add 11; `Out; `Out; `Add -15; `Out; `Add 9; `Out;
  `Add -78; `Out; `Add 66; `Out; `Add 23; `Out; `Add -89; `Out; `Add 35;
  `Out; `Add 41; `Out; `Add -3; `Out; `Add -3; `Out; `Out; `Add 9; `Out;
  `Add 3; `Out; `Add -14; `Out; `Add -68; `Out; `Add 55; `Out; `Add 24; `Out;
  `Add -3; `Out; `Add -6; `Out; `Add -70; `Out; `Add 28; `Out; `Add 44; `Out;
  `Add 12; `Out; `Out; `Add -4; `Out; `Add -54; `Out; `Add -11; `Out; `Out;
  `Add 72; `Out; `Out; `Out; `Add -73; `Out; `Add 53; `Out; `Add 9; `Out;
  `Add -3; `Out; `Add -3; `Out; `Out; ...])

This simple optimization is enough to beat C, even though the C implementation 
goes to some lengths to run-length encode code loops (which can only buy you 
a factor of 2). This is faster for two main reasons:

1. O(n) loops have been reduced to O(1) operations.

2. Pattern matching dispatches over sum types much more efficiently than over 
characters or strings.

In order to draw a justifiable conclusion about the relative performance of 
dynamic dispatch and static continuations, we should put more effort into 
optimizing this code. As OCaml is slow at handling characters, it might be 
worth sacrificing 8x the memory consumption for an int array rather than a 
string to represent the scratch space "data". We can also use an ordinary 
variant type (these are often dispatch over more efficiently) and flatten the 
data structure to avoid using lists of variants by placing the tail in each 
constructor:

open Printf
let data = Array.make 1000000 0
let try_input_line ch = try Some(input_line ch) with End_of_file -> None
let rec read ch = match try_input_line ch with
  | Some line -> line :: read ch
  | None -> []
let code = match Sys.argv with
  | [|_; file|] ->
      let ch = open_in file in
      let code = read ch in
      close_in ch;
      code
  | _ -> read stdin
let code = String.concat "" code
type t =
  | Move of int * t
  | Add of int * t
  | Out of t
  | In of t
  | Loop of t * t
  | Nil
let rec rev_append x = function
  | Move(n, t) -> rev_append (Move(n, x)) t
  | Add(n, t) -> rev_append (Add(n, x)) t
  | Out t -> rev_append (Out x) t
  | In t -> rev_append (In x) t
  | Loop(t', t) -> rev_append (Loop(t', x)) t
  | Nil -> x
let rev = rev_append Nil
let rec compile pc t =
  if pc >= String.length code then pc, rev t else
  match code.[pc], t with
  | '>', Move(n, t) -> compile (pc+1) (Move(n + 1, t))
  | '<', Move(n, t) -> compile (pc+1) (Move(n - 1, t))
  | ('>' | '<'), t -> compile pc (Move(0, t))
  | '+', Add(n, t) -> compile (pc+1) (Add(n + 1, t))
  | '-', Add(n, t) -> compile (pc+1) (Add(n - 1, t))
  | ('+' | '-'), t -> compile pc (Add(0, t))
  | '.', t -> compile (pc+1) (Out t)
  | ',', t -> compile (pc+1) (In t)
  | '[', t ->
      let pc, t' = compile (pc+1) Nil in
      compile (pc+1) (Loop(t', t))
  | ']', t -> pc, rev t
  | _, t -> compile (pc+1) t
let add dn p =
  data.(p) <- data.(p) + dn
let rec run p = function
  | Move(dp, t) -> run (p + dp) t
  | Add(dn, t) -> add dn p; run p t
  | Out t -> printf "%c%!" (char_of_int data.(p)); run p t
  | In t -> data.(p) <- int_of_char(input_char stdin); run p t
  | Loop(t', t) ->
      let rec aux p = if data.(p) = 0 then p else aux (run p t') in
      run (aux p) t
  | Nil -> p;;
let _ =
  let _, t = compile 0 Nil in
  run 0 t

Finally, we can use continuations to avoid all dynamic dispatch by building a 
tree of closures that capture the closure they will invoke in their 
environment. The following uses this approach and is my most efficient so 
far:

open Printf
let data = Array.make 1000000 0
let try_input_line ch = try Some(input_line ch) with End_of_file -> None
let rec read ch = match try_input_line ch with
  | Some line -> line :: read ch
  | None -> []
let code = match Sys.argv with
  | [|_; file|] ->
      let ch = open_in file in
      let code = read ch in
      close_in ch;
      code
  | _ -> read stdin
let code = String.concat "" code
let rec compile pc t =
  if pc >= String.length code then pc, List.rev t else
  match code.[pc], t with
  | '>', `Move n::t -> compile (pc+1) (`Move (n + 1)::t)
  | '<', `Move n::t -> compile (pc+1) (`Move (n - 1)::t)
  | ('>' | '<'), t -> compile pc (`Move 0::t)
  | '+', `Add n::t -> compile (pc+1) (`Add (n + 1)::t)
  | '-', `Add n::t -> compile (pc+1) (`Add (n - 1)::t)
  | ('+' | '-'), t -> compile pc (`Add 0::t)
  | '.', t -> compile (pc+1) (`Out::t)
  | ',', t -> compile (pc+1) (`In::t)
  | '[', t ->
      let pc, t' = compile (pc+1) [] in
      compile (pc+1) (`Loop t'::t)
  | ']', t -> pc, List.rev t
  | _, t -> compile (pc+1) t
let rec run = function
  | [] -> (fun p -> p)
  | h::t ->
      let k = run t in
      match h with
      | `Move dp -> (fun p -> k(p + dp))
      | `Add dn -> (fun p -> data.(p) <- data.(p) + dn; k p)
      | `Out -> (fun p -> printf "%c%!" (char_of_int data.(p)); k p)
      | `In -> (fun p -> data.(p) <- int_of_char(input_char stdin); k p)
      | `Loop t' ->
          let k' = run t' in
          (fun p ->
             let rec aux p = if data.(p) = 0 then p else aux (k' p) in
             k (aux p))
let _ =
  let _, t = compile 0 [] in
  run t 0

Note that this implementation is also significantly shorter than the previous 
one and, IMHO, just as clear.

Even in the case of BF, where the dispatch is over a small number of 
possibilities, generating a tree of closures is still ~16% more efficient 
(despite the overhead of closure invocation compared to the previously 
tail-recursive call of "run"). So this approach is definitely worth 
considering.

As an aside, I've also written a MetaOCaml implementation but cannot get it to 
run anything like as fast. I was very surprised by this as I found it quite 
easy to get substantial performance improvements in term level interpreters 
using MetaOCaml.

To read more detailed information about the design and implementation of 
term-level, bytecode, staged and macro interpreters and rewriters, please 
subscribe to the OCaml Journal.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-08-16 16:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-15 11:49 High-performance bytecode interpreter in OCaml Joel Reymont
2007-08-15 13:01 ` [Caml-list] " Jon Harrop
2007-08-15 13:20   ` Joel Reymont
2007-08-15 15:18     ` Jon Harrop
2007-08-15 23:26       ` Joel Reymont
2007-08-16  3:58         ` Jon Harrop
2007-08-15 23:31     ` Grant Olson
2007-08-16 15:55       ` Jon Harrop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).