From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id pA76kDNm030113 for ; Mon, 7 Nov 2011 07:46:14 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmsBAGF+t05UXeb0mWdsb2JhbABDqX0iAQEBAQEICwsHFCWBcgEBBQgCMDIaAwIJRhk+AgQBHQWHdLUjiSsEpg0 X-IronPort-AV: E=Sophos;i="4.69,468,1315173600"; d="scan'208";a="117332381" Received: from avasout03.plus.net ([84.93.230.244]) by mail4-smtp-sop.national.inria.fr with ESMTP; 07 Nov 2011 07:46:08 +0100 Received: from WinEight ([46.208.111.81]) by avasout03.plus.net with smtp id u6m61h0011lRNrS016m7vC; Mon, 07 Nov 2011 06:46:07 +0000 X-CM-Score: 0.00 X-CNFS-Analysis: v=2.0 cv=W8y6pGqk c=1 sm=1 a=lyetY/+p0BqC0i8b5xHHmg==:17 a=Ulr_7PGG0H0A:10 a=Xub9RBUEA-sA:10 a=kj9zAlcOel0A:10 a=HDfyuBsteHQViz5HYU8A:9 a=FdbIwPUgG7mksJ6gjMIA:7 a=CjuIK1q_8ugA:10 a=lyetY/+p0BqC0i8b5xHHmg==:117 From: "Jon Harrop" To: "'Diego Olivier Fernandez Pons'" , References: <4EA54BD0.8090009@inria.fr> <1319460013.18639.90.camel@thinkpad> <20111024124617.GA2287@siouxsie> <1319461110.18639.96.camel@thinkpad> <20111024210108.GE1838@siouxsie> In-Reply-To: Date: Mon, 7 Nov 2011 06:45:40 -0000 Message-ID: <0a4d01cc9d18$de73d1d0$9b5b7570$@ffconsultancy.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQFpNO9s7qxiFQlqUToRrBYB3Nc3KAGmjGPLAqp1/toAWbFaYgHoDxwYAekuSLECVho+kwHOQdwrlgLAHbA= Content-Language: en-gb Subject: RE: [Caml-list] How to write an efficient interpreter If the language you are interpreting is quite declarative then piggy-backing on OCaml's run-time either by writing an interpreter or by compiling to OCaml code will be a big advantage. Writing a VM with a run-time as efficient as OCaml's in this context is a *lot* of work compared to writing an interpreter. Unless there is a compelling reason not to, I'd stick with an interpreter and just optimize it carefully. In particular, pay careful attention to your data representations. When writing a term-rewriter, one of the biggest performance gains I got was putting the value of each variable in the variable itself so it could be looked up by dereferencing a pointer rather than going via a hash table. Symbol tables are useful for similar reasons: map identifiers onto small integer IDs and replace hash tables that have identifiers as keys with an array indexed by ID. Beware the write barrier though. Also, when profiling I'd recommend logging the "kinds" of inputs your interpreter sees. For example, try to spot patterns in the shapes of expressions that your interpreter deals with and add special cases to the expression type for those shapes and custom code to interpret them. Cheers, Jon.