From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id NAA28479; Wed, 26 May 2004 13:31:50 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id NAA28826 for <caml-list@pauillac.inria.fr>; Wed, 26 May 2004 13:31:48 +0200 (MET DST)
Received: from qrnik.knm.org.pl (paf87.warszawa.sdi.tpnet.pl [217.96.225.87])
	by nez-perce.inria.fr (8.12.10/8.12.10) with ESMTP id i4QBVkEV027912
	for <caml-list@inria.fr>; Wed, 26 May 2004 13:31:47 +0200
Received: from qrnik ([192.168.0.1] ident=qrczak)
	by qrnik.knm.org.pl with esmtp (Exim 3.36 #1)
	id 1BSwdK-0005To-00; Wed, 26 May 2004 13:31:42 +0200
Subject: Re: [Caml-list] handling forward references in a compiler
From: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
To: dido@imperium.ph
Cc: caml-list@inria.fr
In-Reply-To: <20040525095213.GA2655@imperium.ph>
References: <20040525095213.GA2655@imperium.ph>
Content-Type: text/plain; charset=iso-8859-2
Message-Id: <1085571101.23506.15.camel@qrnik>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.5.7 
Date: Wed, 26 May 2004 13:31:42 +0200
Content-Transfer-Encoding: quoted-printable
X-Miltered: at nez-perce with ID 40B48022.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Loop: caml-list@inria.fr
X-Spam: no; 0.00; caml-list:01 marcin:01 'qrczak':01 kowalczyk:01 qrczak:01 wto:99 dido:01 undeclared:01 resorting:01 ids:01 avoided:01 implemented:01 ids:01 marcin:01 kowalczyk:01 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

W li=B6cie z wto, 25-05-2004, godz. 17:52 +0800, dido@imperium.ph napisa=B3=
:

> I'm writing a compiler for a language that allows undeclared forward
> references in OCaml, and I'm just wondering how one is supposed to
> handle these sorts of things without resorting to imperative features or
> multiple passes.

These are indeed two most obvious choices. I prefer the second one.

In the first pass we scan only "heads" of definitions, creating an
environment which maps strings to some unique ids. In the second pass
we know the meaning of each identifier, so we can process code normally
no matter which identifiers have their definitions before or after.

> Another option is to make a second pass through the
> generated intermediate representation, and fixing up the forward
> references after all of the declarations have been processed, but this
> seems like a second traversal that can be avoided.

There is no need to have a separate pass which only patches references.
You just lookup references in the previously constructed environment
as necessary, when processing the code tree in the next pass which also
does whatever it was supposed to do anyway.

In a compiler I implemented it was not that easy to scan heads of
definitions, because some syntactic sugar was not yet resolved, so it
would have to be resolved twice. To avoid this, the pass which gathers
defined names also produces a modified list of definitions, with
resolved toplevel syntactic sugar, with bound names represented by
unique ids, but with everything else - in particular all embedded
expressions - left unprocessed. They are processed in the next pass,
which resolves used identifiers and expression-level syntactic sugar,
and produces a new code representation for further processing.

--=20
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners