From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id WAA19183 for caml-red; Wed, 7 Feb 2001 22:44:08 +0100 (MET) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id CAA14921 for ; Wed, 7 Feb 2001 02:32:14 +0100 (MET) Received: from localhost.localdomain (jimbo52.zip.com.au [202.7.88.52]) by nez-perce.inria.fr (8.11.1/8.10.0) with ESMTP id f171WA922815; Wed, 7 Feb 2001 02:32:11 +0100 (MET) Received: from ozemail.com.au (IDENT:root@localhost [127.0.0.1]) by localhost.localdomain (8.9.3/8.8.7) with ESMTP id MAA12948; Wed, 7 Feb 2001 12:29:00 +1100 Message-ID: <3A80A4DB.E3CA3FA9@ozemail.com.au> Date: Wed, 07 Feb 2001 12:28:59 +1100 From: John Max Skaller X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: Pierre Weis CC: caml-list@inria.fr Subject: Re: compilation of lablgl examples. References: <200102061555.QAA08776@pauillac.inria.fr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: weis@pauillac.inria.fr Pierre Weis wrote: > > [...] > > pet project over to them ... but the use of standard variant > > constructors is extremely heavy, and I'm having trouble > > coming to terms with the need to virtually rewrite my > > entire source. > Could you explain a bit why ``the use of standard variant > constructors is extremely heavy'' (I thought it was on the contrary > extremely light and elegant!) ? Excuse my poor English -- what I meant was "A very large percentage of symbols in the source code are variant constructor names". A very large amount of code consists of match .. with statements where the expression is a variant type, and the result expressions are another variant. For example, something like this happens: the type of a function declaration in the Abstract Syntax Tree is: | AST_function of id_t * parameter_t list * typecode_t * statement_t list which is converted to: | DCL_function of parameter_t list * typecode_t * asm_t list which is converted to: | SYMDEF_function of parameter_t list * typecode_t * int list * exe_t list * name_map_t which is converted to: | BDCL_function of bparameter_t list * btypecode_t * int list * exe_t list * name_map_t which is converted to: | BBDCL_function of bparameter_t list * btypecode_t * int list * bexe_t list * name_map_t That is, there are FIVE separate types for each phase of the compilation, all of which represent a function declaration (and the same again for other constructions). What is happening is: first, the statements are desugared and split into declarative and executable parts of a lower level language, then the list of statements of a function is split into executable code and a map representing contained declarations, then the type names are bound, and finally variable names are bound. The final structure is then used by the back end to generate a list of non-nested functions (C++ classes, actually). The strict 'phasing' enforced by the separate typing makes it easy to find errors statically. But the verbosity makes it easy to _make_ a lot more errors, and the structure is inflexible. Said another way, the current design consists of a sequence of functors: Lex -> Parse -> AST -> SYM -> DCL -> BDCL -> BBDCL -> C++ where the categories joining these functors are distinct. Another solution would be to use a single category for the inner work space, moving from one 'subspace' to another, but while this is very flexible, the lack of structure makes static error checking weak. But with polymorphic variants, types can be ascribed to the 'subspaces' even when they overlap, and it can be done _after_ the fact to check correctness, while standard variants require the typing to be designed first. I.e., polymorphic variants are more useful for prototyping, since types do not need to be declared before writing algorithms, yet the declarations can still be added later when the design is solider to check just how solid it really is. At least, this is my expectation. For example, an 'expression' type can be defined to include BOTH 'string name' and 'name as integer index into symbol table', allowing a single routine 'print expression', while it is still possible to give a type for 'expression not containing any string names' (to be used after all the names are bound). [At present, there are three 'types' for expressions, and three almost identical print routines to display them: all the expressions are the same, except that the second type has all lambda's removed and _also_ has the names of types bound (but not variables) while the third type has variable names bound as well: the first type already fails to use static typing to distingush 'expression with lambdas' and 'expression with lambdas removed' which in principle it should.] I am not certain this will work. Comments appreciated. I am loath to try it out, since it would takes many days to rewrite all the constructor names, and I would have to undo all the work if the experiment failed. Note that doing it all manually (mainly adding a back quote in front of all the constructor names everywhere) it is easy to make a spelling mistake, which will lead to obscure typing problems: it is hard to do this conversion incrementally, and therefore isolate the source of a problem. If I could 'add backquotes' in front of all the constructor names mechanically, then everything would work 'as is', and I could begin the task of identifying formerly distinct variant components. -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 checkout Vyper http://Vyper.sourceforge.net download Interscript http://Interscript.sourceforge.net