From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id A2FFBBC0A for ; Fri, 8 Jun 2007 03:51:48 +0200 (CEST) Received: from ipmail03.adl2.internode.on.net (ipmail03.adl2.internode.on.net [203.16.214.135]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l581pkgf028965 for ; Fri, 8 Jun 2007 03:51:47 +0200 X-IronPort-AV: E=Sophos;i="4.16,397,1175437800"; d="scan'208";a="100530398" Received: from ppp2-129.lns1.syd7.internode.on.net (HELO [192.168.1.201]) ([59.167.2.129]) by ipmail03.adl2.internode.on.net with ESMTP; 08 Jun 2007 11:21:42 +0930 Subject: Re: [Caml-list] Re: compiling large file hogs RAM and takes a long time. From: skaller To: Jacques Garrigue Cc: sds@gnu.org, caml-list@inria.fr In-Reply-To: <20070608.100255.84973407.garrigue@math.nagoya-u.ac.jp> References: <4666E11F.6000308@podval.org> <466829A3.2090508@gnu.org> <20070608.100255.84973407.garrigue@math.nagoya-u.ac.jp> Content-Type: text/plain Date: Fri, 08 Jun 2007 11:51:40 +1000 Message-Id: <1181267500.15201.144.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 4668B632.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; compiler:01 compilation:01 constructors:01 variants:01 variants:01 polymorphism:01 compiler:01 constructors:01 indirection:01 camlp:01 1,000:98 100,000:98 sourceforge:01 polymorphic:01 polymorphic:01 On Fri, 2007-06-08 at 10:02 +0900, Jacques Garrigue wrote: > > > > Any chance there is some quadratic code in polymorphic variant type > > processing?! > > There is, and this is a known problem: > http://caml.inria.fr/mantis/view.php?id=4053 > > I'm sorry, but I don't see any easy way out. > At least on the basic time complexity. You mention in the ticket there is a hard way out .. using binary trees; hard because it would require changes everywhere in the compiler. Is this actually enough? Seems to reduce O(n * n * log n) to O( n * log n * log n) which is still pretty bad.. is that right? How about a one level digital lookup, that is, use the first 8 bits of the key to choose one of 256 binary trees? In commercial code O() performance isn't usually that relevant. Hybrid data structure is probably best, even if the O() performance is worse: if a one level digital lookup reduces quadratic time by 256 that reduces a 4 hour compilation to 1 minute.. :) Then 1,000-10,000 type constructors would be handled easily, although around 100,000 you'd be trouble again .. but I think you'd be in trouble anyhow if you had that many! Also .. I suspect the OP is only using polymorphic variants to work around the 246 constructor limitation on non-polymorphic variants. In this case, they could just use factored non-polymorphic variants. the point being: polymorphic variants should be used when you need polymorphism and are willing to pay the price. This is typically in a compiler or some other system where there are only a moderate number of constructors, but many ways to combine them. So maybe the correct fix isn't to modify polymorphic variants .. but to repair the 246 limitation on non-polymorphic variants by automatically factoring them .. it is, after all, entirely syntactic sugar. Of course there's be a run time penalty from double indirection .. but that's the price you pay for extremely fast single indirected non-polymorphic variants when the number of constructors is small. Hmm .. and this could 'almost' be done with camlp4.. :) -- John Skaller Felix, successor to C++: http://felix.sf.net