From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 63FF2BC48 for ; Thu, 14 Apr 2005 12:28:02 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j3EAS25C024617 for ; Thu, 14 Apr 2005 12:28:02 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id MAA13725 for ; Thu, 14 Apr 2005 12:28:01 +0200 (MET DST) Received: from relay.felk.cvut.cz (relay.felk.cvut.cz [147.32.80.7]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j3EAS1Tn000776 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 14 Apr 2005 12:28:01 +0200 Received: from k333.felk.cvut.cz (k333.felk.cvut.cz [147.32.87.5]) by relay.felk.cvut.cz (8.12.11/8.12.11) with ESMTP id j3EARtR2026088 for ; Thu, 14 Apr 2005 12:27:55 +0200 (CEST) (envelope-from kybic@fel.cvut.cz) Received: from K333/SpoolDir by k333.felk.cvut.cz (Mercury 1.48); 14 Apr 05 12:27:55 +0100 Received: from SpoolDir by K333 (Mercury 1.48); 14 Apr 05 12:27:55 +0100 Received: from d600-jk (147.32.84.2) by k333.felk.cvut.cz (Mercury 1.48) with ESMTP; 14 Apr 05 12:27:49 +0100 To: caml-list@inria.fr Subject: Re: [Caml-list] Impact of GC on memoized algorithm References: <424A65C4.2080507@barettadeit.com> <200503301303.46742.jon@ffconsultancy.com> <424A9CCA.7030204@barettadeit.com> <200503301509.01571.A.S.Usov@kvi.nl> <874qetdypa.fsf@qrnik.zagroda> <424ABFAE.7080309@barettadeit.com> <425A9C66.3050802@barettadeit.com> Reply-To: Jan Kybic From: Jan Kybic In-Reply-To: Date: 14 Apr 2005 12:27:49 +0200 Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Scanned-By: milter-sender/0.62.837 (relay.felk.cvut.cz [147.32.80.7]); Thu, 14 Apr 2005 12:27:55 +0200 X-MailScanner-felk: Found to be clean X-MailScanner-SpamCheck-felk: not spam, SpamAssassin (score=-4.9, required 5, BAYES_00 -4.90) X-Miltered: at concorde with ID 425E45B2.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 425E45B1.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 memoized:01 hashing:01 hash:01 arbitrarily:01 hash:01 memoized:01 trade-off:01 ...:98 algorithm:01 heuristic:02 pattern:03 swap:04 depends:04 size:95 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: > In my experience, it is important to pay attention to the cost of the > hashing function. If you can avoid going through the whole structure to > compute the hash, you'll save a lot of time. > > It may also be a good idea to limit the number of cache entries, instead > of just letting the cache grow arbitrarily large. I've had good results > by deleting some entries at random when the cache gets too big. Exactly. When the cache size exceeds your real RAM, it is often cheaper to recalculate the value instead of retrieving it from the disk (swap). I have good experience with deleting the oldest (by order of creation) and cheapest-to-recalculate entries when memory becomes scarce by alternatively filling several (linked) hash tables. The heuristic needs monitoring the current memory consumption, the size of the items and timing the memoized operations. On the other hand, an exact LRU (last recently used) cache based on doubly-linked lists did not work well for me, the cost of maintaining the links was too high. The best trade-off obviously depends on the size of your items and keys, cost of recalculating the items, and on the access pattern... Experimenting is probably unavoidable. Jan -- ------------------------------------------------------------------------- Jan Kybic tel. +420 2 2435 5721 http://cmp.felk.cvut.cz/~kybic ICQ 200569450