From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id SAA12112; Thu, 1 Jul 2004 18:26:04 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA12111 for ; Thu, 1 Jul 2004 18:26:03 +0200 (MET DST) Received: from relay.felk.cvut.cz (relay.felk.cvut.cz [147.32.80.7]) by nez-perce.inria.fr (8.12.10/8.12.10) with ESMTP id i61GQ2EV018627 for ; Thu, 1 Jul 2004 18:26:02 +0200 Received: from k333.felk.cvut.cz (k333.felk.cvut.cz [147.32.87.5]) by relay.felk.cvut.cz (8.12.11/8.12.11) with ESMTP id i61GPr4S030538 for ; Thu, 1 Jul 2004 18:25:54 +0200 (CEST) (envelope-from kybic@fel.cvut.cz) Received: from K333/SpoolDir by k333.felk.cvut.cz (Mercury 1.48); 1 Jul 04 18:25:53 +0100 Received: from SpoolDir by K333 (Mercury 1.48); 1 Jul 04 18:25:33 +0100 Received: from localhost (147.32.84.19) by k333.felk.cvut.cz (Mercury 1.48) with ESMTP; 1 Jul 04 18:25:25 +0100 To: caml-list@pauillac.inria.fr Subject: [Caml-list] Re: Q: automatic forgetting cache, module Weak, Gc control References: <20040701.174358.116814367.garrigue@kurims.kyoto-u.ac.jp> <1088690342.2582.122.camel@pelican.wigram> From: Jan Kybic In-Reply-To: <1088690342.2582.122.camel@pelican.wigram> Date: 01 Jul 2004 18:25:24 +0200 Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2.93 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-MailScanner-felk: Found to be clean X-MailScanner-SpamCheck-felk: not spam, SpamAssassin (score=-4.9, required 5, BAYES_00 -4.90) X-Miltered: at nez-perce with ID 40E43B1A.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caching:01 memoizing:01 disadvantage:01 hashtable:01 cyclic:01 buffer:01 buffer:01 hashtable:01 fifo:01 fifo:01 apropriate:01 disadvantage:01 hashtables:01 hashtables:01 tradeoff:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk [This thread is about how to implement caching (memoizing) of function values of different type with limited memory.] > A more labor intensive idea would be to manage the > caches yourself. This has the advantage -- and disadvantage -- > that you're totally in control. I tried the weak hashtable from the Weak module. It did not work well, apparently the values got garbage collected almost immediately. I do not think there is a way to avoid it. I also tried the LRU module of Dustin Sallings but it is too slow for me. > One idea -- just use a fixed sized cyclic buffer, > keep usage stats, and manually tune the buffer sizes. I think the right idea is to insert each cached values into two structures: a weak hashtable so that the value can be found fast, and another global FIFO type structure that will start to drop oldest values when there is not enough memory. For efficiency, the FIFO structure will hold blocks (arrays). As the FIFO structure is global and will have to hold different types of data, storing Obj.t seems to be apropriate. I also have to deal with the fact that I want to cache different fucntions that do not take the same effort to evaluate. I plan to implement a self tuning strategy - when evaluating the function for the first time, we will measure how much time it takes and then adjust the block size, so that each block represents about the same computational effort. I am also thinking about other strategies taking into account the sizes of the produced results. In this case the global structure would be an ordered set. The disadvantage of putting the values into two structures is the memory overhead. Maybe I should avoid using Weak hashtables alltogether and store the values in a set of normal hashtables, dropping some of them, if necessary... It is difficult to get the memory/speed tradeoff right. > Another more flexible and potentially expensive > technique is to cache everything in a thread-safe > manner, and run a separate 'reaper' thread to purge > the most useless values. This has the advantage that > the cache control is lexically isolated and so > can be worked on independently. Obviously needing > to use a thread-lock to put values in the cache > will cost big time. Yes, but what is the most useless value and how to find it efficiently? Jan -- ------------------------------------------------------------------------- Jan Kybic tel. +420 2 2435 7264 or , http://cmp.felk.cvut.cz/~kybic ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners