caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jan Kybic <kybic@fel.cvut.cz>
To: caml-list@pauillac.inria.fr
Subject: [Caml-list] Re: Q: automatic forgetting cache, module Weak, Gc control
Date: 01 Jul 2004 18:25:24 +0200	[thread overview]
Message-ID: <m2n02j3daj.fsf_-_@fel.cvut.cz> (raw)
In-Reply-To: <1088690342.2582.122.camel@pelican.wigram>

[This thread is about how to implement caching (memoizing) of function
values of different type with limited memory.]

> A more labor intensive idea would be to manage the
> caches yourself. This has the advantage -- and disadvantage --
> that you're totally in control.

I tried the weak hashtable from the Weak module. It did not work well,
apparently the values got garbage collected almost immediately.
I do not think there is a way to avoid it.

I also tried the LRU module of Dustin Sallings but it is too slow for me.

> One idea -- just use a fixed sized cyclic buffer,
> keep usage stats, and manually tune the buffer sizes.


I think the right idea is to insert each cached values into two
structures: a weak hashtable so that the value can be found fast, and 
another global FIFO type structure that will start to drop oldest values when
there is not enough memory. For efficiency, the FIFO structure will 
hold blocks (arrays). As the FIFO structure is global and will have to
hold different types of data, storing Obj.t seems to be apropriate.

I also have to deal with the fact that I want to cache different
fucntions that do not take the same effort to evaluate. I plan to
implement a self tuning strategy - when evaluating the function for
the first time, we will measure how much time it takes and then adjust
the block size, so that each block represents about the same
computational effort. 

I am also thinking about other strategies taking into account the
sizes of the produced results. In this case the global structure would
be an ordered set.

The disadvantage of putting the values into two structures is the
memory overhead. Maybe I should avoid using Weak hashtables
alltogether and store the values in a set of normal hashtables,
dropping some of them, if necessary... It is difficult to get the
memory/speed tradeoff right.

> Another more flexible and potentially expensive
> technique is to cache everything in a thread-safe
> manner, and run a separate 'reaper' thread to purge
> the most useless values. This has the advantage that
> the cache control is lexically isolated and so 
> can be worked on independently. Obviously needing
> to use a thread-lock to put values in the cache
> will cost big time. 

Yes, but what is the most useless value and how to find it
efficiently? 

Jan

-- 
-------------------------------------------------------------------------
Jan Kybic <kybic@ieee.org>                  tel. +420 2 2435 7264
       or <kybic@fel.cvut.cz>,     http://cmp.felk.cvut.cz/~kybic

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2004-07-01 16:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-30 11:05 [Caml-list] " Jan Kybic
2004-07-01  8:43 ` Jacques GARRIGUE
2004-07-01 13:59   ` skaller
2004-07-01 16:25     ` Jan Kybic [this message]
2004-07-01 18:14       ` [Caml-list] " Alex Baretta
2004-07-02  8:53       ` Hendrik Tews
2004-07-02  9:40         ` Jan Kybic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2n02j3daj.fsf_-_@fel.cvut.cz \
    --to=kybic@fel.cvut.cz \
    --cc=caml-list@pauillac.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).