caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Weak hashtables & aggressive caching
@ 2006-08-14 14:58 Matt Gushee
  2006-08-14 15:47 ` [Caml-list] " Richard Jones
  2006-08-14 21:23 ` Jacques Garrigue
  0 siblings, 2 replies; 12+ messages in thread
From: Matt Gushee @ 2006-08-14 14:58 UTC (permalink / raw)
  To: caml-list

Hello, all--

I wrote a LablGTK-based image viewer this past weekend; one of its 
features is an image cache--specifically, a weak hashtable that contains
values of type string * GdkPixbuf.pixbuf (the string being the file 
name). When a particular image file is requested, it is retrieved from 
the cache if it exists there; otherwise it is loaded from disk (and 
placed in the cache at the same time). This is useful if the user wants 
to quickly look back through a series of images that have already been 
loaded, but it doesn't help with loading images for the first time.

It seems to me it might be useful to implement an aggressive caching 
strategy--i.e., since the files to be loaded are known in advance (from 
the command line), there could be a low-priority thread that would look 
ahead and load images before the user requests them. Of course, if too 
many images are loaded it might trigger the garbage collector, which 
would defeat the whole purpose. Ideally, preloading should stop somewhat 
before garbage collection starts.

 From the documentation, it appears that the GC.stat and GC.control 
functions could be used to regulate the caching behavior, but I have not 
worked with the GC module before. Has anyone done something like this? 
Is it worth the effort? Any non-obvious pitfalls I should be aware of?

-- 
Matt Gushee
: Bantam - lightweight file manager : matt.gushee.net/software/bantam/ :
: RASCL's A Simple Configuration Language :     matt.gushee.net/rascl/ :


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-14 14:58 Weak hashtables & aggressive caching Matt Gushee
@ 2006-08-14 15:47 ` Richard Jones
  2006-08-14 16:28   ` Matt Gushee
  2006-08-14 21:23 ` Jacques Garrigue
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Jones @ 2006-08-14 15:47 UTC (permalink / raw)
  To: Matt Gushee; +Cc: caml-list

On Mon, Aug 14, 2006 at 08:58:29AM -0600, Matt Gushee wrote:
> It seems to me it might be useful to implement an aggressive caching 
> strategy--i.e., since the files to be loaded are known in advance (from 
> the command line),[...]

Please no!  When running X remotely this will cause images to be
transferred (uncompressed) over the network and stored inside the X
server when they may not even be viewed.  This sort of thing is
already a serious problem with programs like 'eog', making them
virtually unusable remotely.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-14 15:47 ` [Caml-list] " Richard Jones
@ 2006-08-14 16:28   ` Matt Gushee
       [not found]     ` <44E0A8F1.8060504@janestcapital.com>
  2006-08-14 18:18     ` Richard Jones
  0 siblings, 2 replies; 12+ messages in thread
From: Matt Gushee @ 2006-08-14 16:28 UTC (permalink / raw)
  To: caml-list

Richard Jones wrote:
> On Mon, Aug 14, 2006 at 08:58:29AM -0600, Matt Gushee wrote:
>> It seems to me it might be useful to implement an aggressive caching 
>> strategy--i.e., since the files to be loaded are known in advance (from 
>> the command line),[...]
> 
> Please no!  When running X remotely this will cause images to be
> transferred (uncompressed) over the network and stored inside the X
> server when they may not even be viewed.  This sort of thing is
> already a serious problem with programs like 'eog', making them
> virtually unusable remotely.

Hmm ... well, I happen to have the heretical view that in an age of 
cheap, powerful PCs and inexpensive software, running X remotely is just 
plain absurd in most situations. Okay, yeah, there are thin clients, but 
who actually uses them--other than a few large corporations, for whom I 
have no sympathy?

However, I also know that my philosophy is on the fringe, and from a 
practical standpoint people actually do some of these absurd things, so 
... thanks for the heads-up.

Wait a minute, though. According to the Gdk reference manual, 
<http://developer.gnome.org/doc/API/2.0/gdk/gdk-Pixbufs.html#id2861842>

   Pixbufs are client-side images.

If that's true, I don't understand how loading pixbufs from files would 
affect the X server.

-- 
Matt Gushee
: Bantam - lightweight file manager : matt.gushee.net/software/bantam/ :
: RASCL's A Simple Configuration Language :     matt.gushee.net/rascl/ :


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
       [not found]     ` <44E0A8F1.8060504@janestcapital.com>
@ 2006-08-14 17:35       ` Matt Gushee
  0 siblings, 0 replies; 12+ messages in thread
From: Matt Gushee @ 2006-08-14 17:35 UTC (permalink / raw)
  To: caml-list

I will try to make this my last off-topic message on this subject.

Brian Hurt wrote:

> I'm running X remotely to access remote machines (note the plural).  One 
> of the advantages of X is that I can run GUI apps on machines that I'm 
> not sitting in front of.

And what percentage of the computer-using population do you suppose has 
*ever* done that?

> I'm also using RealVNC to log into other 
> Windows machines.  Please don't assume *your* situation is *everyone's* 
> situation,  as this makes your software signifigantly less usefull.

No. It limits the population of users for whom the software is useful, 
which is a very different matter. Don't make assumptions about what I 
assume. I know very well there are different kinds of users; where my 
thinking differs from the mainstream is that I believe it is 
impossible--or at least very difficult--to create software that delivers 
a good user experience for all types of users.

To take one example, what tool would you use to develop a Web site? Some 
people find Cold Fusion highly productive. That's fine. I find Vim to be 
far more productive than any other tool I've tried, at least for the 
kinds of Web sites I develop (mostly my own). I'd bet a large sum of 
money that either one is far better for its target users than some 
hypothetical app that tried to address both groups.

BTW, some of the leading thinkers on human-computer interaction (e.g. 
Jef Raskin and Alan Cooper) have argued--based on extensive 
research--that offering many different ways to accomplish a task is 
usually bad for usability. They're talking about user interfaces, but 
their thinking is at least consistent with my broader claim that no 
single app is suitable for all circumstances.

Anyway, if I release an app to the public, I try to be very clear--as 
clear as you can be in words and screenshots--about what it does and 
doesn't do, and what kinds of users and usage situations it is suitable 
for. If people don't want to use my software, that's fine. If I can't 
develop something that will bring in significant income--and I long ago 
gave up hope of doing that--I'll bloody well develop something I like. 
As long as I'm clear about what I like, and don't expect the whole world 
to agree with me, I don't see why that's a problem.

-- 
Matt Gushee
: Bantam - lightweight file manager : matt.gushee.net/software/bantam/ :
: RASCL's A Simple Configuration Language :     matt.gushee.net/rascl/ :


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-14 16:28   ` Matt Gushee
       [not found]     ` <44E0A8F1.8060504@janestcapital.com>
@ 2006-08-14 18:18     ` Richard Jones
  2006-08-14 23:25       ` Matt Gushee
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Jones @ 2006-08-14 18:18 UTC (permalink / raw)
  To: Matt Gushee; +Cc: caml-list

On Mon, Aug 14, 2006 at 10:28:39AM -0600, Matt Gushee wrote:
> Wait a minute, though. According to the Gdk reference manual, 
> <http://developer.gnome.org/doc/API/2.0/gdk/gdk-Pixbufs.html#id2861842>
> 
>   Pixbufs are client-side images.

Ah right, pixbufs, pixmaps ...  In that case why bother preloading
them at all?  eog is flagrant with regards to pixmaps because the
developers believe it allows them to display images quickly (the
images are already on the X server, converted from JPEGs into raw
pixels).  In this age of fast CPUs and slow RAM this is unlikely to be
the case.

Rich.

-- 
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Team Notepad - intranets and extranets for business - http://team-notepad.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-14 14:58 Weak hashtables & aggressive caching Matt Gushee
  2006-08-14 15:47 ` [Caml-list] " Richard Jones
@ 2006-08-14 21:23 ` Jacques Garrigue
  2006-08-14 23:30   ` Matt Gushee
  2006-08-15  4:55   ` skaller
  1 sibling, 2 replies; 12+ messages in thread
From: Jacques Garrigue @ 2006-08-14 21:23 UTC (permalink / raw)
  To: matt; +Cc: caml-list

From: Matt Gushee <matt@gushee.net>

> I wrote a LablGTK-based image viewer this past weekend; one of its 
> features is an image cache--specifically, a weak hashtable that contains
> values of type string * GdkPixbuf.pixbuf (the string being the file 
> name). When a particular image file is requested, it is retrieved from 
> the cache if it exists there; otherwise it is loaded from disk (and 
> placed in the cache at the same time). This is useful if the user wants 
> to quickly look back through a series of images that have already been 
> loaded, but it doesn't help with loading images for the first time.

I wonder how you trigger the GC, to both keep the cache long enough,
and to avoid filling the memory too much, and resulting in lots of
swapping.

With ocaml data structures, the GC does a good job, as it is
triggered everytime already allocated memory is filled. Hopefully this
means that the memory set should not increase. But with external data
structures like pixbufs, the GC is called in a pre-programmed way,
currently at least after every 10 pixbuf allocations. This is probably
too much for your scheme (you won't get more than 9 images in memory),
but less might be not enough (big images will fill the memory without
calling the GC earlier.)

Considering the difficulties avoid memory overflow, the only workable
approach still seems to have an over-eager GC, that happens much more
often than necessary. But as a result the caching effect is very
limited. Otherwise you need to change all the parameters in lablgtk.

Jacques Garrigue


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-14 18:18     ` Richard Jones
@ 2006-08-14 23:25       ` Matt Gushee
  0 siblings, 0 replies; 12+ messages in thread
From: Matt Gushee @ 2006-08-14 23:25 UTC (permalink / raw)
  To: caml-list

Richard Jones wrote:
> On Mon, Aug 14, 2006 at 10:28:39AM -0600, Matt Gushee wrote:
>> Wait a minute, though. According to the Gdk reference manual, 
>> <http://developer.gnome.org/doc/API/2.0/gdk/gdk-Pixbufs.html#id2861842>
>>
>>   Pixbufs are client-side images.
> 
> Ah right, pixbufs, pixmaps ...  In that case why bother preloading
> them at all?

Well, maybe I shouldn't. That's why I asked if it was worth the effort.

> eog is flagrant with regards to pixmaps because the
> developers believe it allows them to display images quickly (the
> images are already on the X server, converted from JPEGs into raw
> pixels).  In this age of fast CPUs and slow RAM this is unlikely to be
> the case.

Thanks for your insights.

-- 
Matt Gushee
: Bantam - lightweight file manager : matt.gushee.net/software/bantam/ :
: RASCL's A Simple Configuration Language :     matt.gushee.net/rascl/ :


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-14 21:23 ` Jacques Garrigue
@ 2006-08-14 23:30   ` Matt Gushee
  2006-08-16  0:54     ` Jacques Garrigue
  2006-08-15  4:55   ` skaller
  1 sibling, 1 reply; 12+ messages in thread
From: Matt Gushee @ 2006-08-14 23:30 UTC (permalink / raw)
  To: caml-list

Jacques Garrigue wrote:

> I wonder how you trigger the GC, to both keep the cache long enough,
> and to avoid filling the memory too much, and resulting in lots of
> swapping.

I wasn't planning to trigger the GC explicitly. My thought was simply to 
stop preloading before GC begins (or at least *when* GC begins).

> means that the memory set should not increase. But with external data
> structures like pixbufs, the GC is called in a pre-programmed way,
> currently at least after every 10 pixbuf allocations.

You mean that LablGTK directly invokes the garbage collector after 10 
images. That's not much (unless, of course, they are big images). Sounds 
like it's a lot of trouble for a small benefit.

-- 
Matt Gushee
: Bantam - lightweight file manager : matt.gushee.net/software/bantam/ :
: RASCL's A Simple Configuration Language :     matt.gushee.net/rascl/ :


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-14 21:23 ` Jacques Garrigue
  2006-08-14 23:30   ` Matt Gushee
@ 2006-08-15  4:55   ` skaller
  2006-08-15 16:17     ` Matt Gushee
  1 sibling, 1 reply; 12+ messages in thread
From: skaller @ 2006-08-15  4:55 UTC (permalink / raw)
  To: Jacques Garrigue; +Cc: matt, caml-list

On Tue, 2006-08-15 at 06:23 +0900, Jacques Garrigue wrote:
> From: Matt Gushee <matt@gushee.net>
> 
> > I wrote a LablGTK-based image viewer this past weekend; one of its 
> > features is an image cache--specifically, a weak hashtable that contains
> > values of type string * GdkPixbuf.pixbuf (the string being the file 
> > name). When a particular image file is requested, it is retrieved from 
> > the cache if it exists there; otherwise it is loaded from disk (and 
> > placed in the cache at the same time). This is useful if the user wants 
> > to quickly look back through a series of images that have already been 
> > loaded, but it doesn't help with loading images for the first time.
> 
> I wonder how you trigger the GC, to both keep the cache long enough,
> and to avoid filling the memory too much, and resulting in lots of
> swapping.

I'm confused. First, a pixmap doesn't have any pointers in it,
so it doesn't need to be scanned by the GC.

Second, you'd need a LOT of images to come even close
to running out of address space (on a 64 bit machine anyhow :)

And third, there would be no swapping, unless you were 
flicking between the images .. in which case there'd
be swapping no matter what.

> Considering the difficulties avoid memory overflow,

I have thousands of images and I can scan them at full size
very fast with GQView .. I can only barely see the drawing
happen .. it almost keeps up with the keyboard repeat rate
at full screen size .. and that includes *scaling* the images.
Mind you .. GQView is extremely quick and it knows when to move on
(interrupts rendering when you tell it to view a new image).
(this is with a low end nVidia card on an amd64 3200 single core/1GRam)

Lets get real here: the difficulties arise editing video,
not still pictures.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-15  4:55   ` skaller
@ 2006-08-15 16:17     ` Matt Gushee
  0 siblings, 0 replies; 12+ messages in thread
From: Matt Gushee @ 2006-08-15 16:17 UTC (permalink / raw)
  To: caml-list

skaller wrote:

>> I wonder how you trigger the GC, to both keep the cache long enough,
>> and to avoid filling the memory too much, and resulting in lots of
>> swapping.
> 
> I'm confused. First, a pixmap doesn't have any pointers in it,
> so it doesn't need to be scanned by the GC.

Does that statement apply to a GdkPixbuf.pixbuf? That is the type I am 
using.

I took Jacques' statement to mean that LablGTK was explicitly invoking 
the GC--though of course I'd like to hear his answer on that point.

> Second, you'd need a LOT of images to come even close
> to running out of address space (on a 64 bit machine anyhow :)

:) Of course, many people are still using those antiquated 32-bit 
processors. I know that real software developers use overpowered 
machines to help insulate them from the constraints that face ordinary 
users. Me, I can't afford a powerful computer, so I guess I'm not a real 
developer.

> I have thousands of images and I can scan them at full size
> very fast with GQView .. I can only barely see the drawing
> happen .. it almost keeps up with the keyboard repeat rate
> at full screen size .. and that includes *scaling* the images.
> Mind you .. GQView is extremely quick

Interesting. For me it's neither fast nor slow.

> and it knows when to move on
> (interrupts rendering when you tell it to view a new image).

That's good. I would like to know (or figure out) how to do that with 
LablGTK.

> Lets get real here: the difficulties arise editing video,
> not still pictures.

Except for those of us with really old hardware. I imagine there are a 
lot of such folks in Africa; and seeing as America is rapidly becoming a 
Third World country, maybe more then you'd expect here.

-- 
Matt Gushee
: Bantam - lightweight file manager : matt.gushee.net/software/bantam/ :
: RASCL's A Simple Configuration Language :     matt.gushee.net/rascl/ :


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-14 23:30   ` Matt Gushee
@ 2006-08-16  0:54     ` Jacques Garrigue
  2006-08-16  4:33       ` Matt Gushee
  0 siblings, 1 reply; 12+ messages in thread
From: Jacques Garrigue @ 2006-08-16  0:54 UTC (permalink / raw)
  To: matt; +Cc: caml-list

From: Matt Gushee <matt@gushee.net>
> > I wonder how you trigger the GC, to both keep the cache long enough,
> > and to avoid filling the memory too much, and resulting in lots of
> > swapping.
> 
> I wasn't planning to trigger the GC explicitly. My thought was simply to 
> stop preloading before GC begins (or at least *when* GC begins).

But, if you wait for the GC to begin this is too late: all your weak
references will be collected as garbage, so that your cache will be
emptied as soon as you fill it.

> > means that the memory set should not increase. But with external data
> > structures like pixbufs, the GC is called in a pre-programmed way,
> > currently at least after every 10 pixbuf allocations.
> 
> You mean that LablGTK directly invokes the garbage collector after 10 
> images. That's not much (unless, of course, they are big images). Sounds 
> like it's a lot of trouble for a small benefit.

Again, the trouble is that there is only one allocation function for
pixbufs, and it doesn't look at their size. And it isn't aware of how
much memory is available either. So the choice was to be extremely
conservative. This is maybe a bad idea, but the intent is to avoid
keeping big garbage around, as I have seen really bad situations in
the past (programs growing to more than 100MB pretty fast.) Since weak
references are counted as garbage, there is clearly a contradiction.

I suppose more GC tuning in lablgtk would be a good thing. But I
really don't see how to do it easily with the ocaml allocation API.
The only way to interface external allocation with the GC is an
increment N you pass when calling alloc_custom. It tells ocaml to
shorten the time to next GC by N % (actually this is a ratio, so you
can provide smaller increments.) The trouble is that the GC is
triggered by the sum of all increments for all allocations. So if you
want to slow it, you need to reduce all increments everywhere...

Jacques Garrigue


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Weak hashtables & aggressive caching
  2006-08-16  0:54     ` Jacques Garrigue
@ 2006-08-16  4:33       ` Matt Gushee
  0 siblings, 0 replies; 12+ messages in thread
From: Matt Gushee @ 2006-08-16  4:33 UTC (permalink / raw)
  To: caml-list

Jacques Garrigue wrote:

 >>> means that the memory set should not increase. But with external data
 >>> structures like pixbufs, the GC is called in a pre-programmed way,
 >>> currently at least after every 10 pixbuf allocations.
 >> You mean that LablGTK directly invokes the garbage collector after 
10 images. That's not much (unless, of course, they are big images). 
Sounds like it's a lot of trouble for a small benefit.
 >
 > Again, the trouble is that there is only one allocation function for
 > pixbufs, and it doesn't look at their size. And it isn't aware of how
 > much memory is available either. So the choice was to be extremely
 > conservative.

I'm sorry. I meant that my notion of preloading images would be a lot of 
trouble for a small benefit. I don't have sufficient expertise to judge 
your garbage collection strategy.

Anyway, thanks for the explanation.

-- 
Matt Gushee
: Bantam - lightweight file manager : matt.gushee.net/software/bantam/ :
: RASCL's A Simple Configuration Language :     matt.gushee.net/rascl/ :


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-08-16  4:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-14 14:58 Weak hashtables & aggressive caching Matt Gushee
2006-08-14 15:47 ` [Caml-list] " Richard Jones
2006-08-14 16:28   ` Matt Gushee
     [not found]     ` <44E0A8F1.8060504@janestcapital.com>
2006-08-14 17:35       ` Matt Gushee
2006-08-14 18:18     ` Richard Jones
2006-08-14 23:25       ` Matt Gushee
2006-08-14 21:23 ` Jacques Garrigue
2006-08-14 23:30   ` Matt Gushee
2006-08-16  0:54     ` Jacques Garrigue
2006-08-16  4:33       ` Matt Gushee
2006-08-15  4:55   ` skaller
2006-08-15 16:17     ` Matt Gushee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).