From mboxrd@z Thu Jan  1 00:00:00 1970
Message-Id: <200307081136.h68Ba5719800@augusta.math.psu.edu>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Rstat needs three size fields?
In-Reply-To: Your message of "Tue, 08 Jul 2003 12:09:00 BST."
             <3F0AA64C.8070209@proweb.co.uk>
From: Dan Cross <cross@math.psu.edu>
Date: Tue,  8 Jul 2003 07:36:05 -0400
Topicbox-Message-UUID: ee6b7cdc-eacb-11e9-9e20-41e7f4b1d025

matt <matt@proweb.co.uk> writes:
> >as a reallocation strategy i take what i have and then double it.
> >this is when i'm dealing with something that's going to 'grow'
> >and i don't want to have to realloc for each new 'element'.
>
> I've recently had this challenge and was wodnering what people use as a
> resize strategy
>
> I've read and seen the double it but decided it was pretty likely to run
> out of memory. Allocating 200Mb of memory for one extra array index
> seemed a bit risky.
>
> I plucked 5 + 125% as a compromise (the 5 for a performance boost the
> the beginning)
>
> Someone must have done some reseach into it but google isn't being very
> helpful
>
> In the end I used the "make a list and when you're done, count the
> elements and make the array" strategy

Doubling is, I think, the most common strategy; it works well with
memory allocators based around the buddy system or variants thereof.
Another common strategy that I've used is to double up until to
threshold, and then extend by a constant chunk size after that.  For
instance, you might double things until you get to, say, a megabyte,
and then add on a megabyte after that.

As Boyd will no doubt tell you, everything is a tradeoff.  Here, I'm
trading some amount of speed for memory efficiency.  If you're using
smallish data sets, say anywhere from a couple of bytes up to a several
megabytes, it will work resonably well (it will take only 24
allocations to get 5 megabytes if you start with a nil pointer).  If
you're trying to hold several gigabytes, obviously you might want to
change around the starting size and chunk size, or adopt an entirely
different strategy.

For truly variable data sets, I like dynamic data structures that can
grow without bound in a natural way.  Many of these are simple to
implement and have other desirable properties.  For instance, lists are
just about the simplest thing one can build, and work well for linear
data sets (though have some overhead in having to maintain links; this
can be amortized by storing more than one datum in a single node in the
list).

Anyway, that's just me.

	- Dan C.