Re: [Caml-list] Out_of_memory exception in output_value

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Eric Dahlman <edahlman@atcorp.com>
To: skaller@users.sourceforge.net
Cc: carette@mcmaster.ca, "'Richard Jones'" <rich@annexia.org>,
	caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Out_of_memory exception in output_value
Date: Fri, 28 May 2004 15:54:19 -0500	[thread overview]
Message-ID: <3037847D-B0E9-11D8-AF42-000393914EAA@atcorp.com> (raw)
In-Reply-To: <1085773482.3036.81.camel@pelican.wigram>

On May 28, 2004, at 2:44 PM, skaller wrote:

> On Sat, 2004-05-29 at 02:47, Jacques Carette wrote:
>>> It turns out that during output_value, the OCaml code allocates a
>>> block of memory to contain the marshalled representation of the data.
>>> Each time it runs out of room, it doubles the size of that block 
>>> using
>>> realloc().
>>
>> Wouldn't it be more system-friendly to try successively factors *2, 
>> *1.5,
>> *1.1, and *1.05 before actually failing?

I am not sure that it would have that much benefit for all of the 
complexity it would introduce.  I expect the set of interesting things 
with a marshaled representation which is too large for the present 
approach but still small enough to work for a smaller factors is really 
small.  Keep in mind that when the size is grown the assumption is that 
you will have to copy the lower half of the already marshaled data into 
the new object since it is almost guaranteed that it will not be 
possible to grow the object in place as something else will almost 
surely have been allocated after it.  A better approach would be to 
have an output_really_big_object which allocated a huge buffer in a 
single go or an approach which streamed the marshaled data so as not to 
have to keep it in memory.  (I don't know if that approach is possible 
using ocaml's marshaling, i didn't look.) At any rate this still 
addresses what I believe to be a small class of useful situations, on 
the whole I suspect that the data will be small enough to work with the 
present approach or so big that the fractional approach will also fail.

> I have a published book chapter part of which deals with
> this issue in some detail, including some performance
> measurements.
>
> The general solution is much cleaner -- to use
> a user supplied allocation function something like:
>
> new_size = f old_size requested_size
>
> Doubling the size, or indeed even using a pure multiplier,
> is only one possible option: simply adding a fixed amout
> is another option, and an arbitrary function handles
> both cases and many more. So a general solution is to make
> the operation polymorphic by making the calculator function
> an argument (in practice with a sensible default).

Since you will have to copy the data on a realloc doubling has the nice 
effect of giving us a constant bound on the costs of reallocation in a 
calculation.  This does not hold for approaches like increasing in 
fixed increments which will convert an algorithm which in linear in the 
size of the data into one which is quadratic.  I think that doubling 
might well be the best "guestimate" for a general library to make.

> My experiments were with variable length arrays used
> to hold big integers, so some operations produced
> small increases (like addition) whereas other
> produced increases much faster (like multiplication).

In this domain specific case you have a wealth of information and you 
can calculate a very tight bound on the size  of the resulting array so 
you should never have to grow the result as you go.  Just allocate it 
correctly in the beginning or am I missing something?

> A quick and easy fix would be to use a global variable
> containing the function which the client can reset.
> Yeah, I hate global state but here the actual function
> chosen has no semantic impact, it only affects performance
> (unless you run out of memory .. of course). So this
> time a global might be OK :)

I don't know about this one, it really smacks of premature optimization 
and abstraction inversion.  The present approach to buffer growing has 
a constant amortized cost.  If one was in the position of being able to 
significantly improve on that for a given problem domain by carefully 
tweaking such an allocation function then I would almost guarantee that 
it would be better to create a custom solution for that domain.  This 
would allow explicitly using that information to create a better 
algorithm rather than just tweaking a general mechanism.

Just my two cents,
-Eric

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

next prev parent reply	other threads:[~2004-05-28 20:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-28  9:10 Richard Jones
2004-05-28 14:31 ` Richard Jones
2004-05-28 16:47   ` Jacques Carette
2004-05-28 19:44     ` skaller
2004-05-28 20:54       ` Eric Dahlman [this message]
2004-05-29  7:13         ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3037847D-B0E9-11D8-AF42-000393914EAA@atcorp.com \
    --to=edahlman@atcorp.com \
    --cc=caml-list@inria.fr \
    --cc=carette@mcmaster.ca \
    --cc=rich@annexia.org \
    --cc=skaller@users.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).