caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Tom Ridge <tom.j.ridge+list@googlemail.com>
To: Michael Hicks <mwh@cs.umd.edu>
Cc: OCaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] Question about garbage collection and impact on performance
Date: Thu, 5 Dec 2013 11:21:14 +0000	[thread overview]
Message-ID: <CABooLwO_X2hq9n6pKJk1OKrocVmzET_A58=SdyKg8Gs8Eeej1A@mail.gmail.com> (raw)
In-Reply-To: <D643405C-11D8-4DE6-B90B-CB4BABDA7204@cs.umd.edu>

Yes, examples like this are what I'm worried about. Thanks!

On 5 December 2013 10:48, Michael Hicks <mwh@cs.umd.edu> wrote:
> I'm not sure whether this is useful or not, but Steve Blackburn recently pointed me to a talk at ISMM'10 about the limits of parallel GC based on the data structures in your program. The basic idea is that in the limit if your GC is a trace, and your data structure is a single linked list, then parallel GC will get you nowhere (unless you do something interesting). Here are the slides:
>
> http://www.cs.technion.ac.il/~erez/presentations/parallel-trace-ismm.ppt
>
> I wonder if the same is true in general: that a GC that repeatedly has to trace over a fixed-sized data structure will thwart the performance of a program that would otherwise run in in sub linear time? For example, suppose you execute M queries over a size-N data structure that each take time log N. In between each query you do a "constant" number of allocations (i.e., independent of both N and M). But in each case you do enough to force the GC to run. This run will be in time O(N) and so the overall running time of your program will be O(MN) not O(M log N) as you'd expect.
>
> This analysis seems to work for a weird program and by doing some fishy accounting (treating all of the allocations as constant), and your particular program's execution is not sub linear, so perhaps it doesn't apply to your situation. But I suspect this is the kind of thing you were thinking about? Note that my example assumes a O(live size) GC and not a O(heap size), but the latter could make the situation worse, I'd guess.
>
> -Mike
>
> On Dec 5, 2013, at 5:08 AM, Tom Ridge <tom.j.ridge+list@googlemail.com> wrote:
>
>> On 5 December 2013 01:09, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
>>> 2013/12/04 21:20, Tom Ridge <tom.j.ridge+list@googlemail.com>:
>>>
>>>> Dear caml-list,
>>>>
>>>> I have an OCaml program which I expect to run in time O((n^3) *
>>>> (ln(n))) say. My expectations are based (unrealistically) on ignoring
>>>> garbage collection completely. As inputs get large, the program
>>>> performs worse than I expect.
>>>>
>>>> My question is: is it possible for OCaml's garbage collection to alter
>>>> the time complexity of my program?
>>>
>>> I would say that, on a program that is already O(n^3), that would be very surprising.
>>
>> Surprising, but not impossible?
>>
>>> What kind of measure did you do?
>>
>> A rather basic measurement of the time the program takes (using /usr/bin/time).
>>
>>> If your program uses lots of memory, swapping may be a major slowdown,
>>> using a GC or not (but a badly tuned GC may cause more swapping).
>>
>> Having played around with GC settings, I now get the expected
>> behaviour. I don't think the previous version was swapping, but I'm
>> not sure (and will look into this).
>>
>> I take Scherer's point that the question is not well-posed, so I am
>> thinking about how to phrase the question better.
>>
>> I am slightly worried that, if the GC were mark and sweep (and OCaml's
>> GC is partly mark and sweep, as I understand it) and if the GC is
>> invoked sufficiently often, then indeed the performance may change.
>> For example, a program that runs for n steps, and allocates a
>> fixed-size chunk of memory (that is kept live) at each step, is
>> naively O(n). If each step of the program is followed by a mark+sweep
>> GC, then the time cost of the GC is roughly O(n^2). So with GC the
>> program may perform as O(n^2). Of course, what actually happens is
>> that GC does not run after each step, and a real analysis would be
>> sensitive to the size of the live set (relative to the memory
>> available), the amount of garbage produced etc.
>>
>> Are there any pointers to analyses (not specific to OCaml) along these lines?
>>
>>
>>
>>>
>>> Jacques Garrigue
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

  reply	other threads:[~2013-12-05 11:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-04 12:20 Tom Ridge
2013-12-04 12:43 ` Malcolm Matalka
2013-12-04 13:48 ` Gabriel Scherer
2013-12-19 22:47   ` Richard W.M. Jones
2013-12-22  2:25     ` Jon Harrop
2013-12-22  3:04       ` David Sheets
2013-12-22 12:43         ` Jeff Schultz
2013-12-22 10:41       ` Richard W.M. Jones
2013-12-04 18:09 ` Jon Harrop
2013-12-05  1:09 ` Jacques Garrigue
2013-12-05 10:08   ` Tom Ridge
2013-12-05 10:37     ` Malcolm Matalka
2013-12-05 10:48     ` Michael Hicks
2013-12-05 11:21       ` Tom Ridge [this message]
2013-12-05 21:09       ` Jon Harrop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABooLwO_X2hq9n6pKJk1OKrocVmzET_A58=SdyKg8Gs8Eeej1A@mail.gmail.com' \
    --to=tom.j.ridge+list@googlemail.com \
    --cc=caml-list@inria.fr \
    --cc=mwh@cs.umd.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).