caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Question about garbage collection and impact on performance
@ 2013-12-04 12:20 Tom Ridge
  2013-12-04 12:43 ` Malcolm Matalka
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Tom Ridge @ 2013-12-04 12:20 UTC (permalink / raw)
  To: caml-list

Dear caml-list,

I have an OCaml program which I expect to run in time O((n^3) *
(ln(n))) say. My expectations are based (unrealistically) on ignoring
garbage collection completely. As inputs get large, the program
performs worse than I expect.

My question is: is it possible for OCaml's garbage collection to alter
the time complexity of my program?

If the answer is "yes", then are there any expectations I might have
about how bad this alteration might be? For example, (without thinking
about it too hard) it seems unreasonable for GC to turn a polytime
program into an exponential time program, but is this actually true
for OCaml's garbage collector?

I have read some material on OCaml's GC, but I did not form any
intuitions yet about the answers to these questions.

Thanks in advance

Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-04 12:20 [Caml-list] Question about garbage collection and impact on performance Tom Ridge
@ 2013-12-04 12:43 ` Malcolm Matalka
  2013-12-04 13:48 ` Gabriel Scherer
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Malcolm Matalka @ 2013-12-04 12:43 UTC (permalink / raw)
  To: Tom Ridge; +Cc: caml-list

Have you tried turning on verbose gc so you can determine how much of
your algorithm's time is actually spent in GC?

Tom Ridge <tom.j.ridge+list@googlemail.com> writes:

> Dear caml-list,
>
> I have an OCaml program which I expect to run in time O((n^3) *
> (ln(n))) say. My expectations are based (unrealistically) on ignoring
> garbage collection completely. As inputs get large, the program
> performs worse than I expect.
>
> My question is: is it possible for OCaml's garbage collection to alter
> the time complexity of my program?
>
> If the answer is "yes", then are there any expectations I might have
> about how bad this alteration might be? For example, (without thinking
> about it too hard) it seems unreasonable for GC to turn a polytime
> program into an exponential time program, but is this actually true
> for OCaml's garbage collector?
>
> I have read some material on OCaml's GC, but I did not form any
> intuitions yet about the answers to these questions.
>
> Thanks in advance
>
> Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-04 12:20 [Caml-list] Question about garbage collection and impact on performance Tom Ridge
  2013-12-04 12:43 ` Malcolm Matalka
@ 2013-12-04 13:48 ` Gabriel Scherer
  2013-12-19 22:47   ` Richard W.M. Jones
  2013-12-04 18:09 ` Jon Harrop
  2013-12-05  1:09 ` Jacques Garrigue
  3 siblings, 1 reply; 15+ messages in thread
From: Gabriel Scherer @ 2013-12-04 13:48 UTC (permalink / raw)
  To: Tom Ridge; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2334 bytes --]

This was the topic of the following discussion a few years ago:

http://cstheory.stackexchange.com/questions/2720/can-the-cost-of-gc-be-neglected-when-analyzing-the-running-time-of-worst-case-da

My personal impression is that the question is not that well-posed:
- if you assume infinite memory, you don't actually need a GC (and for any
input you can tweak the GC setting to make sure no collection happens)
- if you assume finite memory, the notion of asymptotic behaviour breaks
down unless you can prove your algorithm has a finite peak memory
consumption that is below that bound
- this is independent of whether or not your language using a GC; if your
algorithm constantly allocates heap memory, you will run into compaction
issues even with malloc/free that may degrade theoretical performances

Neelk answer in the link I gave above is that if you can tune your GC to
make sure collections happen infrequently enough, you can make collection
amortized constant-time. But that means you have to accept some memory
wasted (the runtime using significantly more memory than the size of the
actually live data), possibly proportional to the running time of your
program.


On Wed, Dec 4, 2013 at 1:20 PM, Tom Ridge
<tom.j.ridge+list@googlemail.com>wrote:

> Dear caml-list,
>
> I have an OCaml program which I expect to run in time O((n^3) *
> (ln(n))) say. My expectations are based (unrealistically) on ignoring
> garbage collection completely. As inputs get large, the program
> performs worse than I expect.
>
> My question is: is it possible for OCaml's garbage collection to alter
> the time complexity of my program?
>
> If the answer is "yes", then are there any expectations I might have
> about how bad this alteration might be? For example, (without thinking
> about it too hard) it seems unreasonable for GC to turn a polytime
> program into an exponential time program, but is this actually true
> for OCaml's garbage collector?
>
> I have read some material on OCaml's GC, but I did not form any
> intuitions yet about the answers to these questions.
>
> Thanks in advance
>
> Tom
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 3348 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Caml-list] Question about garbage collection and impact on performance
  2013-12-04 12:20 [Caml-list] Question about garbage collection and impact on performance Tom Ridge
  2013-12-04 12:43 ` Malcolm Matalka
  2013-12-04 13:48 ` Gabriel Scherer
@ 2013-12-04 18:09 ` Jon Harrop
  2013-12-05  1:09 ` Jacques Garrigue
  3 siblings, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2013-12-04 18:09 UTC (permalink / raw)
  To: 'Tom Ridge', 'caml-list'

> is it possible for OCaml's garbage collection to alter the time complexity
of my program?

Yes.

> If the answer is "yes", then are there any expectations I might have about
how bad this alteration might be?

I observed List.map taking O(n^2) time for large "n" because it leaks stack
space and the GC traverses the O(n)-deep stack O(n) times during its
execution.

> For example, (without thinking about it too hard) it seems unreasonable
for GC to turn a polytime program into an exponential time program, but is
this actually true for OCaml's garbage collector?

I think you can just get an extra O(n). I'd also look out for very large
arrays on the heap which, IIRC, cause O(n) pause.

I haven't used OCaml in anger for a few years so take this with a grain of
salt... :-)

Cheers,
Jon.

-----Original Message-----
From: tom.j.ridge@googlemail.com [mailto:tom.j.ridge@googlemail.com] On
Behalf Of Tom Ridge
Sent: 04 December 2013 12:21
To: caml-list
Subject: [Caml-list] Question about garbage collection and impact on
performance

Dear caml-list,

I have an OCaml program which I expect to run in time O((n^3) *
(ln(n))) say. My expectations are based (unrealistically) on ignoring
garbage collection completely. As inputs get large, the program performs
worse than I expect.

My question is: is it possible for OCaml's garbage collection to alter the
time complexity of my program?

If the answer is "yes", then are there any expectations I might have about
how bad this alteration might be? For example, (without thinking about it
too hard) it seems unreasonable for GC to turn a polytime program into an
exponential time program, but is this actually true for OCaml's garbage
collector?

I have read some material on OCaml's GC, but I did not form any intuitions
yet about the answers to these questions.

Thanks in advance

Tom

--
Caml-list mailing list.  Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-04 12:20 [Caml-list] Question about garbage collection and impact on performance Tom Ridge
                   ` (2 preceding siblings ...)
  2013-12-04 18:09 ` Jon Harrop
@ 2013-12-05  1:09 ` Jacques Garrigue
  2013-12-05 10:08   ` Tom Ridge
  3 siblings, 1 reply; 15+ messages in thread
From: Jacques Garrigue @ 2013-12-05  1:09 UTC (permalink / raw)
  To: Tom Ridge; +Cc: OCaml Mailing List

2013/12/04 21:20, Tom Ridge <tom.j.ridge+list@googlemail.com>:

> Dear caml-list,
> 
> I have an OCaml program which I expect to run in time O((n^3) *
> (ln(n))) say. My expectations are based (unrealistically) on ignoring
> garbage collection completely. As inputs get large, the program
> performs worse than I expect.
> 
> My question is: is it possible for OCaml's garbage collection to alter
> the time complexity of my program?

I would say that, on a program that is already O(n^3), that would be very surprising.
What kind of measure did you do?
If your program uses lots of memory, swapping may be a major slowdown,
using a GC or not (but a badly tuned GC may cause more swapping).

Jacques Garrigue

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-05  1:09 ` Jacques Garrigue
@ 2013-12-05 10:08   ` Tom Ridge
  2013-12-05 10:37     ` Malcolm Matalka
  2013-12-05 10:48     ` Michael Hicks
  0 siblings, 2 replies; 15+ messages in thread
From: Tom Ridge @ 2013-12-05 10:08 UTC (permalink / raw)
  To: Jacques Garrigue; +Cc: OCaml Mailing List

On 5 December 2013 01:09, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
> 2013/12/04 21:20, Tom Ridge <tom.j.ridge+list@googlemail.com>:
>
>> Dear caml-list,
>>
>> I have an OCaml program which I expect to run in time O((n^3) *
>> (ln(n))) say. My expectations are based (unrealistically) on ignoring
>> garbage collection completely. As inputs get large, the program
>> performs worse than I expect.
>>
>> My question is: is it possible for OCaml's garbage collection to alter
>> the time complexity of my program?
>
> I would say that, on a program that is already O(n^3), that would be very surprising.

Surprising, but not impossible?

> What kind of measure did you do?

A rather basic measurement of the time the program takes (using /usr/bin/time).

> If your program uses lots of memory, swapping may be a major slowdown,
> using a GC or not (but a badly tuned GC may cause more swapping).

Having played around with GC settings, I now get the expected
behaviour. I don't think the previous version was swapping, but I'm
not sure (and will look into this).

I take Scherer's point that the question is not well-posed, so I am
thinking about how to phrase the question better.

I am slightly worried that, if the GC were mark and sweep (and OCaml's
GC is partly mark and sweep, as I understand it) and if the GC is
invoked sufficiently often, then indeed the performance may change.
For example, a program that runs for n steps, and allocates a
fixed-size chunk of memory (that is kept live) at each step, is
naively O(n). If each step of the program is followed by a mark+sweep
GC, then the time cost of the GC is roughly O(n^2). So with GC the
program may perform as O(n^2). Of course, what actually happens is
that GC does not run after each step, and a real analysis would be
sensitive to the size of the live set (relative to the memory
available), the amount of garbage produced etc.

Are there any pointers to analyses (not specific to OCaml) along these lines?



>
> Jacques Garrigue

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-05 10:08   ` Tom Ridge
@ 2013-12-05 10:37     ` Malcolm Matalka
  2013-12-05 10:48     ` Michael Hicks
  1 sibling, 0 replies; 15+ messages in thread
From: Malcolm Matalka @ 2013-12-05 10:37 UTC (permalink / raw)
  To: Tom Ridge; +Cc: Jacques Garrigue, OCaml Mailing List

Perhaps posting some experiments would help people help you.  In
particular it would be nice to see:

- The actual timings of what you consider a good run and a bad run

- Those runs with verbose gc turned on

- Amount of memory consumption

- The timings for if you jack up the amount of memory that the runtime
  system is allowed to use on both a good run and a bad run to determine
  the system

In short, the answer to your question seems to be "maybe".  But enough
information has not been provided to suggest if it's responsible for
what you're seeing or just a theoretical possibility.

Tom Ridge <tom.j.ridge+list@googlemail.com> writes:

> On 5 December 2013 01:09, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
>> 2013/12/04 21:20, Tom Ridge <tom.j.ridge+list@googlemail.com>:
>>
>>> Dear caml-list,
>>>
>>> I have an OCaml program which I expect to run in time O((n^3) *
>>> (ln(n))) say. My expectations are based (unrealistically) on ignoring
>>> garbage collection completely. As inputs get large, the program
>>> performs worse than I expect.
>>>
>>> My question is: is it possible for OCaml's garbage collection to alter
>>> the time complexity of my program?
>>
>> I would say that, on a program that is already O(n^3), that would be very surprising.
>
> Surprising, but not impossible?
>
>> What kind of measure did you do?
>
> A rather basic measurement of the time the program takes (using /usr/bin/time).
>
>> If your program uses lots of memory, swapping may be a major slowdown,
>> using a GC or not (but a badly tuned GC may cause more swapping).
>
> Having played around with GC settings, I now get the expected
> behaviour. I don't think the previous version was swapping, but I'm
> not sure (and will look into this).
>
> I take Scherer's point that the question is not well-posed, so I am
> thinking about how to phrase the question better.
>
> I am slightly worried that, if the GC were mark and sweep (and OCaml's
> GC is partly mark and sweep, as I understand it) and if the GC is
> invoked sufficiently often, then indeed the performance may change.
> For example, a program that runs for n steps, and allocates a
> fixed-size chunk of memory (that is kept live) at each step, is
> naively O(n). If each step of the program is followed by a mark+sweep
> GC, then the time cost of the GC is roughly O(n^2). So with GC the
> program may perform as O(n^2). Of course, what actually happens is
> that GC does not run after each step, and a real analysis would be
> sensitive to the size of the live set (relative to the memory
> available), the amount of garbage produced etc.
>
> Are there any pointers to analyses (not specific to OCaml) along these lines?
>
>
>
>>
>> Jacques Garrigue

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-05 10:08   ` Tom Ridge
  2013-12-05 10:37     ` Malcolm Matalka
@ 2013-12-05 10:48     ` Michael Hicks
  2013-12-05 11:21       ` Tom Ridge
  2013-12-05 21:09       ` Jon Harrop
  1 sibling, 2 replies; 15+ messages in thread
From: Michael Hicks @ 2013-12-05 10:48 UTC (permalink / raw)
  To: Tom Ridge; +Cc: OCaml Mailing List

I'm not sure whether this is useful or not, but Steve Blackburn recently pointed me to a talk at ISMM'10 about the limits of parallel GC based on the data structures in your program. The basic idea is that in the limit if your GC is a trace, and your data structure is a single linked list, then parallel GC will get you nowhere (unless you do something interesting). Here are the slides:

http://www.cs.technion.ac.il/~erez/presentations/parallel-trace-ismm.ppt

I wonder if the same is true in general: that a GC that repeatedly has to trace over a fixed-sized data structure will thwart the performance of a program that would otherwise run in in sub linear time? For example, suppose you execute M queries over a size-N data structure that each take time log N. In between each query you do a "constant" number of allocations (i.e., independent of both N and M). But in each case you do enough to force the GC to run. This run will be in time O(N) and so the overall running time of your program will be O(MN) not O(M log N) as you'd expect. 

This analysis seems to work for a weird program and by doing some fishy accounting (treating all of the allocations as constant), and your particular program's execution is not sub linear, so perhaps it doesn't apply to your situation. But I suspect this is the kind of thing you were thinking about? Note that my example assumes a O(live size) GC and not a O(heap size), but the latter could make the situation worse, I'd guess.

-Mike

On Dec 5, 2013, at 5:08 AM, Tom Ridge <tom.j.ridge+list@googlemail.com> wrote:

> On 5 December 2013 01:09, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
>> 2013/12/04 21:20, Tom Ridge <tom.j.ridge+list@googlemail.com>:
>> 
>>> Dear caml-list,
>>> 
>>> I have an OCaml program which I expect to run in time O((n^3) *
>>> (ln(n))) say. My expectations are based (unrealistically) on ignoring
>>> garbage collection completely. As inputs get large, the program
>>> performs worse than I expect.
>>> 
>>> My question is: is it possible for OCaml's garbage collection to alter
>>> the time complexity of my program?
>> 
>> I would say that, on a program that is already O(n^3), that would be very surprising.
> 
> Surprising, but not impossible?
> 
>> What kind of measure did you do?
> 
> A rather basic measurement of the time the program takes (using /usr/bin/time).
> 
>> If your program uses lots of memory, swapping may be a major slowdown,
>> using a GC or not (but a badly tuned GC may cause more swapping).
> 
> Having played around with GC settings, I now get the expected
> behaviour. I don't think the previous version was swapping, but I'm
> not sure (and will look into this).
> 
> I take Scherer's point that the question is not well-posed, so I am
> thinking about how to phrase the question better.
> 
> I am slightly worried that, if the GC were mark and sweep (and OCaml's
> GC is partly mark and sweep, as I understand it) and if the GC is
> invoked sufficiently often, then indeed the performance may change.
> For example, a program that runs for n steps, and allocates a
> fixed-size chunk of memory (that is kept live) at each step, is
> naively O(n). If each step of the program is followed by a mark+sweep
> GC, then the time cost of the GC is roughly O(n^2). So with GC the
> program may perform as O(n^2). Of course, what actually happens is
> that GC does not run after each step, and a real analysis would be
> sensitive to the size of the live set (relative to the memory
> available), the amount of garbage produced etc.
> 
> Are there any pointers to analyses (not specific to OCaml) along these lines?
> 
> 
> 
>> 
>> Jacques Garrigue
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-05 10:48     ` Michael Hicks
@ 2013-12-05 11:21       ` Tom Ridge
  2013-12-05 21:09       ` Jon Harrop
  1 sibling, 0 replies; 15+ messages in thread
From: Tom Ridge @ 2013-12-05 11:21 UTC (permalink / raw)
  To: Michael Hicks; +Cc: OCaml Mailing List

Yes, examples like this are what I'm worried about. Thanks!

On 5 December 2013 10:48, Michael Hicks <mwh@cs.umd.edu> wrote:
> I'm not sure whether this is useful or not, but Steve Blackburn recently pointed me to a talk at ISMM'10 about the limits of parallel GC based on the data structures in your program. The basic idea is that in the limit if your GC is a trace, and your data structure is a single linked list, then parallel GC will get you nowhere (unless you do something interesting). Here are the slides:
>
> http://www.cs.technion.ac.il/~erez/presentations/parallel-trace-ismm.ppt
>
> I wonder if the same is true in general: that a GC that repeatedly has to trace over a fixed-sized data structure will thwart the performance of a program that would otherwise run in in sub linear time? For example, suppose you execute M queries over a size-N data structure that each take time log N. In between each query you do a "constant" number of allocations (i.e., independent of both N and M). But in each case you do enough to force the GC to run. This run will be in time O(N) and so the overall running time of your program will be O(MN) not O(M log N) as you'd expect.
>
> This analysis seems to work for a weird program and by doing some fishy accounting (treating all of the allocations as constant), and your particular program's execution is not sub linear, so perhaps it doesn't apply to your situation. But I suspect this is the kind of thing you were thinking about? Note that my example assumes a O(live size) GC and not a O(heap size), but the latter could make the situation worse, I'd guess.
>
> -Mike
>
> On Dec 5, 2013, at 5:08 AM, Tom Ridge <tom.j.ridge+list@googlemail.com> wrote:
>
>> On 5 December 2013 01:09, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
>>> 2013/12/04 21:20, Tom Ridge <tom.j.ridge+list@googlemail.com>:
>>>
>>>> Dear caml-list,
>>>>
>>>> I have an OCaml program which I expect to run in time O((n^3) *
>>>> (ln(n))) say. My expectations are based (unrealistically) on ignoring
>>>> garbage collection completely. As inputs get large, the program
>>>> performs worse than I expect.
>>>>
>>>> My question is: is it possible for OCaml's garbage collection to alter
>>>> the time complexity of my program?
>>>
>>> I would say that, on a program that is already O(n^3), that would be very surprising.
>>
>> Surprising, but not impossible?
>>
>>> What kind of measure did you do?
>>
>> A rather basic measurement of the time the program takes (using /usr/bin/time).
>>
>>> If your program uses lots of memory, swapping may be a major slowdown,
>>> using a GC or not (but a badly tuned GC may cause more swapping).
>>
>> Having played around with GC settings, I now get the expected
>> behaviour. I don't think the previous version was swapping, but I'm
>> not sure (and will look into this).
>>
>> I take Scherer's point that the question is not well-posed, so I am
>> thinking about how to phrase the question better.
>>
>> I am slightly worried that, if the GC were mark and sweep (and OCaml's
>> GC is partly mark and sweep, as I understand it) and if the GC is
>> invoked sufficiently often, then indeed the performance may change.
>> For example, a program that runs for n steps, and allocates a
>> fixed-size chunk of memory (that is kept live) at each step, is
>> naively O(n). If each step of the program is followed by a mark+sweep
>> GC, then the time cost of the GC is roughly O(n^2). So with GC the
>> program may perform as O(n^2). Of course, what actually happens is
>> that GC does not run after each step, and a real analysis would be
>> sensitive to the size of the live set (relative to the memory
>> available), the amount of garbage produced etc.
>>
>> Are there any pointers to analyses (not specific to OCaml) along these lines?
>>
>>
>>
>>>
>>> Jacques Garrigue
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Caml-list] Question about garbage collection and impact on performance
  2013-12-05 10:48     ` Michael Hicks
  2013-12-05 11:21       ` Tom Ridge
@ 2013-12-05 21:09       ` Jon Harrop
  1 sibling, 0 replies; 15+ messages in thread
From: Jon Harrop @ 2013-12-05 21:09 UTC (permalink / raw)
  To: 'Michael Hicks', 'Tom Ridge'; +Cc: 'OCaml Mailing List'

> ...then parallel GC will get you nowhere...

The topology of the heap can impede scalability but OCaml's GC isn't
parallel so that's irrelevant.

> I wonder if the same is true in general

No. Scalability is nonsensical when you're only using one core.

OCaml's GC is incremental and it is careful to put enough effort into each
slice of major heap collection to make sure it keeps up. As I said before,
the only source of problems I'm aware of are deep stacks and arrays with
many elements.

Cheers,
Jon.

-----Original Message-----
From: caml-list-request@inria.fr [mailto:caml-list-request@inria.fr] On
Behalf Of Michael Hicks
Sent: 05 December 2013 10:49
To: Tom Ridge
Cc: OCaml Mailing List
Subject: Re: [Caml-list] Question about garbage collection and impact on
performance

I'm not sure whether this is useful or not, but Steve Blackburn recently
pointed me to a talk at ISMM'10 about the limits of parallel GC based on the
data structures in your program. The basic idea is that in the limit if your
GC is a trace, and your data structure is a single linked list, then
parallel GC will get you nowhere (unless you do something interesting). Here
are the slides:

http://www.cs.technion.ac.il/~erez/presentations/parallel-trace-ismm.ppt

I wonder if the same is true in general: that a GC that repeatedly has to
trace over a fixed-sized data structure will thwart the performance of a
program that would otherwise run in in sub linear time? For example, suppose
you execute M queries over a size-N data structure that each take time log
N. In between each query you do a "constant" number of allocations (i.e.,
independent of both N and M). But in each case you do enough to force the GC
to run. This run will be in time O(N) and so the overall running time of
your program will be O(MN) not O(M log N) as you'd expect. 

This analysis seems to work for a weird program and by doing some fishy
accounting (treating all of the allocations as constant), and your
particular program's execution is not sub linear, so perhaps it doesn't
apply to your situation. But I suspect this is the kind of thing you were
thinking about? Note that my example assumes a O(live size) GC and not a
O(heap size), but the latter could make the situation worse, I'd guess.

-Mike

On Dec 5, 2013, at 5:08 AM, Tom Ridge <tom.j.ridge+list@googlemail.com>
wrote:

> On 5 December 2013 01:09, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp>
wrote:
>> 2013/12/04 21:20, Tom Ridge <tom.j.ridge+list@googlemail.com>:
>> 
>>> Dear caml-list,
>>> 
>>> I have an OCaml program which I expect to run in time O((n^3) *
>>> (ln(n))) say. My expectations are based (unrealistically) on 
>>> ignoring garbage collection completely. As inputs get large, the 
>>> program performs worse than I expect.
>>> 
>>> My question is: is it possible for OCaml's garbage collection to 
>>> alter the time complexity of my program?
>> 
>> I would say that, on a program that is already O(n^3), that would be very
surprising.
> 
> Surprising, but not impossible?
> 
>> What kind of measure did you do?
> 
> A rather basic measurement of the time the program takes (using
/usr/bin/time).
> 
>> If your program uses lots of memory, swapping may be a major 
>> slowdown, using a GC or not (but a badly tuned GC may cause more
swapping).
> 
> Having played around with GC settings, I now get the expected 
> behaviour. I don't think the previous version was swapping, but I'm 
> not sure (and will look into this).
> 
> I take Scherer's point that the question is not well-posed, so I am 
> thinking about how to phrase the question better.
> 
> I am slightly worried that, if the GC were mark and sweep (and OCaml's 
> GC is partly mark and sweep, as I understand it) and if the GC is 
> invoked sufficiently often, then indeed the performance may change.
> For example, a program that runs for n steps, and allocates a 
> fixed-size chunk of memory (that is kept live) at each step, is 
> naively O(n). If each step of the program is followed by a mark+sweep 
> GC, then the time cost of the GC is roughly O(n^2). So with GC the 
> program may perform as O(n^2). Of course, what actually happens is 
> that GC does not run after each step, and a real analysis would be 
> sensitive to the size of the live set (relative to the memory 
> available), the amount of garbage produced etc.
> 
> Are there any pointers to analyses (not specific to OCaml) along these
lines?
> 
> 
> 
>> 
>> Jacques Garrigue
> 
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs=


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-04 13:48 ` Gabriel Scherer
@ 2013-12-19 22:47   ` Richard W.M. Jones
  2013-12-22  2:25     ` Jon Harrop
  0 siblings, 1 reply; 15+ messages in thread
From: Richard W.M. Jones @ 2013-12-19 22:47 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Tom Ridge, caml-list

On Wed, Dec 04, 2013 at 02:48:49PM +0100, Gabriel Scherer wrote:
> This was the topic of the following discussion a few years ago:
> 
> http://cstheory.stackexchange.com/questions/2720/can-the-cost-of-gc-be-neglected-when-analyzing-the-running-time-of-worst-case-da
> 
> My personal impression is that the question is not that well-posed:
> - if you assume infinite memory, you don't actually need a GC (and for any
> input you can tweak the GC setting to make sure no collection happens)

How could "infinite" memory be implemented without affecting the
runtime of programs on such a machine?

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Caml-list] Question about garbage collection and impact on performance
  2013-12-19 22:47   ` Richard W.M. Jones
@ 2013-12-22  2:25     ` Jon Harrop
  2013-12-22  3:04       ` David Sheets
  2013-12-22 10:41       ` Richard W.M. Jones
  0 siblings, 2 replies; 15+ messages in thread
From: Jon Harrop @ 2013-12-22  2:25 UTC (permalink / raw)
  To: 'Richard W.M. Jones', 'Gabriel Scherer'
  Cc: 'Tom Ridge', 'caml-list'

Richard Jones wrote:
> > My personal impression is that the question is not that well-posed:
> > - if you assume infinite memory, you don't actually need a GC (and for 
> > any input you can tweak the GC setting to make sure no collection 
> > happens)
>
> How could "infinite" memory be implemented without affecting the runtime
of programs on such a machine?

I guess O(1) lookup would actually be O(n^(1/3)) due to that speed of light
thing. ;-)

Cheers,
Jon.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-22  2:25     ` Jon Harrop
@ 2013-12-22  3:04       ` David Sheets
  2013-12-22 12:43         ` Jeff Schultz
  2013-12-22 10:41       ` Richard W.M. Jones
  1 sibling, 1 reply; 15+ messages in thread
From: David Sheets @ 2013-12-22  3:04 UTC (permalink / raw)
  To: jon; +Cc: Richard W.M. Jones, Gabriel Scherer, Tom Ridge, caml-list

On Sun, Dec 22, 2013 at 2:25 AM, Jon Harrop <jon@ffconsultancy.com> wrote:
> Richard Jones wrote:
>> > My personal impression is that the question is not that well-posed:
>> > - if you assume infinite memory, you don't actually need a GC (and for
>> > any input you can tweak the GC setting to make sure no collection
>> > happens)
>>
>> How could "infinite" memory be implemented without affecting the runtime
> of programs on such a machine?
>
> I guess O(1) lookup would actually be O(n^(1/3)) due to that speed of light
> thing. ;-)

I think there are some fundamental limits to information density in
the local universe. I reply here, though, to propose that the lookup
would be O(n^(1/2)) because, locally, space is Euclidean. I'm not sure
how you got the 1/3 exponent.

This complexity analysis is a little weird, though, as some of your
infinite memory would be arbitrarily far away and thus would suffer
the square root of arbitrarily large latency. Also, your address space
is infinite? Of course, the machine words are infinitely sized.

It seems that even in theoretical computer science, arbitrarily large
memory must be modeled as a distributed system due to the laws of
physics.

Perhaps I've made an error in my understanding. Please forgive and
correct me, if so.

Happy upswing,

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-22  2:25     ` Jon Harrop
  2013-12-22  3:04       ` David Sheets
@ 2013-12-22 10:41       ` Richard W.M. Jones
  1 sibling, 0 replies; 15+ messages in thread
From: Richard W.M. Jones @ 2013-12-22 10:41 UTC (permalink / raw)
  To: Jon Harrop
  Cc: 'Gabriel Scherer', 'Tom Ridge', 'caml-list'

On Sun, Dec 22, 2013 at 02:25:16AM -0000, Jon Harrop wrote:
> Richard Jones wrote:
> > > My personal impression is that the question is not that well-posed:
> > > - if you assume infinite memory, you don't actually need a GC (and for 
> > > any input you can tweak the GC setting to make sure no collection 
> > > happens)
> >
> > How could "infinite" memory be implemented without affecting the runtime
> of programs on such a machine?
> 
> I guess O(1) lookup would actually be O(n^(1/3)) due to that speed of light
> thing. ;-)

Right.  Or if the memory requirement got really big, you'd
have to have a man to run around fetching tapes from a big
warehouse, which doesn't sound very O(1) to me ..

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Question about garbage collection and impact on performance
  2013-12-22  3:04       ` David Sheets
@ 2013-12-22 12:43         ` Jeff Schultz
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Schultz @ 2013-12-22 12:43 UTC (permalink / raw)
  To: jon; +Cc: caml-list

On 22/12/2013 14:04, David Sheets wrote:
> On Sun, Dec 22, 2013 at 2:25 AM, Jon Harrop <jon@ffconsultancy.com> wrote:
>> Richard Jones wrote:
>>>> My personal impression is that the question is not that well-posed:
>>>> - if you assume infinite memory, you don't actually need a GC (and for
>>>> any input you can tweak the GC setting to make sure no collection
>>>> happens)
>>>
>>> How could "infinite" memory be implemented without affecting the runtime
>> of programs on such a machine?
>>
>> I guess O(1) lookup would actually be O(n^(1/3)) due to that speed of light
>> thing. ;-)
>
> I think there are some fundamental limits to information density in
> the local universe. I reply here, though, to propose that the lookup
> would be O(n^(1/2)) because, locally, space is Euclidean. I'm not sure
> how you got the 1/3 exponent.

Eight times as much memory, uniformly packed in three dimensions, is at 
most twice as long in any one dimension.

However, IIRC, Gene Amdahl observed that the limiting characteristic of 
memory size versus speed is actually the surface area of the sphere 
containing the memory, because that controls heat dissipation.  This 
gives an asymptotic square root law.  Various assumptions could 
challenge that, but not as long as memory consumes power in proportion 
to its size.


     Jeff Schultz

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-12-22 12:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-04 12:20 [Caml-list] Question about garbage collection and impact on performance Tom Ridge
2013-12-04 12:43 ` Malcolm Matalka
2013-12-04 13:48 ` Gabriel Scherer
2013-12-19 22:47   ` Richard W.M. Jones
2013-12-22  2:25     ` Jon Harrop
2013-12-22  3:04       ` David Sheets
2013-12-22 12:43         ` Jeff Schultz
2013-12-22 10:41       ` Richard W.M. Jones
2013-12-04 18:09 ` Jon Harrop
2013-12-05  1:09 ` Jacques Garrigue
2013-12-05 10:08   ` Tom Ridge
2013-12-05 10:37     ` Malcolm Matalka
2013-12-05 10:48     ` Michael Hicks
2013-12-05 11:21       ` Tom Ridge
2013-12-05 21:09       ` Jon Harrop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).