Re: Huge memory consumption on accessing large newsgroup

Gnus development mailing list
 help / color / mirror / Atom feed

From: Daniel Pittman <daniel@rimspace.net>
To: ding@gnus.org
Subject: Re: Huge memory consumption on accessing large newsgroup
Date: Tue, 02 Oct 2007 13:23:56 +1000	[thread overview]
Message-ID: <874phatd0z.fsf@enki.rimspace.net> (raw)
In-Reply-To: <m2zlz25kn1.fsf@lifelogs.com>

Ted Zlatanov <tzz@lifelogs.com> writes:
> On Mon, 01 Oct 2007 11:04:54 +1000 Daniel Pittman <daniel@rimspace.net> wrote: 
> DP> Ted Zlatanov <tzz@lifelogs.com> writes:
>
>>> Consider using inversion lists.  
>
>>> They don't require pairs of integers; each value represents a flip.
>>> They have other nice properties too, though it's been a while since
>>> I looked at them so I can't name them.
>
> DP> Mmm.  The existing Gnus range code is pretty much as efficient,
> DP> has lower computational complexity for the operations Gnus uses
> DP> and is already existent and tested.
>
> We're talking about reducing memory consumption, not CPU usage.  For
> CPU usage the current structure is pretty good.  For memory usage, as
> you can see from the subject of the message, it's not ideal.

Ah.  I mis-expressed myself.

The issue with large memory consumption, as far as I could see from the
thread, was that the compressed range data type was expanded to a flat
list.  This caused, no surprise, huge memory use.

The original problem is that the code calls `gnus-uncompress-range' on
the data *at all* -- and, so, turns a nicely brief data structure into a
vast bloated million-number list.

The *solution* is to rebuild the algorithm to operate on the compressed
version (regardless of the internal representation), not to change the
representation.

In other words: the problem is that something calls
`gnus-uncompress-range', not the format of the data passed to that call.

The "efficient" part referred to both memory and CPU, and the
lower computational complexity is a consideration only after the memory
load is addressed -- but we wouldn't want to turn huge memory use into
hours of CPU time to enter the group. ;)

> We could use several data structures as needed to accomodate large
> data lists.  

Regarding the original problem: we already /have/ a representation of
large data lists and, in fact, the problem occurs related to them:

    (gnus-uncompress-range '(3437 . 30538699))

That second bit is the compressed range data.  The function, which
causes the memory bloat, turns that into a list that contains every
integer between those two values...

Using inversion lists rather than the Gnus range data here would have no
effect other than to (a) convert the Gnus range data into an inversion
list and (b) uncompress the inversion list with the same memory use.

We need to remove the code that deals with uncompressed values rather
than to change the representation of the (already memory efficient)
compressed data format. :)

> Inversion lists are nice in some cases and not others; I think they
> are worth at least some consideration since they don't require a pair
> of numbers to express a range and thus may save some memory.

They do, however, use a pair of numbers to express a single stand-alone
number in the sequence.

> I don't know the Emacs internals intimately, but I think (24 86 88 90)
> takes up less memory than ((24 . 86) (88 . 90)).  Am I wrong?

No, you are correct.  However, if the sequence was:

  24, 26, 28, 30-300 the output would be:

  (24 25 26 27 28 29 30 300)
  (24 26 26 (30 . 300))

Inversion lists are more efficient for long runs of range numbers by one
additional cons cell, but less efficient by one cons cell for each
stand-alone number that is not part of a range.

> Note Gaute's original message was about compressing pairs of integers,
> not general Gnus data structures, so please consider my message in
> that context.

That seems reasonable, and in general I agree: inversion lists are a
nice data structure, a (presumably) good implementation (with tests)
already exists and the cost of adding this new data type isn't /that/
high -- on those terms.

On the other hand the Gnus range code already exists, is well tested (if
not with stand-alone tests) and is the data structure used elsewhere in
the Gnus code.

The Gnus range code is also more efficient if stand alone outlier values
are more common than ranges, while inversion lists are more efficient if
they are less common.

The end problem, though, is that the compressed representation is
expanded to a flat list by the code.  That makes the exact compressed
data structure less important than finding /any/ solution that doesn't
expand it.

Oh, and using inversion lists here would require converting the existing
compressed data format into an inversion list and then operating on
that.  That is ... not really helpful. :)

> DP> I don't think that you would see sufficient benefit from
> DP> introducing the additional data type to justify spending your time
> DP> on it -- but since I am not volunteering to do the work I can't
> DP> tell you how to do it. ;)
>
> I am not sure if you are addressing me or Gaute.  I made a suggestion,
> and an implementation is already in place (I mentioned Kim Storm
> implemented it).  I'm willing to assist Gaute, but he has to be
> interested in my suggestion.  I don't have the time for this as a solo
> project.

I was addressing both of you -- I can express an opinion, and while I
feel qualified to comment on this since I have both worked on this
specific problem before[1] and addressed this conversation before[2].

Since I am not actually writing the code, though, if Gaute or yourself
feel like writing some Lisp that uses inversion lists I can't actually
stop you. ;)

I would strongly advise, however, that you are much better off ignoring
inversion lists here and focusing on converting the existing code to use
the compressed `gnus-range' data structure rather than expanding to a
flat list.

Also of note: the proposed "solution" of narrowing the range is not
going to help much; fixing the code is the right answer.

Regards,
        Daniel

(And because this has been a stupidly annoying couple of week in other
 areas, and because this is nice simple and essentially stress-free work
 I am getting tempted to fix it myself.

 So, maybe inversion lists were the way to get the code fixed after all,
 if not quite so directly as expected. ;) 

Footnotes: 
[1]  The nnimap code did, as far as I can tell, exactly what the code
     causing trouble here dose.

[2]  Inversion lists were suggested at the time, as I recall, though I
     don't have the enthusiasm to dig in the archive and find out who
     suggested them.

-- 
Daniel Pittman <daniel@cybersource.com.au>           Phone: 03 9621 2377
Level 4, 10 Queen St, Melbourne             Web: http://www.cyber.com.au
Cybersource: Australia's Leading Linux and Open Source Solutions Company

next prev parent reply	other threads:[~2007-10-02  3:23 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87wsw4u21m.fsf@gmx.de>
2007-08-10  9:08 ` Katsumi Yamaoka
2007-08-10 11:39   ` Katsumi Yamaoka
2007-08-10 12:43     ` Sven Joachim
2007-08-13 11:44       ` Katsumi Yamaoka
2007-08-13 17:30         ` Sven Joachim
2007-08-14 11:46           ` Katsumi Yamaoka
2007-09-13 10:27             ` Katsumi Yamaoka
2007-08-10 12:42   ` Sven Joachim
2007-09-29 21:04     ` Gaute Strokkenes
2007-09-30 22:11       ` Ted Zlatanov
2007-10-01  0:29         ` Katsumi Yamaoka
2007-10-01  1:04         ` Daniel Pittman
2007-10-02  2:13           ` Ted Zlatanov
2007-10-02  3:23             ` Daniel Pittman [this message]
2007-10-02 11:11               ` Ted Zlatanov
2007-10-02 12:17                 ` Daniel Pittman
2007-10-02 16:08                   ` Ted Zlatanov
2007-10-03  0:19                     ` Daniel Pittman
2007-10-02 13:33               ` Daniel Pittman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874phatd0z.fsf@enki.rimspace.net \
    --to=daniel@rimspace.net \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).