From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/65326 Path: news.gmane.org!not-for-mail From: Daniel Pittman Newsgroups: gmane.emacs.gnus.general Subject: Re: Huge memory consumption on accessing large newsgroup Date: Tue, 02 Oct 2007 13:23:56 +1000 Organization: Cybersource: Australia's Leading Linux and Open Source Solutions Company Message-ID: <874phatd0z.fsf@enki.rimspace.net> References: <87wsw4u21m.fsf@gmx.de> <87sl6rh886.fsf@gmx.de> <87hcldp4j1.fsf@srcf.ucam.org> <87odfjprux.fsf@enki.rimspace.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1191295506 3946 80.91.229.12 (2 Oct 2007 03:25:06 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 2 Oct 2007 03:25:06 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M13838@lists.math.uh.edu Tue Oct 02 05:25:02 2007 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.50) id 1IcYNZ-0007UF-UJ for ding-account@gmane.org; Tue, 02 Oct 2007 05:25:02 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1IcYMu-0007bX-4q; Mon, 01 Oct 2007 22:24:20 -0500 Original-Received: from mx2.math.uh.edu ([129.7.128.33]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1IcYMs-0007bI-LC for ding@lists.math.uh.edu; Mon, 01 Oct 2007 22:24:18 -0500 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx2.math.uh.edu with esmtp (Exim 4.67) (envelope-from ) id 1IcYMj-0001Ox-H0 for ding@lists.math.uh.edu; Mon, 01 Oct 2007 22:24:18 -0500 Original-Received: from 203-217-31-68.perm.iinet.net.au ([203.217.31.68] helo=anu.rimspace.net) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1IcYMZ-0005X2-00 for ; Tue, 02 Oct 2007 05:23:59 +0200 Original-Received: by anu.rimspace.net (Postfix, from userid 10) id F408B192002F; Tue, 2 Oct 2007 13:24:02 +1000 (EST) Original-Received: by enki.rimspace.net (Postfix, from userid 1000) id 880F62F0568; Tue, 2 Oct 2007 13:23:56 +1000 (EST) User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/23.0.0 (gnu/linux) X-Spam-Score: -2.5 (--) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:65326 Archived-At: Ted Zlatanov writes: > On Mon, 01 Oct 2007 11:04:54 +1000 Daniel Pittman wrote: > DP> Ted Zlatanov writes: > >>> Consider using inversion lists. > >>> They don't require pairs of integers; each value represents a flip. >>> They have other nice properties too, though it's been a while since >>> I looked at them so I can't name them. > > DP> Mmm. The existing Gnus range code is pretty much as efficient, > DP> has lower computational complexity for the operations Gnus uses > DP> and is already existent and tested. > > We're talking about reducing memory consumption, not CPU usage. For > CPU usage the current structure is pretty good. For memory usage, as > you can see from the subject of the message, it's not ideal. Ah. I mis-expressed myself. The issue with large memory consumption, as far as I could see from the thread, was that the compressed range data type was expanded to a flat list. This caused, no surprise, huge memory use. The original problem is that the code calls `gnus-uncompress-range' on the data *at all* -- and, so, turns a nicely brief data structure into a vast bloated million-number list. The *solution* is to rebuild the algorithm to operate on the compressed version (regardless of the internal representation), not to change the representation. In other words: the problem is that something calls `gnus-uncompress-range', not the format of the data passed to that call. The "efficient" part referred to both memory and CPU, and the lower computational complexity is a consideration only after the memory load is addressed -- but we wouldn't want to turn huge memory use into hours of CPU time to enter the group. ;) > We could use several data structures as needed to accomodate large > data lists. Regarding the original problem: we already /have/ a representation of large data lists and, in fact, the problem occurs related to them: (gnus-uncompress-range '(3437 . 30538699)) That second bit is the compressed range data. The function, which causes the memory bloat, turns that into a list that contains every integer between those two values... Using inversion lists rather than the Gnus range data here would have no effect other than to (a) convert the Gnus range data into an inversion list and (b) uncompress the inversion list with the same memory use. We need to remove the code that deals with uncompressed values rather than to change the representation of the (already memory efficient) compressed data format. :) > Inversion lists are nice in some cases and not others; I think they > are worth at least some consideration since they don't require a pair > of numbers to express a range and thus may save some memory. They do, however, use a pair of numbers to express a single stand-alone number in the sequence. > I don't know the Emacs internals intimately, but I think (24 86 88 90) > takes up less memory than ((24 . 86) (88 . 90)). Am I wrong? No, you are correct. However, if the sequence was: 24, 26, 28, 30-300 the output would be: (24 25 26 27 28 29 30 300) (24 26 26 (30 . 300)) Inversion lists are more efficient for long runs of range numbers by one additional cons cell, but less efficient by one cons cell for each stand-alone number that is not part of a range. > Note Gaute's original message was about compressing pairs of integers, > not general Gnus data structures, so please consider my message in > that context. That seems reasonable, and in general I agree: inversion lists are a nice data structure, a (presumably) good implementation (with tests) already exists and the cost of adding this new data type isn't /that/ high -- on those terms. On the other hand the Gnus range code already exists, is well tested (if not with stand-alone tests) and is the data structure used elsewhere in the Gnus code. The Gnus range code is also more efficient if stand alone outlier values are more common than ranges, while inversion lists are more efficient if they are less common. The end problem, though, is that the compressed representation is expanded to a flat list by the code. That makes the exact compressed data structure less important than finding /any/ solution that doesn't expand it. Oh, and using inversion lists here would require converting the existing compressed data format into an inversion list and then operating on that. That is ... not really helpful. :) > DP> I don't think that you would see sufficient benefit from > DP> introducing the additional data type to justify spending your time > DP> on it -- but since I am not volunteering to do the work I can't > DP> tell you how to do it. ;) > > I am not sure if you are addressing me or Gaute. I made a suggestion, > and an implementation is already in place (I mentioned Kim Storm > implemented it). I'm willing to assist Gaute, but he has to be > interested in my suggestion. I don't have the time for this as a solo > project. I was addressing both of you -- I can express an opinion, and while I feel qualified to comment on this since I have both worked on this specific problem before[1] and addressed this conversation before[2]. Since I am not actually writing the code, though, if Gaute or yourself feel like writing some Lisp that uses inversion lists I can't actually stop you. ;) I would strongly advise, however, that you are much better off ignoring inversion lists here and focusing on converting the existing code to use the compressed `gnus-range' data structure rather than expanding to a flat list. Also of note: the proposed "solution" of narrowing the range is not going to help much; fixing the code is the right answer. Regards, Daniel (And because this has been a stupidly annoying couple of week in other areas, and because this is nice simple and essentially stress-free work I am getting tempted to fix it myself. So, maybe inversion lists were the way to get the code fixed after all, if not quite so directly as expected. ;) Footnotes: [1] The nnimap code did, as far as I can tell, exactly what the code causing trouble here dose. [2] Inversion lists were suggested at the time, as I recall, though I don't have the enthusiasm to dig in the archive and find out who suggested them. -- Daniel Pittman Phone: 03 9621 2377 Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au Cybersource: Australia's Leading Linux and Open Source Solutions Company