* Re: Huge memory consumption on accessing large newsgroup [not found] <87wsw4u21m.fsf@gmx.de> @ 2007-08-10 9:08 ` Katsumi Yamaoka 2007-08-10 11:39 ` Katsumi Yamaoka 2007-08-10 12:42 ` Sven Joachim 0 siblings, 2 replies; 19+ messages in thread From: Katsumi Yamaoka @ 2007-08-10 9:08 UTC (permalink / raw) To: Sven Joachim; +Cc: bugs, ding [-- Attachment #1: Type: text/plain, Size: 582 bytes --] (I added the ding list to Cc.) >>>>> Sven Joachim wrote: > Gnus v5.11 > GNU Emacs 22.1.50.1 (i486-pc-linux-gnu, GTK+ Version 2.10.13) > of 2007-08-06 on debian, modified by Debian > 200 news.motzarella.org InterNetNews NNRP server INN 2.4.4 (20060818 snapshot) ready (posting ok). I visited http://news.motzarella.org/ and got an account on Motzarella out of curiosity. ;-) > When accessing comp.os.linux.misc on news.motzarella.org, a _very_ > large newsgroup with more than 30,000,000 articles, Emacs' memory > footprint grew heavily. Yes, now the ACTIVE of that group is: [-- Attachment #2: Type: application/emacs-lisp, Size: 73 bytes --] [-- Attachment #3: Type: text/plain, Size: 759 bytes --] However, I verified there are actually no more than less than 3,000 articles. It should not be a cause of this problem. > It took ~2 minutes to display its question > "How many articles...?" and the memory usage was at 555 MB (RSS). > When I answered "500" and the summary buffer finally appeared, it grew > up to 893 MB. Which is a bit scary, since my computer has "only" 1 GB > of RAM and is now already paging quite a bit. > Will this become better if I subscribe to the newsgroup and catch up? > In any case I probably will have to kill my current Emacs session soon. The real cause is that Gnus first expands this ACTIVE data into: (3437 3438 3439 3440 ...... 30538696 30538697 30538698 30538699) If you run Emacs on a super computer, try this: [-- Attachment #4: Type: application/emacs-lisp, Size: 43 bytes --] [-- Attachment #5: Type: text/plain, Size: 116 bytes --] I think the possible solution is to narrow the range into the one with which Emacs can work lightly. Here it is: [-- Attachment #6: Type: application/emacs-lisp, Size: 183 bytes --] [-- Attachment #7: Type: text/plain, Size: 576 bytes --] The 10000 will probably need to be a customizable variable. I'm going to do it next week. BTW, I needed to set the server variable `nntp-authinfo-force' to t in order to let No Gnus v0.7 send the AUTHINFO data to the Motzarella server as follows: (nntp "motzarella" (nntp-address "news.motzarella.org") (nntp-authinfo-user "yamaoka") (nntp-authinfo-password "********") (nntp-authinfo-force t) ...) I guess you have the FORCE element in the ~/.authinfo file since such a server variable has not been implemented yet in Gnus v5.11. Regards, ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-08-10 9:08 ` Huge memory consumption on accessing large newsgroup Katsumi Yamaoka @ 2007-08-10 11:39 ` Katsumi Yamaoka 2007-08-10 12:43 ` Sven Joachim 2007-08-10 12:42 ` Sven Joachim 1 sibling, 1 reply; 19+ messages in thread From: Katsumi Yamaoka @ 2007-08-10 11:39 UTC (permalink / raw) To: Sven Joachim; +Cc: bugs, ding [-- Attachment #1: Type: text/plain, Size: 439 bytes --] >>>>> Katsumi Yamaoka wrote: > I think the possible solution is to narrow the range into the > one with which Emacs can work lightly. Here it is: > (defadvice gnus-uncompress-range (before narrow-range (ranges) activate) > "Narrow the range if it is unreasonably wide." > (setcar ranges (max (car ranges) (- (cdr ranges) 10000)))) I overlooked that the argument of this function can be complicated. Try the following one instead: [-- Attachment #2: Type: application/emacs-lisp, Size: 719 bytes --] [-- Attachment #3: Type: text/plain, Size: 96 bytes --] > The 10000 will probably need to be a customizable variable. I'm > going to do it next week. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-08-10 11:39 ` Katsumi Yamaoka @ 2007-08-10 12:43 ` Sven Joachim 2007-08-13 11:44 ` Katsumi Yamaoka 0 siblings, 1 reply; 19+ messages in thread From: Sven Joachim @ 2007-08-10 12:43 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: bugs, ding Katsumi Yamaoka <yamaoka@jpl.org> writes: > I overlooked that the argument of this function can be complicated. > Try the following one instead: > > (defadvice gnus-uncompress-range (before narrow-range (ranges) activate) > "Narrow the range if it is unreasonably wide." > (let ((ttl 10000) > (rest (if (and (cdr ranges) (not (consp (cdr ranges)))) > (list ranges) > (nreverse ranges))) > range diff) > (setq ranges nil) > (while rest > (setq range (car rest) > rest (cdr rest)) > (if (numberp range) > (progn > (push range ranges) > (setq ttl (1- ttl))) > (if (= ttl 1) > (progn > (push (cdr range) ranges) > (setq ttl 0)) > (setq diff (min (- (cdr range) (car range) -1) ttl) > ttl (- ttl diff)) > (push (cons (max (car range) (- (cdr range) diff -1)) > (cdr range)) > ranges))) > (when (zerop ttl) > (setq rest nil))))) Does not seem to help much, it seems. :-) With this advice, Emacs' memory footprint grew to 303 MB on accessing be.politics at news.motzarella.org, another random example of a group with supposedly 30m+ articles. And fetching 50 headers increased it to 574 MB. Kind regards, Sven ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-08-10 12:43 ` Sven Joachim @ 2007-08-13 11:44 ` Katsumi Yamaoka 2007-08-13 17:30 ` Sven Joachim 0 siblings, 1 reply; 19+ messages in thread From: Katsumi Yamaoka @ 2007-08-13 11:44 UTC (permalink / raw) To: Sven Joachim; +Cc: bugs, ding [-- Attachment #1: Type: text/plain, Size: 962 bytes --] >>>>> Sven Joachim wrote: >> (defadvice gnus-uncompress-range (before narrow-range (ranges) activate) >> "Narrow the range if it is unreasonably wide." >> (let ((ttl 10000) [...] > Does not seem to help much, it seems. :-) With this advice, Emacs' > memory footprint grew to 303 MB on accessing be.politics at > news.motzarella.org, another random example of a group with supposedly > 30m+ articles. And fetching 50 headers increased it to 574 MB. I found the other things to be fixed. Could you try the patch (for Gnus v5.11) attached below? It still might not be complete, so I set the default value of the new variable gnus-maximum-newsgroup to nil so that it might not change the present behavior of Gnus. Now I have (setq gnus-maximum-newsgroup 100000) and Gnus runs smoothly with be.politics and comp.os.linux.misc on news.motzarella.org (though the value 100000 is too much for those groups, see a note in the docstring of the variable). [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 4988 bytes --] --- gnus.el~ 2007-07-26 07:41:56 +0000 +++ gnus.el 2007-08-13 11:43:02 +0000 @@ -1501,6 +1501,17 @@ :type '(choice (const :tag "No limit" nil) integer)) +(defcustom gnus-maximum-newsgroup nil + "The maximum number of articles a newsgroup. +If this is a number, old articles in a newsgroup exceeding this number +are silently ignored. If it is nil, no article is ignored. Note that +setting this variable to a number might prevent you from reading very +old articles." + :group 'gnus-group-select + :version "22.2" + :type '(choice (const :tag "No limit" nil) + integer)) + (defcustom gnus-use-long-file-name (not (memq system-type '(usg-unix-v xenix))) "*Non-nil means that the default name of a file to save articles in is the group name. If it's nil, the directory form of the group name is used instead. --- gnus-agent.el~ 2007-07-26 07:41:55 +0000 +++ gnus-agent.el 2007-08-13 11:43:02 +0000 @@ -1765,7 +1765,14 @@ (gnus-agent-find-parameter group 'agent-predicate))))) (articles (if fetch-all - (gnus-uncompress-range (gnus-active group)) + (if gnus-maximum-newsgroup + (let ((active (gnus-active group))) + (gnus-uncompress-range + (cons (max (car active) + (- (cdr active) + gnus-maximum-newsgroup)) + (cdr active)))) + (gnus-uncompress-range (gnus-active group))) (gnus-list-of-unread-articles group))) (gnus-decode-encoded-word-function 'identity) (gnus-decode-encoded-address-function 'identity) --- gnus-sum.el~ 2007-07-26 07:41:56 +0000 +++ gnus-sum.el 2007-08-13 11:43:02 +0000 @@ -5472,7 +5472,13 @@ ;; articles in the group, or (if that's nil), the ;; articles in the cache. (or - (gnus-uncompress-range (gnus-active group)) + (if gnus-maximum-newsgroup + (let ((active (gnus-active group))) + (gnus-uncompress-range + (cons (max (car active) + (- (cdr active) gnus-maximum-newsgroup)) + (cdr active)))) + (gnus-uncompress-range (gnus-active group))) (gnus-cache-articles-in-group group)) ;; Select only the "normal" subset of articles. (gnus-sorted-nunion @@ -6534,23 +6540,26 @@ (let* ((read (gnus-info-read (gnus-get-info group))) (active (or (gnus-active group) (gnus-activate-group group))) (last (cdr active)) + (bottom (if gnus-maximum-newsgroup + (max (car active) (- last gnus-maximum-newsgroup)) + (car active))) first nlast unread) ;; If none are read, then all are unread. (if (not read) - (setq first (car active)) + (setq first bottom) ;; If the range of read articles is a single range, then the ;; first unread article is the article after the last read ;; article. Sounds logical, doesn't it? (if (and (not (listp (cdr read))) - (or (< (car read) (car active)) + (or (< (car read) bottom) (progn (setq read (list read)) nil))) - (setq first (max (car active) (1+ (cdr read)))) + (setq first (max bottom (1+ (cdr read)))) ;; `read' is a list of ranges. (when (/= (setq nlast (or (and (numberp (car read)) (car read)) (caar read))) 1) - (setq first (car active))) + (setq first bottom)) (while read (when first (while (< first nlast) @@ -6575,7 +6584,12 @@ (gnus-list-range-difference (gnus-list-range-difference (gnus-sorted-complement - (gnus-uncompress-range active) + (gnus-uncompress-range + (if gnus-maximum-newsgroup + (cons (max (car active) + (- (cdr active) gnus-maximum-newsgroup)) + (cdr active)) + active)) (gnus-list-of-unread-articles group)) (cdr (assq 'dormant marked))) (cdr (assq 'tick marked)))))) @@ -6587,23 +6601,26 @@ (let* ((read (gnus-info-read (gnus-get-info group))) (active (or (gnus-active group) (gnus-activate-group group))) (last (cdr active)) + (bottom (if gnus-maximum-newsgroup + (max (car active) (- last gnus-maximum-newsgroup)) + (car active))) first nlast unread) ;; If none are read, then all are unread. (if (not read) - (setq first (car active)) + (setq first bottom) ;; If the range of read articles is a single range, then the ;; first unread article is the article after the last read ;; article. Sounds logical, doesn't it? (if (and (not (listp (cdr read))) - (or (< (car read) (car active)) + (or (< (car read) bottom) (progn (setq read (list read)) nil))) - (setq first (max (car active) (1+ (cdr read)))) + (setq first (max bottom (1+ (cdr read)))) ;; `read' is a list of ranges. (when (/= (setq nlast (or (and (numberp (car read)) (car read)) (caar read))) 1) - (setq first (car active))) + (setq first bottom)) (while read (when first (push (cons first nlast) unread)) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-08-13 11:44 ` Katsumi Yamaoka @ 2007-08-13 17:30 ` Sven Joachim 2007-08-14 11:46 ` Katsumi Yamaoka 0 siblings, 1 reply; 19+ messages in thread From: Sven Joachim @ 2007-08-13 17:30 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: bugs, ding Katsumi Yamaoka <yamaoka@jpl.org> writes: > I found the other things to be fixed. Could you try the patch > (for Gnus v5.11) attached below? It still might not be complete, > so I set the default value of the new variable > > gnus-maximum-newsgroup > > to nil so that it might not change the present behavior of Gnus. > Now I have > > (setq gnus-maximum-newsgroup 100000) > > and Gnus runs smoothly with be.politics and comp.os.linux.misc on > news.motzarella.org (though the value 100000 is too much for those > groups, see a note in the docstring of the variable). I applied your patch and set gnus-maximum-newsgroup to 30000. Here's the state of the Emacs process when doing that: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND sven 3042 0.3 4.3 56568 44776 ? R 15:01 0:54 emacs Then I browsed news.motzarella.org and opened comp.os.linux.misc, fetching the headers, and Emacs' state was USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND sven 3042 0.4 4.7 60432 49188 ? S 15:01 1:03 emacs That seems to be OK, I think. Cheers, Sven ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-08-13 17:30 ` Sven Joachim @ 2007-08-14 11:46 ` Katsumi Yamaoka 2007-09-13 10:27 ` Katsumi Yamaoka 0 siblings, 1 reply; 19+ messages in thread From: Katsumi Yamaoka @ 2007-08-14 11:46 UTC (permalink / raw) To: Sven Joachim; +Cc: bugs, ding >>>>> Sven Joachim wrote: > Katsumi Yamaoka <yamaoka@jpl.org> writes: >> (setq gnus-maximum-newsgroup 100000) > I applied your patch and set gnus-maximum-newsgroup to 30000. Here's > the state of the Emacs process when doing that: > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > sven 3042 0.3 4.3 56568 44776 ? R 15:01 0:54 emacs > Then I browsed news.motzarella.org and opened comp.os.linux.misc, > fetching the headers, and Emacs' state was > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > sven 3042 0.4 4.7 60432 49188 ? S 15:01 1:03 emacs > That seems to be OK, I think. I've installed this change in both the trunk and the v5-10 branch with an Info document. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-08-14 11:46 ` Katsumi Yamaoka @ 2007-09-13 10:27 ` Katsumi Yamaoka 0 siblings, 0 replies; 19+ messages in thread From: Katsumi Yamaoka @ 2007-09-13 10:27 UTC (permalink / raw) To: Sven Joachim; +Cc: bugs, ding >>>>> Katsumi Yamaoka wrote: >>> (setq gnus-maximum-newsgroup 100000) I've changed this variable name into `gnus-newsgroup-maximum-articles' by Leo's suggestion[1]. Sorry for inconvenience. [1] http://article.gmane.org/gmane.emacs.gnus.user/9649 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-08-10 9:08 ` Huge memory consumption on accessing large newsgroup Katsumi Yamaoka 2007-08-10 11:39 ` Katsumi Yamaoka @ 2007-08-10 12:42 ` Sven Joachim 2007-09-29 21:04 ` Gaute Strokkenes 1 sibling, 1 reply; 19+ messages in thread From: Sven Joachim @ 2007-08-10 12:42 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: bugs, ding Katsumi Yamaoka <yamaoka@jpl.org> writes: > I visited http://news.motzarella.org/ and got an account on > Motzarella out of curiosity. ;-) > >> When accessing comp.os.linux.misc on news.motzarella.org, a _very_ >> large newsgroup with more than 30,000,000 articles, Emacs' memory >> footprint grew heavily. > > Yes, now the ACTIVE of that group is: > > (gnus-active "nntp+motzarella:comp.os.linux.misc") > => (3437 . 30538699) > > However, I verified there are actually no more than less than > 3,000 articles. It should not be a cause of this problem. I had already suspected the number is bogus and that they do not really keep 30 million articles for several newsgroups. ;-) > The real cause is that Gnus first expands this ACTIVE data into: > > (3437 3438 3439 3440 ...... 30538696 30538697 30538698 30538699) Aha, that's not exactly a small list, I see. > BTW, I needed to set the server variable `nntp-authinfo-force' > to t in order to let No Gnus v0.7 send the AUTHINFO data to the > Motzarella server as follows: > > (nntp "motzarella" > (nntp-address "news.motzarella.org") > (nntp-authinfo-user "yamaoka") > (nntp-authinfo-password "********") > (nntp-authinfo-force t) > ...) > > I guess you have the FORCE element in the ~/.authinfo file since > such a server variable has not been implemented yet in Gnus v5.11. Yes, this is described in Motzarella's FAQ, you have to put a line machine news.motzarella.org login xxxxxx force yes password xxxxxx into ~/.authinfo. Sorry for not explaining that in my bug report. Cheers, Sven ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-08-10 12:42 ` Sven Joachim @ 2007-09-29 21:04 ` Gaute Strokkenes 2007-09-30 22:11 ` Ted Zlatanov 0 siblings, 1 reply; 19+ messages in thread From: Gaute Strokkenes @ 2007-09-29 21:04 UTC (permalink / raw) To: ding On 10 aug 2007, svenjoac@gmx.de wrote: > Katsumi Yamaoka <yamaoka@jpl.org> writes: > >> I visited http://news.motzarella.org/ and got an account on >> Motzarella out of curiosity. ;-) >> >>> When accessing comp.os.linux.misc on news.motzarella.org, a _very_ >>> large newsgroup with more than 30,000,000 articles, Emacs' memory >>> footprint grew heavily. >> >> Yes, now the ACTIVE of that group is: >> >> (gnus-active "nntp+motzarella:comp.os.linux.misc") >> => (3437 . 30538699) >> >> However, I verified there are actually no more than less than >> 3,000 articles. It should not be a cause of this problem. > > I had already suspected the number is bogus and that they do not > really keep 30 million articles for several newsgroups. ;-) > >> The real cause is that Gnus first expands this ACTIVE data into: >> >> (3437 3438 3439 3440 ...... 30538696 30538697 30538698 30538699) > > Aha, that's not exactly a small list, I see. Sorry for following up on an ancient thread, but: I wonder if it would be possible to make Gnus work solely with compressed ranges (i.e. lists where dotted pairs are used to represent runs of consecutive integers)? Unless there is some deeper reason why this cannot work, I might have a stab at it (eventually). -- Gaute Strokkenes ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-09-29 21:04 ` Gaute Strokkenes @ 2007-09-30 22:11 ` Ted Zlatanov 2007-10-01 0:29 ` Katsumi Yamaoka 2007-10-01 1:04 ` Daniel Pittman 0 siblings, 2 replies; 19+ messages in thread From: Ted Zlatanov @ 2007-09-30 22:11 UTC (permalink / raw) To: ding On Sat, 29 Sep 2007 22:04:18 +0100 Gaute Strokkenes <gs234@srcf.ucam.org> wrote: GS> I wonder if it would be possible to make Gnus work solely with GS> compressed ranges (i.e. lists where dotted pairs are used to represent GS> runs of consecutive integers)? GS> Unless there is some deeper reason why this cannot work, I might have a GS> stab at it (eventually). I think this would be a good idea. Consider using inversion lists. They don't require pairs of integers; each value represents a flip. They have other nice properties too, though it's been a while since I looked at them so I can't name them. I think Kim Storm posted a sample Lisp implementation of the data structure in emacs-devel a while ago; I can dig it up if you're interested. Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-09-30 22:11 ` Ted Zlatanov @ 2007-10-01 0:29 ` Katsumi Yamaoka 2007-10-01 1:04 ` Daniel Pittman 1 sibling, 0 replies; 19+ messages in thread From: Katsumi Yamaoka @ 2007-10-01 0:29 UTC (permalink / raw) To: ding >>>>> Ted Zlatanov wrote: > On Sat, 29 Sep 2007 22:04:18 +0100 Gaute Strokkenes <gs234@srcf.ucam.org> wrote: GS> I wonder if it would be possible to make Gnus work solely with GS> compressed ranges (i.e. lists where dotted pairs are used to represent GS> runs of consecutive integers)? GS> Unless there is some deeper reason why this cannot work, I might have a GS> stab at it (eventually). > I think this would be a good idea. > Consider using inversion lists. They don't require pairs of integers; > each value represents a flip. They have other nice properties too, > though it's been a while since I looked at them so I can't name them. I > think Kim Storm posted a sample Lisp implementation of the data > structure in emacs-devel a while ago; I can dig it up if you're > interested. I don't see what you are going to do but it's very nice if we can read the gnu.emacs.gnus newsgroup in news.motzarella.org[1] with no extra work. Please consider abolishing the variable: `gnus-newsgroup-maximum-articles' Regards, [1] http://article.gmane.org/gmane.emacs.gnus.user/9647 http://news.gmane.org/group/gmane.emacs.gnus.user/thread=9638 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-09-30 22:11 ` Ted Zlatanov 2007-10-01 0:29 ` Katsumi Yamaoka @ 2007-10-01 1:04 ` Daniel Pittman 2007-10-02 2:13 ` Ted Zlatanov 1 sibling, 1 reply; 19+ messages in thread From: Daniel Pittman @ 2007-10-01 1:04 UTC (permalink / raw) To: ding Ted Zlatanov <tzz@lifelogs.com> writes: > On Sat, 29 Sep 2007 22:04:18 +0100 Gaute Strokkenes <gs234@srcf.ucam.org> wrote: > > GS> I wonder if it would be possible to make Gnus work solely with > GS> compressed ranges (i.e. lists where dotted pairs are used to > GS> represent runs of consecutive integers)? > > GS> Unless there is some deeper reason why this cannot work, I might > GS> have a stab at it (eventually). > > I think this would be a good idea. Ah. > Consider using inversion lists. This is almost certainly unnecessary, not to mention that it would involve building an entire parallel infrastructure to handle them. The nnimap code had a similar performance killing "feature" where it would expand two 'range' lists completely, intersect them, then compress them again. This was trivially resolved by using the existing code from `gnus-range.el' to process this on the compressed versions. You should be able to find the appropriate bit of history tucked away in the history of the nnimap.el code via CVS. I am also happy to try and dig up my memories of the work though, frankly, they were trivial enough. The one specific advice I would give you: Write your code so it is "self testing" and run with that for a long while. I ended up having the code do the gnus-range based calculations and compare them to the non-range calculations, then signal an error if they disagreed. This cost extra CPU time for the couple of weeks I used it in production but gave me (and, I think, the rest of the list) a much higher sense of security that the changes were, in practice, correct. (You might want to leave that in with a debug option to turn it on so that the rest of the Gnus CVS userbase also test this, to catch faults that your own use doesn't show. I didn't do that and I vaguely regret it now.) > They don't require pairs of integers; each value represents a flip. > They have other nice properties too, though it's been a while since I > looked at them so I can't name them. Mmm. The existing Gnus range code is pretty much as efficient, has lower computational complexity for the operations Gnus uses and is already existent and tested. I don't think that you would see sufficient benefit from introducing the additional data type to justify spending your time on it -- but since I am not volunteering to do the work I can't tell you how to do it. ;) Regards, Daniel -- Daniel Pittman <daniel@cybersource.com.au> Phone: 03 9621 2377 Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au Cybersource: Australia's Leading Linux and Open Source Solutions Company ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-10-01 1:04 ` Daniel Pittman @ 2007-10-02 2:13 ` Ted Zlatanov 2007-10-02 3:23 ` Daniel Pittman 0 siblings, 1 reply; 19+ messages in thread From: Ted Zlatanov @ 2007-10-02 2:13 UTC (permalink / raw) To: Daniel Pittman; +Cc: ding On Mon, 01 Oct 2007 11:04:54 +1000 Daniel Pittman <daniel@rimspace.net> wrote: DP> Ted Zlatanov <tzz@lifelogs.com> writes: >> Consider using inversion lists. >> They don't require pairs of integers; each value represents a flip. >> They have other nice properties too, though it's been a while since I >> looked at them so I can't name them. DP> Mmm. The existing Gnus range code is pretty much as efficient, has DP> lower computational complexity for the operations Gnus uses and is DP> already existent and tested. We're talking about reducing memory consumption, not CPU usage. For CPU usage the current structure is pretty good. For memory usage, as you can see from the subject of the message, it's not ideal. We could use several data structures as needed to accomodate large data lists. Inversion lists are nice in some cases and not others; I think they are worth at least some consideration since they don't require a pair of numbers to express a range and thus may save some memory. I don't know the Emacs internals intimately, but I think (24 86 88 90) takes up less memory than ((24 . 86) (88 . 90)). Am I wrong? Note Gaute's original message was about compressing pairs of integers, not general Gnus data structures, so please consider my message in that context. DP> I don't think that you would see sufficient benefit from introducing the DP> additional data type to justify spending your time on it -- but since I DP> am not volunteering to do the work I can't tell you how to do it. ;) I am not sure if you are addressing me or Gaute. I made a suggestion, and an implementation is already in place (I mentioned Kim Storm implemented it). I'm willing to assist Gaute, but he has to be interested in my suggestion. I don't have the time for this as a solo project. Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-10-02 2:13 ` Ted Zlatanov @ 2007-10-02 3:23 ` Daniel Pittman 2007-10-02 11:11 ` Ted Zlatanov 2007-10-02 13:33 ` Daniel Pittman 0 siblings, 2 replies; 19+ messages in thread From: Daniel Pittman @ 2007-10-02 3:23 UTC (permalink / raw) To: ding Ted Zlatanov <tzz@lifelogs.com> writes: > On Mon, 01 Oct 2007 11:04:54 +1000 Daniel Pittman <daniel@rimspace.net> wrote: > DP> Ted Zlatanov <tzz@lifelogs.com> writes: > >>> Consider using inversion lists. > >>> They don't require pairs of integers; each value represents a flip. >>> They have other nice properties too, though it's been a while since >>> I looked at them so I can't name them. > > DP> Mmm. The existing Gnus range code is pretty much as efficient, > DP> has lower computational complexity for the operations Gnus uses > DP> and is already existent and tested. > > We're talking about reducing memory consumption, not CPU usage. For > CPU usage the current structure is pretty good. For memory usage, as > you can see from the subject of the message, it's not ideal. Ah. I mis-expressed myself. The issue with large memory consumption, as far as I could see from the thread, was that the compressed range data type was expanded to a flat list. This caused, no surprise, huge memory use. The original problem is that the code calls `gnus-uncompress-range' on the data *at all* -- and, so, turns a nicely brief data structure into a vast bloated million-number list. The *solution* is to rebuild the algorithm to operate on the compressed version (regardless of the internal representation), not to change the representation. In other words: the problem is that something calls `gnus-uncompress-range', not the format of the data passed to that call. The "efficient" part referred to both memory and CPU, and the lower computational complexity is a consideration only after the memory load is addressed -- but we wouldn't want to turn huge memory use into hours of CPU time to enter the group. ;) > We could use several data structures as needed to accomodate large > data lists. Regarding the original problem: we already /have/ a representation of large data lists and, in fact, the problem occurs related to them: (gnus-uncompress-range '(3437 . 30538699)) That second bit is the compressed range data. The function, which causes the memory bloat, turns that into a list that contains every integer between those two values... Using inversion lists rather than the Gnus range data here would have no effect other than to (a) convert the Gnus range data into an inversion list and (b) uncompress the inversion list with the same memory use. We need to remove the code that deals with uncompressed values rather than to change the representation of the (already memory efficient) compressed data format. :) > Inversion lists are nice in some cases and not others; I think they > are worth at least some consideration since they don't require a pair > of numbers to express a range and thus may save some memory. They do, however, use a pair of numbers to express a single stand-alone number in the sequence. > I don't know the Emacs internals intimately, but I think (24 86 88 90) > takes up less memory than ((24 . 86) (88 . 90)). Am I wrong? No, you are correct. However, if the sequence was: 24, 26, 28, 30-300 the output would be: (24 25 26 27 28 29 30 300) (24 26 26 (30 . 300)) Inversion lists are more efficient for long runs of range numbers by one additional cons cell, but less efficient by one cons cell for each stand-alone number that is not part of a range. > Note Gaute's original message was about compressing pairs of integers, > not general Gnus data structures, so please consider my message in > that context. That seems reasonable, and in general I agree: inversion lists are a nice data structure, a (presumably) good implementation (with tests) already exists and the cost of adding this new data type isn't /that/ high -- on those terms. On the other hand the Gnus range code already exists, is well tested (if not with stand-alone tests) and is the data structure used elsewhere in the Gnus code. The Gnus range code is also more efficient if stand alone outlier values are more common than ranges, while inversion lists are more efficient if they are less common. The end problem, though, is that the compressed representation is expanded to a flat list by the code. That makes the exact compressed data structure less important than finding /any/ solution that doesn't expand it. Oh, and using inversion lists here would require converting the existing compressed data format into an inversion list and then operating on that. That is ... not really helpful. :) > DP> I don't think that you would see sufficient benefit from > DP> introducing the additional data type to justify spending your time > DP> on it -- but since I am not volunteering to do the work I can't > DP> tell you how to do it. ;) > > I am not sure if you are addressing me or Gaute. I made a suggestion, > and an implementation is already in place (I mentioned Kim Storm > implemented it). I'm willing to assist Gaute, but he has to be > interested in my suggestion. I don't have the time for this as a solo > project. I was addressing both of you -- I can express an opinion, and while I feel qualified to comment on this since I have both worked on this specific problem before[1] and addressed this conversation before[2]. Since I am not actually writing the code, though, if Gaute or yourself feel like writing some Lisp that uses inversion lists I can't actually stop you. ;) I would strongly advise, however, that you are much better off ignoring inversion lists here and focusing on converting the existing code to use the compressed `gnus-range' data structure rather than expanding to a flat list. Also of note: the proposed "solution" of narrowing the range is not going to help much; fixing the code is the right answer. Regards, Daniel (And because this has been a stupidly annoying couple of week in other areas, and because this is nice simple and essentially stress-free work I am getting tempted to fix it myself. So, maybe inversion lists were the way to get the code fixed after all, if not quite so directly as expected. ;) Footnotes: [1] The nnimap code did, as far as I can tell, exactly what the code causing trouble here dose. [2] Inversion lists were suggested at the time, as I recall, though I don't have the enthusiasm to dig in the archive and find out who suggested them. -- Daniel Pittman <daniel@cybersource.com.au> Phone: 03 9621 2377 Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au Cybersource: Australia's Leading Linux and Open Source Solutions Company ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-10-02 3:23 ` Daniel Pittman @ 2007-10-02 11:11 ` Ted Zlatanov 2007-10-02 12:17 ` Daniel Pittman 2007-10-02 13:33 ` Daniel Pittman 1 sibling, 1 reply; 19+ messages in thread From: Ted Zlatanov @ 2007-10-02 11:11 UTC (permalink / raw) To: Daniel Pittman; +Cc: ding On Tue, 02 Oct 2007 13:23:56 +1000 Daniel Pittman <daniel@rimspace.net> wrote: DP> The issue with large memory consumption, as far as I could see from the DP> thread, was that the compressed range data type was expanded to a flat DP> list. This caused, no surprise, huge memory use. DP> The original problem is that the code calls `gnus-uncompress-range' on DP> the data *at all* -- and, so, turns a nicely brief data structure into a DP> vast bloated million-number list. DP> The *solution* is to rebuild the algorithm to operate on the compressed DP> version (regardless of the internal representation), not to change the DP> representation. You're right, I didn't know this. I thought the memory problems were caused by the original list. In light of your explanation, I completely agree that the right way is to find all instances of `gnus-uncompress-range' and fix their consequences. Thank you for explaining. DP> (And because this has been a stupidly annoying couple of week in other DP> areas, and because this is nice simple and essentially stress-free work DP> I am getting tempted to fix it myself. DP> So, maybe inversion lists were the way to get the code fixed after all, DP> if not quite so directly as expected. ;) The `gnus-uncompress-range' function is not called in too many places: gnus-agent.el:7 gnus-draft.el:1 gnus-group.el:2 gnus-move.el:4 gnus-nocem.el:1 gnus-range.el:2 gnus-start.el:2 gnus-sum.el:9 nnimap.el:1 nnmaildir.el:3 nnsoup.el:1 nnvirtual.el:1 In gnus-number-of-unseen-articles-in-group for example it's called only to find the length of the list, and in gnus-move-group-to-server only to see if the list is not nil. There's room for improvement. Maybe there should be a group API that abstracts the data structures, so subordinate code doesn't have to know if it's a list or a compressed range? That would be a good first step to cleaning things up. Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-10-02 11:11 ` Ted Zlatanov @ 2007-10-02 12:17 ` Daniel Pittman 2007-10-02 16:08 ` Ted Zlatanov 0 siblings, 1 reply; 19+ messages in thread From: Daniel Pittman @ 2007-10-02 12:17 UTC (permalink / raw) To: ding Ted Zlatanov <tzz@lifelogs.com> writes: > On Tue, 02 Oct 2007 13:23:56 +1000 Daniel Pittman <daniel@rimspace.net> wrote: > > DP> The issue with large memory consumption, as far as I could see > DP> from the thread, was that the compressed range data type was > DP> expanded to a flat list. This caused, no surprise, huge memory > DP> use. > > DP> The original problem is that the code calls > DP> `gnus-uncompress-range' on the data *at all* -- and, so, turns a > DP> nicely brief data structure into a vast bloated million-number > DP> list. > > DP> The *solution* is to rebuild the algorithm to operate on the > DP> compressed version (regardless of the internal representation), > DP> not to change the representation. > > You're right, I didn't know this. I thought the memory problems were > caused by the original list. *nod* I figured as much. :) [...] > DP> (And because this has been a stupidly annoying couple of week in other > DP> areas, and because this is nice simple and essentially stress-free work > DP> I am getting tempted to fix it myself. > > DP> So, maybe inversion lists were the way to get the code fixed after all, > DP> if not quite so directly as expected. ;) > > The `gnus-uncompress-range' function is not called in too many places: [...] > In gnus-number-of-unseen-articles-in-group for example it's called > only to find the length of the list, and in gnus-move-group-to-server > only to see if the list is not nil. Mmm. The not nil version should really be killed, as a guide, and the length isn't too awful. The length version can just call `gnus-range-length' and be done with it... > There's room for improvement. Maybe there should be a group API that > abstracts the data structures, so subordinate code doesn't have to > know if it's a list or a compressed range? That would be a good first > step to cleaning things up. Maybe. I think in most cases it is just that the range code wasn't pushed everywhere it could be because, in most cases, you never /really/ see the pain. Regards, Daniel -- Daniel Pittman <daniel@cybersource.com.au> Phone: 03 9621 2377 Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au Cybersource: Australia's Leading Linux and Open Source Solutions Company ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-10-02 12:17 ` Daniel Pittman @ 2007-10-02 16:08 ` Ted Zlatanov 2007-10-03 0:19 ` Daniel Pittman 0 siblings, 1 reply; 19+ messages in thread From: Ted Zlatanov @ 2007-10-02 16:08 UTC (permalink / raw) To: Daniel Pittman; +Cc: ding On Tue, 02 Oct 2007 22:17:46 +1000 Daniel Pittman <daniel@rimspace.net> wrote: DP> Ted Zlatanov <tzz@lifelogs.com> writes: >> The `gnus-uncompress-range' function is not called in too many places: >> In gnus-number-of-unseen-articles-in-group for example it's called >> only to find the length of the list, and in gnus-move-group-to-server >> only to see if the list is not nil. DP> Mmm. The not nil version should really be killed, as a guide, and the DP> length isn't too awful. The length version can just call DP> `gnus-range-length' and be done with it... Those were just two examples I saw right away. Will you or Gaute do the whole cleanup, or should it be an open task for the Gnus team (I'll get to it when I can, or someone else will)? Ted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-10-02 16:08 ` Ted Zlatanov @ 2007-10-03 0:19 ` Daniel Pittman 0 siblings, 0 replies; 19+ messages in thread From: Daniel Pittman @ 2007-10-03 0:19 UTC (permalink / raw) To: ding Ted Zlatanov <tzz@lifelogs.com> writes: > On Tue, 02 Oct 2007 22:17:46 +1000 Daniel Pittman <daniel@rimspace.net> wrote: > DP> Ted Zlatanov <tzz@lifelogs.com> writes: > >>> The `gnus-uncompress-range' function is not called in too many >>> places: > >>> In gnus-number-of-unseen-articles-in-group for example it's called >>> only to find the length of the list, and in >>> gnus-move-group-to-server only to see if the list is not nil. > > DP> Mmm. The not nil version should really be killed, as a guide, and > DP> the length isn't too awful. The length version can just call > DP> `gnus-range-length' and be done with it... > > Those were just two examples I saw right away. Will you or Gaute do > the whole cleanup, or should it be an open task for the Gnus team > (I'll get to it when I can, or someone else will)? Open the task, I suggest, unless Gaute is willing to commit to finishing the job. I ... have more than enough on my plate with having to move offices, home and deal with a few other life issues that I can't promise to finish the job. (Plus, having done all the easy bits the rest is actually vaguely hard.) Daniel -- Daniel Pittman <daniel@cybersource.com.au> Phone: 03 9621 2377 Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au Cybersource: Australia's Leading Linux and Open Source Solutions Company ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Huge memory consumption on accessing large newsgroup 2007-10-02 3:23 ` Daniel Pittman 2007-10-02 11:11 ` Ted Zlatanov @ 2007-10-02 13:33 ` Daniel Pittman 1 sibling, 0 replies; 19+ messages in thread From: Daniel Pittman @ 2007-10-02 13:33 UTC (permalink / raw) To: ding [-- Attachment #1: Type: text/plain, Size: 1051 bytes --] Daniel Pittman <daniel@rimspace.net> writes: [...] > (And because this has been a stupidly annoying couple of week in other > areas, and because this is nice simple and essentially stress-free work > I am getting tempted to fix it myself. > > So, maybe inversion lists were the way to get the code fixed after all, > if not quite so directly as expected. ;) ...and that turns out to be true. Here is a patch against current CVS that addresses the low hanging fruit in the area. I will tackle the more complex examples some time soon, or someone else can, and we will see where we go next. This has only been very briefly looked at and isn't thoroughly tested at this stage. It should all be correct but could really do with a third party review for correctness before committing to CVS or anything. Daniel -- Daniel Pittman <daniel@cybersource.com.au> Phone: 03 9621 2377 Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au Cybersource: Australia's Leading Linux and Open Source Solutions Company [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: keep-ranges-compressed-1.patch --] [-- Type: text/x-diff, Size: 5866 bytes --] ? keep-ranges-compressed-1.patch Index: lisp/gnus-draft.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/gnus-draft.el,v retrieving revision 7.16 diff -u -u -r7.16 gnus-draft.el --- lisp/gnus-draft.el 24 Mar 2007 19:50:00 -0000 7.16 +++ lisp/gnus-draft.el 2 Oct 2007 13:32:14 -0000 @@ -212,15 +212,14 @@ (gnus-activate-group "nndraft:queue") (save-excursion (let* ((articles (nndraft-articles)) - (unsendable (gnus-uncompress-range - (cdr (assq 'unsend - (gnus-info-marks - (gnus-get-info "nndraft:queue")))))) + (unsendable (cdr (assq 'unsend + (gnus-info-marks + (gnus-get-info "nndraft:queue")))))) (gnus-posting-styles nil) (total (length articles)) article) (while (setq article (pop articles)) - (unless (memq article unsendable) + (unless (gnus-member-of-range article unsendable) (let ((message-sending-message (format "Sending message %d of %d..." (- total (length articles)) total))) Index: lisp/gnus-group.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/gnus-group.el,v retrieving revision 7.97 diff -u -u -r7.97 gnus-group.el --- lisp/gnus-group.el 3 Aug 2007 06:20:58 -0000 7.97 +++ lisp/gnus-group.el 2 Oct 2007 13:32:16 -0000 @@ -1496,10 +1496,10 @@ (active (gnus-active group))) (if (not active) 0 - (length (gnus-uncompress-range - (gnus-range-difference - (gnus-range-difference (list active) (gnus-info-read info)) - seen)))))) + (gnus-range-length + (gnus-range-difference + (gnus-range-difference (list active) (gnus-info-read info)) + seen))))) ;; Moving through the Group buffer (in topic mode) e.g. with C-n doesn't ;; update the state (enabled/disabled) of the icon `gnus-group-describe-group' @@ -4406,9 +4406,8 @@ (setcar (nthcdr 3 info) (gnus-delete-alist type (car marked))) (setcdr m (gnus-compress-sequence articles t))) - (setcdr m (gnus-compress-sequence - (sort (nconc (gnus-uncompress-range (cdr m)) - (copy-sequence articles)) '<) t)))))) + (setcdr m (gnus-range-add (cdr m) + (sort (copy-sequence articles) '<))))))) (defun gnus-add-mark (group mark article) "Mark ARTICLE in GROUP with MARK, whether the group is displayed or not." Index: lisp/gnus-move.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/gnus-move.el,v retrieving revision 7.9 diff -u -u -r7.9 gnus-move.el --- lisp/gnus-move.el 24 Jan 2007 07:15:37 -0000 7.9 +++ lisp/gnus-move.el 2 Oct 2007 13:32:18 -0000 @@ -91,8 +91,8 @@ ;; Then we read the headers from the `from-server'. (when (and (gnus-request-group group nil from-server) (gnus-active group) - (gnus-uncompress-range - (gnus-active group)) + ;; Should this simply test for an empty or nil range? + (gnus-range-length (gnus-active group)) (setq type (gnus-retrieve-headers (gnus-uncompress-range (gnus-active group)) Index: lisp/gnus-start.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/gnus-start.el,v retrieving revision 7.54 diff -u -u -r7.54 gnus-start.el --- lisp/gnus-start.el 7 Sep 2007 02:52:27 -0000 7.54 +++ lisp/gnus-start.el 2 Oct 2007 13:32:18 -0000 @@ -2376,8 +2376,7 @@ info (gnus-add-to-range (gnus-info-read info) - (nconc (gnus-uncompress-range dormant) - (gnus-uncompress-range ticked))))))))) + (gnus-range-add dormant ticked)))))))) (defun gnus-load (file) "Load FILE, but in such a way that read errors can be reported." Index: lisp/gnus-sum.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/gnus-sum.el,v retrieving revision 7.201 diff -u -u -r7.201 gnus-sum.el --- lisp/gnus-sum.el 27 Sep 2007 21:22:18 -0000 7.201 +++ lisp/gnus-sum.el 2 Oct 2007 13:32:24 -0000 @@ -5747,6 +5747,8 @@ (setq articles (cdr articles))) out)) +;; REVISIT: Never used and, given the other memory use issues, probably best +;; if it stays that way. Should, I think, be removed. --dp, 2007-10-02 (defun gnus-uncompress-marks (marks) "Uncompress the mark ranges in MARKS." (let ((uncompressed '(score bookmark)) @@ -6765,6 +6767,8 @@ (gnus-list-range-difference (gnus-list-range-difference (gnus-sorted-complement + ;; REVISIT: This needs `gnus-range-complement' implemented + ;; and tested, then we can drop the expansion of lists here. (gnus-uncompress-range (if gnus-newsgroup-maximum-articles (cons (max (car active) @@ -6772,7 +6776,7 @@ gnus-newsgroup-maximum-articles -1)) (cdr active)) - active)) + active) (gnus-list-of-unread-articles group)) (cdr (assq 'dormant marked))) (cdr (assq 'tick marked)))))) Index: lisp/nnimap.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/nnimap.el,v retrieving revision 7.36 diff -u -u -r7.36 nnimap.el --- lisp/nnimap.el 17 Aug 2007 11:09:00 -0000 7.36 +++ lisp/nnimap.el 2 Oct 2007 13:32:27 -0000 @@ -634,14 +634,13 @@ (imap-search (concat "UID " (imap-range-to-message-set - (gnus-compress-sequence - (append (gnus-uncompress-sequence - (and fetch-old - (cons (if (numberp fetch-old) - (max 1 (- (car articles) fetch-old)) - 1) - (1- (car articles))))) - articles))))) + (gnus-range-add + (and fetch-old + (cons (if (numberp fetch-old) + (max 1 (- (car articles) fetch-old)) + 1) + (1- (car articles)))) + articles)))) (mapcar (lambda (msgid) (imap-search (format "HEADER Message-Id \"%s\"" msgid))) ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-10-03 0:19 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <87wsw4u21m.fsf@gmx.de> 2007-08-10 9:08 ` Huge memory consumption on accessing large newsgroup Katsumi Yamaoka 2007-08-10 11:39 ` Katsumi Yamaoka 2007-08-10 12:43 ` Sven Joachim 2007-08-13 11:44 ` Katsumi Yamaoka 2007-08-13 17:30 ` Sven Joachim 2007-08-14 11:46 ` Katsumi Yamaoka 2007-09-13 10:27 ` Katsumi Yamaoka 2007-08-10 12:42 ` Sven Joachim 2007-09-29 21:04 ` Gaute Strokkenes 2007-09-30 22:11 ` Ted Zlatanov 2007-10-01 0:29 ` Katsumi Yamaoka 2007-10-01 1:04 ` Daniel Pittman 2007-10-02 2:13 ` Ted Zlatanov 2007-10-02 3:23 ` Daniel Pittman 2007-10-02 11:11 ` Ted Zlatanov 2007-10-02 12:17 ` Daniel Pittman 2007-10-02 16:08 ` Ted Zlatanov 2007-10-03 0:19 ` Daniel Pittman 2007-10-02 13:33 ` Daniel Pittman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).