Gnus development mailing list
 help / color / mirror / Atom feed
* Ranges
@ 2004-01-05  4:45 Lars Magne Ingebrigtsen
  2004-01-05  6:49 ` Ranges Zack Weinberg
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Lars Magne Ingebrigtsen @ 2004-01-05  4:45 UTC (permalink / raw)


I've been thinking about the range thing -- using ranges instead of
lists whenever talking about articles.  And I've got some ideas
cooking about the C implementation of those, and I think it could
possibly be quite groovy.  (A new data type; super-hyper-fast
`range-memq', etc.)

But, on the other hand -- what's the impact, really?  Where does
using lists instead of ranges really represent a noticeable
performance drag?  Ok, when entering a group with 1000 articles, you
get a list that's 1000 elements long, but does creating that list
really make any noticeable difference?

I'd like to see some scenarios where this matters before going down
this road.
 
-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05  4:45 Ranges Lars Magne Ingebrigtsen
@ 2004-01-05  6:49 ` Zack Weinberg
  2004-01-05 10:57   ` Ranges Per Abrahamsen
  2004-01-05  7:13 ` Ranges Steve Youngs
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Zack Weinberg @ 2004-01-05  6:49 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> But, on the other hand -- what's the impact, really?  Where does
> using lists instead of ranges really represent a noticeable
> performance drag?  Ok, when entering a group with 1000 articles, you
> get a list that's 1000 elements long, but does creating that list
> really make any noticeable difference?

I can't speak directly to the question you're posing, but I'd like to
point out that groups may get a *lot* bigger than that.  If Gnus (and
Courier imapd, but never mind) were efficient enough I would have an
IMAP group with 99,000 messages in it.  They aren't, so I split it up
by month, and some of them still have order of 10,000 messages.

zw



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05  4:45 Ranges Lars Magne Ingebrigtsen
  2004-01-05  6:49 ` Ranges Zack Weinberg
@ 2004-01-05  7:13 ` Steve Youngs
  2004-01-05  7:51   ` Ranges Lars Magne Ingebrigtsen
  2004-01-05 11:01 ` Ranges Kim Minh Kaplan
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Steve Youngs @ 2004-01-05  7:13 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 901 bytes --]

|--==> "LMI" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

  LMI> I've been thinking about the range thing -- using ranges instead
  LMI> of lists whenever talking about articles.  And I've got some
  LMI> ideas cooking about the C implementation of those

What worries me is how it will impact on the users.  Will we have a
situation where people won't be able to use Gnus simply because they
don't want to use a bleeding edge (X)Emacs?

And what about XEmacs?  You'd be pushing a big pile of poo up a very
steep hill to get new features into 21.4, and development on 21.5 is
practically non-existent.  Would XEmacs users just get left behind?

-- 
|---<Steve Youngs>---------------<GnuPG KeyID: A94B3003>---|
|              Ashes to ashes, dust to dust.               |
|      The proof of the pudding, is under the crust.       |
|------------------------------<sryoungs@bigpond.net.au>---|

[-- Attachment #2: Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05  7:13 ` Ranges Steve Youngs
@ 2004-01-05  7:51   ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 24+ messages in thread
From: Lars Magne Ingebrigtsen @ 2004-01-05  7:51 UTC (permalink / raw)


Steve Youngs <sryoungs@bigpond.net.au> writes:

> What worries me is how it will impact on the users.  Will we have a
> situation where people won't be able to use Gnus simply because they
> don't want to use a bleeding edge (X)Emacs?

No, there'd be compatibility functions in elisp.  One worry is
whether they'll be majorly slow, but they shouldn't be...  Most
ranges would be so much shorter than their list equivalents that a
Lisp-based `range-memq' shouldn't be all that much slower than a
C `memq'. 

> And what about XEmacs?  You'd be pushing a big pile of poo up a very
> steep hill to get new features into 21.4, and development on 21.5 is
> practically non-existent.  Would XEmacs users just get left behind?

Nah.  Perhaps this'll prod 21.5 into existence.  :-)

But this is all kinda premature.  First we have to know whether
there's anything to gain by going to ranges.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05  6:49 ` Ranges Zack Weinberg
@ 2004-01-05 10:57   ` Per Abrahamsen
  0 siblings, 0 replies; 24+ messages in thread
From: Per Abrahamsen @ 2004-01-05 10:57 UTC (permalink / raw)


"Zack Weinberg" <zack@codesourcery.com> writes:

> If Gnus (and
> Courier imapd, but never mind) were efficient enough I would have an
> IMAP group with 99,000 messages in it.  They aren't, so I split it up
> by month, and some of them still have order of 10,000 messages.

Me too, Gnus become unusable slow for groups with more than 10000
messages.  However, I haven't profiled it, so I don't know if ranges
are the bottleneck.  

And yes, 10000 messages groups aren't than uncommon.  Both for mailing
lists I subscribe to (and don't expire), for GMANE archives, and some
of the more active Usenet groups.  I occasionally want to read an
entire such group into a Gnus summary either to search for some
specific information, or to get a "feel" of the community before
posting.

Oh, and the "automatically detected spam" group grow to that size, if
I forget the monthly "false positives" check.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05  4:45 Ranges Lars Magne Ingebrigtsen
  2004-01-05  6:49 ` Ranges Zack Weinberg
  2004-01-05  7:13 ` Ranges Steve Youngs
@ 2004-01-05 11:01 ` Kim Minh Kaplan
  2004-01-05 11:19   ` Ranges Lars Magne Ingebrigtsen
  2004-01-05 19:01 ` Ranges Ted Zlatanov
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Kim Minh Kaplan @ 2004-01-05 11:01 UTC (permalink / raw)


Lars Magne Ingebrigtsen writes:

> I've been thinking about the range thing -- using ranges instead of
> lists whenever talking about articles [...]
>
> But, on the other hand -- what's the impact, really?  Where does
> using lists instead of ranges really represent a noticeable
> performance drag?  Ok, when entering a group with 1000 articles, you
> get a list that's 1000 elements long, but does creating that list
> really make any noticeable difference?
>
> I'd like to see some scenarios where this matters before going down
> this road.

Some ranges can be much much bigger, especially when using nnimap
which currently uses UIDs as article numbers.  Here the article
numbers (UID really) range from 10216 to 80802 but there are less than
1000 mails in the folder...

Kim Minh.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05 11:01 ` Ranges Kim Minh Kaplan
@ 2004-01-05 11:19   ` Lars Magne Ingebrigtsen
  2004-01-06  8:44     ` Ranges Kim Minh Kaplan
  0 siblings, 1 reply; 24+ messages in thread
From: Lars Magne Ingebrigtsen @ 2004-01-05 11:19 UTC (permalink / raw)


Kim Minh Kaplan <kmkaplan@selfoffice.com> writes:

> Some ranges can be much much bigger, especially when using nnimap
> which currently uses UIDs as article numbers.  Here the article
> numbers (UID really) range from 10216 to 80802 but there are less than
> 1000 mails in the folder...

And does Gnus unpack these ranges?

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05  4:45 Ranges Lars Magne Ingebrigtsen
                   ` (2 preceding siblings ...)
  2004-01-05 11:01 ` Ranges Kim Minh Kaplan
@ 2004-01-05 19:01 ` Ted Zlatanov
  2004-01-06  4:58   ` Ranges Lars Magne Ingebrigtsen
  2004-01-05 21:39 ` Ranges Kai Grossjohann
  2004-01-10 16:13 ` Ranges Gaute B Strokkenes
  5 siblings, 1 reply; 24+ messages in thread
From: Ted Zlatanov @ 2004-01-05 19:01 UTC (permalink / raw)


On Mon, 05 Jan 2004, larsi@gnus.org wrote:

> I've been thinking about the range thing -- using ranges instead of
> lists whenever talking about articles.  And I've got some ideas
> cooking about the C implementation of those, and I think it could
> possibly be quite groovy.  (A new data type; super-hyper-fast
> `range-memq', etc.)

You may like inversion lists for this.  They may even be faster in
Lisp than the built-in ELisp lists, for the average case.

http://www-106.ibm.com/developerworks/linux/library/l-cpinv.html

Ted



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05  4:45 Ranges Lars Magne Ingebrigtsen
                   ` (3 preceding siblings ...)
  2004-01-05 19:01 ` Ranges Ted Zlatanov
@ 2004-01-05 21:39 ` Kai Grossjohann
  2004-01-06  5:00   ` Ranges Lars Magne Ingebrigtsen
  2004-01-10 16:13 ` Ranges Gaute B Strokkenes
  5 siblings, 1 reply; 24+ messages in thread
From: Kai Grossjohann @ 2004-01-05 21:39 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

> But, on the other hand -- what's the impact, really?  Where does
> using lists instead of ranges really represent a noticeable
> performance drag?  Ok, when entering a group with 1000 articles, you
> get a list that's 1000 elements long, but does creating that list
> really make any noticeable difference?

In some cases, ranges are a problem: some servers have article numbers
that overflow the integer range, and some IMAP servers are extremely
susceptible to this problem.

Do we implement bignum support in Emacs Lisp to work around this, or
is it better to use something other than integers-for-article-numbers
to identify articles?

There is stuff to be gained by using message ids...

What do people think?

Kai



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05 19:01 ` Ranges Ted Zlatanov
@ 2004-01-06  4:58   ` Lars Magne Ingebrigtsen
  2004-01-06  5:26     ` Ranges Paul Jarc
  0 siblings, 1 reply; 24+ messages in thread
From: Lars Magne Ingebrigtsen @ 2004-01-06  4:58 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> You may like inversion lists for this.  They may even be faster in
> Lisp than the built-in ELisp lists, for the average case.

Interesting, but I don't quite see how they would apply here...

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05 21:39 ` Ranges Kai Grossjohann
@ 2004-01-06  5:00   ` Lars Magne Ingebrigtsen
  2004-01-06  6:34     ` Ranges Steve Youngs
  0 siblings, 1 reply; 24+ messages in thread
From: Lars Magne Ingebrigtsen @ 2004-01-06  5:00 UTC (permalink / raw)


Kai Grossjohann <kai@emptydomain.de> writes:

> In some cases, ranges are a problem: some servers have article numbers
> that overflow the integer range, and some IMAP servers are extremely
> susceptible to this problem.

Well, that's not a problem with lists/ranges, but with Emacs integers.

> Do we implement bignum support in Emacs Lisp to work around this, or
> is it better to use something other than integers-for-article-numbers
> to identify articles?

I'd *love* to have bignum support in Emacs.  Has anybody looked at
doing this?

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-06  4:58   ` Ranges Lars Magne Ingebrigtsen
@ 2004-01-06  5:26     ` Paul Jarc
  2004-01-06  6:44       ` Ranges Ted Zlatanov
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Jarc @ 2004-01-06  5:26 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
>> You may like inversion lists for this.  They may even be faster in
>> Lisp than the built-in ELisp lists, for the average case.
>
> Interesting, but I don't quite see how they would apply here...

AIUI, It would be another way to represent a list of article numbers.
For example, if the last article number in a group is 8 (or maybe even
if not):
Inversion list   Range list    Full list
(2 5 7 8)        ((2 . 4) 7)   (2 3 4 7)

I think an inversion list would use 1-2 times as many conses as the
corresponding range list.


paul



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-06  5:00   ` Ranges Lars Magne Ingebrigtsen
@ 2004-01-06  6:34     ` Steve Youngs
  0 siblings, 0 replies; 24+ messages in thread
From: Steve Youngs @ 2004-01-06  6:34 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 515 bytes --]

|--==> "LMI" == Lars Magne Ingebrigtsen <larsi@gnus.org> writes:

  LMI> I'd *love* to have bignum support in Emacs.  Has anybody looked
  LMI> at doing this?

Jerry James <james@xemacs.org> has plans for adding it to XEmacs.  No
idea how far he has gotten, though.

-- 
|---<Steve Youngs>---------------<GnuPG KeyID: A94B3003>---|
|              Ashes to ashes, dust to dust.               |
|      The proof of the pudding, is under the crust.       |
|------------------------------<sryoungs@bigpond.net.au>---|

[-- Attachment #2: Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-06  5:26     ` Ranges Paul Jarc
@ 2004-01-06  6:44       ` Ted Zlatanov
  2004-01-07  2:35         ` Ranges Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 24+ messages in thread
From: Ted Zlatanov @ 2004-01-06  6:44 UTC (permalink / raw)


On Tue, 06 Jan 2004, prj@po.cwru.edu wrote:

> AIUI, It would be another way to represent a list of article
> numbers.  For example, if the last article number in a group is 8
> (or maybe even if not): 

> Inversion list Range list  Full list 
> (2 5 7 8)      ((2 . 4) 7) (2 3 4 7)

Right.  Inversion lists are as compact as ranges, but have some nicer
features - insertion and deletion of items from the inversion list is,
I think, simpler and faster code than ranges.  There's no
consolidation or splitting of ranges needed - you just eliminate any
duplicate numbers if they happen after an insertion or a deletion.
Also, for a range you have to examine the beginning and the end of the
range if you're searching.  With an inversion list you know it's
sorted, so you can step through the list in linear time (or O(log2(n))
if you can access elements randomly, but I don't think that's possible
with the regular single-linked lists in Emacs).

> I think an inversion list would use 1-2 times as many conses as the
> corresponding range list.

Also, more garbage collection would probably happen when the list gets
rebuilt.  It may not be the optimal solution, but I think some
benchmarking would be good.  I'll look at gnus-range.el, time
permitting.

Ted



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05 11:19   ` Ranges Lars Magne Ingebrigtsen
@ 2004-01-06  8:44     ` Kim Minh Kaplan
  0 siblings, 0 replies; 24+ messages in thread
From: Kim Minh Kaplan @ 2004-01-06  8:44 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi-smP1P7uqpqc@public.gmane.org> writes:

> Kim Minh Kaplan <kmkaplan-AwwS6Bc0PDVoiYX5Tdu9fQ@public.gmane.org> writes:
>
>> Some ranges can be much much bigger, especially when using nnimap
>> which currently uses UIDs as article numbers.
>
> And does Gnus unpack these ranges?

I can't be completly positive on this today as I use a modified¹
version of nnimap that hides UIDs from Gnus, but I seem to remember
that `nnimap-retrieve-headers' received *huge* uncompressed articles
lists and compressed it.  I think somewhere in
`nnimap-request-update-info-internal' it uncompressed some big ranges
to.  I used to never fetch old articles from some mailgroups with Gnus
due to these (waiting time was somewhere around 5 to 10 minutes).

There is also `nnimap-request-expire-articles' that compresses ranges
but I think this one is a non issue.

Kim Minh.

¹ I now use a nnimap that keeps UIDs as strings.  It solves this range
  performance problem and more importantly the problem that Gnus could
  not be used with some IMAP servers that have UIDs that overflow
  Emacs' integers.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-06  6:44       ` Ranges Ted Zlatanov
@ 2004-01-07  2:35         ` Lars Magne Ingebrigtsen
  2004-01-08 21:52           ` Ranges Ted Zlatanov
  2004-01-12 21:28           ` Ranges Paul Jarc
  0 siblings, 2 replies; 24+ messages in thread
From: Lars Magne Ingebrigtsen @ 2004-01-07  2:35 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

>> Inversion list Range list  Full list 
>> (2 5 7 8)      ((2 . 4) 7) (2 3 4 7)
>
> Right.  Inversion lists are as compact as ranges, but have some nicer
> features - insertion and deletion of items from the inversion list is,
> I think, simpler and faster code than ranges.

In this example, it's one cons cell more.  :-)

> There's no consolidation or splitting of ranges needed - you just
> eliminate any duplicate numbers if they happen after an insertion or
> a deletion.  Also, for a range you have to examine the beginning and
> the end of the range if you're searching.  With an inversion list
> you know it's sorted, so you can step through the list in linear
> time (or O(log2(n))

Ranges are also sorted, so the same holds true for them, really.  And
you have to split/consolidate inversions, too, don't you?  If you
remove 3 from the list above, you'll have to insert another cons cell
in the middle, there...

Inversion lists seem slightly less intuitive than ranges...

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-07  2:35         ` Ranges Lars Magne Ingebrigtsen
@ 2004-01-08 21:52           ` Ted Zlatanov
  2004-01-10 18:43             ` Ranges Dan Christensen
  2004-01-12 21:28           ` Ranges Paul Jarc
  1 sibling, 1 reply; 24+ messages in thread
From: Ted Zlatanov @ 2004-01-08 21:52 UTC (permalink / raw)


On Wed, 07 Jan 2004, larsi@gnus.org wrote:

> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>>> Inversion list Range list  Full list 
>>> (2 5 7 8)      ((2 . 4) 7) (2 3 4 7)
>>
>> Right.  Inversion lists are as compact as ranges, but have some
>> nicer features - insertion and deletion of items from the inversion
>> list is, I think, simpler and faster code than ranges.
> 
> In this example, it's one cons cell more.  :-)

For short ranges (less than 4 items per interval, I think), it's
better to use explicit article numbers than almost anything else.  The
example was not very good to show why inversion lists should be used.

I'll show the sequence in a more real-life situation, and see if you
think inversion lists may be useful.  I'll use a list of 1000 items
that can be on or off.

start with () = all items are off; below is the running state as 1 or
0; also I include the range representations

set items 5, 20, 80-100 on
(5 6 20 21 80 101)
 1100111000111000 

range: (5 20 21 (80 . 100))

set items 100-1000 on
(5 6 20 21 80 100 100 101 101 1001); consolidate to
 110011100011100001111000011110000

(5 6 20 21 80 1001)
 11001110001110000

range: (5 20 21 (80 . 1000))

set items 21-79 on
(5 6 20 21 21 80 80 1001); consolidate to
 11001110001110001110000

(5 6 20 1001)
 11001110000

range: (5 (20 . 1000))

set items 50, 500, 999 off

(5 6 20 50 51 500 501 999 1000 1001)
 1100111000111000011110000111110000

range: (5 (20 . 49) (51 . 499) (501 . 1000))

So yes, this is far less intuitive than ranges.  I think, however,
that it may be faster for operations.  If you think it's worthwhile, I
can try adding gnus-range.el functions that use inversion lists and we
can benchmark them.

For a start, here's the gnus-member-of-invlist function, much simpler
than gnus-member-of-range:

(defun gnus-member-of-invlist (number list)
  (let (state)
    (while (and list
		(>= number (car list)))
      (setq state (not state))
      (setq list (cdr list)))
    state))

I found that it was slower to search this way:

(benchmark-run-compiled 50000 (gnus-member-of-invlist 800 '(5 6 20 50 51 500 501 999 1000 1001)))
(2.5340199999999995 10 3.791519999999963)
(benchmark-run-compiled 50000 (gnus-member-of-range 800 '(5 (20 . 49) (51 . 499) (501 . 1000))))
(0.4903200000000001 10 4.371743000000038)

I may have missed something in the gnus-member-of-invlist
implementation, I'm not sure.  I'm not well-versed in the internal
Emacs implementation of lists, so maybe there's a better way of
storing the inversion list.  Vectors may work - with the advantages
of binary searching and an easy distinction from ranges, which can
still be stores as lists.

The code for the other gnus-range.el functions will also be
significantly shorter using inversion lists, though I'm not sure if
they will be faster - I'd need to benchmark.

Ted



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-05  4:45 Ranges Lars Magne Ingebrigtsen
                   ` (4 preceding siblings ...)
  2004-01-05 21:39 ` Ranges Kai Grossjohann
@ 2004-01-10 16:13 ` Gaute B Strokkenes
  5 siblings, 0 replies; 24+ messages in thread
From: Gaute B Strokkenes @ 2004-01-10 16:13 UTC (permalink / raw)


On 5 jan 2004, larsi@gnus.org wrote:

> I've been thinking about the range thing -- using ranges instead of
> lists whenever talking about articles.  And I've got some ideas
> cooking about the C implementation of those, and I think it could
> possibly be quite groovy.  (A new data type; super-hyper-fast
> `range-memq', etc.)
>
> But, on the other hand -- what's the impact, really?

The best (in my not-so-humble opinion) argument, and one that hasn't
been mentioned so far, is that consistently using ranges means that
you only use one data structure to indicate sets of articles.  I have
been confused on more than one ocassion while browsing through the
gnus source whether ranges or lists are expected.

> Where does using lists instead of ranges really represent a
> noticeable performance drag?  Ok, when entering a group with 1000
> articles, you get a list that's 1000 elements long, but does
> creating that list really make any noticeable difference?
>
> I'd like to see some scenarios where this matters before going down
> this road.

-- 
Gaute Strokkenes                        http://www.srcf.ucam.org/~gs234/
I'm in ATLANTIC CITY riding in a comfortable ROLLING CHAIR...



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-08 21:52           ` Ranges Ted Zlatanov
@ 2004-01-10 18:43             ` Dan Christensen
  2004-01-12 21:13               ` Ranges Ted Zlatanov
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Christensen @ 2004-01-10 18:43 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> For a start, here's the gnus-member-of-invlist function, much simpler
> than gnus-member-of-range:
>
> (defun gnus-member-of-invlist (number list)
>   (let (state)
>     (while (and list
> 		(>= number (car list)))
>       (setq state (not state))
>       (setq list (cdr list)))
>     state))
>
> I found that it was slower to search this way:
>
> (benchmark-run-compiled 50000 (gnus-member-of-invlist 800 '(5 6 20 50 51 500 501 999 1000 1001)))
> (2.5340199999999995 10 3.791519999999963)
> (benchmark-run-compiled 50000 (gnus-member-of-range 800 '(5 (20 . 49) (51 . 499) (501 . 1000))))
> (0.4903200000000001 10 4.371743000000038)

I did some tests, and your code is actually twice as fast as
gnus-member-of-range.  Are you sure your function was byte-compiled in
your tests?  I used

  http://quimby.gnus.org/elisp/benchmark.el

which byte-compiles the control loop, but not the function being
called.  To get good performance, I had to manually byte-compile
gnus-member-of-invlist.  Here's my output (I used slightly longer
lists):

(benchmark 50000 (gnus-member-of-invlist 800 '(5 6 7 9 10 12 13 15 20 50 51 500 501 999 1000 1001)))
0.176069974899292

[Uncompiled, I got 0.9875500202178955]

(benchmark 50000 (gnus-member-of-range 800 '(5 (7 . 8) (10 . 11) (13 . 14) (20 . 49) (51 . 499) (501 . 1000))))
0.31979799270629883

Dan

PS: At first I thought that part of the slow down was because
gnus-member-of-range drops a pair ( x . y ) during each iteration.
So I wrote

(defun gnus-member-of-invlist-b (number list)
  (while (and list
	      (cdr list)
	      (>= number (cadr list)))
    (setq list (cddr list)))
  (and list
       (>= number (car list))))

which goes through two at a time, and avoids using a state variable. 
But it's only slightly faster:

(benchmark 50000 (gnus-member-of-invlist-b 800 '(5 6 7 9 10 12 13 15 20 50 51 500 501 999 1000 1001)))
0.1708989143371582

[Note that I didn't test edge cases thoroughly in my version.]



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-10 18:43             ` Ranges Dan Christensen
@ 2004-01-12 21:13               ` Ted Zlatanov
  0 siblings, 0 replies; 24+ messages in thread
From: Ted Zlatanov @ 2004-01-12 21:13 UTC (permalink / raw)


On Sat, 10 Jan 2004, jdc@uwo.ca wrote:

> I did some tests, and your code is actually twice as fast as
> gnus-member-of-range.

Excellent.  I knew something was strange because my code was so much
simpler, but I thought benchmark-run-compiled automatically did the
byte-compile of the function being tested, so I imagined the current
ranges code was magic :)

> (benchmark 50000 (gnus-member-of-invlist 800 '(5 6 7 9 10 12 13 15
> 20 50 51 500 501 999 1000 1001))) 0.176069974899292
> 
> [Uncompiled, I got 0.9875500202178955]

Sounds right.  Thank you for disbelieving my benchmarks!

> (benchmark 50000 (gnus-member-of-range 800 '(5 (7 . 8) (10 . 11) (13
> . 14) (20 . 49) (51 . 499) (501 . 1000)))) 0.31979799270629883

Cool.  Hey Lars, what do you think?  Are you interested in an invlist
implementation of the ranges functions?  I think, based on these
benchmarks, that we can get some good speedups.

Thanks
Ted



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-07  2:35         ` Ranges Lars Magne Ingebrigtsen
  2004-01-08 21:52           ` Ranges Ted Zlatanov
@ 2004-01-12 21:28           ` Paul Jarc
  2004-01-20 23:53             ` Ranges Ted Zlatanov
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Jarc @ 2004-01-12 21:28 UTC (permalink / raw)


Lars Magne Ingebrigtsen <larsi@gnus.org> wrote:
> Inversion lists seem slightly less intuitive than ranges...

Also, unlike inversion lists, range lists are a generalization of
plain lists.  So changing things over from plain lists to range lists
can be done incrementally: update each piece of code that reads a
plain list, one at a time, so that they can all handle ranges as well,
and then update each piece of code that creates a plain list, one at a
time, s othat they create range lists instead.  Converting to
inversion lists has to be done all at once.  I'm not saying this
should stop anyone from doing that conversion - it's just something to
keep in mind.

It might be best to first convert everything to ranges, since that
conversion is simpler, and at the same time, update all the code to
use an opaque "article set" abstraction.  Then after everything has
been converted to that, we can play with the implementation of
"article sets" for performance.

I guess stored data like .newsrc.el will have to stick with ranges in
any case.


paul



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-12 21:28           ` Ranges Paul Jarc
@ 2004-01-20 23:53             ` Ted Zlatanov
  2004-01-21  7:15               ` Ranges Paul Jarc
  0 siblings, 1 reply; 24+ messages in thread
From: Ted Zlatanov @ 2004-01-20 23:53 UTC (permalink / raw)


On Mon, 12 Jan 2004, prj@po.cwru.edu wrote:

> It might be best to first convert everything to ranges, since that
> conversion is simpler, and at the same time, update all the code to
> use an opaque "article set" abstraction.  Then after everything has
> been converted to that, we can play with the implementation of
> "article sets" for performance.
> 
> I guess stored data like .newsrc.el will have to stick with ranges
> in any case.

How about tagging inversion lists as such, maybe if the first element
is 'invlist then you apply the inversion list algorithms?

Is the Emacs bit vector internal structure as compact as ranges and
inversion lists?  It seems like a good idea to use what's already
written in C to do a fast lookup, but the memory usage could be
prohibitive.  It might be nice to make inversions lists or ranges an
alternate implementation of the internal Emacs bit vectors.

I also think the "article set" abstraction is a good thing.
Unfortunately I don't think I'll have the time to work on it, sorry.

Ted



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-20 23:53             ` Ranges Ted Zlatanov
@ 2004-01-21  7:15               ` Paul Jarc
  2004-01-21 15:19                 ` Ranges Ted Zlatanov
  0 siblings, 1 reply; 24+ messages in thread
From: Paul Jarc @ 2004-01-21  7:15 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> wrote:
> On Mon, 12 Jan 2004, prj@po.cwru.edu wrote:
>> I guess stored data like .newsrc.el will have to stick with ranges
>> in any case.
>
> How about tagging inversion lists as such, maybe if the first element
> is 'invlist then you apply the inversion list algorithms?

That would still cause trouble when people downgrade to an earlier CVS
checkout that doesn't recognize invlists at all.


paul



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Ranges
  2004-01-21  7:15               ` Ranges Paul Jarc
@ 2004-01-21 15:19                 ` Ted Zlatanov
  0 siblings, 0 replies; 24+ messages in thread
From: Ted Zlatanov @ 2004-01-21 15:19 UTC (permalink / raw)


On Wed, 21 Jan 2004, prj@po.cwru.edu wrote:

Ted Zlatanov <tzz@lifelogs.com> wrote: > On Mon, 12 Jan 2004, prj@po.cwru.edu wrote:
>>> I guess stored data like .newsrc.el will have to stick with ranges
>>> in any case.
>>
>> How about tagging inversion lists as such, maybe if the first
>> element is 'invlist then you apply the inversion list algorithms?
> 
> That would still cause trouble when people downgrade to an earlier
> CVS checkout that doesn't recognize invlists at all.

The above mentioned conversion (using 'invlist as the first element)
can be done only when the user requests it.  Then the user knows he
has to "un-inversify" his newsrc.eld before downgrading to an earlier
CVS version.

As you said, an article set abstraction would make all this
significantly easier.  I still think a 30% - 50% speed gain is worth
the coding effort.

I'll ask on emacs-devel about using inversion lists as an alternate
internal implementation of bool-vectors.  Right now they seem to be
implemented literally with concatenated numbers, which is fast but
memory-inefficient.

Ted



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2004-01-21 15:19 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-05  4:45 Ranges Lars Magne Ingebrigtsen
2004-01-05  6:49 ` Ranges Zack Weinberg
2004-01-05 10:57   ` Ranges Per Abrahamsen
2004-01-05  7:13 ` Ranges Steve Youngs
2004-01-05  7:51   ` Ranges Lars Magne Ingebrigtsen
2004-01-05 11:01 ` Ranges Kim Minh Kaplan
2004-01-05 11:19   ` Ranges Lars Magne Ingebrigtsen
2004-01-06  8:44     ` Ranges Kim Minh Kaplan
2004-01-05 19:01 ` Ranges Ted Zlatanov
2004-01-06  4:58   ` Ranges Lars Magne Ingebrigtsen
2004-01-06  5:26     ` Ranges Paul Jarc
2004-01-06  6:44       ` Ranges Ted Zlatanov
2004-01-07  2:35         ` Ranges Lars Magne Ingebrigtsen
2004-01-08 21:52           ` Ranges Ted Zlatanov
2004-01-10 18:43             ` Ranges Dan Christensen
2004-01-12 21:13               ` Ranges Ted Zlatanov
2004-01-12 21:28           ` Ranges Paul Jarc
2004-01-20 23:53             ` Ranges Ted Zlatanov
2004-01-21  7:15               ` Ranges Paul Jarc
2004-01-21 15:19                 ` Ranges Ted Zlatanov
2004-01-05 21:39 ` Ranges Kai Grossjohann
2004-01-06  5:00   ` Ranges Lars Magne Ingebrigtsen
2004-01-06  6:34     ` Ranges Steve Youngs
2004-01-10 16:13 ` Ranges Gaute B Strokkenes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).