From: David Moore <dmoore@UCSD.EDU>
Subject: Re: Red Gnus v0.76 is released
Date: 16 Dec 1996 22:26:18 -0800 [thread overview]
Message-ID: <rvrakp660l.fsf@sdnp5.ucsd.edu> (raw)
In-Reply-To: Jim Meyering's message of 16 Dec 1996 22:36:33 -0600
Jim Meyering <meyering@asic.sc.ti.com> writes:
> Lars Magne Ingebrigtsen <larsi@ifi.uio.no> writes:
> | Thu Dec 12 18:18:11 1996 David Moore <dmoore@ucsd.edu>
> |
> | * gnus-util.el (gnus-create-hash-size): Power of 2 hashtables can
> | be _significantly_ faster than 2^x-1 tables on many risc
> | machines. Any gains of 2^x-1 are comparably small on other
> | machines.
>
> Hi.
>
> Have you considered any other implications of changing the table size
> to be a power of 2? It's pretty generally accepted that hash tables
> get better key distributions (fewer collisions) when the table size
> is a prime -- or has relatively few factors. A bad table size (and
> powers of 2 are notoriously bad) can make the normally-very-efficient
> (O(1)) insert/delete/lookup operations abyssymally slow O(n) ones.
I've looked into this yes, and it's not as large of a problem
with bucket hashs as it is with quadratic rehashing, typically. With
the ~12,864 newgroups on my site placed in both 16,384 and 16,383 size
hashtable using the xemacs hash function the number, the collision rate
was virtually the same (less than 5% fewer buckets in the power of 2,
although I can't say much about the chain size from lisp). However, on
the sparc and some other risc processors, over 75% of the cost to
search/insert into the hashtable is in the modulo operation.
> I suspect that choosing table size == 2^N (rather than the
> usually-good and sometimes-even-prime 2^N - 1) would degrade the
> efficiency due to the increased frequency of collisions.
It tends to depend on your hash function and input data,
although I do usually use a rule of thumb of going to a larger table
when using power of two. But again it only helps when your hash
function is still doing something reasonable. I didn't bother doubling
the hashtable size here, because I was mostly after reasonable O(1)
operation, rather than 1 hit accesses. I didn't think a 128k table in
memory was worth it, since the 64k table was doing fine.
--
David Moore <dmoore@ucsd.edu> | Computer Systems Lab __o
UCSD Dept. Computer Science - 0114 | Work: (619) 534-8604 _ \<,_
La Jolla, CA 92093-0114 | Fax: (619) 534-1445 (_)/ (_)
<URL:http://oj.egbt.org/dmoore/> | In a cloud bones of steel.
prev parent reply other threads:[~1996-12-17 6:26 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
1996-12-16 22:47 Lars Magne Ingebrigtsen
1996-12-17 4:36 ` Jim Meyering
1996-12-17 6:26 ` David Moore [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=rvrakp660l.fsf@sdnp5.ucsd.edu \
--to=dmoore@ucsd.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).