Gnus development mailing list
 help / color / mirror / Atom feed
From: David Moore <dmoore@UCSD.EDU>
Subject: Re: Red Gnus v0.76 is released
Date: 16 Dec 1996 22:26:18 -0800	[thread overview]
Message-ID: <rvrakp660l.fsf@sdnp5.ucsd.edu> (raw)
In-Reply-To: Jim Meyering's message of 16 Dec 1996 22:36:33 -0600

Jim Meyering <meyering@asic.sc.ti.com> writes:

> Lars Magne Ingebrigtsen <larsi@ifi.uio.no> writes:
> | Thu Dec 12 18:18:11 1996  David Moore  <dmoore@ucsd.edu>
> | 
> | 	* gnus-util.el (gnus-create-hash-size): Power of 2 hashtables can
> | 	be _significantly_ faster than 2^x-1 tables on many risc
> | 	machines.  Any gains of 2^x-1 are comparably small on other
> | 	machines. 
> 
> Hi.
> 
> Have you considered any other implications of changing the table size
> to be a power of 2?  It's pretty generally accepted that hash tables
> get better key distributions (fewer collisions) when the table size
> is a prime -- or has relatively few factors.  A bad table size (and
> powers of 2 are notoriously bad) can make the normally-very-efficient
> (O(1)) insert/delete/lookup operations abyssymally slow O(n) ones.

	I've looked into this yes, and it's not as large of a problem
with bucket hashs as it is with quadratic rehashing, typically.  With
the ~12,864 newgroups on my site placed in both 16,384 and 16,383 size
hashtable using the xemacs hash function the number, the collision rate
was virtually the same (less than 5% fewer buckets in the power of 2,
although I can't say much about the chain size from lisp).  However, on
the sparc and some other risc processors, over 75% of the cost to
search/insert into the hashtable is in the modulo operation.

> I suspect that choosing table size == 2^N (rather than the
> usually-good and sometimes-even-prime 2^N - 1) would degrade the
> efficiency due to the increased frequency of collisions.

	It tends to depend on your hash function and input data,
although I do usually use a rule of thumb of going to a larger table
when using power of two.  But again it only helps when your hash
function is still doing something reasonable.  I didn't bother doubling
the hashtable size here, because I was mostly after reasonable O(1)
operation, rather than 1 hit accesses.  I didn't think a 128k table in
memory was worth it, since the 64k table was doing fine.

-- 
David Moore <dmoore@ucsd.edu>       | Computer Systems Lab      __o
UCSD Dept. Computer Science - 0114  | Work: (619) 534-8604    _ \<,_
La Jolla, CA 92093-0114             | Fax:  (619) 534-1445   (_)/ (_)
<URL:http://oj.egbt.org/dmoore/>    | In a cloud bones of steel.


      reply	other threads:[~1996-12-17  6:26 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1996-12-16 22:47 Lars Magne Ingebrigtsen
1996-12-17  4:36 ` Jim Meyering
1996-12-17  6:26   ` David Moore [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=rvrakp660l.fsf@sdnp5.ucsd.edu \
    --to=dmoore@ucsd.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).