From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp-2.sys.kth.se (smtp-2.sys.kth.se [130.237.32.160])
	by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p6VC0u9d011663
	for <tech@mdocml.bsd.lv>; Sun, 31 Jul 2011 08:00:57 -0400 (EDT)
Received: from mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91])
	by smtp-2.sys.kth.se (Postfix) with ESMTP id 9936914DC54;
	Sun, 31 Jul 2011 14:00:50 +0200 (CEST)
X-Virus-Scanned: by amavisd-new at kth.se
Received: from smtp-2.sys.kth.se ([130.237.32.160])
	by mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) (amavisd-new, port 10024)
	with LMTP id Q0PNAbZv-TGH; Sun, 31 Jul 2011 14:00:47 +0200 (CEST)
X-KTH-Auth: kristaps [89.158.117.88]
X-KTH-mail-from: kristaps@bsd.lv
Received: from macky.local (89-158-117-88.rev.dartybox.com [89.158.117.88])
	by smtp-2.sys.kth.se (Postfix) with ESMTP id A715714DC52;
	Sun, 31 Jul 2011 14:00:46 +0200 (CEST)
Message-ID: <4E3543ED.5030103@bsd.lv>
Date: Sun, 31 Jul 2011 14:00:45 +0200
From: Kristaps Dzonsons <kristaps@bsd.lv>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11
X-Mailinglist: mdocml-tech
Reply-To: tech@mdocml.bsd.lv
MIME-Version: 1.0
To: tech@mdocml.bsd.lv
CC: Ingo Schwarze <schwarze@usta.de>
Subject: Re: Need hash: uthash?
References: <4E313AC0.2010106@bsd.lv> <20110731104223.GB1831@iris.usta.de>
In-Reply-To: <20110731104223.GB1831@iris.usta.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

>> For the time being, this is implemented with the same linear search
>> of the `ds' and `de' macro keys.  This means O(mn) performance over
>> the number of words/characters (!) and keys.  In practise, when
>> profiling the code, libroff spends a lot of time running through
>> these lists. -mdoc gets away without even touching this code, but
>> -man (especially perlpod) is smacked hard.
>
> Is that still true with Joerg's suggestion?

Ingo,

No---well, not really.  As you can see, the code switches between using 
a byte-table for lookups (regular characters) and a linear search 
(escape sequences).  Cf. roff_strdup().  Again, most mdoc manuals never 
touch this code.

The topic of hashes is important and could make a big difference, but I 
don't yet have the benchmarks to substantiate this, so I'm not pushing 
it too hard.  I'm mainly bothered by the many diverse tables and methods 
that are floating around in the codebase.

The situation is as follows.

Perfect [minimal?] hashing.

   Usage: escape table, mdoc macro table, man macro table, roff macro table.

   Pros: removes a big chunk of code/space (depending on the hash 
implementation of course) and initialisation overhead (static), some 
speed-ups in lookups.

   Cons: machine-generated code, needs Yet Another Tool.  NetBSD has one 
(nmbperf), GNU has one (gperf).

   Alternatives: use a dynamic hash and eat the overhead.  Since I 
already manually "hash" these, this is an acceptable start.

Dynamic hashing.

   Usage: `de' and `ds' symbol table, `tr' symbol table.

   Pros: cleans up interfaces, significantly speeds up some areas (the 
linear lookup).  Will speed up -man documents by quite a bit.

   Cons: needs a hash mechanism.  Some space overhead for the table vs. 
the linked list.

   Alternatives: continue being slow.

What I'm going to do is implement most of these and run benchmarks to 
see what we actually get.  I'd done this before for a small part but 
have forgotten the results, so it's time to do it again.  I'm interested 
in both the single-manual improvement and the many-manual improvement.

Thanks,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv