From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-2.sys.kth.se (smtp-2.sys.kth.se [130.237.32.160]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p6VC0u9d011663 for ; Sun, 31 Jul 2011 08:00:57 -0400 (EDT) Received: from mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) by smtp-2.sys.kth.se (Postfix) with ESMTP id 9936914DC54; Sun, 31 Jul 2011 14:00:50 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-2.sys.kth.se ([130.237.32.160]) by mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) (amavisd-new, port 10024) with LMTP id Q0PNAbZv-TGH; Sun, 31 Jul 2011 14:00:47 +0200 (CEST) X-KTH-Auth: kristaps [89.158.117.88] X-KTH-mail-from: kristaps@bsd.lv Received: from macky.local (89-158-117-88.rev.dartybox.com [89.158.117.88]) by smtp-2.sys.kth.se (Postfix) with ESMTP id A715714DC52; Sun, 31 Jul 2011 14:00:46 +0200 (CEST) Message-ID: <4E3543ED.5030103@bsd.lv> Date: Sun, 31 Jul 2011 14:00:45 +0200 From: Kristaps Dzonsons User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11 X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 To: tech@mdocml.bsd.lv CC: Ingo Schwarze Subject: Re: Need hash: uthash? References: <4E313AC0.2010106@bsd.lv> <20110731104223.GB1831@iris.usta.de> In-Reply-To: <20110731104223.GB1831@iris.usta.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit >> For the time being, this is implemented with the same linear search >> of the `ds' and `de' macro keys. This means O(mn) performance over >> the number of words/characters (!) and keys. In practise, when >> profiling the code, libroff spends a lot of time running through >> these lists. -mdoc gets away without even touching this code, but >> -man (especially perlpod) is smacked hard. > > Is that still true with Joerg's suggestion? Ingo, No---well, not really. As you can see, the code switches between using a byte-table for lookups (regular characters) and a linear search (escape sequences). Cf. roff_strdup(). Again, most mdoc manuals never touch this code. The topic of hashes is important and could make a big difference, but I don't yet have the benchmarks to substantiate this, so I'm not pushing it too hard. I'm mainly bothered by the many diverse tables and methods that are floating around in the codebase. The situation is as follows. Perfect [minimal?] hashing. Usage: escape table, mdoc macro table, man macro table, roff macro table. Pros: removes a big chunk of code/space (depending on the hash implementation of course) and initialisation overhead (static), some speed-ups in lookups. Cons: machine-generated code, needs Yet Another Tool. NetBSD has one (nmbperf), GNU has one (gperf). Alternatives: use a dynamic hash and eat the overhead. Since I already manually "hash" these, this is an acceptable start. Dynamic hashing. Usage: `de' and `ds' symbol table, `tr' symbol table. Pros: cleans up interfaces, significantly speeds up some areas (the linear lookup). Will speed up -man documents by quite a bit. Cons: needs a hash mechanism. Some space overhead for the table vs. the linked list. Alternatives: continue being slow. What I'm going to do is implement most of these and run benchmarks to see what we actually get. I'd done this before for a small part but have forgotten the results, so it's time to do it again. I'm interested in both the single-manual improvement and the many-manual improvement. Thanks, Kristaps -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv