From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout.scc.kit.edu (scc-mailout.scc.kit.edu [129.13.185.201]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p6VAgQrp003451 for ; Sun, 31 Jul 2011 06:42:28 -0400 (EDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-01 with esmtp (Exim 4.72 #1) id 1QnTTU-0007SB-EG; Sun, 31 Jul 2011 12:42:24 +0200 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1QnTTU-0005nh-FW for tech@mdocml.bsd.lv; Sun, 31 Jul 2011 12:42:24 +0200 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.69) (envelope-from ) id 1QnTTU-0003uv-Dc for tech@mdocml.bsd.lv; Sun, 31 Jul 2011 12:42:24 +0200 Received: from schwarze by usta.de with local (Exim 4.72) (envelope-from ) id 1QnTTU-00045O-4e for tech@mdocml.bsd.lv; Sun, 31 Jul 2011 12:42:24 +0200 Date: Sun, 31 Jul 2011 12:42:23 +0200 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: Re: Need hash: uthash? Message-ID: <20110731104223.GB1831@iris.usta.de> References: <4E313AC0.2010106@bsd.lv> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E313AC0.2010106@bsd.lv> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Kristaps, Kristaps Dzonsons wrote on Thu, Jul 28, 2011 at 12:32:32PM +0200: > mandoc is getting a `tr' implementation*, needed primarily for > perlpod. It's astounding how fast this happened. I can't even keep up with reading the code right now, however, i'm sure it's good to have it. > This is expensive as it involves iterating over each character in > each text string, then each element in an array of `tr' characters [...] Good that Joerg's suggestions mitigated this problem. > For the time being, this is implemented with the same linear search > of the `ds' and `de' macro keys. This means O(mn) performance over > the number of words/characters (!) and keys. In practise, when > profiling the code, libroff spends a lot of time running through > these lists. -mdoc gets away without even touching this code, but > -man (especially perlpod) is smacked hard. Is that still true with Joerg's suggestion? > In short, the `ds', `de', and now `tr' macros can really benefit > from a hash table. I'm settling on using uthash** due to its > license---I can directly include it in the code. All the required tools for developing mandoc ought to be in OpenBSD base. So, *if* we settle on uthash, we should first discuss that with the other developers, and even if people agree, we will likely have to maintain it in tree ourselves. The first things we will be asked is: Why not use the existing tools? If i understand correctly, uthash is a dynamic hasher (as opposed to a perfect static hasher). What about hash(3) in /usr/src/lib/libc/db/hash/ ? Is that useable? But even if we use tools from base, i think in the current phase of development, we should only do optimizations that require additional libraries if they really cause HUGE performance gain in practice. All of the places where this might be used are still covered in dust that is hardly starting to settle down, so simplicity and flexibility are still top priorities, and will stay for some time. Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv