From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-1.sys.kth.se (smtp-1.sys.kth.se [130.237.32.175]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p6SJEpk4016156 for ; Thu, 28 Jul 2011 15:14:52 -0400 (EDT) Received: from mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) by smtp-1.sys.kth.se (Postfix) with ESMTP id C4D47155893 for ; Thu, 28 Jul 2011 21:14:45 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-1.sys.kth.se ([130.237.32.175]) by mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) (amavisd-new, port 10024) with LMTP id 4NgTEqlUvcWS for ; Thu, 28 Jul 2011 21:14:44 +0200 (CEST) X-KTH-Auth: kristaps [89.158.117.88] X-KTH-mail-from: kristaps@bsd.lv X-KTH-rcpt-to: tech@mdocml.bsd.lv Received: from macky.local (89-158-117-88.rev.dartybox.com [89.158.117.88]) by smtp-1.sys.kth.se (Postfix) with ESMTP id 00A091551FC for ; Thu, 28 Jul 2011 21:14:42 +0200 (CEST) Message-ID: <4E31B521.80806@bsd.lv> Date: Thu, 28 Jul 2011 21:14:41 +0200 From: Kristaps Dzonsons User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11 X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 To: tech@mdocml.bsd.lv Subject: Re: Need hash: uthash? References: <4E313AC0.2010106@bsd.lv> <20110728150401.GA6081@britannica.bec.de> <4E317C5B.9000804@bsd.lv> <20110728151812.GA6598@britannica.bec.de> In-Reply-To: <20110728151812.GA6598@britannica.bec.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 28/07/2011 17:18, Joerg Sonnenberger wrote: > On Thu, Jul 28, 2011 at 05:12:27PM +0200, Kristaps Dzonsons wrote: >> On 28/07/2011 17:04, Joerg Sonnenberger wrote: >>> On Thu, Jul 28, 2011 at 12:32:32PM +0200, Kristaps Dzonsons wrote: >>>> mandoc is getting a `tr' implementation*, needed primarily for >>>> perlpod. This is expensive as it involves iterating over each >>>> character in each text string, then each element in an array of `tr' >>>> characters (or escape sequences). Expect it in the next few commits >>>> (now it's in polish phase). >>> >>> Shouldn't this use a simple byte lookup table for the hot path? >>> Most of the tr processing applies to non-special sequences and unicode >>> or other \-literals are rare. >> >> Joerg, >> >> On the contrary. perlpod (followed by GNU) is the main offender and >> makes significant use of escape-translation. Yes, I could >> special-case \(*W, but really would rather not. > > i think you misunderstand me. My point is that the really critical path > for .tr processing is handling non-backslashed sequences. E.g. normal > text. For that, building a byte lookup table drops the performance > penalty to a few percent at most. The rest is relative expensive anyway, > but it should be only a very small part of the input. So the O(nm) drops > down to O(normal text + m * backslash-literals). Joerg, yes, I see what you mean---I'll put something in to that effect. In fact, I can use the same interface for the character substitution macros. However, the requirement of a good hashtable for the other elements is still very relevant. Any suggestions beyond uthash? Thanks, Kristaps -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv