From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp1.rz.uni-karlsruhe.de (Debian-exim@smtp1.rz.uni-karlsruhe.de [129.13.185.217]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id oBAKjGCx031380 for ; Fri, 10 Dec 2010 15:45:18 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by smtp1.rz.uni-karlsruhe.de with esmtp (Exim 4.63 #1) id 1PR9q5-0006H3-UC; Fri, 10 Dec 2010 21:45:14 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1PR9q5-0003QR-Oh for tech@mdocml.bsd.lv; Fri, 10 Dec 2010 21:45:13 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.69) (envelope-from ) id 1PR9q5-0007v1-Nq for tech@mdocml.bsd.lv; Fri, 10 Dec 2010 21:45:13 +0100 Received: from schwarze by usta.de with local (Exim 4.72) (envelope-from ) id 1PR9q5-0007Jy-N2 for tech@mdocml.bsd.lv; Fri, 10 Dec 2010 21:45:13 +0100 Date: Fri, 10 Dec 2010 21:45:13 +0100 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: Re: roff.c question Message-ID: <20101210204513.GB18607@iris.usta.de> References: <4CF678F0.6020304@bsd.lv> <20101201212834.GA22990@iris.usta.de> <4CF77A2B.6020702@bsd.lv> <4CF79F45.6080105@bsd.lv> <20101202225019.GD12188@iris.usta.de> <20101203214929.GB28384@iris.usta.de> <4CFBAC80.1060004@bsd.lv> <20101208010527.GA25360@iris.usta.de> <4D01F5A4.8010300@bsd.lv> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D01F5A4.8010300@bsd.lv> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Kristaps, Kristaps Dzonsons wrote on Fri, Dec 10, 2010 at 10:40:52AM +0100: >>> Ingo wrote: >>>> 1) roff_res() should not expand \\* or \\\\*, >>>> but it should expand \* and \\\*. > I worry a bit about performance, as this is an > often-called function: Yes, the function is called L+E times, where L is the number of input lines and E is the number of expanded strings and user-defined roff macro invocations. But my algorithm is still O(N), where N is the number of characters on the line, as long as no expansion is done. So, if C is the total number of characters in the input file, and e is the mean number of expansions per line, then the total order of the search algorithm (without doing the actual expansions) is O(C*(1+e)). The work to do for the expansions is O(C*e), but that didn't change. All that is still nearly linear, which is hardly a bad order. > why the funny business looking for three > non-nils then the escaped asterisk? > > Seems (it pseudo-C) > > for (cp = buf + pos; cpp = strstr(cp, "\\*"); cp++) > if ('\0' == cpp[2]) continue; > ... > > would be much clearer and faster. No? No. Faster, no. That's O(C*(1+e)) as well. So we are talking about constant factors here. Maybe you are economizing two char comparisons per loop cycle. Assuming e=1, that will buy you perhaps 400k char comparisons on a 100k input file. That might be about a millisecond on a very old PC. Clearer, no. It's incorrect. You cannot use strstr(). The number of backslashes before "\\*" is relevant. It must be even, or you don't want to substitute. If we want to go into micro-optimization, we could do this: for (cp = *bufp + pos; *cp; cp++) { if ('\\' != *cp) continue; stesc = cp; if ('\0' == *(++cp)) break; if ('*' != *cp) continue; if ('\0' == *(++cp)) break; switch (*cp) { deferring the tests until they are really needed (untested). That's fewer tests because most of the time, the continue branches will be taken. But is that really better? It is more code, and harder to understand. Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv