From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout.scc.kit.edu (scc-mailout.scc.kit.edu [129.13.185.202]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id pBS1RgUZ019638 for ; Tue, 27 Dec 2011 20:27:42 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1RfiIv-0000k4-1g; Wed, 28 Dec 2011 02:27:41 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1RfiIv-0002wh-2O for tech@mdocml.bsd.lv; Wed, 28 Dec 2011 02:27:41 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1RfiIv-00065j-0s for tech@mdocml.bsd.lv; Wed, 28 Dec 2011 02:27:41 +0100 Received: from schwarze by usta.de with local (Exim 4.72) (envelope-from ) id 1RfiIv-0003eF-08 for tech@mdocml.bsd.lv; Wed, 28 Dec 2011 02:27:41 +0100 Date: Wed, 28 Dec 2011 02:27:40 +0100 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: Re: Improve mandocdb catpage/man heuristics. Message-ID: <20111228012740.GB30880@iris.usta.de> References: <4EFA5B78.5030507@bsd.lv> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4EFA5B78.5030507@bsd.lv> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Kristaps, these are clear improvements, so i have committed them to OpenBSD. Kristaps Dzonsons wrote on Wed, Dec 28, 2011 at 01:57:44AM +0200: > This improves the mandocdb(8) catpage heuristic to, well, more or > less as good as it's going to get. It now reads multiple lines into > a buffer, joining the lines with a space. Actually, some man(7) pages have a similar problem, see for example curs_extend(3) which contains .SH NAME \fBcurses_version\fP, \fBuse_extended_names\fP \- miscellaneous curses extensions These need to take multiple lines into account, to, but maybe a simpler algorithm than in pformatted is sufficient: After .SH NAME, skip all lines until you find "- ", then use everything until EOL as .Nd. > While here, I removed the 70-character limit. Yes, i'm not married to that. Probably it's wrong to have this at all. I only put it in when first writing the code to avoid getting distracted from the main tasks by badly formatted corner case pages. Now that the basic infrastructure is in place, we can figure out whether such corner cases really exist, and how many of them, and what to do about them. Truncating is a very naive - well, i hardly dare say "solution". > I recoded this into apropos.c's and whatis.c's printf(3) > statements. Already an improvement. > We should really > consider a better way: if not 70-char, to the COLUMN limit? Not sure yet, open for suggestions. I'm not even sure how big the problem is... > Lastly, I added an extra man(7) heuristic for separating names and > descriptions, namely the \-\- I observed in some POD manuals. This > cleans up "apropos -s 3p ~.*" quite a lot. Maybe man(7) can use similar heuristics as cat, just assuming that "- " starts the description? Thanks, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv