From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout.scc.kit.edu (scc-mailout.scc.kit.edu [129.13.185.202]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id pAQBsUdg010298 for ; Sat, 26 Nov 2011 06:54:30 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1RUGpx-0000Lr-Gx; Sat, 26 Nov 2011 12:54:29 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1RUGpx-0001St-Ig for tech@mdocml.bsd.lv; Sat, 26 Nov 2011 12:54:29 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1RUGpx-0005H2-HC for tech@mdocml.bsd.lv; Sat, 26 Nov 2011 12:54:29 +0100 Received: from schwarze by usta.de with local (Exim 4.72) (envelope-from ) id 1RUGpx-00043j-8q for tech@mdocml.bsd.lv; Sat, 26 Nov 2011 12:54:29 +0100 Date: Sat, 26 Nov 2011 12:54:29 +0100 From: Ingo Schwarze To: tech@mdocml.bsd.lv Subject: Re: mandocdb: handle formatted manuals Message-ID: <20111126115429.GB13912@iris.usta.de> References: <20111119005649.GA10365@iris.usta.de> <4ECE1B81.7080902@bsd.lv> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4ECE1B81.7080902@bsd.lv> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Kristaps, Kristaps Dzonsons wrote on Thu, Nov 24, 2011 at 11:25:05AM +0100: > On 11/19/11 01:56, Ingo Schwarze wrote: >> right, extracting information from formatted manuals is a rather >> dirty business and never going to be that reliable, but there is >> no choice: Sometimes, nothing else is available, and we have to >> deal with it. Of course, on OpenBSD, we could leave that dirty >> work to espie@'s OpenBSD::Makewhatis perl modules, but i'd rather >> have a portable solution, and i'd rather not have makewhatis(8) >> split into two pieces. I still hope that mandocdb(8) can replace >> makewhatis(8) completely (except for the pkg_add(1)/pkg_delete(1)/ >> pkg_create(1)-integration of course, which is not going to be >> portable given how different pkg_add and pkgsrc are). >> >> So here is what i did on my train ride from the p2k11 ports hackathon >> in Budapest back to Karlsruhe (including the one hour lockup in >> Hegyeshalom when the locomotive stopped working, grrr): >> >> * Even without -a, walk the cat* dirs in addition to man*. >> * Only use those cats where men^Wmans are not available >> because mans are just greater than cats. >> >> There is still a lot of room for improvement, several features of >> OpenBSD::Makewhatis are not yet implemented. However, this is >> already working in most respects, and i'd like to put it in for >> in-tree polishing. > There's a slight problem with this: when a file is entered into > mandocdb's databases, there's an implicit assumption that it was > parsed. In other words, mandocdb entries are "safe" for mandoc. > > We need to clearly demark which files are "safe" and which are not. > In this way, progs interfacing with mandocdb databases can act > accordingly. > > The easiest way, of course, is a bit in the index file. Can you > modify this patch, and the mandoc.index format, to do something like > that? I think that is clearly possible and may be useful. To avoid confounding patches across the two repositories, i'm first going to commit my unchanged patch to bsd.lv, even if that's going to cause the problems you describe for a short period of time, then write an additional patch to introduce the flag you suggest. > While we're updating the mandoc.index format, is there anything else > that should be going in there? Currently, we have recno -> filename \0 section \0 title \0 arch \0 description \0 Given that the description will typically be dozens of characters, i don't think encoding the file type in a single byte with a set of #defines is worth the obfuscation, so i'd just make that recno -> type \0 filename \0 section \0 title \0 arch \0 description \0 where type = ( mdoc | man | cat ). Do you agree with that? I don't think any other information is required in the index. However, you planned to check endian-neutrality, right? Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv