From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout.scc.kit.edu (scc-mailout.scc.kit.edu [129.13.185.201]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p98LMICj025286 for ; Sat, 8 Oct 2011 17:22:20 -0400 (EDT) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-01.scc.kit.edu with esmtp (Exim 4.72 #1) id 1RCeLZ-0001tn-7c; Sat, 08 Oct 2011 23:22:17 +0200 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1RCeKp-00054O-Js for tech@mdocml.bsd.lv; Sat, 08 Oct 2011 23:21:31 +0200 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1RCeLZ-0000rK-36; Sat, 08 Oct 2011 23:22:17 +0200 Received: from schwarze by usta.de with local (Exim 4.72) (envelope-from ) id 1RCe1Z-0006Wy-92; Sat, 08 Oct 2011 23:01:37 +0200 Date: Sat, 8 Oct 2011 23:01:37 +0200 From: Ingo Schwarze To: tech@mdocml.bsd.lv Cc: jmc@usta.de Subject: Re: apropos "types" (WAS apropos(1) option naming) Message-ID: <20111008210136.GA8119@iris.usta.de> References: <20111008142925.GB28339@iris.usta.de> <4E90689C.6000206@bsd.lv> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E90689C.6000206@bsd.lv> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Kristaps, Kristaps Dzonsons wrote on Sat, Oct 08, 2011 at 05:13:32PM +0200: > I'm tagging out a release right now; in the coming one, we can focus > much more on getting the options Just Right. Sure, and i'll merge the release soon. >> Except that maybe, i still hope for something like >> >> apropos -s 3 -Q Xr open and Er ENOENT >> apropos -s 4 -Q An Gray or An Reyk > I'd like to spend some time on this. I'll speak aloud because I'm > still undecided. So am i, so let's throw ideas at each other. > Let's consider just the matchings in your > statements above (e.g., An Gray) and assume logical operators exist. > (The rest is for a different thread.) > > The current state of option matching is by symbolic type: Yes, and i do want matching by symbolic type. > "func foo" to query all functions named "foo". However, i don't we to invent new names for symbolic types, but instead use the existing ones, those defined by mdoc(7) macro names. > Functions, in this case, are > defined by `Fo' and `Fn'. These definitions are encoded when the > database is created; the source macro type is lost. Sure, i see no need to save the exact source macro type. > I could change the database to instead encode only the mdoc macro > name, as in your example. Then "Fo foo AND Fn foo" would match the > above. The problem with this so far is that it's not user friendly > at all. It assumes users know about mdoc, and in general they > don't. Well, they don't know about "func" either, and learning "func" is rather useless, learning "Fo" is more useful and hardly more difficult. By the way, i would map .Fn this way: - first argument, all but the last word -> .Ft - first argument, last word -> .Fo - remaining arguments -> .Fa So in the database, you don't see which macro was actually used, but you get maximal semantic information. In the user interface for searching, .Fn will be an alias for .Ft | .Fo | .Fa. If you do insist on "func", we can provide that as an alias for .Fn, but i looks a bit like bloat of limited usefulness. > Furthermore, it doesn't work for -man, because now we need > to do things like `SH foo AND Sh foo' for sections. This gets ugly. Well, let's just map .SH to .Sh; done. We don't want to make anybody learn man(7) macros. It's a legacy language that only a handful of specilists need to understand nowadays. > And then what happens for the -man description, or name? It has no > macro at all. That's a delicate task for later. The only way i see is heuristic guessing. When the code clearly follows usual conventions and the engine is confident what's going on, it will map, even without macros. When the code looks strange and the guessing engine is unsure, don't map at all - bogus db entries are very annoying, so be conservative. > Making people search for `Nd text' and having it also > search -man, which has no `Nd', is confusing because sometimes > there'd be a macro, sometimes not. No, i don't think it's confusing. Users should not worry what the actual code of a specific page is. They don't search for pages containing the .Nd macro. But they will learn that the Name section Description (if the parser can find it) is .Nd, and they will search for the Name section Description, not worrying whether the actual source code is mdoc(7) or man(7). > But that's ok, actually. Because we could let apropos have some > symbolic types, like "function", that would magically expand into > "Fo foo AND Fn foo", hiding the types from the users. Something > like "section" would expand into "Sh foo AND SH foo". And "desc" > into `Nd' for -mdoc and the free-form description for -man. No, my goal is not to have the user interface require knowledge of such technicalities. I do want symbolic types. I just want their names derived from mdoc(7) - because that actually makes the interface *simpler*. People who know what they are doing don't need to learn anything new at all. > But then... for something like the description, we would have a > symbolic name but not a macro name. This is confusing. > > Overall I'm still on the fence as to the best approach. On the one > hand we have lots of flexibility, but significant complexity. On > the other hand, we have a tighter database, but our choices for > types may appear arbitrary. > > I slightly prefer, however, the best approach of biting the pillow > and trying to determine the best symbolic types, which will be > encoded directly in the database as they are now. If we do a good > job, we can probably match the flexibility of `Xr open' without the > complexity (not even to mention that many macros aren't semantically > interesting). But I'm open to suggestions, so please chime in! I think i wouldn't put .Em into the database at all, because who is going to search for underlined text? That's of very limited usefulness at best. Let's look at the mdoc(7) macro overview. I think the following are useful as search keys: - Document preamble and NAME section macros (perhaps excluding Os, at least at first) - Sections and cross references (excluding .Sx, .Pp, and .Lp) - .Bd, .D1, .Dl are special cases. Maybe they warrant *one* common search key, or none, or one for -literal, one for -filled. Not sure yet. - .Bl is a special case. .Bl -tag .It is probably interesting - or maybe not. .Bl -bullet is probably pointless and will just be skipped. Not sure yet. - .%* is very interesting. - Semantic markup for command line utilities - Semantic markup for function libraries - Various semantic markup - Text production (maybe) The following are irrelevant: - Spacing control - Physical markup - Physical enclosures (or maybe? Brq, Aq? Not sure yet.) Oh, by the way. Maybe we don't need an option at all: apropos An=Gray An=Reyk apropos Xr=open \& Er=ENOENT apropos 'Xr=open & Er=ENOENT' Traditionally, multiple arguments mean "or", so the authors are or'ed. So we need a good syntax for "and". The Linux syntax "-a" is not powerful enough (it just switches the whole command line to "and" logic) and clashes badly with man -a, whatis -a, whereis -a. The above is not yet very nice. Well, the "or" case is, but the "and" case - hmm... Anyway, my point is that almost no useful .Nm/.Nd contains the character '=', so we can just prefix the search type to the search key, right? Yours, Ingo -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv