From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-2.sys.kth.se (smtp-2.sys.kth.se [130.237.32.160]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p9CL8kNq014485 for ; Wed, 12 Oct 2011 17:08:46 -0400 (EDT) Received: from mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) by smtp-2.sys.kth.se (Postfix) with ESMTP id 4F34514DC48; Wed, 12 Oct 2011 23:08:40 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-2.sys.kth.se ([130.237.32.160]) by mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) (amavisd-new, port 10024) with LMTP id hyIgeJ3wp1rb; Wed, 12 Oct 2011 23:08:39 +0200 (CEST) X-KTH-Auth: kristaps [83.250.3.9] X-KTH-mail-from: kristaps@bsd.lv Received: from macky.local (c83-250-3-9.bredband.comhem.se [83.250.3.9]) by smtp-2.sys.kth.se (Postfix) with ESMTP id 9755014EA1F; Wed, 12 Oct 2011 23:08:37 +0200 (CEST) Message-ID: <4E9601D5.2080501@bsd.lv> Date: Wed, 12 Oct 2011 23:08:37 +0200 From: Kristaps Dzonsons User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.23) Gecko/20110920 Thunderbird/3.1.15 X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 To: tech@mdocml.bsd.lv CC: Ingo Schwarze , Jason McIntyre Subject: Re: apropos "types" (WAS apropos(1) option naming) References: <20111008142925.GB28339@iris.usta.de> <4E90689C.6000206@bsd.lv> <20111008210136.GA8119@iris.usta.de> In-Reply-To: <20111008210136.GA8119@iris.usta.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Ingo, I thought I'd sent this mail. Well, here it goes again... On 08/10/2011 23:01, Ingo Schwarze wrote: > Hi Kristaps, > > Kristaps Dzonsons wrote on Sat, Oct 08, 2011 at 05:13:32PM +0200: > >> I'm tagging out a release right now; in the coming one, we can focus >> much more on getting the options Just Right. > > Sure, and i'll merge the release soon. > >>> Except that maybe, i still hope for something like >>> >>> apropos -s 3 -Q Xr open and Er ENOENT >>> apropos -s 4 -Q An Gray or An Reyk > >> I'd like to spend some time on this. I'll speak aloud because I'm >> still undecided. > > So am i, so let's throw ideas at each other. > >> Let's consider just the matchings in your >> statements above (e.g., An Gray) and assume logical operators exist. >> (The rest is for a different thread.) >> >> The current state of option matching is by symbolic type: > > Yes, and i do want matching by symbolic type. > >> "func foo" to query all functions named "foo". > > However, i don't we to invent new names for symbolic types, but instead > use the existing ones, those defined by mdoc(7) macro names. > >> Functions, in this case, are >> defined by `Fo' and `Fn'. These definitions are encoded when the >> database is created; the source macro type is lost. > > Sure, i see no need to save the exact source macro type. > >> I could change the database to instead encode only the mdoc macro >> name, as in your example. Then "Fo foo AND Fn foo" would match the >> above. The problem with this so far is that it's not user friendly >> at all. It assumes users know about mdoc, and in general they >> don't. > > Well, they don't know about "func" either, and learning "func" is > rather useless, learning "Fo" is more useful and hardly more difficult. Yes! This is exactly the bit of logic I needed to push me into being comfortable with this idea. The way I see it now is that, well, the user will have to consult the manpage to learn the various types anyway. Whether it's "func" or "Fo" is the same to them. This is enough for me! Then all your arguments below follow... `Fo' notation is more economical and bolsters knowledge of mdoc(7), etc. I'm sold. > By the way, i would map .Fn this way: > - first argument, all but the last word -> .Ft > - first argument, last word -> .Fo > - remaining arguments -> .Fa > > So in the database, you don't see which macro was actually used, > but you get maximal semantic information. > > In the user interface for searching, .Fn will be an alias > for .Ft | .Fo | .Fa. > > If you do insist on "func", we can provide that as an alias > for .Fn, but i looks a bit like bloat of limited usefulness. > >> Furthermore, it doesn't work for -man, because now we need >> to do things like `SH foo AND Sh foo' for sections. This gets ugly. > > Well, let's just map .SH to .Sh; done. > We don't want to make anybody learn man(7) macros. > It's a legacy language that only a handful of specilists need to > understand nowadays. > >> And then what happens for the -man description, or name? It has no >> macro at all. > > That's a delicate task for later. > The only way i see is heuristic guessing. > When the code clearly follows usual conventions and the engine > is confident what's going on, it will map, even without macros. > When the code looks strange and the guessing engine is unsure, don't > map at all - bogus db entries are very annoying, so be conservative. Yes, and you know I already match for most -man name/description cases, although I haven't empirically tested the extent of "most", here. >> Making people search for `Nd text' and having it also >> search -man, which has no `Nd', is confusing because sometimes >> there'd be a macro, sometimes not. > > No, i don't think it's confusing. Users should not worry what > the actual code of a specific page is. They don't search for > pages containing the .Nd macro. But they will learn that the > Name section Description (if the parser can find it) is .Nd, > and they will search for the Name section Description, not > worrying whether the actual source code is mdoc(7) or man(7). > >> But that's ok, actually. Because we could let apropos have some >> symbolic types, like "function", that would magically expand into >> "Fo foo AND Fn foo", hiding the types from the users. Something >> like "section" would expand into "Sh foo AND SH foo". And "desc" >> into `Nd' for -mdoc and the free-form description for -man. > > No, my goal is not to have the user interface require knowledge > of such technicalities. I do want symbolic types. I just want > their names derived from mdoc(7) - because that actually makes > the interface *simpler*. People who know what they are doing > don't need to learn anything new at all. > >> But then... for something like the description, we would have a >> symbolic name but not a macro name. This is confusing. >> >> Overall I'm still on the fence as to the best approach. On the one >> hand we have lots of flexibility, but significant complexity. On >> the other hand, we have a tighter database, but our choices for >> types may appear arbitrary. >> >> I slightly prefer, however, the best approach of biting the pillow >> and trying to determine the best symbolic types, which will be >> encoded directly in the database as they are now. If we do a good >> job, we can probably match the flexibility of `Xr open' without the >> complexity (not even to mention that many macros aren't semantically >> interesting). But I'm open to suggestions, so please chime in! > > I think i wouldn't put .Em into the database at all, because > who is going to search for underlined text? That's of very > limited usefulness at best. > > Let's look at the mdoc(7) macro overview. > I think the following are useful as search keys: > > - Document preamble and NAME section macros > (perhaps excluding Os, at least at first) > - Sections and cross references > (excluding .Sx, .Pp, and .Lp) > - .Bd, .D1, .Dl are special cases. > Maybe they warrant *one* common search key, or none, > or one for -literal, one for -filled. > Not sure yet. > - .Bl is a special case. > .Bl -tag .It is probably interesting - or maybe not. > .Bl -bullet is probably pointless and will just be skipped. > Not sure yet. > - .%* is very interesting. > - Semantic markup for command line utilities > - Semantic markup for function libraries > - Various semantic markup > - Text production (maybe) Ok, I'm going to put together a potential list of types for review. I anticipate the biggest "problem", user-interfacedly, will be that of discerning "Fo made in the SYNOPSIS" and "Fo referenced in the body". > The following are irrelevant: > - Spacing control > - Physical markup > - Physical enclosures (or maybe? Brq, Aq? Not sure yet.) > > Oh, by the way. > Maybe we don't need an option at all: > > apropos An=Gray An=Reyk > apropos Xr=open \& Er=ENOENT > apropos 'Xr=open& Er=ENOENT' > > Traditionally, multiple arguments mean "or", so the authors > are or'ed. So we need a good syntax for "and". The Linux > syntax "-a" is not powerful enough (it just switches the whole > command line to "and" logic) and clashes badly with man -a, > whatis -a, whereis -a. The above is not yet very nice. > Well, the "or" case is, but the "and" case - hmm... Why don't we just be a little more awesome and allow apropos '( An Gray && Xr open ) || ...'. Meaning, the AND/OR/NOT logical operators and a notion of grouping? It doesn't seem too hard: matching would still be pretty straightforward. In fact, once we start to do any Boolean logic, the distance between arbitrary expression (such as the above) and something like Linux's dumb "-a" is quite small. Might as well go the distance. Thoughts? Kristaps -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv