tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* overhaul apropos(1) interface
@ 2011-11-09  1:30 Ingo Schwarze
  2011-11-09 10:19 ` Kristaps Dzonsons
  2011-11-12 23:54 ` Ingo Schwarze
  0 siblings, 2 replies; 6+ messages in thread
From: Ingo Schwarze @ 2011-11-09  1:30 UTC (permalink / raw)
  To: tech; +Cc: jmc

Hi,

the interface of the apropos utility in the mandoc package
is very different from traditional apropos, and the -t option
hinders logical extensions, as discussed previously.
So, here is a first step to fix this, doing various things:

INTERFACE CHANGES:
 * drop the -s (sort) option, it clashes with -s (section)
 * introduce -N (numerical sort) instead
 * drop the -c (cat) option, it clashes with -c (copy to stdout)
 * rename it to the usual -s (section)
 * drop the -a (arch) option, it clashes with -a (all)
 * rename it to the usual -S (subsection)
 * drop the -t (search type), it clashes with -t (use troff)
 * use a macro= syntax instead, as discussed
 * drop -e (exact) and -r (regex), it belongs to each query phrase
 * use a macro== and macro=~ syntax instead

The new syntax is:

  apropos [-IN] [-s section] [-S arch] query_phrase [...]

  query_phrase ::= [[macro[,...]](=|==|=~)]query_value

Multiple query phrases are not yet implemented, but they are to
be or'ed in the future.

The operators are:

 =  substring match
 == exact match
 =~ regex match

If no operator is given, = is assumed.
Multiple macros can be given, joined with commas, they are or'ed.
If no macro is given, "Nm,Nd" is assumed.

STRUCTURAL CLEANUP:
 * collect the common defines for mandocdb and apropos in mandocdb.h
 * name the TYPE_ bitfield constants according to the new interface
 * drop types and match from the global struct opts
   because these are local to each query phrase
 * drop the enum sort; for an alternative, a bool is sufficient
 * drop the local dbf and idxf strings, use those in the global struct

This is lightly tested, but given that we are not yet in production
and lots of heavy changes are needed in after this, i consider
light testing sufficient.

OK?

Of course, when committing, i'm going to change the manual, too.

And by the way, this patch is +144 -175 :-).

Yours,
  Ingo


ischwarze@isnote $ apropos An=Gray    
AN(4) - Aironet Communications 4500/4800 IEEE 802.11FH/b wireless network device
APS(4) - ThinkPad Active Protection System accelerometer
BWI(4) - Broadcom AirForce IEEE 802.11b/g wireless network device
ET(4) - Agere/LSI ET1310 10/100/Gigabit Ethernet device
ETPHY(4) - Agere/LSI ET1011 TruePHY Gigabit Ethernet PHY
JME(4) - JMicron JMC250/JMC260 10/100/Gigabit Ethernet device
JMPHY(4) - JMicron JMP202/JMP211 10/100/Gigabit Ethernet PHY
MOSCOM(4) - MosChip Semiconductor MCS7703 based USB serial adapter
NFE(4) - NVIDIA nForce MCP 10/100/Gigabit Ethernet device
RTW(4) - Realtek RTL8180L IEEE 802.11b wireless network device
SPDMEM(4) - Serial Presence Detect memory
UARK(4) - Arkmicro Technologies ARK3116 based USB serial adapter
UCHCOM(4) - WinChipHead CH341/340 based USB serial adapter
UDAV(4) - Davicom DM9601 10/100 USB Ethernet device
UMSM(4) - Qualcomm MSM modem device
USLCOM(4) - Silicon Laboratories CP2101/CP2102 based USB serial adapter
ZYD(4) - ZyDAS ZD1211/ZD1211B USB IEEE 802.11b/g wireless network device
ischwarze@isnote $ apropos Xr==et.4 
PCI(4) - introduction to PCI bus support
ischwarze@isnote $ apropos Nm=~^b.e$
BCE(4) - Broadcom BCM4401 10/100 Ethernet device
BGE(4) - Broadcom BCM57xx/BCM590x 10/100/Gigabit Ethernet device


--- apropos.c.orig
+++ apropos.c
@@ -1,6 +1,7 @@
 /*	$Id: apropos.c,v 1.2 2011/10/09 17:59:56 schwarze Exp $ */
 /*
-* Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
+ * Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
+ * Copyright (c) 2011 Ingo Schwarze <schwarze@openbsd.org>
  *
  * Permission to use, copy, modify, and distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
@@ -31,44 +32,21 @@
 #include <db.h>
 
 #include "mandoc.h"
+#include "mandocdb.h"
 
 #define	MAXRESULTS	 256
 
-/* Bit-fields.  See mandocdb.8. */
-
-#define TYPE_NAME	  0x01
-#define TYPE_FUNCTION	  0x02
-#define TYPE_UTILITY	  0x04
-#define TYPE_INCLUDES	  0x08
-#define TYPE_VARIABLE	  0x10
-#define TYPE_STANDARD	  0x20
-#define TYPE_AUTHOR	  0x40
-#define TYPE_CONFIG	  0x80
-#define TYPE_DESC	  0x100
-#define TYPE_XREF	  0x200
-#define TYPE_PATH	  0x400
-#define TYPE_ENV	  0x800
-#define TYPE_ERR	  0x1000
-
 enum	match {
 	MATCH_SUBSTR = 0,
 	MATCH_REGEX,
 	MATCH_EXACT
 };
 
-enum	sort {
-	SORT_TITLE = 0,
-	SORT_CAT,
-	SORT__MAX
-};
-
 struct	opts {
-	enum sort	 sort; /* output sorting */
+	const char	*sec; /* restrict to manual section */
 	const char	*arch; /* restrict to architecture */
-	const char	*cat; /* restrict to category */
-	int		 types; /* only types in bitmask */
 	int		 insens; /* case-insensitive match */
-	enum match	 match; /* match type */
+	int		 numerical; /* sort output by section */
 };
 
 struct	type {
@@ -78,7 +56,7 @@ struct	type {
 
 struct	rec {
 	char		*file; /* file in file-system */
-	char		*cat; /* category (3p, 3, etc.) */
+	char		*sec; /* section (3p, 3, etc.) */
 	char		*title; /* title (FOO, etc.) */
 	char		*arch; /* arch (or empty string) */
 	char		*desc; /* description (from Nd) */
@@ -86,12 +64,12 @@ struct	rec {
 };
 
 struct	res {
+	char		*sec; /* manual section */
+	char		*title; /* manual title */
 	char		*arch; /* architecture */
 	char		*desc; /* free-form description */
 	char		*keyword; /* matched keyword */
 	int	 	 types; /* bitmask of field selectors */
-	char		*cat; /* manual section */
-	char		*title; /* manual section */
 	char		*uri; /* formatted uri of file */
 	recno_t		 rec; /* unique id of underlying manual */
 	/*
@@ -111,26 +89,22 @@ struct	state {
 	const char	 *idxf; /* index name */
 };
 
-static	const char * const sorts[SORT__MAX] = {
-	"cat", /* SORT_CAT */
-	"title", /* SORT_TITLE */
-};
-
-static	const struct type types[] = {
-	{ TYPE_NAME, "name" },
-	{ TYPE_FUNCTION, "func" },
-	{ TYPE_UTILITY, "utility" },
-	{ TYPE_INCLUDES, "incl" },
-	{ TYPE_VARIABLE, "var" },
-	{ TYPE_STANDARD, "stand" },
-	{ TYPE_AUTHOR, "auth" },
-	{ TYPE_CONFIG, "conf" },
-	{ TYPE_DESC, "desc" },
-	{ TYPE_XREF, "xref" },
-	{ TYPE_PATH, "path" },
-	{ TYPE_ENV, "env" },
-	{ TYPE_ERR, "err" },
-	{ INT_MAX, "all" },
+static	const struct type typemap[] = {
+	{ TYPE_An, "An" },
+	{ TYPE_Cd, "Cd" },
+	{ TYPE_Er, "Er" },
+	{ TYPE_Ev, "Ev" },
+	{ TYPE_Fn, "Fn" },
+	{ TYPE_Fn, "Fo" },
+	{ TYPE_In, "In" },
+	{ TYPE_Nd, "Nd" },
+	{ TYPE_Nm, "Nm" },
+	{ TYPE_Pa, "Pa" },
+	{ TYPE_St, "St" },
+	{ TYPE_Va, "Va" },
+	{ TYPE_Va, "Vt" },
+	{ TYPE_Xr, "Xr" },
+	{ INT_MAX, "any" },
 	{ 0, NULL }
 };
 
@@ -138,7 +112,7 @@ static	void	 buf_alloc(char **, size_t *, size_t);
 static	void	 buf_dup(struct mchars *, char **, const char *);
 static	void	 buf_redup(struct mchars *, char **, 
 			size_t *, const char *);
-static	int	 sort_cat(const void *, const void *);
+static	int	 sort_sec(const void *, const void *);
 static	int	 sort_title(const void *, const void *);
 static	int	 state_getrecord(struct state *, 
 			recno_t, struct rec *);
@@ -153,10 +127,9 @@ int
 apropos(int argc, char *argv[])
 {
 	BTREEINFO	 info;
-	int		 ch, i, rc;
-	const char	*dbf, *idxf;
+	int		 ch, rc;
 	struct state	 state;
-	char		*q, *v;
+	char		*q;
 	struct opts	 opts;
 	extern int	 optind;
 	extern char	*optarg;
@@ -164,8 +137,8 @@ apropos(int argc, char *argv[])
 	memset(&opts, 0, sizeof(struct opts));
 	memset(&state, 0, sizeof(struct state));
 
-	dbf = "mandoc.db";
-	idxf = "mandoc.index";
+	state.dbf = MANDOC_DB;
+	state.idxf = MANDOC_IDX;
 	q = NULL;
 	rc = EXIT_FAILURE;
 
@@ -175,56 +148,20 @@ apropos(int argc, char *argv[])
 	else
 		++progname;
 
-	opts.match = MATCH_SUBSTR;
-
-	while (-1 != (ch = getopt(argc, argv, "a:c:eIrs:t:"))) 
+	while (-1 != (ch = getopt(argc, argv, "INS:s:"))) 
 		switch (ch) {
-		case ('a'):
-			opts.arch = optarg;
-			break;
-		case ('c'):
-			opts.cat = optarg;
-			break;
-		case ('e'):
-			opts.match = MATCH_EXACT;
-			break;
 		case ('I'):
 			opts.insens = 1;
 			break;
-		case ('r'):
-			opts.match = MATCH_REGEX;
+		case ('N'):
+			opts.numerical = 1;
+			break;
+		case ('S'):
+			opts.arch = optarg;
 			break;
 		case ('s'):
-			for (i = 0; i < SORT__MAX; i++) {
-				if (strcmp(optarg, sorts[i])) 
-					continue;
-				opts.sort = (enum sort)i;
-				break;
-			}
-
-			if (i < SORT__MAX)
-				break;
-
-			fprintf(stderr, "%s: Bad sort\n", optarg);
-			return(EXIT_FAILURE);
-		case ('t'):
-			while (NULL != (v = strsep(&optarg, ","))) {
-				if ('\0' == *v)
-					continue;
-				for (i = 0; types[i].mask; i++) {
-					if (strcmp(types[i].name, v))
-						continue;
-					break;
-				}
-				if (0 == types[i].mask)
-					break;
-				opts.types |= types[i].mask;
-			}
-			if (NULL == v)
-				break;
-			
-			fprintf(stderr, "%s: Bad type\n", v);
-			return(EXIT_FAILURE);
+			opts.sec = optarg;
+			break;
 		default:
 			usage();
 			return(EXIT_FAILURE);
@@ -239,9 +176,6 @@ apropos(int argc, char *argv[])
 	} else
 		q = *argv;
 
-	if (0 == opts.types)
-		opts.types = TYPE_NAME | TYPE_DESC;
-
 	/*
 	 * Configure databases.
 	 * The keyword database is a btree that allows for duplicate
@@ -252,15 +186,15 @@ apropos(int argc, char *argv[])
 	memset(&info, 0, sizeof(BTREEINFO));
 	info.flags = R_DUP;
 
-	state.db = dbopen(dbf, O_RDONLY, 0, DB_BTREE, &info);
+	state.db = dbopen(state.dbf, O_RDONLY, 0, DB_BTREE, &info);
 	if (NULL == state.db) {
-		perror(dbf);
+		perror(state.dbf);
 		goto out;
 	}
 
-	state.idx = dbopen(idxf, O_RDONLY, 0, DB_RECNO, NULL);
+	state.idx = dbopen(state.idxf, O_RDONLY, 0, DB_RECNO, NULL);
 	if (NULL == state.idx) {
-		perror(idxf);
+		perror(state.idxf);
 		goto out;
 	}
 
@@ -280,9 +214,9 @@ out:
 static int
 state_search(struct state *p, const struct opts *opts, char *q)
 {
-	int		 leaf, root, len, ch, dflag, rc;
+	int		 i, leaf, root, len, ch, dflag, rc, types;
 	struct mchars	*mc;
-	char		*buf;
+	char		*buf, *qkey, *qval;
 	size_t		 bufsz;
 	recno_t		 rec;
 	uint32_t	 fl;
@@ -292,6 +226,7 @@ state_search(struct state *p, const struct opts *opts, char *q)
 	regex_t		*regp;
 	char		 filebuf[10];
 	struct rec	 record;
+	enum match	 match_method;
 
 	rc = 0;
 	root = leaf = -1;
@@ -302,29 +237,50 @@ state_search(struct state *p, const struct opts *opts, char *q)
 	regp = NULL;
 
 	/*
+	 * Determine the search types.
+	 */
+
+	types = 0;
+	if (NULL == (qval = strchr(q, '=')))
+		qval = q;
+	else {
+		*qval++ = '\0';
+		while (NULL != (qkey = strsep(&q, ","))) {
+			i = 0;
+			while (typemap[i].mask &&
+			    strcmp(typemap[i].name, qkey))
+				i++;
+			types |= typemap[i].mask;
+		}
+	}
+	if (0 == types)
+		types = TYPE_Nm | TYPE_Nd;
+
+	/*
 	 * Configure how we scan through results to see if we match:
 	 * whether by regexp or exact matches.
 	 */
 
-	switch (opts->match) {
-	case (MATCH_REGEX):
+	switch (*qval) {
+	case ('~'):
+		match_method = MATCH_REGEX;
 		ch = REG_EXTENDED | REG_NOSUB | 
 			(opts->insens ? REG_ICASE : 0);
-
-		if (0 != regcomp(&reg, q, ch)) {
-			fprintf(stderr, "%s: Bad pattern\n", q);
+		if (0 != regcomp(&reg, ++qval, ch)) {
+			fprintf(stderr, "%s: Bad pattern\n", qval);
 			return(0);
 		}
-
 		regp = &reg;
 		dflag = R_FIRST;
 		break;
-	case (MATCH_EXACT):
-		key.data = q;
-		key.size = strlen(q) + 1;
+	case ('='):
+		match_method = MATCH_EXACT;
+		key.data = ++qval;
+		key.size = strlen(qval) + 1;
 		dflag = R_CURSOR;
 		break;
 	default:
+		match_method = MATCH_SUBSTR;
 		dflag = R_FIRST;
 		break;
 	}
@@ -357,24 +313,24 @@ state_search(struct state *p, const struct opts *opts, char *q)
 
 		fl = *(uint32_t *)val.data;
 
-		if ( ! (fl & opts->types))
+		if ( ! (fl & types))
 			continue;
 
-		switch (opts->match) {
+		switch (match_method) {
 		case (MATCH_REGEX):
 			if (regexec(regp, buf, 0, NULL, 0))
 				continue;
 			break;
 		case (MATCH_EXACT):
-			if (opts->insens && strcasecmp(buf, q))
+			if (opts->insens && strcasecmp(buf, qval))
 				goto send;
-			if ( ! opts->insens && strcmp(buf, q))
+			if ( ! opts->insens && strcmp(buf, qval))
 				goto send;
 			break;
 		default:
-			if (opts->insens && NULL == strcasestr(buf, q))
+			if (opts->insens && NULL == strcasestr(buf, qval))
 				continue;
-			if ( ! opts->insens && NULL == strstr(buf, q))
+			if ( ! opts->insens && NULL == strstr(buf, qval))
 				continue;
 			break;
 		}
@@ -391,7 +347,7 @@ state_search(struct state *p, const struct opts *opts, char *q)
 
 		/* If we're in a different section, skip... */
 
-		if (opts->cat && strcasecmp(opts->cat, record.cat))
+		if (opts->sec && strcasecmp(opts->sec, record.sec))
 			continue;
 		if (opts->arch && strcasecmp(opts->arch, record.arch))
 			continue;
@@ -431,7 +387,7 @@ state_search(struct state *p, const struct opts *opts, char *q)
 
 		buf_dup(mc, &res[len].keyword, buf);
 		buf_dup(mc, &res[len].uri, filebuf);
-		buf_dup(mc, &res[len].cat, record.cat);
+		buf_dup(mc, &res[len].sec, record.sec);
 		buf_dup(mc, &res[len].arch, record.arch);
 		buf_dup(mc, &res[len].title, record.title);
 		buf_dup(mc, &res[len].desc, record.desc);
@@ -454,8 +410,8 @@ state_search(struct state *p, const struct opts *opts, char *q)
 send:
 	/* Sort our results. */
 
-	if (SORT_CAT == opts->sort)
-		qsort(res, len, sizeof(struct res), sort_cat);
+	if (opts->numerical)
+		qsort(res, len, sizeof(struct res), sort_sec);
 	else
 		qsort(res, len, sizeof(struct res), sort_title);
 
@@ -465,7 +421,7 @@ out:
 	for (len-- ; len >= 0; len--) {
 		free(res[len].keyword);
 		free(res[len].title);
-		free(res[len].cat);
+		free(res[len].sec);
 		free(res[len].arch);
 		free(res[len].desc);
 		free(res[len].uri);
@@ -589,7 +545,7 @@ state_output(const struct res *res, int sz)
 
 	for (i = 0; i < sz; i++)
 		printf("%s(%s%s%s) - %s\n", res[i].title, 
-				res[i].cat, 
+				res[i].sec, 
 				*res[i].arch ? "/" : "",
 				*res[i].arch ? res[i].arch : "",
 				res[i].desc);
@@ -600,11 +556,9 @@ usage(void)
 {
 
 	fprintf(stderr, "usage: %s "
-			"[-eIr] "
-			"[-a arch] "
-			"[-c cat] "
-			"[-s sort] "
-			"[-t type[,...]] "
+			"[-IN] "
+			"[-S subsection] "
+			"[-s section] "
 			"key\n", progname);
 }
 
@@ -629,8 +583,8 @@ state_getrecord(struct state *p, recno_t rec, struct rec *rp)
 	if ((sz = strlen(rp->file) + 1) >= val.size)
 		goto err;
 
-	rp->cat = (char *)val.data + (int)sz;
-	if ((sz += strlen(rp->cat) + 1) >= val.size)
+	rp->sec = (char *)val.data + (int)sz;
+	if ((sz += strlen(rp->sec) + 1) >= val.size)
 		goto err;
 
 	rp->title = (char *)val.data + (int)sz;
@@ -658,12 +612,12 @@ sort_title(const void *p1, const void *p2)
 }
 
 static int
-sort_cat(const void *p1, const void *p2)
+sort_sec(const void *p1, const void *p2)
 {
 	int		 rc;
 
-	rc = strcmp(((const struct res *)p1)->cat,
-			((const struct res *)p2)->cat);
+	rc = strcmp(((const struct res *)p1)->sec,
+			((const struct res *)p2)->sec);
 
 	return(0 == rc ? sort_title(p1, p2) : rc);
 }
--- mandocdb.c.orig
+++ mandocdb.c
@@ -29,28 +29,11 @@
 #include "man.h"
 #include "mdoc.h"
 #include "mandoc.h"
+#include "mandocdb.h"
 
-#define	MANDOC_DB	 "mandoc.db"
-#define	MANDOC_IDX	 "mandoc.index"
 #define	MANDOC_BUFSZ	  BUFSIZ
 #define	MANDOC_SLOP	  1024
 
-/* Bit-fields.  See mandocdb.8. */
-
-#define TYPE_NAME	  0x01
-#define TYPE_FUNCTION	  0x02
-#define TYPE_UTILITY	  0x04
-#define TYPE_INCLUDES	  0x08
-#define TYPE_VARIABLE	  0x10
-#define TYPE_STANDARD	  0x20
-#define TYPE_AUTHOR	  0x40
-#define TYPE_CONFIG	  0x80
-#define TYPE_DESC	  0x100
-#define TYPE_XREF	  0x200
-#define TYPE_PATH	  0x400
-#define TYPE_ENV	  0x800
-#define TYPE_ERR	  0x1000
-
 /* Tiny list for files.  No need to bring in QUEUE. */
 
 struct	of {
@@ -719,7 +702,7 @@ pmdoc_An(MDOC_ARGS)
 		return;
 
 	buf_appendmdoc(buf, n->child, 0);
-	hash_put(hash, buf, TYPE_AUTHOR);
+	hash_put(hash, buf, TYPE_An);
 }
 
 static void
@@ -780,7 +763,7 @@ pmdoc_Fd(MDOC_ARGS)
 	buf_appendb(buf, start, (size_t)(end - start + 1));
 	buf_appendb(buf, "", 1);
 
-	hash_put(hash, buf, TYPE_INCLUDES);
+	hash_put(hash, buf, TYPE_In);
 }
 
 /* ARGSUSED */
@@ -792,7 +775,7 @@ pmdoc_Cd(MDOC_ARGS)
 		return;
 
 	buf_appendmdoc(buf, n->child, 0);
-	hash_put(hash, buf, TYPE_CONFIG);
+	hash_put(hash, buf, TYPE_Cd);
 }
 
 /* ARGSUSED */
@@ -806,7 +789,7 @@ pmdoc_In(MDOC_ARGS)
 		return;
 
 	buf_append(buf, n->child->string);
-	hash_put(hash, buf, TYPE_INCLUDES);
+	hash_put(hash, buf, TYPE_In);
 }
 
 /* ARGSUSED */
@@ -832,7 +815,7 @@ pmdoc_Fn(MDOC_ARGS)
 		cp++;
 
 	buf_append(buf, cp);
-	hash_put(hash, buf, TYPE_FUNCTION);
+	hash_put(hash, buf, TYPE_Fn);
 }
 
 /* ARGSUSED */
@@ -846,7 +829,7 @@ pmdoc_St(MDOC_ARGS)
 		return;
 
 	buf_append(buf, n->child->string);
-	hash_put(hash, buf, TYPE_STANDARD);
+	hash_put(hash, buf, TYPE_St);
 }
 
 /* ARGSUSED */
@@ -865,7 +848,7 @@ pmdoc_Xr(MDOC_ARGS)
 	} else
 		buf_appendb(buf, ".", 2);
 
-	hash_put(hash, buf, TYPE_XREF);
+	hash_put(hash, buf, TYPE_Xr);
 }
 
 /* ARGSUSED */
@@ -902,7 +885,7 @@ pmdoc_Vt(MDOC_ARGS)
 
 	buf_appendb(buf, start, sz);
 	buf_appendb(buf, "", 1);
-	hash_put(hash, buf, TYPE_VARIABLE);
+	hash_put(hash, buf, TYPE_Va);
 }
 
 /* ARGSUSED */
@@ -916,7 +899,7 @@ pmdoc_Fo(MDOC_ARGS)
 		return;
 
 	buf_append(buf, n->child->string);
-	hash_put(hash, buf, TYPE_FUNCTION);
+	hash_put(hash, buf, TYPE_Fn);
 }
 
 
@@ -931,7 +914,7 @@ pmdoc_Nd(MDOC_ARGS)
 	buf_appendmdoc(dbuf, n->child, 1);
 	buf_appendmdoc(buf, n->child, 0);
 
-	hash_put(hash, buf, TYPE_DESC);
+	hash_put(hash, buf, TYPE_Nd);
 }
 
 /* ARGSUSED */
@@ -943,7 +926,7 @@ pmdoc_Er(MDOC_ARGS)
 		return;
 	
 	buf_appendmdoc(buf, n->child, 0);
-	hash_put(hash, buf, TYPE_ERR);
+	hash_put(hash, buf, TYPE_Er);
 }
 
 /* ARGSUSED */
@@ -955,7 +938,7 @@ pmdoc_Ev(MDOC_ARGS)
 		return;
 	
 	buf_appendmdoc(buf, n->child, 0);
-	hash_put(hash, buf, TYPE_ENV);
+	hash_put(hash, buf, TYPE_Ev);
 }
 
 /* ARGSUSED */
@@ -967,7 +950,7 @@ pmdoc_Pa(MDOC_ARGS)
 		return;
 	
 	buf_appendmdoc(buf, n->child, 0);
-	hash_put(hash, buf, TYPE_PATH);
+	hash_put(hash, buf, TYPE_Pa);
 }
 
 /* ARGSUSED */
@@ -977,7 +960,7 @@ pmdoc_Nm(MDOC_ARGS)
 	
 	if (SEC_NAME == n->sec) {
 		buf_appendmdoc(buf, n->child, 0);
-		hash_put(hash, buf, TYPE_NAME);
+		hash_put(hash, buf, TYPE_Nm);
 		return;
 	} else if (SEC_SYNOPSIS != n->sec || MDOC_HEAD != n->type)
 		return;
@@ -986,7 +969,7 @@ pmdoc_Nm(MDOC_ARGS)
 		buf_append(buf, m->name);
 
 	buf_appendmdoc(buf, n->child, 0);
-	hash_put(hash, buf, TYPE_UTILITY);
+	hash_put(hash, buf, TYPE_Nm);
 }
 
 static void
@@ -1116,7 +1099,7 @@ pman_node(MAN_ARGS)
 				buf_appendb(buf, start, sz);
 				buf_appendb(buf, "", 1);
 
-				hash_put(hash, buf, TYPE_NAME);
+				hash_put(hash, buf, TYPE_Nm);
 
 				if (' ' == start[(int)sz]) {
 					start += (int)sz + 1;
@@ -1155,7 +1138,7 @@ pman_node(MAN_ARGS)
 			buf_appendb(dbuf, start, sz);
 			buf_appendb(buf, start, sz);
 
-			hash_put(hash, buf, TYPE_DESC);
+			hash_put(hash, buf, TYPE_Nd);
 		}
 	}
 
--- /dev/null
+++ mandocdb.h
@@ -0,0 +1,32 @@
+/*      $Id$ */
+/*
+ * Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#define	MANDOC_DB	"mandoc.db"
+#define	MANDOC_IDX	"mandoc.index"
+
+#define	TYPE_An		0x01
+#define	TYPE_Cd		0x02
+#define	TYPE_Er		0x04
+#define	TYPE_Ev		0x08
+#define	TYPE_Fn		0x10
+#define	TYPE_In		0x20
+#define	TYPE_Nd		0x40
+#define	TYPE_Nm		0x100
+#define	TYPE_Pa		0x200
+#define	TYPE_St		0x400
+#define	TYPE_Va		0x1000
+#define	TYPE_Xr		0x2000
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: overhaul apropos(1) interface
  2011-11-09  1:30 overhaul apropos(1) interface Ingo Schwarze
@ 2011-11-09 10:19 ` Kristaps Dzonsons
  2011-11-12 23:54 ` Ingo Schwarze
  1 sibling, 0 replies; 6+ messages in thread
From: Kristaps Dzonsons @ 2011-11-09 10:19 UTC (permalink / raw)
  To: tech

On 09/11/2011 02:30, Ingo Schwarze wrote:
> Hi,
>
> the interface of the apropos utility in the mandoc package
> is very different from traditional apropos, and the -t option
> hinders logical extensions, as discussed previously.
> So, here is a first step to fix this, doing various things:
>
> INTERFACE CHANGES:
>   * drop the -s (sort) option, it clashes with -s (section)
>   * introduce -N (numerical sort) instead
>   * drop the -c (cat) option, it clashes with -c (copy to stdout)
>   * rename it to the usual -s (section)
>   * drop the -a (arch) option, it clashes with -a (all)
>   * rename it to the usual -S (subsection)
>   * drop the -t (search type), it clashes with -t (use troff)
>   * use a macro= syntax instead, as discussed
>   * drop -e (exact) and -r (regex), it belongs to each query phrase
>   * use a macro== and macro=~ syntax instead
>
> The new syntax is:
>
>    apropos [-IN] [-s section] [-S arch] query_phrase [...]
>
>    query_phrase ::= [[macro[,...]](=|==|=~)]query_value
>
> Multiple query phrases are not yet implemented, but they are to
> be or'ed in the future.
>
> The operators are:
>
>   =  substring match
>   == exact match
>   =~ regex match
>
> If no operator is given, = is assumed.
> Multiple macros can be given, joined with commas, they are or'ed.
> If no macro is given, "Nm,Nd" is assumed.
>
> STRUCTURAL CLEANUP:
>   * collect the common defines for mandocdb and apropos in mandocdb.h
>   * name the TYPE_ bitfield constants according to the new interface
>   * drop types and match from the global struct opts
>     because these are local to each query phrase
>   * drop the enum sort; for an alternative, a bool is sufficient
>   * drop the local dbf and idxf strings, use those in the global struct
>
> This is lightly tested, but given that we are not yet in production
> and lots of heavy changes are needed in after this, i consider
> light testing sufficient.
>
> OK?
>
> Of course, when committing, i'm going to change the manual, too.
>
> And by the way, this patch is +144 -175 :-).

Unfortunately, this comes in after an overhaul I checked in last night. 
  However, that overhaul overwhelmingly concerns the search function 
itself, not the interface, so there are few conflicts -- just name 
changes.  I'll piece this into the new version soon.

K.

--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: overhaul apropos(1) interface
  2011-11-09  1:30 overhaul apropos(1) interface Ingo Schwarze
  2011-11-09 10:19 ` Kristaps Dzonsons
@ 2011-11-12 23:54 ` Ingo Schwarze
  2011-11-13  0:07   ` Kristaps Dzonsons
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Schwarze @ 2011-11-12 23:54 UTC (permalink / raw)
  To: tech

Hi,

here is a new version of my overhaul after Kristaps' changes.
The patch given below is not likely to apply cleanly to anybody's
repo, it requires the apropos_db.* rename first.  It is meant
for quick review; if you agree with the direction, i'll figure
out how to get this in cleanly.

So, here is what the overhaul does:

 * Remove -I from the main program.
   Logically, that's not a global option,
   but a per-expression thingy.
 * Sort the arguments of exprcomp.
   All the world uses (argc, argv),
   why should we suddenly go for (argv, argc)?
 * I agree with dropping MATCH_EXACT:
   That's rarely needed and can easily be constructed
   as MATCH_REGEX with ^...$.
 * For the same reason, let's drop case-sensitive MATCH_STR:
   It's rarely needed and can easily be constructed with MATCH_REGEX.
 * Keeping MATCH_STR seems OK: It is needed as the default
   when the type is unspecified.
 * The MATCH_REGEXCASE enum item is unused already now,
   since the iflag is compiled into the regex object.
   So drop that one as well.
 * Only two enum items remain; that's better and more easily
   expressed by a single boolean integer (expr.regex).
 * The exprexec() function requires a mask argument,
   or all search keys will act as "any" (important bug fix
   along the way!).

The most massive changes are in exprcomp().
I strongly dislike the proposed interface.
It is cumbersome and requires too much typing.
The -eq and -re arguments are exceedingly ugly, and the
implementation is hard to get right - if i remember
correctly, not all cases work properly right now.

Thus, i have simplified my interface proposal to just this:

  apropos [-s section] [-S arch] query_phrase [...]

  query_phrase ::= [[macro[,...]](=|~)]query_value

So, the value to be searched for can optionally
be preceded by '=' (for string search) or '~' (for regex search),
and that can optionally be preceded by one or more macro names,
joined by commas.  Including "i" among the macros switches
regex searches to case-insensitive and has no effect on
string searches.

OK?
  Ingo


--- apropos.c.orig
+++ apropos.c
@@ -33,7 +33,7 @@ static	char	*progname;
 int
 apropos(int argc, char *argv[])
 {
-	int		 ch, cs;
+	int		 ch;
 	struct opts	 opts;
 	struct expr	*e;
 	extern int	 optind;
@@ -47,9 +47,7 @@ apropos(int argc, char *argv[])
 	else
 		++progname;
 
-	cs = 0;
-
-	while (-1 != (ch = getopt(argc, argv, "S:s:I"))) 
+	while (-1 != (ch = getopt(argc, argv, "S:s:"))) 
 		switch (ch) {
 		case ('S'):
 			opts.arch = optarg;
@@ -57,9 +55,6 @@ apropos(int argc, char *argv[])
 		case ('s'):
 			opts.cat = optarg;
 			break;
-		case ('I'):
-			cs = 1;
-			break;
 		default:
 			usage();
 			return(EXIT_FAILURE);
@@ -71,7 +66,7 @@ apropos(int argc, char *argv[])
 	if (0 == argc) 
 		return(EXIT_SUCCESS);
 
-	if (NULL == (e = exprcomp(cs, argv, argc))) {
+	if (NULL == (e = exprcomp(argc, argv))) {
 		fprintf(stderr, "Bad expression\n");
 		return(EXIT_FAILURE);
 	}
--- apropos_db.c.orig
+++ apropos_db.c
@@ -1,6 +1,7 @@
 /*	$Id: db.c,v 1.1 2011/11/09 01:24:23 kristaps Exp $ */
 /*
  * Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
+ * Copyright (c) 2011 Ingo Schwarze <schwarze@openbsd.org>
  *
  * Permission to use, copy, modify, and distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
@@ -31,15 +32,8 @@
 #include "apropos_db.h"
 #include "mandoc.h"
 
-enum	match {
-	MATCH_REGEX,
-	MATCH_REGEXCASE,
-	MATCH_STR,
-	MATCH_STRCASE
-};
-
 struct	expr {
-	enum match	 match;
+	int		 regex;
 	int	 	 mask;
 	char		*v;
 	regex_t	 	 re;
@@ -71,7 +65,7 @@ static	const struct type types[] = {
 
 static	DB	*btree_open(void);
 static	int	 btree_read(const DBT *, const struct mchars *, char **);
-static	int	 exprexec(const struct expr *, char *);
+static	int	 exprexec(const struct expr *, char *, int);
 static	DB	*index_open(void);
 static	int	 index_read(const DBT *, const DBT *, 
 			const struct mchars *, struct rec *);
@@ -368,7 +362,7 @@ apropos_search(const struct opts *opts, const struct expr *expr,
 		if ( ! btree_read(&key, mc, &buf))
 			break;
 
-		if ( ! exprexec(expr, buf))
+		if ( ! exprexec(expr, buf, *(int *)val.data))
 			continue;
 
 		memcpy(&rec, val.data + 4, sizeof(recno_t));
@@ -460,55 +454,55 @@ out:
 }
 
 struct expr *
-exprcomp(int cs, char *argv[], int argc)
+exprcomp(int argc, char *argv[])
 {
 	struct expr	*p;
 	struct expr	 e;
-	int		 i, pos, ch;
+	char		*key;
+	int		 i, icase;
 
-	pos = 0;
-
-	if (pos > argc)
+	if (0 >= argc)
 		return(NULL);
 
-	for (i = 0; 0 != types[i].mask; i++)
-		if (0 == strcmp(types[i].name, argv[pos]))
-			break;
-
-	if (0 == (e.mask = types[i].mask))
-		return(NULL);
-
-	if (++pos > argc--)
-		return(NULL);
+	/*
+	 * Choose regex or substring match.
+	 */
 
-	if ('-' != *argv[pos]) 
-		e.match = cs ? MATCH_STRCASE : MATCH_STR;
-	else if (0 == strcmp("-eq", argv[pos]))
-		e.match = cs ? MATCH_STRCASE : MATCH_STR;
-	else if (0 == strcmp("-ieq", argv[pos]))
-		e.match = MATCH_STRCASE;
-	else if (0 == strcmp("-re", argv[pos]))
-		e.match = cs ? MATCH_REGEXCASE : MATCH_REGEX;
-	else if (0 == strcmp("-ire", argv[pos]))
-		e.match = MATCH_REGEXCASE;
-	else
-		return(NULL);
+	if (NULL == (e.v = strpbrk(*argv, "=~"))) {
+		e.regex = 0;
+		e.v = *argv;
+	} else {
+		e.regex = '~' == *e.v;
+		*e.v++ = '\0';
+	}
 
-	if ('-' == *argv[pos])
-		pos++;
+	/*
+	 * Determine the record types to search for.
+	 */
+
+	icase = 0;
+	e.mask = 0;
+	if (*argv < e.v) {
+		while (NULL != (key = strsep(argv, ","))) {
+			if ('i' == key[0] && '\0' == key[1]) {
+				icase = REG_ICASE;
+				continue;
+			}
+			i = 0;
+			while (types[i].mask &&
+			    strcmp(types[i].name, key))
+				i++;
+			e.mask |= types[i].mask;
+		}
+	}
+	if (0 == e.mask)
+		e.mask = TYPE_Nm | TYPE_Nd;
 
-	if (pos > argc--)
+	if (e.regex &&
+	    regcomp(&e.re, e.v, REG_EXTENDED | REG_NOSUB | icase))
 		return(NULL);
 
-	e.v = mandoc_strdup(argv[pos]);
-
-	if (MATCH_REGEX == e.match || MATCH_REGEXCASE == e.match) {
-		ch = REG_EXTENDED | REG_NOSUB;
-		if (MATCH_REGEXCASE == e.match)
-			ch |= REG_ICASE;
-		if (regcomp(&e.re, e.v, ch))
-			return(NULL);
-	}
+	e.v = mandoc_strdup(e.v);
 
 	p = mandoc_calloc(1, sizeof(struct expr));
 	memcpy(p, &e, sizeof(struct expr));
@@ -522,7 +516,7 @@ exprfree(struct expr *p)
 	if (NULL == p)
 		return;
 
-	if (MATCH_REGEX == p->match)
+	if (p->regex)
 		regfree(&p->re);
 
 	free(p->v);
@@ -530,14 +524,14 @@ exprfree(struct expr *p)
 }
 
 static int
-exprexec(const struct expr *p, char *cp)
+exprexec(const struct expr *p, char *cp, int mask)
 {
 
-	if (MATCH_STR == p->match)
-		return(0 == strcmp(p->v, cp));
-	else if (MATCH_STRCASE == p->match)
-		return(0 == strcasecmp(p->v, cp));
+	if ( ! (mask & p->mask))
+		return(0);
 
-	assert(MATCH_REGEX == p->match);
-	return(0 == regexec(&p->re, cp, 0, NULL, 0));
+	if (p->regex)
+		return(0 == regexec(&p->re, cp, 0, NULL, 0));
+	else
+		return(NULL != strcasestr(cp, p->v));
 }
--- apropos_db.h.orig
+++ apropos_db.h
@@ -49,7 +49,7 @@ void	 	 apropos_search(const struct opts *,
 			const struct expr *, void *, 
 			void (*)(struct rec *, size_t, void *));
 
-struct	expr	*exprcomp(int, char *[], int);
+struct	expr	*exprcomp(int, char *[]);
 void		 exprfree(struct expr *);
 
 __END_DECLS
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: overhaul apropos(1) interface
  2011-11-12 23:54 ` Ingo Schwarze
@ 2011-11-13  0:07   ` Kristaps Dzonsons
  2011-11-13  0:39     ` Ingo Schwarze
  0 siblings, 1 reply; 6+ messages in thread
From: Kristaps Dzonsons @ 2011-11-13  0:07 UTC (permalink / raw)
  To: tech; +Cc: Ingo Schwarze

On 13/11/2011 00:54, Ingo Schwarze wrote:
> Hi,
>
> here is a new version of my overhaul after Kristaps' changes.
> The patch given below is not likely to apply cleanly to anybody's
> repo, it requires the apropos_db.* rename first.  It is meant
> for quick review; if you agree with the direction, i'll figure
> out how to get this in cleanly.
>
> So, here is what the overhaul does:
>
>   * Remove -I from the main program.
>     Logically, that's not a global option,
>     but a per-expression thingy.
>   * Sort the arguments of exprcomp.
>     All the world uses (argc, argv),
>     why should we suddenly go for (argv, argc)?
>   * I agree with dropping MATCH_EXACT:
>     That's rarely needed and can easily be constructed
>     as MATCH_REGEX with ^...$.
>   * For the same reason, let's drop case-sensitive MATCH_STR:
>     It's rarely needed and can easily be constructed with MATCH_REGEX.
>   * Keeping MATCH_STR seems OK: It is needed as the default
>     when the type is unspecified.
>   * The MATCH_REGEXCASE enum item is unused already now,
>     since the iflag is compiled into the regex object.
>     So drop that one as well.
>   * Only two enum items remain; that's better and more easily
>     expressed by a single boolean integer (expr.regex).
>   * The exprexec() function requires a mask argument,
>     or all search keys will act as "any" (important bug fix
>     along the way!).
>
> The most massive changes are in exprcomp().
> I strongly dislike the proposed interface.
> It is cumbersome and requires too much typing.
> The -eq and -re arguments are exceedingly ugly, and the
> implementation is hard to get right - if i remember
> correctly, not all cases work properly right now.
>
> Thus, i have simplified my interface proposal to just this:
>
>    apropos [-s section] [-S arch] query_phrase [...]
>
>    query_phrase ::= [[macro[,...]](=|~)]query_value
>
> So, the value to be searched for can optionally
> be preceded by '=' (for string search) or '~' (for regex search),
> and that can optionally be preceded by one or more macro names,
> joined by commas.  Including "i" among the macros switches
> regex searches to case-insensitive and has no effect on
> string searches.

Ingo,

I'm working on the final parts of this check-in, so please hold off on 
this file!  It's by no means finished; as mentioned in the source 
checkin, I'll post to tech@ when the implementation is feature complete. 
  Consider, to gauge the complexity:

  apropos Ar == foo -a Ar =~ baz

(I don't care at all about the connecting syntax, ==, etc., so long as 
it's regular.  Your notation is not extensible to case-insensitive 
matching: can one extend to include these?)

Anyway, the "AND" is tricky: each file's evaluation state must be 
retained for all keyword entries (which have no guarantee on ordering) 
then post-operated.  Thus, I'm maintaining evaluation trees during the 
parse.  The goal is to make the trivial case -- "Ar foo", say -- as fast 
as in the simple implementation before.  (I guarantee the trivial case 
by the partial evaluation.)

Anyway, I anticipate a few days til I get the final checkins.  The code 
is not tricky in implementation (not much, anyway) and guarantees 
arbitrary expressions with well-defined compute time.

I'm also not at all married to the filenames, but let's hold off for a 
bit more as I get these chunks into place.  apropos_db.c is fine by me, 
for the record.

Thanks again,

Kristaps
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: overhaul apropos(1) interface
  2011-11-13  0:07   ` Kristaps Dzonsons
@ 2011-11-13  0:39     ` Ingo Schwarze
  2011-11-13  9:23       ` Kristaps Dzonsons
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Schwarze @ 2011-11-13  0:39 UTC (permalink / raw)
  To: tech

Hi Kristaps,

Kristaps Dzonsons wrote on Sun, Nov 13, 2011 at 01:07:20AM +0100:

> I'm working on the final parts of this check-in, so please hold off
> on this file!

I cannot; i need to get this working in the OpenBSD tree ASAP,
i.e. tomorrow.  The logical and and or are not critical for me
right now, i can do without those for some time, but i need a
working mandocdb-apropos toolchain in-tree to build upon.

Ports hackathons are short, and i want to get on as fast as
possible with the following critical path:

 1) get mandocdb working on single directories
    to produce databases in the macro-format,
    i.e. with types like TYPE_An, TYPE_Cd
 2) get apropos to work with that, such that the
    databases can be used
 3) get rid of the most glaring bugs
    and complete backward compatibility
 4) integrate the man.conf parser into mandocdb
    such that mandocdb can walk the MANPATH
    just like makewhatis(8) does
 5) integrate the man.conf parser into apropos such that
    mandoc-apropos gets useable as a real apropos replacement
 6) rudimentary formatted page parsing in mandocdb
 7) integrate mandocdb into pkg_add such that it gets
    useable as a real makewhatis replacement

I realized in Ljubljana that this is more work than i thought
before, and i can't hold off, especially not this week, or i will
surely miss the 5.1 release.

Maybe you can hold off and bring in the and/or stuff
afterwards, i.e. after the patches i have posted so far?
You need not wait long, i would *gladly* push all my stuff
tomorrow in the morning, and then you have clean earth to
till and to adjust your work to it.

> It's by no means finished; as mentioned in the source checkin,
> I'll post to tech@ when the implementation is feature complete.

Sure, no doubt, but blocking system integration at a critical
time to implement optional, fancy features is a bad idea!

> Consider, to gauge the complexity:
> 
>  apropos Ar == foo -a Ar =~ baz
> 
> (I don't care at all about the connecting syntax, ==, etc., so long
> as it's regular.  Your notation is not extensible to
> case-insensitive matching: can one extend to include these?)

It is.

ischwarze@isnote $ cd /usr/share/man                                           
ischwarze@isnote $ apropos.m Nm~^b.e$
BCE(4) - Broadcom BCM4401 10/100 Ethernet device
BGE(4) - Broadcom BCM57xx/BCM590x 10/100/Gigabit Ethernet device
ischwarze@isnote $ apropos.m Nm~^B.e$ 
ischwarze@isnote $ apropos.m Nm,i~^B.e$
BCE(4) - Broadcom BCM4401 10/100 Ethernet device
BGE(4) - Broadcom BCM57xx/BCM590x 10/100/Gigabit Ethernet device

That's not a mockup, that's what i'm running right now.

However, i propose that substring match always be case-insensitive.
Substring match is a simplification for daily wear and tear
and doesn't need such complexity.  If case-insensitive substring
match drowns you in noise, just use regex matching - done.

> Anyway, the "AND" is tricky: each file's evaluation state must be
> retained for all keyword entries (which have no guarantee on
> ordering) then post-operated.  Thus, I'm maintaining evaluation
> trees during the parse.  The goal is to make the trivial case -- "Ar
> foo", say -- as fast as in the simple implementation before.  (I
> guarantee the trivial case by the partial evaluation.)
> 
> Anyway, I anticipate a few days til I get the final checkins.  The
> code is not tricky in implementation (not much, anyway) and
> guarantees arbitrary expressions with well-defined compute time.
> 
> I'm also not at all married to the filenames, but let's hold off for
> a bit more as I get these chunks into place.  apropos_db.c is fine
> by me, for the record.

Sounds good, except that a hackathon is a bad time to make me wait.

Thanks for your understanding (and your work, above all!),
  Ingo
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: overhaul apropos(1) interface
  2011-11-13  0:39     ` Ingo Schwarze
@ 2011-11-13  9:23       ` Kristaps Dzonsons
  0 siblings, 0 replies; 6+ messages in thread
From: Kristaps Dzonsons @ 2011-11-13  9:23 UTC (permalink / raw)
  To: tech

> I cannot; i need to get this working in the OpenBSD tree ASAP,
> i.e. tomorrow.  The logical and and or are not critical for me
> right now, i can do without those for some time, but i need a
> working mandocdb-apropos toolchain in-tree to build upon.

Ingo,

Oi oi, a hackathon!  In this case, I'm fine with your patch -- I wasn't 
aware of the conditions.  If exprexec() simply does the type-check and 
the strcmp/regexec, then it works for the single-case expression: Ar == 
foo, Ar =~ foo, etc.  This can then be further built up, later. 
Further, if everything's kept within exprcomp() and exprexec(), it's 
nice and neat to upgrade later.

Better yet, as we've agreed upon the tests (==, =~, etc.), the upgrade 
to allow logical statements (and, or, etc.) is backwards-compatible.

So please go ahead with these changes; I'll merge and check my 
improvements (they'll more or less re-write exprcomp() and exprexec(), 
and add structs etc.) afteward.

Is that alright with you?

Thanks,

Kristaps

> Ports hackathons are short, and i want to get on as fast as
> possible with the following critical path:
>
>   1) get mandocdb working on single directories
>      to produce databases in the macro-format,
>      i.e. with types like TYPE_An, TYPE_Cd
>   2) get apropos to work with that, such that the
>      databases can be used
>   3) get rid of the most glaring bugs
>      and complete backward compatibility
>   4) integrate the man.conf parser into mandocdb
>      such that mandocdb can walk the MANPATH
>      just like makewhatis(8) does
>   5) integrate the man.conf parser into apropos such that
>      mandoc-apropos gets useable as a real apropos replacement
>   6) rudimentary formatted page parsing in mandocdb
>   7) integrate mandocdb into pkg_add such that it gets
>      useable as a real makewhatis replacement


>
> I realized in Ljubljana that this is more work than i thought
> before, and i can't hold off, especially not this week, or i will
> surely miss the 5.1 release.
>
> Maybe you can hold off and bring in the and/or stuff
> afterwards, i.e. after the patches i have posted so far?
> You need not wait long, i would *gladly* push all my stuff
> tomorrow in the morning, and then you have clean earth to
> till and to adjust your work to it.
>
>> It's by no means finished; as mentioned in the source checkin,
>> I'll post to tech@ when the implementation is feature complete.
>
> Sure, no doubt, but blocking system integration at a critical
> time to implement optional, fancy features is a bad idea!
>
>> Consider, to gauge the complexity:
>>
>>   apropos Ar == foo -a Ar =~ baz
>>
>> (I don't care at all about the connecting syntax, ==, etc., so long
>> as it's regular.  Your notation is not extensible to
>> case-insensitive matching: can one extend to include these?)
>
> It is.
>
> ischwarze@isnote $ cd /usr/share/man
> ischwarze@isnote $ apropos.m Nm~^b.e$
> BCE(4) - Broadcom BCM4401 10/100 Ethernet device
> BGE(4) - Broadcom BCM57xx/BCM590x 10/100/Gigabit Ethernet device
> ischwarze@isnote $ apropos.m Nm~^B.e$
> ischwarze@isnote $ apropos.m Nm,i~^B.e$
> BCE(4) - Broadcom BCM4401 10/100 Ethernet device
> BGE(4) - Broadcom BCM57xx/BCM590x 10/100/Gigabit Ethernet device
>
> That's not a mockup, that's what i'm running right now.
>
> However, i propose that substring match always be case-insensitive.
> Substring match is a simplification for daily wear and tear
> and doesn't need such complexity.  If case-insensitive substring
> match drowns you in noise, just use regex matching - done.
>
>> Anyway, the "AND" is tricky: each file's evaluation state must be
>> retained for all keyword entries (which have no guarantee on
>> ordering) then post-operated.  Thus, I'm maintaining evaluation
>> trees during the parse.  The goal is to make the trivial case -- "Ar
>> foo", say -- as fast as in the simple implementation before.  (I
>> guarantee the trivial case by the partial evaluation.)
>>
>> Anyway, I anticipate a few days til I get the final checkins.  The
>> code is not tricky in implementation (not much, anyway) and
>> guarantees arbitrary expressions with well-defined compute time.
>>
>> I'm also not at all married to the filenames, but let's hold off for
>> a bit more as I get these chunks into place.  apropos_db.c is fine
>> by me, for the record.
>
> Sounds good, except that a hackathon is a bad time to make me wait.
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-11-13  9:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-09  1:30 overhaul apropos(1) interface Ingo Schwarze
2011-11-09 10:19 ` Kristaps Dzonsons
2011-11-12 23:54 ` Ingo Schwarze
2011-11-13  0:07   ` Kristaps Dzonsons
2011-11-13  0:39     ` Ingo Schwarze
2011-11-13  9:23       ` Kristaps Dzonsons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).