From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp-1.sys.kth.se (smtp-1.sys.kth.se [130.237.32.175])
	by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p2GAsaPG008043
	for <discuss@mdocml.bsd.lv>; Wed, 16 Mar 2011 06:54:37 -0400 (EDT)
Received: from mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91])
	by smtp-1.sys.kth.se (Postfix) with ESMTP id 8D1781563D3
	for <discuss@mdocml.bsd.lv>; Wed, 16 Mar 2011 11:47:24 +0100 (CET)
X-Virus-Scanned: by amavisd-new at kth.se
Received: from smtp-1.sys.kth.se ([130.237.32.175])
	by mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) (amavisd-new, port 10024)
	with LMTP id 7z-MC037haQq for <discuss@mdocml.bsd.lv>;
	Wed, 16 Mar 2011 11:47:21 +0100 (CET)
X-KTH-Auth: kristaps [85.8.61.126]
X-KTH-mail-from: kristaps@bsd.lv
X-KTH-rcpt-to: discuss@mdocml.bsd.lv
Received: from h85-8-61-126.dynamic.se.alltele.net (h85-8-61-126.dynamic.se.alltele.net [85.8.61.126])
	by smtp-1.sys.kth.se (Postfix) with ESMTP id A5A27156402
	for <discuss@mdocml.bsd.lv>; Wed, 16 Mar 2011 11:47:19 +0100 (CET)
Message-ID: <4D809537.7090201@bsd.lv>
Date: Wed, 16 Mar 2011 11:47:19 +0100
From: Kristaps Dzonsons <kristaps@bsd.lv>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.14) Gecko/20110221 Thunderbird/3.1.8
X-Mailinglist: mdocml-discuss
Reply-To: discuss@mdocml.bsd.lv
MIME-Version: 1.0
To: discuss@mdocml.bsd.lv
Subject: [PATCH] Being crazy: -Tindex
Content-Type: multipart/mixed;
 boundary="------------000702040403090407020005"

This is a multi-part message in MIME format.
--------------000702040403090407020005
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

I'm accustomed to being able to search for documentation quickly and am 
not satisfied with apropos, whatis, and man -K.

I propose a new output mode, -Tindex.  This will crawl through the NAME 
and SYNOPSIS section of a manual, indexing function names, variable 
types, header files (searching for all manuals that mention a particular 
header file = golden), utility names, etc.  It writes these to a 
Berkeley database file.  Each record consists of the keyword, then a 
bit-field (the type of record), then the corresponding file.

Other utilities can then mine this data...

Enclosed is a quick sketch.  It dumps manual names (Nm in NAME), utility 
names (Nm in SYNOPSIS), and function names (Fo, FN in SYNOPSIS) into the 
index.  It only does -mdoc, but -man can heuristically grab at least the 
name by grabbing ^[[:alpha:]]+ from the NAME section.  It requires some 
modifications to mdoc.h to associate a file-name with a parse.

This is just a sketch; a "real" version would need to be [at least] much 
more careful about stripping non-character escapes and so on.

I'm still not sure whether it's a good idea to have this /in/ mandoc. 
The BSD db.h not standard across Unices.  I've started cleaning up 
main.c to push the main file-reading routines into a utility class, 
which would allow different users of the entire backend library.  Lots 
of things to think about.

Thoughts?

Kristaps

--------------000702040403090407020005
Content-Type: text/plain;
 name="patch.index.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="patch.index.txt"

? XTextWidth.man
? awk.1
? foo.1
? foo.1.html
? foo.1.xhtml
? foo.3
? foo.3.ps
? foo.html
? gcc.1
? gm.1
? gm.1.html
? man.btree
? mandoc.1.htm
? patch.eqn.txt
? patch.file_status.txt
? patch.foo2.txt
? patch.index.txt
? patch.mandoc_char.txt
? patch.txt
? pcap-savefile.manfile.in
? roff.patch
? style.old.css
? test-strlcat.dSYM
? test-strlcpy.dSYM
Index: Makefile
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/Makefile,v
retrieving revision 1.312
diff -u -r1.312 Makefile
--- Makefile	24 Feb 2011 14:30:15 -0000	1.312
+++ Makefile	15 Mar 2011 22:35:46 -0000
@@ -66,15 +66,15 @@
 MAINLNS	   = main.ln mdoc_term.ln chars.ln term.ln tree.ln \
 	     compat.ln man_term.ln html.ln mdoc_html.ln \
 	     man_html.ln out.ln term_ps.ln term_ascii.ln \
-	     tbl_term.ln tbl_html.ln
+	     tbl_term.ln tbl_html.ln index.ln
 
 MAINOBJS   = main.o mdoc_term.o chars.o term.o tree.o compat.o \
 	     man_term.o html.o mdoc_html.o man_html.o out.o \
-	     term_ps.o term_ascii.o tbl_term.o tbl_html.o
+	     term_ps.o term_ascii.o tbl_term.o tbl_html.o index.o
 
 MAINSRCS   = main.c mdoc_term.c chars.c term.c tree.c compat.c \
 	     man_term.c html.c mdoc_html.c man_html.c out.c \
-	     term_ps.c term_ascii.c tbl_term.c tbl_html.c
+	     term_ps.c term_ascii.c tbl_term.c tbl_html.c index.c
 
 LLNS	   = llib-llibmdoc.ln llib-llibman.ln llib-lmandoc.ln \
 	     llib-llibmandoc.ln llib-llibroff.ln
Index: index.c
===================================================================
RCS file: index.c
diff -N index.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ index.c	15 Mar 2011 22:35:46 -0000
@@ -0,0 +1,382 @@
+/*	$Id: tree.c,v 1.36 2011/02/09 09:18:15 kristaps Exp $ */
+/*
+ * Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#include <sys/types.h>
+
+#include <assert.h>
+#include <db.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "mandoc.h"
+#include "mdoc.h"
+#include "man.h"
+#include "main.h"
+
+enum	indexflags {
+	DBNAME		= 0x01, /* manual name(s) (NAME, Nm) */
+	DBUTIL		= 0x02, /* utility (SYNOPSIS, Nm) */
+	DBFUNC		= 0x04, /* function (SYNOPSIS, Fn, Fo) */
+};
+
+struct	index {
+	DB		*db; /* btree database */
+	char		*curbuf; /* buffer of flag,filename */
+	size_t	 	 curbufsz; /* size of curbuf */
+	const char	*file; /* database file */
+};
+
+typedef	void		(*mdoc_indexh)(struct index *, const struct mdoc_node *);
+static	void	 	 scan_mdoc(struct index *, const struct mdoc_node *);
+static	void	 	 scan_fn(struct index *, const struct mdoc_node *);
+static	void	 	 scan_fo(struct index *, const struct mdoc_node *);
+static	void	 	 scan_nm(struct index *, const struct mdoc_node *);
+static	void	 	 scan_put(struct index *, char *, int);
+
+static	mdoc_indexh mdocs[MDOC_MAX] = {
+	NULL, /* Ap */
+	NULL, /* Dd */
+	NULL, /* Dt */
+	NULL, /* Os */
+	NULL, /* Sh */
+	NULL, /* Ss */ 
+	NULL, /* Pp */ 
+	NULL, /* D1 */
+	NULL, /* Dl */
+	NULL, /* Bd */
+	NULL, /* Ed */
+	NULL, /* Bl */
+	NULL, /* El */
+	NULL, /* It */
+	NULL, /* Ad */ 
+	NULL, /* An */
+	NULL, /* Ar */
+	NULL, /* Cd */
+	NULL, /* Cm */
+	NULL, /* Dv */ 
+	NULL, /* Er */ 
+	NULL, /* Ev */ 
+	NULL, /* Ex */
+	NULL, /* Fa */ 
+	NULL, /* Fd */ 
+	NULL, /* Fl */
+	scan_fn, /* Fn */ 
+	NULL, /* Ft */ 
+	NULL, /* Ic */ 
+	NULL, /* In */ 
+	NULL, /* Li */
+	NULL, /* Nd */ 
+	scan_nm, /* Nm */ 
+	NULL, /* Op */
+	NULL, /* Ot */
+	NULL, /* Pa */
+	NULL, /* Rv */
+	NULL, /* St */ 
+	NULL, /* Va */
+	NULL, /* Vt */ 
+	NULL, /* Xr */
+	NULL, /* %A */
+	NULL, /* %B */
+	NULL, /* %D */
+	NULL, /* %I */
+	NULL, /* %J */
+	NULL, /* %N */
+	NULL, /* %O */
+	NULL, /* %P */
+	NULL, /* %R */
+	NULL, /* %T */
+	NULL, /* %V */
+	NULL, /* Ac */
+	NULL, /* Ao */
+	NULL, /* Aq */
+	NULL, /* At */
+	NULL, /* Bc */
+	NULL, /* Bf */ 
+	NULL, /* Bo */
+	NULL, /* Bq */
+	NULL, /* Bsx */
+	NULL, /* Bx */
+	NULL, /* Db */
+	NULL, /* Dc */
+	NULL, /* Do */
+	NULL, /* Dq */
+	NULL, /* Ec */
+	NULL, /* Ef */
+	NULL, /* Em */ 
+	NULL, /* Eo */
+	NULL, /* Fx */
+	NULL, /* Ms */
+	NULL, /* No */
+	NULL, /* Ns */
+	NULL, /* Nx */
+	NULL, /* Ox */
+	NULL, /* Pc */
+	NULL, /* Pf */
+	NULL, /* Po */
+	NULL, /* Pq */
+	NULL, /* Qc */
+	NULL, /* Ql */
+	NULL, /* Qo */
+	NULL, /* Qq */
+	NULL, /* Re */
+	NULL, /* Rs */
+	NULL, /* Sc */
+	NULL, /* So */
+	NULL, /* Sq */
+	NULL, /* Sm */ 
+	NULL, /* Sx */
+	NULL, /* Sy */
+	NULL, /* Tn */
+	NULL, /* Ux */
+	NULL, /* Xc */
+	NULL, /* Xo */
+	scan_fo, /* Fo */ 
+	NULL, /* Fc */ 
+	NULL, /* Oo */
+	NULL, /* Oc */
+	NULL, /* Bk */
+	NULL, /* Ek */
+	NULL, /* Bt */
+	NULL, /* Hf */
+	NULL, /* Fr */
+	NULL, /* Ud */
+	NULL, /* Lb */
+	NULL, /* Lp */ 
+	NULL, /* Lk */ 
+	NULL, /* Mt */ 
+	NULL, /* Brq */ 
+	NULL, /* Bro */ 
+	NULL, /* Brc */ 
+	NULL, /* %C */ 
+	NULL, /* Es */ 
+	NULL, /* En */
+	NULL, /* Dx */ 
+	NULL, /* %Q */ 
+	NULL, /* br */
+	NULL, /* sp */ 
+	NULL, /* %U */ 
+	NULL, /* Ta */ 
+};
+
+/* ARGSUSED */
+void
+index_man(void *arg, const struct man *m)
+{
+
+	/* Do nothing. */
+}
+
+void *
+index_alloc(char *arg)
+{
+	struct index	*db;
+	const char	*file;
+	BTREEINFO	 info;
+
+	db = calloc(1, sizeof(struct index));
+	if (NULL == db) {
+		perror(NULL);
+		exit((int)MANDOCLEVEL_SYSERR);
+	}
+
+	memset(&info, 0, sizeof(BTREEINFO));
+	info.flags = R_DUP;
+
+	db->file = "man.btree";
+	db->db = dbopen(db->file, O_CREAT | O_RDWR, 0644, DB_BTREE, &info);
+
+	if (NULL == db->db) {
+		perror(file);
+		exit((int)MANDOCLEVEL_SYSERR);
+	}
+
+	return(db);
+}
+
+void
+index_free(void *arg)
+{
+	struct index	*db;
+
+	db = (struct index *)arg;
+
+	if (-1 == (*db->db->close)(db->db))
+		perror(db->file);
+	if (db->curbuf)
+		free(db->curbuf);
+
+	free(db);
+}
+
+/* ARGSUSED */
+void
+index_mdoc(void *arg, const struct mdoc *m)
+{
+	struct index	*db;
+	const char	*file;
+	size_t		 filesz;
+
+	db = (struct index *)arg;
+
+	if (NULL == (file = mdoc_meta(m)->file))
+		return;
+
+	/*
+	 * Create enough storage space for the record type and the
+	 * file-name of the record.
+	 */
+
+	filesz = strlen(file);
+
+	db->curbufsz = filesz + 5; /* Four bytes (int) and nil. */
+	db->curbuf = realloc(db->curbuf, db->curbufsz);
+
+	if (NULL == db->curbuf) {
+		perror(NULL);
+		exit((int)MANDOCLEVEL_SYSERR);
+	}
+
+	strlcpy(db->curbuf + 4, file, db->curbufsz - 4);
+	scan_mdoc(db, mdoc_node(m));
+}
+
+static void
+scan_put(struct index *db, char *buf, int flag)
+{
+	DBT		 key, val;
+
+	if ('\0' == buf[0])
+		return;
+
+	key.data = buf;
+	key.size = strlen(buf);
+
+	/*
+	 * The value of the record is the entry type (host-byte integer
+	 * bit-field) followed by the nil-terminated filename.
+	 */
+
+	memcpy(db->curbuf, &flag, 4);
+
+	val.data = db->curbuf;
+	val.size = db->curbufsz;
+
+	if (-1 == (*db->db->put)(db->db, &key, &val, 0)) {
+		perror(db->file);
+		exit((int)MANDOCLEVEL_SYSERR);
+	}
+}
+
+/*
+ * Accept function names (`Fn') in the SYNOPSIS section.  `Fn' has
+ * strange syntax, so make sure we get the actual name and not the type.
+ */
+static void
+scan_fn(struct index *db, const struct mdoc_node *n)
+{
+
+	if (SEC_SYNOPSIS != n->sec)
+		return;
+	if (MDOC_ELEM != n->type)
+		return;
+
+	if (n->nchild > 1) 
+		n = n->child->next;
+	else
+		n = n->child;
+
+	scan_put(db, n->string, DBFUNC);
+}
+
+static void
+scan_fo(struct index *db, const struct mdoc_node *n)
+{
+	char		 buf[BUFSIZ];
+
+	if (SEC_SYNOPSIS != n->sec)
+		return;
+	if (MDOC_ELEM != n->type)
+		return;
+
+	for (buf[0] = '\0', n = n->child; n; n = n->next) {
+		assert(MDOC_TEXT == n->type);
+		strlcat(buf, n->string, BUFSIZ);
+		if (n->next)
+			strlcat(buf, " ", BUFSIZ);
+	}
+
+	scan_put(db, buf, DBFUNC);
+}
+
+static void
+scan_nm(struct index *db, const struct mdoc_node *n)
+{
+	int		 flag;
+	char		 buf[BUFSIZ];
+
+	if (SEC_NAME != n->sec && SEC_SYNOPSIS != n->sec)
+		return;
+	if (MDOC_ELEM != n->type && MDOC_HEAD != n->type)
+		return;
+
+	flag = SEC_NAME == n->sec ? DBNAME : DBUTIL;
+
+	for (buf[0] = '\0', n = n->child; n; n = n->next) {
+		if (MDOC_TEXT != n->type)
+			continue;
+		strlcat(buf, n->string, BUFSIZ);
+		if (n->next)
+			strlcat(buf, " ", BUFSIZ);
+	}
+
+	scan_put(db, buf, flag);
+}
+
+static void
+scan_mdoc(struct index *db, const struct mdoc_node *n)
+{
+
+	switch (n->type) {
+	case (MDOC_ELEM):
+		/* FALLTHROUGH */
+	case (MDOC_BLOCK):
+		/* FALLTHROUGH */
+	case (MDOC_TAIL):
+		/* FALLTHROUGH */
+	case (MDOC_BODY):
+		/* FALLTHROUGH */
+	case (MDOC_HEAD):
+		if (NULL == mdocs[(int)n->tok])
+			break;
+		(*mdocs[(int)n->tok])(db, n);
+		break;
+	default:
+		break;
+	}
+
+	if (n->child)
+		scan_mdoc(db, n->child);
+	if (n->next)
+		scan_mdoc(db, n->next);
+}
+
Index: main.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/main.c,v
retrieving revision 1.150
diff -u -r1.150 main.c
--- main.c	15 Mar 2011 16:23:51 -0000	1.150
+++ main.c	15 Mar 2011 22:35:46 -0000
@@ -73,7 +73,8 @@
 	OUTT_XHTML,
 	OUTT_LINT,
 	OUTT_PS,
-	OUTT_PDF
+	OUTT_PDF,
+	OUTT_INDEX
 };
 
 struct	curparse {
@@ -560,6 +561,10 @@
 			curp->outdata = ascii_alloc(curp->outopts);
 			curp->outfree = ascii_free;
 			break;
+		case (OUTT_INDEX):
+			curp->outdata = index_alloc(curp->outopts);
+			curp->outfree = index_free;
+			break;
 		case (OUTT_PDF):
 			curp->outdata = pdf_alloc(curp->outopts);
 			curp->outfree = pspdf_free;
@@ -584,6 +589,10 @@
 			curp->outman = tree_man;
 			curp->outmdoc = tree_mdoc;
 			break;
+		case (OUTT_INDEX):
+			curp->outman = index_man;
+			curp->outmdoc = index_mdoc;
+			break;
 		case (OUTT_PDF):
 			/* FALLTHROUGH */
 		case (OUTT_ASCII):
@@ -900,6 +909,9 @@
 pset(const char *buf, int pos, struct curparse *curp)
 {
 	int		 i;
+	const char	*file;
+
+	file = STDIN_FILENO == curp->fd ? NULL : curp->file;
 
 	/*
 	 * Try to intuit which kind of manual parser should be used.  If
@@ -927,6 +939,7 @@
 				(&curp->regs, curp, mmsg);
 		assert(curp->pmdoc);
 		curp->mdoc = curp->pmdoc;
+		mdoc_startparse(curp->mdoc, file);
 		return;
 	case (INTT_MAN):
 		if (NULL == curp->pman) 
@@ -945,6 +958,7 @@
 				(&curp->regs, curp, mmsg);
 		assert(curp->pmdoc);
 		curp->mdoc = curp->pmdoc;
+		mdoc_startparse(curp->mdoc, file);
 		return;
 	} 
 
@@ -992,6 +1006,8 @@
 		curp->outtype = OUTT_PS;
 	else if (0 == strcmp(arg, "pdf"))
 		curp->outtype = OUTT_PDF;
+	else if (0 == strcmp(arg, "index"))
+		curp->outtype = OUTT_INDEX;
 	else {
 		fprintf(stderr, "%s: Bad argument\n", arg);
 		return(0);
Index: main.h
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/main.h,v
retrieving revision 1.10
diff -u -r1.10 main.h
--- main.h	31 Jul 2010 23:52:58 -0000	1.10
+++ main.h	15 Mar 2011 22:35:46 -0000
@@ -41,6 +41,11 @@
 void		  tree_mdoc(void *, const struct mdoc *);
 void		  tree_man(void *, const struct man *);
 
+void		 *index_alloc(char *);
+void		  index_free(void *);
+void		  index_mdoc(void *, const struct mdoc *);
+void		  index_man(void *, const struct man *);
+
 void		 *ascii_alloc(char *);
 void		  ascii_free(void *);
 
Index: mdoc.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc.c,v
retrieving revision 1.183
diff -u -r1.183 mdoc.c
--- mdoc.c	15 Mar 2011 13:23:33 -0000	1.183
+++ mdoc.c	15 Mar 2011 22:35:46 -0000
@@ -138,6 +138,8 @@
 		free(mdoc->meta.vol);
 	if (mdoc->meta.msec)
 		free(mdoc->meta.msec);
+	if (mdoc->meta.file)
+		free(mdoc->meta.file);
 	if (mdoc->meta.date)
 		free(mdoc->meta.date);
 }
@@ -207,6 +209,15 @@
 	return(p);
 }
 
+void
+mdoc_startparse(struct mdoc *m, const char *file)
+{
+
+	assert(NULL == m->meta.file);
+	if (NULL == file)
+		return;
+	m->meta.file = mandoc_strdup(file);
+}
 
 /*
  * Climb back up the parse tree, validating open scopes.  Mostly calls
Index: mdoc.h
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc.h,v
retrieving revision 1.118
diff -u -r1.118 mdoc.h
--- mdoc.h	7 Mar 2011 01:35:51 -0000	1.118
+++ mdoc.h	15 Mar 2011 22:35:46 -0000
@@ -229,6 +229,7 @@
  * Information from prologue. 
  */
 struct	mdoc_meta {
+	char		 *file; /* filename (NULL if stdin) */
 	char		 *msec; /* `Dt' section (1, 3p, etc.) */
 	char		 *vol; /* `Dt' volume (implied) */
 	char		 *arch; /* `Dt' arch (i386, etc.) */
@@ -430,6 +431,7 @@
 const struct mdoc_node *mdoc_node(const struct mdoc *);
 const struct mdoc_meta *mdoc_meta(const struct mdoc *);
 int		  mdoc_endparse(struct mdoc *);
+void		  mdoc_startparse(struct mdoc *, const char *);
 int		  mdoc_addspan(struct mdoc *,
 			const struct tbl_span *);
 int		  mdoc_addeqn(struct mdoc *,

--------------000702040403090407020005--
--
 To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv