From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-1.sys.kth.se (smtp-1.sys.kth.se [130.237.32.175]) by krisdoz.my.domain (8.14.3/8.14.3) with ESMTP id p2GAsaPG008043 for ; Wed, 16 Mar 2011 06:54:37 -0400 (EDT) Received: from mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) by smtp-1.sys.kth.se (Postfix) with ESMTP id 8D1781563D3 for ; Wed, 16 Mar 2011 11:47:24 +0100 (CET) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-1.sys.kth.se ([130.237.32.175]) by mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) (amavisd-new, port 10024) with LMTP id 7z-MC037haQq for ; Wed, 16 Mar 2011 11:47:21 +0100 (CET) X-KTH-Auth: kristaps [85.8.61.126] X-KTH-mail-from: kristaps@bsd.lv X-KTH-rcpt-to: discuss@mdocml.bsd.lv Received: from h85-8-61-126.dynamic.se.alltele.net (h85-8-61-126.dynamic.se.alltele.net [85.8.61.126]) by smtp-1.sys.kth.se (Postfix) with ESMTP id A5A27156402 for ; Wed, 16 Mar 2011 11:47:19 +0100 (CET) Message-ID: <4D809537.7090201@bsd.lv> Date: Wed, 16 Mar 2011 11:47:19 +0100 From: Kristaps Dzonsons User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.14) Gecko/20110221 Thunderbird/3.1.8 X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 To: discuss@mdocml.bsd.lv Subject: [PATCH] Being crazy: -Tindex Content-Type: multipart/mixed; boundary="------------000702040403090407020005" This is a multi-part message in MIME format. --------------000702040403090407020005 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, I'm accustomed to being able to search for documentation quickly and am not satisfied with apropos, whatis, and man -K. I propose a new output mode, -Tindex. This will crawl through the NAME and SYNOPSIS section of a manual, indexing function names, variable types, header files (searching for all manuals that mention a particular header file = golden), utility names, etc. It writes these to a Berkeley database file. Each record consists of the keyword, then a bit-field (the type of record), then the corresponding file. Other utilities can then mine this data... Enclosed is a quick sketch. It dumps manual names (Nm in NAME), utility names (Nm in SYNOPSIS), and function names (Fo, FN in SYNOPSIS) into the index. It only does -mdoc, but -man can heuristically grab at least the name by grabbing ^[[:alpha:]]+ from the NAME section. It requires some modifications to mdoc.h to associate a file-name with a parse. This is just a sketch; a "real" version would need to be [at least] much more careful about stripping non-character escapes and so on. I'm still not sure whether it's a good idea to have this /in/ mandoc. The BSD db.h not standard across Unices. I've started cleaning up main.c to push the main file-reading routines into a utility class, which would allow different users of the entire backend library. Lots of things to think about. Thoughts? Kristaps --------------000702040403090407020005 Content-Type: text/plain; name="patch.index.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="patch.index.txt" ? XTextWidth.man ? awk.1 ? foo.1 ? foo.1.html ? foo.1.xhtml ? foo.3 ? foo.3.ps ? foo.html ? gcc.1 ? gm.1 ? gm.1.html ? man.btree ? mandoc.1.htm ? patch.eqn.txt ? patch.file_status.txt ? patch.foo2.txt ? patch.index.txt ? patch.mandoc_char.txt ? patch.txt ? pcap-savefile.manfile.in ? roff.patch ? style.old.css ? test-strlcat.dSYM ? test-strlcpy.dSYM Index: Makefile =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/Makefile,v retrieving revision 1.312 diff -u -r1.312 Makefile --- Makefile 24 Feb 2011 14:30:15 -0000 1.312 +++ Makefile 15 Mar 2011 22:35:46 -0000 @@ -66,15 +66,15 @@ MAINLNS = main.ln mdoc_term.ln chars.ln term.ln tree.ln \ compat.ln man_term.ln html.ln mdoc_html.ln \ man_html.ln out.ln term_ps.ln term_ascii.ln \ - tbl_term.ln tbl_html.ln + tbl_term.ln tbl_html.ln index.ln MAINOBJS = main.o mdoc_term.o chars.o term.o tree.o compat.o \ man_term.o html.o mdoc_html.o man_html.o out.o \ - term_ps.o term_ascii.o tbl_term.o tbl_html.o + term_ps.o term_ascii.o tbl_term.o tbl_html.o index.o MAINSRCS = main.c mdoc_term.c chars.c term.c tree.c compat.c \ man_term.c html.c mdoc_html.c man_html.c out.c \ - term_ps.c term_ascii.c tbl_term.c tbl_html.c + term_ps.c term_ascii.c tbl_term.c tbl_html.c index.c LLNS = llib-llibmdoc.ln llib-llibman.ln llib-lmandoc.ln \ llib-llibmandoc.ln llib-llibroff.ln Index: index.c =================================================================== RCS file: index.c diff -N index.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ index.c 15 Mar 2011 22:35:46 -0000 @@ -0,0 +1,382 @@ +/* $Id: tree.c,v 1.36 2011/02/09 09:18:15 kristaps Exp $ */ +/* + * Copyright (c) 2011 Kristaps Dzonsons + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + */ +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "mandoc.h" +#include "mdoc.h" +#include "man.h" +#include "main.h" + +enum indexflags { + DBNAME = 0x01, /* manual name(s) (NAME, Nm) */ + DBUTIL = 0x02, /* utility (SYNOPSIS, Nm) */ + DBFUNC = 0x04, /* function (SYNOPSIS, Fn, Fo) */ +}; + +struct index { + DB *db; /* btree database */ + char *curbuf; /* buffer of flag,filename */ + size_t curbufsz; /* size of curbuf */ + const char *file; /* database file */ +}; + +typedef void (*mdoc_indexh)(struct index *, const struct mdoc_node *); +static void scan_mdoc(struct index *, const struct mdoc_node *); +static void scan_fn(struct index *, const struct mdoc_node *); +static void scan_fo(struct index *, const struct mdoc_node *); +static void scan_nm(struct index *, const struct mdoc_node *); +static void scan_put(struct index *, char *, int); + +static mdoc_indexh mdocs[MDOC_MAX] = { + NULL, /* Ap */ + NULL, /* Dd */ + NULL, /* Dt */ + NULL, /* Os */ + NULL, /* Sh */ + NULL, /* Ss */ + NULL, /* Pp */ + NULL, /* D1 */ + NULL, /* Dl */ + NULL, /* Bd */ + NULL, /* Ed */ + NULL, /* Bl */ + NULL, /* El */ + NULL, /* It */ + NULL, /* Ad */ + NULL, /* An */ + NULL, /* Ar */ + NULL, /* Cd */ + NULL, /* Cm */ + NULL, /* Dv */ + NULL, /* Er */ + NULL, /* Ev */ + NULL, /* Ex */ + NULL, /* Fa */ + NULL, /* Fd */ + NULL, /* Fl */ + scan_fn, /* Fn */ + NULL, /* Ft */ + NULL, /* Ic */ + NULL, /* In */ + NULL, /* Li */ + NULL, /* Nd */ + scan_nm, /* Nm */ + NULL, /* Op */ + NULL, /* Ot */ + NULL, /* Pa */ + NULL, /* Rv */ + NULL, /* St */ + NULL, /* Va */ + NULL, /* Vt */ + NULL, /* Xr */ + NULL, /* %A */ + NULL, /* %B */ + NULL, /* %D */ + NULL, /* %I */ + NULL, /* %J */ + NULL, /* %N */ + NULL, /* %O */ + NULL, /* %P */ + NULL, /* %R */ + NULL, /* %T */ + NULL, /* %V */ + NULL, /* Ac */ + NULL, /* Ao */ + NULL, /* Aq */ + NULL, /* At */ + NULL, /* Bc */ + NULL, /* Bf */ + NULL, /* Bo */ + NULL, /* Bq */ + NULL, /* Bsx */ + NULL, /* Bx */ + NULL, /* Db */ + NULL, /* Dc */ + NULL, /* Do */ + NULL, /* Dq */ + NULL, /* Ec */ + NULL, /* Ef */ + NULL, /* Em */ + NULL, /* Eo */ + NULL, /* Fx */ + NULL, /* Ms */ + NULL, /* No */ + NULL, /* Ns */ + NULL, /* Nx */ + NULL, /* Ox */ + NULL, /* Pc */ + NULL, /* Pf */ + NULL, /* Po */ + NULL, /* Pq */ + NULL, /* Qc */ + NULL, /* Ql */ + NULL, /* Qo */ + NULL, /* Qq */ + NULL, /* Re */ + NULL, /* Rs */ + NULL, /* Sc */ + NULL, /* So */ + NULL, /* Sq */ + NULL, /* Sm */ + NULL, /* Sx */ + NULL, /* Sy */ + NULL, /* Tn */ + NULL, /* Ux */ + NULL, /* Xc */ + NULL, /* Xo */ + scan_fo, /* Fo */ + NULL, /* Fc */ + NULL, /* Oo */ + NULL, /* Oc */ + NULL, /* Bk */ + NULL, /* Ek */ + NULL, /* Bt */ + NULL, /* Hf */ + NULL, /* Fr */ + NULL, /* Ud */ + NULL, /* Lb */ + NULL, /* Lp */ + NULL, /* Lk */ + NULL, /* Mt */ + NULL, /* Brq */ + NULL, /* Bro */ + NULL, /* Brc */ + NULL, /* %C */ + NULL, /* Es */ + NULL, /* En */ + NULL, /* Dx */ + NULL, /* %Q */ + NULL, /* br */ + NULL, /* sp */ + NULL, /* %U */ + NULL, /* Ta */ +}; + +/* ARGSUSED */ +void +index_man(void *arg, const struct man *m) +{ + + /* Do nothing. */ +} + +void * +index_alloc(char *arg) +{ + struct index *db; + const char *file; + BTREEINFO info; + + db = calloc(1, sizeof(struct index)); + if (NULL == db) { + perror(NULL); + exit((int)MANDOCLEVEL_SYSERR); + } + + memset(&info, 0, sizeof(BTREEINFO)); + info.flags = R_DUP; + + db->file = "man.btree"; + db->db = dbopen(db->file, O_CREAT | O_RDWR, 0644, DB_BTREE, &info); + + if (NULL == db->db) { + perror(file); + exit((int)MANDOCLEVEL_SYSERR); + } + + return(db); +} + +void +index_free(void *arg) +{ + struct index *db; + + db = (struct index *)arg; + + if (-1 == (*db->db->close)(db->db)) + perror(db->file); + if (db->curbuf) + free(db->curbuf); + + free(db); +} + +/* ARGSUSED */ +void +index_mdoc(void *arg, const struct mdoc *m) +{ + struct index *db; + const char *file; + size_t filesz; + + db = (struct index *)arg; + + if (NULL == (file = mdoc_meta(m)->file)) + return; + + /* + * Create enough storage space for the record type and the + * file-name of the record. + */ + + filesz = strlen(file); + + db->curbufsz = filesz + 5; /* Four bytes (int) and nil. */ + db->curbuf = realloc(db->curbuf, db->curbufsz); + + if (NULL == db->curbuf) { + perror(NULL); + exit((int)MANDOCLEVEL_SYSERR); + } + + strlcpy(db->curbuf + 4, file, db->curbufsz - 4); + scan_mdoc(db, mdoc_node(m)); +} + +static void +scan_put(struct index *db, char *buf, int flag) +{ + DBT key, val; + + if ('\0' == buf[0]) + return; + + key.data = buf; + key.size = strlen(buf); + + /* + * The value of the record is the entry type (host-byte integer + * bit-field) followed by the nil-terminated filename. + */ + + memcpy(db->curbuf, &flag, 4); + + val.data = db->curbuf; + val.size = db->curbufsz; + + if (-1 == (*db->db->put)(db->db, &key, &val, 0)) { + perror(db->file); + exit((int)MANDOCLEVEL_SYSERR); + } +} + +/* + * Accept function names (`Fn') in the SYNOPSIS section. `Fn' has + * strange syntax, so make sure we get the actual name and not the type. + */ +static void +scan_fn(struct index *db, const struct mdoc_node *n) +{ + + if (SEC_SYNOPSIS != n->sec) + return; + if (MDOC_ELEM != n->type) + return; + + if (n->nchild > 1) + n = n->child->next; + else + n = n->child; + + scan_put(db, n->string, DBFUNC); +} + +static void +scan_fo(struct index *db, const struct mdoc_node *n) +{ + char buf[BUFSIZ]; + + if (SEC_SYNOPSIS != n->sec) + return; + if (MDOC_ELEM != n->type) + return; + + for (buf[0] = '\0', n = n->child; n; n = n->next) { + assert(MDOC_TEXT == n->type); + strlcat(buf, n->string, BUFSIZ); + if (n->next) + strlcat(buf, " ", BUFSIZ); + } + + scan_put(db, buf, DBFUNC); +} + +static void +scan_nm(struct index *db, const struct mdoc_node *n) +{ + int flag; + char buf[BUFSIZ]; + + if (SEC_NAME != n->sec && SEC_SYNOPSIS != n->sec) + return; + if (MDOC_ELEM != n->type && MDOC_HEAD != n->type) + return; + + flag = SEC_NAME == n->sec ? DBNAME : DBUTIL; + + for (buf[0] = '\0', n = n->child; n; n = n->next) { + if (MDOC_TEXT != n->type) + continue; + strlcat(buf, n->string, BUFSIZ); + if (n->next) + strlcat(buf, " ", BUFSIZ); + } + + scan_put(db, buf, flag); +} + +static void +scan_mdoc(struct index *db, const struct mdoc_node *n) +{ + + switch (n->type) { + case (MDOC_ELEM): + /* FALLTHROUGH */ + case (MDOC_BLOCK): + /* FALLTHROUGH */ + case (MDOC_TAIL): + /* FALLTHROUGH */ + case (MDOC_BODY): + /* FALLTHROUGH */ + case (MDOC_HEAD): + if (NULL == mdocs[(int)n->tok]) + break; + (*mdocs[(int)n->tok])(db, n); + break; + default: + break; + } + + if (n->child) + scan_mdoc(db, n->child); + if (n->next) + scan_mdoc(db, n->next); +} + Index: main.c =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/main.c,v retrieving revision 1.150 diff -u -r1.150 main.c --- main.c 15 Mar 2011 16:23:51 -0000 1.150 +++ main.c 15 Mar 2011 22:35:46 -0000 @@ -73,7 +73,8 @@ OUTT_XHTML, OUTT_LINT, OUTT_PS, - OUTT_PDF + OUTT_PDF, + OUTT_INDEX }; struct curparse { @@ -560,6 +561,10 @@ curp->outdata = ascii_alloc(curp->outopts); curp->outfree = ascii_free; break; + case (OUTT_INDEX): + curp->outdata = index_alloc(curp->outopts); + curp->outfree = index_free; + break; case (OUTT_PDF): curp->outdata = pdf_alloc(curp->outopts); curp->outfree = pspdf_free; @@ -584,6 +589,10 @@ curp->outman = tree_man; curp->outmdoc = tree_mdoc; break; + case (OUTT_INDEX): + curp->outman = index_man; + curp->outmdoc = index_mdoc; + break; case (OUTT_PDF): /* FALLTHROUGH */ case (OUTT_ASCII): @@ -900,6 +909,9 @@ pset(const char *buf, int pos, struct curparse *curp) { int i; + const char *file; + + file = STDIN_FILENO == curp->fd ? NULL : curp->file; /* * Try to intuit which kind of manual parser should be used. If @@ -927,6 +939,7 @@ (&curp->regs, curp, mmsg); assert(curp->pmdoc); curp->mdoc = curp->pmdoc; + mdoc_startparse(curp->mdoc, file); return; case (INTT_MAN): if (NULL == curp->pman) @@ -945,6 +958,7 @@ (&curp->regs, curp, mmsg); assert(curp->pmdoc); curp->mdoc = curp->pmdoc; + mdoc_startparse(curp->mdoc, file); return; } @@ -992,6 +1006,8 @@ curp->outtype = OUTT_PS; else if (0 == strcmp(arg, "pdf")) curp->outtype = OUTT_PDF; + else if (0 == strcmp(arg, "index")) + curp->outtype = OUTT_INDEX; else { fprintf(stderr, "%s: Bad argument\n", arg); return(0); Index: main.h =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/main.h,v retrieving revision 1.10 diff -u -r1.10 main.h --- main.h 31 Jul 2010 23:52:58 -0000 1.10 +++ main.h 15 Mar 2011 22:35:46 -0000 @@ -41,6 +41,11 @@ void tree_mdoc(void *, const struct mdoc *); void tree_man(void *, const struct man *); +void *index_alloc(char *); +void index_free(void *); +void index_mdoc(void *, const struct mdoc *); +void index_man(void *, const struct man *); + void *ascii_alloc(char *); void ascii_free(void *); Index: mdoc.c =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc.c,v retrieving revision 1.183 diff -u -r1.183 mdoc.c --- mdoc.c 15 Mar 2011 13:23:33 -0000 1.183 +++ mdoc.c 15 Mar 2011 22:35:46 -0000 @@ -138,6 +138,8 @@ free(mdoc->meta.vol); if (mdoc->meta.msec) free(mdoc->meta.msec); + if (mdoc->meta.file) + free(mdoc->meta.file); if (mdoc->meta.date) free(mdoc->meta.date); } @@ -207,6 +209,15 @@ return(p); } +void +mdoc_startparse(struct mdoc *m, const char *file) +{ + + assert(NULL == m->meta.file); + if (NULL == file) + return; + m->meta.file = mandoc_strdup(file); +} /* * Climb back up the parse tree, validating open scopes. Mostly calls Index: mdoc.h =================================================================== RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mdoc.h,v retrieving revision 1.118 diff -u -r1.118 mdoc.h --- mdoc.h 7 Mar 2011 01:35:51 -0000 1.118 +++ mdoc.h 15 Mar 2011 22:35:46 -0000 @@ -229,6 +229,7 @@ * Information from prologue. */ struct mdoc_meta { + char *file; /* filename (NULL if stdin) */ char *msec; /* `Dt' section (1, 3p, etc.) */ char *vol; /* `Dt' volume (implied) */ char *arch; /* `Dt' arch (i386, etc.) */ @@ -430,6 +431,7 @@ const struct mdoc_node *mdoc_node(const struct mdoc *); const struct mdoc_meta *mdoc_meta(const struct mdoc *); int mdoc_endparse(struct mdoc *); +void mdoc_startparse(struct mdoc *, const char *); int mdoc_addspan(struct mdoc *, const struct tbl_span *); int mdoc_addeqn(struct mdoc *, --------------000702040403090407020005-- -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv