tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* mandocdb tools, sqlite3, and ohash
@ 2012-06-07 15:15 Kristaps Dzonsons
  2012-06-07 16:29 ` Joerg Sonnenberger
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Kristaps Dzonsons @ 2012-06-07 15:15 UTC (permalink / raw)
  To: tech; +Cc: Marc Espie

[-- Attachment #1: Type: text/plain, Size: 1461 bytes --]

Hi,

I've now kicked out uthash in favour of OpenBSD's native ohash.  (I've 
included espie@ in this just in case he has any ohashy wisdom.)  This 
requires that I add a compat_ohash.c file, but this is trivial.

The difference is as small as expected.  In fact, pretty much all that 
needed tweaking was making struct of's file pointer an array (since 
ohash requires that keys be within the structures--something that needs 
stating twice because I forgot) and using a dummy array declaration for 
struct str for the same reason:

   struct  str {
         ...stuff here...
         char key[1];
   };

...with allocation in the regular way (the struct size plus the string 
length).  Is this known to play nicely with all compilers?  I can't 
think of a better way without modifying the ohash routines to work with 
key pointers.

The biggest bottleneck right now is the database.  Try running mandocdb 
with -n to see the magnitude of difference.  Ugh!  The -n is checked 
directly prior to database stuff ("nodb" in the source), so there's 
little else going on there.  I don't do any fancy sqlitey stuff, so 
there's room for improvement.

Note that Marc's hash-growth is less greedy than uthash's, so I can run 
it as a normal user without running out of memory.  Also, I initially 
thought I'd have to switch from K&R's hash (ohash's default), but it 
does a pretty good job especially considering that size of the sqlite3 
bottleneck.

Best,

Kristaps

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 154161 bytes --]

Index: Makefile
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/Makefile,v
retrieving revision 1.395
diff -u -p -r1.395 Makefile
--- Makefile	24 Mar 2012 01:54:43 -0000	1.395
+++ Makefile	7 Jun 2012 15:09:33 -0000
@@ -24,17 +24,16 @@ VDATE		 = 23 March 2012
 CFLAGS	 	+= -DUSE_WCHAR
 
 # If your system has manpath(1), uncomment this.  This is most any
-# system that's not OpenBSD or NetBSD.  If uncommented, apropos(1),
-# mandocdb(8), and man.cgi will popen(3) manpath(1) to get the MANPATH
-# variable.
+# system that's not OpenBSD or NetBSD.  If uncommented, manpage(1) and
+# mandocdb(8) will use manpath(1) to get the MANPATH variable.
 #CFLAGS		+= -DUSE_MANPATH
 
 # If your system supports static binaries only, uncomment this.  This
 # appears only to be BSD UNIX systems (Mac OS X has no support and Linux
 # requires -pthreads for static libdb).
-STATIC		 = -static
+#STATIC		 = -static
 
-CFLAGS		+= -g -DHAVE_CONFIG_H -DVERSION="\"$(VERSION)\""
+CFLAGS		+= -I/usr/local/include -g -DHAVE_CONFIG_H -DVERSION="\"$(VERSION)\""
 CFLAGS     	+= -W -Wall -Wstrict-prototypes -Wno-unused-parameter -Wwrite-strings
 PREFIX		 = /usr/local
 WWWPREFIX	 = /var/www
@@ -52,29 +51,17 @@ INSTALL_LIB	 = $(INSTALL) -m 0644
 INSTALL_SOURCE	 = $(INSTALL) -m 0644
 INSTALL_MAN	 = $(INSTALL_DATA)
 
-# Non-BSD systems (Linux, etc.) need -ldb to compile mandocdb and
-# apropos.
-# However, if you don't have -ldb at all (or it's not native), then
-# comment out apropos and mandocdb. 
-#
-#DBLIB		 = -ldb
-DBBIN		 = apropos mandocdb man.cgi catman whatis
-DBLN		 = llib-lapropos.ln llib-lmandocdb.ln llib-lman.cgi.ln llib-lcatman.ln
+DBLIB		 = -L/usr/local/lib -lsqlite3
+DBBIN		 = mandocdb manpage apropos
 
 all: mandoc preconv demandoc $(DBBIN)
 
 SRCS		 = Makefile \
 		   TODO \
-		   apropos.1 \
-		   apropos.c \
-		   apropos_db.c \
-		   apropos_db.h \
 		   arch.c \
 		   arch.in \
 		   att.c \
 		   att.in \
-		   catman.8 \
-		   catman.c \
 		   cgi.c \
 		   chars.c \
 		   chars.in \
@@ -106,7 +93,6 @@ SRCS		 = Makefile \
 		   main.h \
 		   man.7 \
 		   man.c \
-		   man.cgi.7 \
 		   man-cgi.css \
 		   man.h \
 		   man_hash.c \
@@ -166,17 +152,12 @@ SRCS		 = Makefile \
 		   test-strptime.c \
 		   tree.c \
 		   vol.c \
-		   vol.in \
-		   whatis.1
+		   vol.in
 
 LIBMAN_OBJS	 = man.o \
 		   man_hash.o \
 		   man_macro.o \
 		   man_validate.o
-LIBMAN_LNS	 = man.ln \
-		   man_hash.ln \
-		   man_macro.ln \
-		   man_validate.ln
 
 LIBMDOC_OBJS	 = arch.o \
 		   att.o \
@@ -188,16 +169,6 @@ LIBMDOC_OBJS	 = arch.o \
 		   mdoc_validate.o \
 		   st.o \
 		   vol.o
-LIBMDOC_LNS	 = arch.ln \
-		   att.ln \
-		   lib.ln \
-		   mdoc.ln \
-		   mdoc_argv.ln \
-		   mdoc_hash.ln \
-		   mdoc_macro.ln \
-		   mdoc_validate.ln \
-		   st.ln \
-		   vol.ln
 
 LIBROFF_OBJS	 = eqn.o \
 		   roff.o \
@@ -205,12 +176,6 @@ LIBROFF_OBJS	 = eqn.o \
 		   tbl_data.o \
 		   tbl_layout.o \
 		   tbl_opts.o
-LIBROFF_LNS	 = eqn.ln \
-		   roff.ln \
-		   tbl.ln \
-		   tbl_data.ln \
-		   tbl_layout.ln \
-		   tbl_opts.ln
 
 LIBMANDOC_OBJS	 = $(LIBMAN_OBJS) \
 		   $(LIBMDOC_OBJS) \
@@ -219,52 +184,35 @@ LIBMANDOC_OBJS	 = $(LIBMAN_OBJS) \
 		   mandoc.o \
 		   msec.o \
 		   read.o
-LIBMANDOC_LNS	 = $(LIBMAN_LNS) \
-		   $(LIBMDOC_LNS) \
-		   $(LIBROFF_LNS) \
-		   chars.ln \
-		   mandoc.ln \
-		   msec.ln \
-		   read.ln
 
 COMPAT_OBJS	 = compat_fgetln.o \
 		   compat_getsubopt.o \
 		   compat_strlcat.o \
 		   compat_strlcpy.o
-COMPAT_LNS	 = compat_fgetln.ln \
-		   compat_getsubopt.ln \
-		   compat_strlcat.ln \
-		   compat_strlcpy.ln
-
-arch.o arch.ln: arch.in
-att.o att.ln: att.in
-chars.o chars.ln: chars.in
-lib.o lib.ln: lib.in
-msec.o msec.ln: msec.in
-roff.o roff.ln: predefs.in
-st.o st.ln: st.in
-vol.o vol.ln: vol.in
-
-$(LIBMAN_OBJS) $(LIBMAN_LNS): libman.h
-$(LIBMDOC_OBJS) $(LIBMDOC_LNS): libmdoc.h
-$(LIBROFF_OBJS) $(LIBROFF_LNS): libroff.h
-$(LIBMANDOC_OBJS) $(LIBMANDOC_LNS): mandoc.h mdoc.h man.h libmandoc.h config.h
 
-$(COMPAT_OBJS) $(COMPAT_LNS): config.h
+arch.o: arch.in
+att.o: att.in
+chars.o: chars.in
+lib.o: lib.in
+msec.o: msec.in
+roff.o: predefs.in
+st.o: st.in
+vol.o: vol.in
+
+$(LIBMAN_OBJS): libman.h
+$(LIBMDOC_OBJS): libmdoc.h
+$(LIBROFF_OBJS): libroff.h
+$(LIBMANDOC_OBJS): mandoc.h mdoc.h man.h libmandoc.h config.h
+$(COMPAT_OBJS): config.h
 
 MANDOC_HTML_OBJS = eqn_html.o \
 		   html.o \
 		   man_html.o \
 		   mdoc_html.o \
 		   tbl_html.o
-MANDOC_HTML_LNS	 = eqn_html.ln \
-		   html.ln \
-		   man_html.ln \
-		   mdoc_html.ln \
-		   tbl_html.ln
+$(MANDOC_HTML_OBJS): html.h
 
 MANDOC_MAN_OBJS  = mdoc_man.o
-MANDOC_MAN_LNS   = mdoc_man.ln
 
 MANDOC_TERM_OBJS = eqn_term.o \
 		   man_term.o \
@@ -273,13 +221,7 @@ MANDOC_TERM_OBJS = eqn_term.o \
 		   term_ascii.o \
 		   term_ps.o \
 		   tbl_term.o
-MANDOC_TERM_LNS	 = eqn_term.ln \
-		   man_term.ln \
-		   mdoc_term.ln \
-		   term.ln \
-		   term_ascii.ln \
-		   term_ps.ln \
-		   tbl_term.ln
+$(MANDOC_TERM_OBJS): term.h
 
 MANDOC_OBJS	 = $(MANDOC_HTML_OBJS) \
 		   $(MANDOC_MAN_OBJS) \
@@ -287,73 +229,24 @@ MANDOC_OBJS	 = $(MANDOC_HTML_OBJS) \
 		   main.o \
 		   out.o \
 		   tree.o
-MANDOC_LNS	 = $(MANDOC_HTML_LNS) \
-		   $(MANDOC_MAN_LNS) \
-		   $(MANDOC_TERM_LNS) \
-		   main.ln \
-		   out.ln \
-		   tree.ln
-
-$(MANDOC_HTML_OBJS) $(MANDOC_HTML_LNS): html.h
-$(MANDOC_TERM_OBJS) $(MANDOC_TERM_LNS): term.h
-$(MANDOC_OBJS) $(MANDOC_LNS): main.h mandoc.h mdoc.h man.h config.h out.h
+$(MANDOC_OBJS): main.h mandoc.h mdoc.h man.h config.h out.h
 
 MANDOCDB_OBJS	 = mandocdb.o manpath.o
-MANDOCDB_LNS	 = mandocdb.ln manpath.ln
-
-$(MANDOCDB_OBJS) $(MANDOCDB_LNS): mandocdb.h mandoc.h mdoc.h man.h config.h manpath.h
+$(MANDOCDB_OBJS): mandocdb.h mandoc.h mdoc.h man.h config.h manpath.h
 
 PRECONV_OBJS	 = preconv.o
-PRECONV_LNS	 = preconv.ln
-
-$(PRECONV_OBJS) $(PRECONV_LNS): config.h
-
-APROPOS_OBJS	 = apropos.o apropos_db.o manpath.o
-APROPOS_LNS	 = apropos.ln apropos_db.ln manpath.ln
-
-$(APROPOS_OBJS) $(APROPOS_LNS): config.h mandoc.h apropos_db.h manpath.h mandocdb.h
-
-CGI_OBJS	 = $(MANDOC_HTML_OBJS) \
-		   $(MANDOC_MAN_OBJS) \
-		   $(MANDOC_TERM_OBJS) \
-		   cgi.o \
-		   apropos_db.o \
-		   manpath.o \
-		   out.o \
-		   tree.o
+$(PRECONV_OBJS): config.h
 
-CGI_LNS	 	 = $(MANDOC_HTML_LNS) \
-		   $(MANDOC_MAN_LNS) \
-		   $(MANDOC_TERM_LNS) \
-		   cgi.ln \
-		   apropos_db.ln \
-		   manpath.ln \
-		   out.ln \
-		   tree.ln
+APROPOS_OBJS	 = apropos.o mansearch.o manpath.o
+$(APROPOS_OBJS): config.h manpath.h mandocdb.h mansearch.h
 
-$(CGI_OBJS) $(CGI_LNS): main.h mdoc.h man.h out.h config.h mandoc.h apropos_db.h manpath.h mandocdb.h
-
-CATMAN_OBJS	 = catman.o manpath.o
-CATMAN_LNS 	 = catman.ln manpath.ln
-
-$(CATMAN_OBJS) $(CATMAN_LNS): config.h mandoc.h manpath.h mandocdb.h
+MANPAGE_OBJS	 = manpage.o mansearch.o manpath.o
+$(MANPAGE_OBJS): config.h manpath.h mandocdb.h mansearch.h
 
 DEMANDOC_OBJS	 = demandoc.o
-DEMANDOC_LNS	 = demandoc.ln
-
-$(DEMANDOC_OBJS) $(DEMANDOC_LNS): config.h
+$(DEMANDOC_OBJS): config.h
 
-INDEX_MANS	 = apropos.1.html \
-		   apropos.1.xhtml \
-		   apropos.1.ps \
-		   apropos.1.pdf \
-		   apropos.1.txt \
-		   catman.8.html \
-		   catman.8.xhtml \
-		   catman.8.ps \
-		   catman.8.pdf \
-		   catman.8.txt \
-		   demandoc.1.html \
+INDEX_MANS	 = demandoc.1.html \
 		   demandoc.1.xhtml \
 		   demandoc.1.ps \
 		   demandoc.1.pdf \
@@ -363,11 +256,6 @@ INDEX_MANS	 = apropos.1.html \
 		   mandoc.1.ps \
 		   mandoc.1.pdf \
 		   mandoc.1.txt \
-		   whatis.1.html \
-		   whatis.1.xhtml \
-		   whatis.1.ps \
-		   whatis.1.pdf \
-		   whatis.1.txt \
 		   mandoc.3.html \
 		   mandoc.3.xhtml \
 		   mandoc.3.ps \
@@ -383,11 +271,6 @@ INDEX_MANS	 = apropos.1.html \
 		   man.7.ps \
 		   man.7.pdf \
 		   man.7.txt \
-		   man.cgi.7.html \
-		   man.cgi.7.xhtml \
-		   man.cgi.7.ps \
-		   man.cgi.7.pdf \
-		   man.cgi.7.txt \
 		   mandoc_char.7.html \
 		   mandoc_char.7.xhtml \
 		   mandoc_char.7.ps \
@@ -430,38 +313,18 @@ INDEX_OBJS	 = $(INDEX_MANS) \
 
 www: index.html
 
-lint: llib-lmandoc.ln llib-lpreconv.ln llib-ldemandoc.ln $(DBLN)
-
 clean:
 	rm -f libmandoc.a $(LIBMANDOC_OBJS)
-	rm -f llib-llibmandoc.ln $(LIBMANDOC_LNS)
+	rm -f apropos $(APROPOS_OBJS)
 	rm -f mandocdb $(MANDOCDB_OBJS)
-	rm -f llib-lmandocdb.ln $(MANDOCDB_LNS)
 	rm -f preconv $(PRECONV_OBJS)
-	rm -f llib-lpreconv.ln $(PRECONV_LNS)
-	rm -f apropos whatis $(APROPOS_OBJS)
-	rm -f llib-lapropos.ln $(APROPOS_LNS)
-	rm -f man.cgi $(CGI_OBJS)
-	rm -f llib-lman.cgi.ln $(CGI_LNS)
-	rm -f catman $(CATMAN_OBJS)
-	rm -f llib-lcatman.ln $(CATMAN_LNS)
+	rm -f manpage $(MANPAGE_OBJS)
 	rm -f demandoc $(DEMANDOC_OBJS)
-	rm -f llib-ldemandoc.ln $(DEMANDOC_LNS)
 	rm -f mandoc $(MANDOC_OBJS)
-	rm -f llib-lmandoc.ln $(MANDOC_LNS)
-	rm -f config.h config.log $(COMPAT_OBJS) $(COMPAT_LNS)
-	rm -f mdocml.tar.gz mdocml-win32.zip mdocml-win64.zip mdocml-macosx.zip
+	rm -f config.h config.log $(COMPAT_OBJS)
+	rm -f mdocml.tar.gz
 	rm -f index.html $(INDEX_OBJS)
-	rm -rf test-fgetln.dSYM
-	rm -rf test-strlcpy.dSYM
-	rm -rf test-strlcat.dSYM 
-	rm -rf test-strptime.dSYM 
-	rm -rf test-mmap.dSYM 
-	rm -rf test-getsubopt.dSYM
-	rm -rf apropos.dSYM
-	rm -rf catman.dSYM
-	rm -rf mandocdb.dSYM
-	rm -rf whatis.dSYM
+	rm -rf *.dSYM
 
 install: all
 	mkdir -p $(DESTDIR)$(BINDIR)
@@ -482,7 +345,7 @@ install: all
 installcgi: all
 	mkdir -p $(DESTDIR)$(CGIBINDIR)
 	mkdir -p $(DESTDIR)$(HTDOCDIR)
-	$(INSTALL_PROGRAM) man.cgi $(DESTDIR)$(CGIBINDIR)
+	#$(INSTALL_PROGRAM) man.cgi $(DESTDIR)$(CGIBINDIR)
 	$(INSTALL_DATA) example.style.css $(DESTDIR)$(HTDOCDIR)/man.css
 	$(INSTALL_DATA) man-cgi.css $(DESTDIR)$(HTDOCDIR)
 
@@ -500,54 +363,24 @@ installwww: www
 libmandoc.a: $(COMPAT_OBJS) $(LIBMANDOC_OBJS)
 	$(AR) rs $@ $(COMPAT_OBJS) $(LIBMANDOC_OBJS)
 
-llib-llibmandoc.ln: $(COMPAT_LNS) $(LIBMANDOC_LNS)
-	$(LINT) $(LINTFLAGS) -Clibmandoc $(COMPAT_LNS) $(LIBMANDOC_LNS)
-
 mandoc: $(MANDOC_OBJS) libmandoc.a
 	$(CC) $(LDFLAGS) -o $@ $(MANDOC_OBJS) libmandoc.a
 
-llib-lmandoc.ln: $(MANDOC_LNS) llib-llibmandoc.ln
-	$(LINT) $(LINTFLAGS) -Cmandoc $(MANDOC_LNS) llib-llibmandoc.ln
-
 mandocdb: $(MANDOCDB_OBJS) libmandoc.a
 	$(CC) $(LDFLAGS) -o $@ $(MANDOCDB_OBJS) libmandoc.a $(DBLIB)
 
-llib-lmandocdb.ln: $(MANDOCDB_LNS) llib-llibmandoc.ln
-	$(LINT) $(LINTFLAGS) -Cmandocdb $(MANDOCDB_LNS) llib-llibmandoc.ln
-
 preconv: $(PRECONV_OBJS)
 	$(CC) $(LDFLAGS) -o $@ $(PRECONV_OBJS)
 
-llib-lpreconv.ln: $(PRECONV_LNS) llib-llibmandoc.ln
-	$(LINT) $(LINTFLAGS) -Cpreconv $(PRECONV_LNS) llib-llibmandoc.ln
-
-whatis: apropos
-	cp -f apropos whatis
+manpage: $(MANPAGE_OBJS) libmandoc.a
+	$(CC) $(LDFLAGS) -o $@ $(MANPAGE_OBJS) libmandoc.a $(DBLIB)
 
 apropos: $(APROPOS_OBJS) libmandoc.a
 	$(CC) $(LDFLAGS) -o $@ $(APROPOS_OBJS) libmandoc.a $(DBLIB)
 
-llib-lapropos.ln: $(APROPOS_LNS) llib-llibmandoc.ln
-	$(LINT) $(LINTFLAGS) -Capropos $(APROPOS_LNS) llib-llibmandoc.ln
-
-catman: $(CATMAN_OBJS) libmandoc.a
-	$(CC) $(LDFLAGS) -o $@ $(CATMAN_OBJS) libmandoc.a $(DBLIB)
-
-llib-lcatman.ln: $(CATMAN_LNS) llib-llibmandoc.ln
-	$(LINT) $(LINTFLAGS) -Ccatman $(CATMAN_LNS) llib-llibmandoc.ln
-
-man.cgi: $(CGI_OBJS) libmandoc.a
-	$(CC) $(LDFLAGS) $(STATIC) -o $@ $(CGI_OBJS) libmandoc.a $(DBLIB)
-
-llib-lman.cgi.ln: $(CGI_LNS) llib-llibmandoc.ln
-	$(LINT) $(LINTFLAGS) -Cman.cgi $(CGI_LNS) llib-llibmandoc.ln
-
 demandoc: $(DEMANDOC_OBJS) libmandoc.a
 	$(CC) $(LDFLAGS) -o $@ $(DEMANDOC_OBJS) libmandoc.a
 
-llib-ldemandoc.ln: $(DEMANDOC_LNS) llib-llibmandoc.ln
-	$(LINT) $(LINTFLAGS) -Cdemandoc $(DEMANDOC_LNS) llib-llibmandoc.ln
-
 mdocml.md5: mdocml.tar.gz
 	md5 mdocml.tar.gz >$@
 
@@ -556,37 +389,6 @@ mdocml.tar.gz: $(SRCS)
 	$(INSTALL_SOURCE) $(SRCS) .dist/mdocml-$(VERSION)
 	( cd .dist/ && tar zcf ../$@ ./ )
 	rm -rf .dist/
-
-mdocml-win32.zip: $(SRCS)
-	mkdir -p .win32/mdocml-$(VERSION)/
-	$(INSTALL_SOURCE) $(SRCS) .win32
-	cp .win32/Makefile .win32/Makefile.old
-	egrep -v -e DUSE_WCHAR -e ^DBBIN .win32/Makefile.old >.win32/Makefile
-	( cd .win32; \
-		CC=i686-w64-mingw32-gcc AR=i686-w64-mingw32-ar CFLAGS='-DOSNAME=\"Windows\"' make; \
-		make install PREFIX=mdocml-$(VERSION) ; \
-		zip -r ../$@ mdocml-$(VERSION) )
-	rm -rf .win32
-
-mdocml-win64.zip: $(SRCS)
-	mkdir -p .win64/mdocml-$(VERSION)/
-	$(INSTALL_SOURCE) $(SRCS) .win64
-	cp .win64/Makefile .win64/Makefile.old
-	egrep -v -e DUSE_WCHAR -e ^DBBIN .win64/Makefile.old >.win64/Makefile
-	( cd .win64; \
-		CC=x86_64-w64-mingw32-gcc AR=x86_64-w64-mingw32-ar CFLAGS='-DOSNAME=\"Windows\"' make; \
-		make install PREFIX=mdocml-$(VERSION) ; \
-		zip -r ../$@ mdocml-$(VERSION) )
-	rm -rf .win64
-
-mdocml-macosx.zip: $(SRCS)
-	mkdir -p .macosx/mdocml-$(VERSION)/
-	$(INSTALL_SOURCE) $(SRCS) .macosx
-	( cd .macosx; \
-		CFLAGS="-arch i386 -arch x86_64 -arch ppc" LDFLAGS="-arch i386 -arch x86_64 -arch ppc" make; \
-		make install PREFIX=mdocml-$(VERSION) ; \
-		zip -r ../$@ mdocml-$(VERSION) )
-	rm -rf .macosx
 
 index.html: $(INDEX_OBJS)
 
Index: apropos.1
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/apropos.1,v
retrieving revision 1.17
diff -u -p -r1.17 apropos.1
--- apropos.1	24 Mar 2012 01:46:25 -0000	1.17
+++ apropos.1	7 Jun 2012 15:09:33 -0000
@@ -42,13 +42,11 @@ By default,
 searches for
 .Xr mandocdb 8
 databases in the default paths stipulated by
-.Xr man 1 ,
-parses terms as case-sensitive regular expressions
+.Xr man 1
+and
+parses terms as case-sensitive words
 over manual names and descriptions.
-Multiple terms imply pairwise
-.Fl o .
-If standard output is a TTY, a result may be selected from a list and
-its manual displayed with the pager.
+Multiple terms are OR'd.
 .Pp
 Its arguments are as follows:
 .Bl -tag -width Ds
@@ -81,41 +79,8 @@ for a listing of manual sections.
 .Pp
 An
 .Ar expression
-consists of search terms joined by logical operators
-.Fl a
-.Pq and
-and
-.Fl o
-.Pq or .
-The
-.Fl a
-operator has precedence over
-.Fl o
-and both are evaluated left-to-right.
-.Bl -tag -width Ds
-.It \&( Ar expr No \&)
-True if the subexpression
-.Ar expr
-is true.
-.It Ar expr1 Fl a Ar expr2
-True if both
-.Ar expr1
-and
-.Ar expr2
-are true (logical
-.Qq and ) .
-.It Ar expr1 Oo Fl o Oc Ar expr2
-True if
-.Ar expr1
-and/or
-.Ar expr2
-evaluate to true (logical
-.Qq or ) .
-.It Ar term
-True if
-.Ar term
-is satisfied.
-This has syntax
+consists of type and keyword pairs.
+This pair syntax
 .Li [key[,key]*(=~)]?val ,
 where operand
 .Cm key
@@ -129,22 +94,15 @@ See
 for a list of available keys.
 Operator
 .Li \&=
-evaluates a substring, while
+evaluates a full string, while
 .Li \&~
-evaluates a regular expression.
-.It Fl i Ar term
-If
-.Ar term
-is a regular expression, it
-is evaluated case-insensitively.
-Has no effect on substring terms.
-.El
+evaluates a
+.Xr glob 7
+pattern.
 .Pp
 Results are sorted by manual title, with output formatted as
-.Pp
-.D1 title(sec) \- description
-.Pp
-Where
+.Qq title(sec) \- description
+where
 .Qq title
 is the manual's title (note multiple manual names may exist for one
 title),
@@ -153,24 +111,7 @@ is the manual section, and
 .Qq description
 is the manual's short description.
 If an architecture is specified for the manual, it is displayed as
-.Pp
-.D1 title(cat/arch) \- description
-.Pp
-If on a TTY, results are prefixed with a numeric identifier.
-.Pp
-.D1 [index] title(cat) \- description
-.Pp
-One may choose a manual be entering the index at the prompt.
-Valid choices are displayed using
-.Ev MANPAGER ,
-or failing that ,
-.Ev PAGER
-or just
-.Xr more 1 .
-Source pages are formatted with
-.Xr mandoc 1 ;
-preformatted pages with
-.Xr cat 1 .
+.Qq title(cat/arch) \- description .
 .Ss Macro Keys
 Queries evaluate over a subset of
 .Xr mdoc 7
@@ -248,14 +189,6 @@ Text production:
 .El
 .Sh ENVIRONMENT
 .Bl -tag -width Ds
-.It Ev MANPAGER
-Default pager for manuals.
-If this is unset, falls back to
-.Ev Pager .
-.It Ev PAGER
-The second choice for a manual pager.
-If this is unset, use
-.Xr more 1 .
 .It Ev MANPATH
 Colon-separated paths modifying the default list of paths searched for
 manual databases.
@@ -294,31 +227,30 @@ configuration file
 .Sh EXAMPLES
 Search for
 .Qq mdoc
-as a substring and regular expression
-within each manual name and description:
+as a word or
+.Xr glob 7
+expression:
 .Pp
 .Dl $ apropos mdoc
-.Dl $ apropos ~^mdoc$
+.Dl $ apropos any~mdoc*
 .Pp
 Include matches for
 .Qq roff
 and
 .Qq man
-for the regular expression case:
+using
+.Xr glob 7
+expressions:
 .Pp
-.Dl $ apropos ~^mdoc$ roff man
-.Dl $ apropos ~^mdoc$ \-o roff \-o man
+.Dl $ apropos ~*mdoc* ~*roff*
 .Pp
 Search for
-.Qq optind
-and
 .Qq optarg
-as variable names in the library category:
+as a variable name in the library category:
 .Pp
-.Dl $ apropos \-s 3 Va~^optind \-a Va~^optarg$
+.Dl $ apropos \-s 3 Va=optarg
 .Sh SEE ALSO
-.Xr more 1
-.Xr re_format 7 ,
+.Xr glob 7 ,
 .Xr mandocdb 8
 .Sh AUTHORS
 The
Index: apropos.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/apropos.c,v
retrieving revision 1.30
diff -u -p -r1.30 apropos.c
--- apropos.c	24 Mar 2012 02:18:51 -0000	1.30
+++ apropos.c	7 Jun 2012 15:09:33 -0000
@@ -1,7 +1,6 @@
-/*	$Id: apropos.c,v 1.30 2012/03/24 02:18:51 kristaps Exp $ */
+/*	$Id: mandocdb.c,v 1.46 2012/03/23 06:52:17 kristaps Exp $ */
 /*
- * Copyright (c) 2011, 2012 Kristaps Dzonsons <kristaps@bsd.lv>
- * Copyright (c) 2011 Ingo Schwarze <schwarze@openbsd.org>
+ * Copyright (c) 2012 Kristaps Dzonsons <kristaps@bsd.lv>
  *
  * Permission to use, copy, modify, and distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
@@ -27,38 +26,21 @@
 #include <string.h>
 #include <unistd.h>
 
-#include "apropos_db.h"
-#include "mandoc.h"
 #include "manpath.h"
-
-#define	SINGLETON(_res, _sz) \
-	((_sz) && (_res)[0].matched && \
-	 (1 == (_sz) || 0 == (_res)[1].matched))
-#define	EMPTYSET(_res, _sz) \
-	((0 == (_sz)) || 0 == (_res)[0].matched)
-
-static	int	 cmp(const void *, const void *);
-static	void	 list(struct res *, size_t, void *);
-static	void	 usage(void);
-
-static	char	*progname;
+#include "mansearch.h"
 
 int
 main(int argc, char *argv[])
 {
-	int		 ch, rc, whatis, usecat;
-	struct res	*res;
+	int		 ch;
+	size_t		 i, sz;
+	struct manpage	*res;
+	char		*conf_file, *defpaths, *auxpaths,
+			*arch, *sec;
 	struct manpaths	 paths;
-	const char	*prog;
-	pid_t		 pid;
-	char		 path[PATH_MAX];
-	int		 fds[2];
-	size_t		 terms, ressz, sz;
-	struct opts	 opts;
-	struct expr	*e;
-	char		*defpaths, *auxpaths, *conf_file, *cp;
-	extern int	 optind;
+	char		*progname;
 	extern char	*optarg;
+	extern int	 optind;
 
 	progname = strrchr(argv[0], '/');
 	if (progname == NULL)
@@ -66,18 +48,8 @@ main(int argc, char *argv[])
 	else
 		++progname;
 
-	whatis = 0 == strncmp(progname, "whatis", 6);
-
+	auxpaths = defpaths = conf_file = arch = sec = NULL;
 	memset(&paths, 0, sizeof(struct manpaths));
-	memset(&opts, 0, sizeof(struct opts));
-
-	usecat = 0;
-	ressz = 0;
-	res = NULL;
-	auxpaths = defpaths = NULL;
-	conf_file = NULL;
-	e = NULL;
-	path[0] = '\0';
 
 	while (-1 != (ch = getopt(argc, argv, "C:M:m:S:s:")))
 		switch (ch) {
@@ -91,149 +63,42 @@ main(int argc, char *argv[])
 			auxpaths = optarg;
 			break;
 		case ('S'):
-			opts.arch = optarg;
+			arch = optarg;
 			break;
 		case ('s'):
-			opts.cat = optarg;
+			sec = optarg;
 			break;
 		default:
-			usage();
-			return(EXIT_FAILURE);
+			goto usage;
 		}
 
 	argc -= optind;
 	argv += optind;
 
-	if (0 == argc) 
-		return(EXIT_SUCCESS);
-
-	rc = 0;
+	if (0 == argc)
+		goto usage;
 
 	manpath_parse(&paths, conf_file, defpaths, auxpaths);
-
-	e = whatis ? termcomp(argc, argv, &terms) :
-		     exprcomp(argc, argv, &terms);
-		
-	if (NULL == e) {
-		fprintf(stderr, "%s: Bad expression\n", progname);
-		goto out;
-	}
-
-	rc = apropos_search
-		(paths.sz, paths.paths, &opts, 
-		 e, terms, NULL, &ressz, &res, list);
-
-	terms = 1;
-
-	if (0 == rc) {
-		fprintf(stderr, "%s: Bad database\n", progname);
-		goto out;
-	} else if ( ! isatty(STDOUT_FILENO) || EMPTYSET(res, ressz))
-		goto out;
-
-	if ( ! SINGLETON(res, ressz)) {
-		printf("Which manpage would you like [1]? ");
-		fflush(stdout);
-		if (NULL != (cp = fgetln(stdin, &sz)) && 
-				sz > 1 && '\n' == cp[--sz]) {
-			if ((ch = atoi(cp)) <= 0)
-				goto out;
-			terms = (size_t)ch;
-		}
-	}
-
-	if (--terms < ressz && res[terms].matched) {
-		chdir(paths.paths[res[terms].volume]);
-		strlcpy(path, res[terms].file, PATH_MAX);
-		usecat = RESTYPE_CAT == res[terms].type;
-	}
-out:
+	ch = mansearch(&paths, arch, sec, argc, argv, &res, &sz);
 	manpath_free(&paths);
-	resfree(res, ressz);
-	exprfree(e);
 
-	if ('\0' == path[0])
-		return(rc ? EXIT_SUCCESS : EXIT_FAILURE);
+	if (0 == ch)
+		goto usage;
 
-	if (-1 == pipe(fds)) {
-		perror(NULL);
-		exit(EXIT_FAILURE);
+	for (i = 0; i < sz; i++) {
+		printf("%s - %s\n", res[i].file, res[i].desc);
+		free(res[i].desc);
 	}
 
-	if (-1 == (pid = fork())) {
-		perror(NULL);
-		exit(EXIT_FAILURE);
-	} else if (pid > 0) {
-		dup2(fds[0], STDIN_FILENO);
-		close(fds[1]);
-		prog = NULL != getenv("MANPAGER") ? 
-			getenv("MANPAGER") :
-			(NULL != getenv("PAGER") ? 
-			 getenv("PAGER") : "more");
-		execlp(prog, prog, (char *)NULL);
-		perror(prog);
-		return(EXIT_FAILURE);
-	}
-
-	dup2(fds[1], STDOUT_FILENO);
-	close(fds[0]);
-	prog = usecat ? "cat" : "mandoc";
-	execlp(prog, prog, path, (char *)NULL);
-	perror(prog);
+	free(res);
+	return(sz ? EXIT_SUCCESS : EXIT_FAILURE);
+usage:
+	fprintf(stderr, "usage: %s [-C conf] "
+			 	  "[-M paths] "
+				  "[-m paths] "
+				  "[-S arch] "
+				  "[-s section] "
+			          "expr ...\n", 
+				  progname);
 	return(EXIT_FAILURE);
-}
-
-/* ARGSUSED */
-static void
-list(struct res *res, size_t sz, void *arg)
-{
-	size_t		 i;
-
-	qsort(res, sz, sizeof(struct res), cmp);
-
-	if (EMPTYSET(res, sz) || SINGLETON(res, sz))
-		return;
-
-	if ( ! isatty(STDOUT_FILENO))
-		for (i = 0; i < sz && res[i].matched; i++)
-			printf("%s(%s%s%s) - %.70s\n", 
-					res[i].title, res[i].cat,
-					*res[i].arch ? "/" : "",
-					*res[i].arch ? res[i].arch : "",
-					res[i].desc);
-	else
-		for (i = 0; i < sz && res[i].matched; i++)
-			printf("[%zu] %s(%s%s%s) - %.70s\n", i + 1,
-					res[i].title, res[i].cat,
-					*res[i].arch ? "/" : "",
-					*res[i].arch ? res[i].arch : "",
-					res[i].desc);
-}
-
-static int
-cmp(const void *p1, const void *p2)
-{
-	const struct res *r1 = p1;
-	const struct res *r2 = p2;
-
-	if (0 == r1->matched)
-		return(1);
-	else if (0 == r2->matched)
-		return(1);
-
-	return(strcasecmp(r1->title, r2->title));
-}
-
-static void
-usage(void)
-{
-
-	fprintf(stderr, "usage: %s "
-			"[-C file] "
-			"[-M manpath] "
-			"[-m manpath] "
-			"[-S arch] "
-			"[-s section] "
-			"expression ...\n",
-			progname);
 }
Index: apropos_db.c
===================================================================
RCS file: apropos_db.c
diff -N apropos_db.c
--- apropos_db.c	25 Mar 2012 00:48:47 -0000	1.32
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,879 +0,0 @@
-/*	$Id: apropos_db.c,v 1.32 2012/03/25 00:48:47 kristaps Exp $ */
-/*
- * Copyright (c) 2011, 2012 Kristaps Dzonsons <kristaps@bsd.lv>
- * Copyright (c) 2011 Ingo Schwarze <schwarze@openbsd.org>
- *
- * Permission to use, copy, modify, and distribute this software for any
- * purpose with or without fee is hereby granted, provided that the above
- * copyright notice and this permission notice appear in all copies.
- *
- * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
- * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
- * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
- * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
- * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
- * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
- * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
- */
-#ifdef HAVE_CONFIG_H
-#include "config.h"
-#endif
-
-#include <sys/param.h>
-
-#include <assert.h>
-#include <fcntl.h>
-#include <regex.h>
-#include <stdarg.h>
-#include <stdint.h>
-#include <stdlib.h>
-#include <string.h>
-#include <unistd.h>
-
-#if defined(__linux__)
-# include <endian.h>
-# include <db_185.h>
-#elif defined(__APPLE__)
-# include <libkern/OSByteOrder.h>
-# include <db.h>
-#else
-# include <db.h>
-#endif
-
-#include "mandocdb.h"
-#include "apropos_db.h"
-#include "mandoc.h"
-
-#define	RESFREE(_x) \
-	do { \
-		free((_x)->file); \
-		free((_x)->cat); \
-		free((_x)->title); \
-		free((_x)->arch); \
-		free((_x)->desc); \
-		free((_x)->matches); \
-	} while (/*CONSTCOND*/0)
-
-struct	expr {
-	int		 regex; /* is regex? */
-	int		 index; /* index in match array */
-	uint64_t 	 mask; /* type-mask */
-	int		 and; /* is rhs of logical AND? */
-	char		*v; /* search value */
-	regex_t	 	 re; /* compiled re, if regex */
-	struct expr	*next; /* next in sequence */
-	struct expr	*subexpr;
-};
-
-struct	type {
-	uint64_t	 mask;
-	const char	*name;
-};
-
-struct	rectree {
-	struct res	*node; /* record array for dir tree */
-	int		 len; /* length of record array */
-};
-
-static	const struct type types[] = {
-	{ TYPE_An, "An" },
-	{ TYPE_Ar, "Ar" },
-	{ TYPE_At, "At" },
-	{ TYPE_Bsx, "Bsx" },
-	{ TYPE_Bx, "Bx" },
-	{ TYPE_Cd, "Cd" },
-	{ TYPE_Cm, "Cm" },
-	{ TYPE_Dv, "Dv" },
-	{ TYPE_Dx, "Dx" },
-	{ TYPE_Em, "Em" },
-	{ TYPE_Er, "Er" },
-	{ TYPE_Ev, "Ev" },
-	{ TYPE_Fa, "Fa" },
-	{ TYPE_Fl, "Fl" },
-	{ TYPE_Fn, "Fn" },
-	{ TYPE_Fn, "Fo" },
-	{ TYPE_Ft, "Ft" },
-	{ TYPE_Fx, "Fx" },
-	{ TYPE_Ic, "Ic" },
-	{ TYPE_In, "In" },
-	{ TYPE_Lb, "Lb" },
-	{ TYPE_Li, "Li" },
-	{ TYPE_Lk, "Lk" },
-	{ TYPE_Ms, "Ms" },
-	{ TYPE_Mt, "Mt" },
-	{ TYPE_Nd, "Nd" },
-	{ TYPE_Nm, "Nm" },
-	{ TYPE_Nx, "Nx" },
-	{ TYPE_Ox, "Ox" },
-	{ TYPE_Pa, "Pa" },
-	{ TYPE_Rs, "Rs" },
-	{ TYPE_Sh, "Sh" },
-	{ TYPE_Ss, "Ss" },
-	{ TYPE_St, "St" },
-	{ TYPE_Sy, "Sy" },
-	{ TYPE_Tn, "Tn" },
-	{ TYPE_Va, "Va" },
-	{ TYPE_Va, "Vt" },
-	{ TYPE_Xr, "Xr" },
-	{ UINT64_MAX, "any" },
-	{ 0, NULL }
-};
-
-static	DB	*btree_open(void);
-static	int	 btree_read(const DBT *, const DBT *,
-			const struct mchars *,
-			uint64_t *, recno_t *, char **);
-static	int	 expreval(const struct expr *, int *);
-static	void	 exprexec(const struct expr *,
-			const char *, uint64_t, struct res *);
-static	int	 exprmark(const struct expr *,
-			const char *, uint64_t, int *);
-static	struct expr *exprexpr(int, char *[], int *, int *, size_t *);
-static	struct expr *exprterm(char *, int);
-static	DB	*index_open(void);
-static	int	 index_read(const DBT *, const DBT *, int,
-			const struct mchars *, struct res *);
-static	void	 norm_string(const char *,
-			const struct mchars *, char **);
-static	size_t	 norm_utf8(unsigned int, char[7]);
-static	int	 single_search(struct rectree *, const struct opts *,
-			const struct expr *, size_t terms,
-			struct mchars *, int);
-
-/*
- * Open the keyword mandoc-db database.
- */
-static DB *
-btree_open(void)
-{
-	BTREEINFO	 info;
-	DB		*db;
-
-	memset(&info, 0, sizeof(BTREEINFO));
-	info.lorder = 4321;
-	info.flags = R_DUP;
-
-	db = dbopen(MANDOC_DB, O_RDONLY, 0, DB_BTREE, &info);
-	if (NULL != db)
-		return(db);
-
-	return(NULL);
-}
-
-/*
- * Read a keyword from the database and normalise it.
- * Return 0 if the database is insane, else 1.
- */
-static int
-btree_read(const DBT *k, const DBT *v, const struct mchars *mc,
-		uint64_t *mask, recno_t *rec, char **buf)
-{
-	uint64_t	 vbuf[2];
-
-	/* Are our sizes sane? */
-	if (k->size < 2 || sizeof(vbuf) != v->size)
-		return(0);
-
-	/* Is our string nil-terminated? */
-	if ('\0' != ((const char *)k->data)[(int)k->size - 1])
-		return(0);
-
-	norm_string((const char *)k->data, mc, buf);
-	memcpy(vbuf, v->data, v->size);
-	*mask = betoh64(vbuf[0]);
-	*rec  = betoh64(vbuf[1]);
-	return(1);
-}
-
-/*
- * Take a Unicode codepoint and produce its UTF-8 encoding.
- * This isn't the best way to do this, but it works.
- * The magic numbers are from the UTF-8 packaging.
- * They're not as scary as they seem: read the UTF-8 spec for details.
- */
-static size_t
-norm_utf8(unsigned int cp, char out[7])
-{
-	int		 rc;
-
-	rc = 0;
-
-	if (cp <= 0x0000007F) {
-		rc = 1;
-		out[0] = (char)cp;
-	} else if (cp <= 0x000007FF) {
-		rc = 2;
-		out[0] = (cp >> 6  & 31) | 192;
-		out[1] = (cp       & 63) | 128;
-	} else if (cp <= 0x0000FFFF) {
-		rc = 3;
-		out[0] = (cp >> 12 & 15) | 224;
-		out[1] = (cp >> 6  & 63) | 128;
-		out[2] = (cp       & 63) | 128;
-	} else if (cp <= 0x001FFFFF) {
-		rc = 4;
-		out[0] = (cp >> 18 & 7) | 240;
-		out[1] = (cp >> 12 & 63) | 128;
-		out[2] = (cp >> 6  & 63) | 128;
-		out[3] = (cp       & 63) | 128;
-	} else if (cp <= 0x03FFFFFF) {
-		rc = 5;
-		out[0] = (cp >> 24 & 3) | 248;
-		out[1] = (cp >> 18 & 63) | 128;
-		out[2] = (cp >> 12 & 63) | 128;
-		out[3] = (cp >> 6  & 63) | 128;
-		out[4] = (cp       & 63) | 128;
-	} else if (cp <= 0x7FFFFFFF) {
-		rc = 6;
-		out[0] = (cp >> 30 & 1) | 252;
-		out[1] = (cp >> 24 & 63) | 128;
-		out[2] = (cp >> 18 & 63) | 128;
-		out[3] = (cp >> 12 & 63) | 128;
-		out[4] = (cp >> 6  & 63) | 128;
-		out[5] = (cp       & 63) | 128;
-	} else
-		return(0);
-
-	out[rc] = '\0';
-	return((size_t)rc);
-}
-
-/*
- * Normalise strings from the index and database.
- * These strings are escaped as defined by mandoc_char(7) along with
- * other goop in mandoc.h (e.g., soft hyphens).
- * This function normalises these into a nice UTF-8 string.
- * Returns 0 if the database is fucked.
- */
-static void
-norm_string(const char *val, const struct mchars *mc, char **buf)
-{
-	size_t		  sz, bsz;
-	char		  utfbuf[7];
-	const char	 *seq, *cpp;
-	int		  len, u, pos;
-	enum mandoc_esc	  esc;
-	static const char res[] = { '\\', '\t',
-				ASCII_NBRSP, ASCII_HYPH, '\0' };
-
-	/* Pre-allocate by the length of the input */
-
-	bsz = strlen(val) + 1;
-	*buf = mandoc_realloc(*buf, bsz);
-	pos = 0;
-
-	while ('\0' != *val) {
-		/*
-		 * Halt on the first escape sequence.
-		 * This also halts on the end of string, in which case
-		 * we just copy, fallthrough, and exit the loop.
-		 */
-		if ((sz = strcspn(val, res)) > 0) {
-			memcpy(&(*buf)[pos], val, sz);
-			pos += (int)sz;
-			val += (int)sz;
-		}
-
-		if (ASCII_HYPH == *val) {
-			(*buf)[pos++] = '-';
-			val++;
-			continue;
-		} else if ('\t' == *val || ASCII_NBRSP == *val) {
-			(*buf)[pos++] = ' ';
-			val++;
-			continue;
-		} else if ('\\' != *val)
-			break;
-
-		/* Read past the slash. */
-
-		val++;
-		u = 0;
-
-		/*
-		 * Parse the escape sequence and see if it's a
-		 * predefined character or special character.
-		 */
-
-		esc = mandoc_escape(&val, &seq, &len);
-		if (ESCAPE_ERROR == esc)
-			break;
-
-		/*
-		 * XXX - this just does UTF-8, but we need to know
-		 * beforehand whether we should do text substitution.
-		 */
-
-		switch (esc) {
-		case (ESCAPE_SPECIAL):
-			if (0 != (u = mchars_spec2cp(mc, seq, len)))
-				break;
-			/* FALLTHROUGH */
-		default:
-			continue;
-		}
-
-		/*
-		 * If we have a Unicode codepoint, try to convert that
-		 * to a UTF-8 byte string.
-		 */
-
-		cpp = utfbuf;
-		if (0 == (sz = norm_utf8(u, utfbuf)))
-			continue;
-
-		/* Copy the rendered glyph into the stream. */
-
-		sz = strlen(cpp);
-		bsz += sz;
-
-		*buf = mandoc_realloc(*buf, bsz);
-
-		memcpy(&(*buf)[pos], cpp, sz);
-		pos += (int)sz;
-	}
-
-	(*buf)[pos] = '\0';
-}
-
-/*
- * Open the filename-index mandoc-db database.
- * Returns NULL if opening failed.
- */
-static DB *
-index_open(void)
-{
-	DB		*db;
-
-	db = dbopen(MANDOC_IDX, O_RDONLY, 0, DB_RECNO, NULL);
-	if (NULL != db)
-		return(db);
-
-	return(NULL);
-}
-
-/*
- * Safely unpack from an index file record into the structure.
- * Returns 1 if an entry was unpacked, 0 if the database is insane.
- */
-static int
-index_read(const DBT *key, const DBT *val, int index,
-		const struct mchars *mc, struct res *rec)
-{
-	size_t		 left;
-	char		*np, *cp;
-	char		 type;
-
-#define	INDEX_BREAD(_dst) \
-	do { \
-		if (NULL == (np = memchr(cp, '\0', left))) \
-			return(0); \
-		norm_string(cp, mc, &(_dst)); \
-		left -= (np - cp) + 1; \
-		cp = np + 1; \
-	} while (/* CONSTCOND */ 0)
-
-	if (0 == (left = val->size))
-		return(0);
-
-	cp = val->data;
-	assert(sizeof(recno_t) == key->size);
-	memcpy(&rec->rec, key->data, key->size);
-	rec->volume = index;
-
-	if ('d' == (type = *cp++))
-		rec->type = RESTYPE_MDOC;
-	else if ('a' == type)
-		rec->type = RESTYPE_MAN;
-	else if ('c' == type)
-		rec->type = RESTYPE_CAT;
-	else
-		return(0);
-
-	left--;
-	INDEX_BREAD(rec->file);
-	INDEX_BREAD(rec->cat);
-	INDEX_BREAD(rec->title);
-	INDEX_BREAD(rec->arch);
-	INDEX_BREAD(rec->desc);
-	return(1);
-}
-
-/*
- * Search mandocdb databases in paths for expression "expr".
- * Filter out by "opts".
- * Call "res" with the results, which may be zero.
- * Return 0 if there was a database error, else return 1.
- */
-int
-apropos_search(int pathsz, char **paths, const struct opts *opts,
-		const struct expr *expr, size_t terms, void *arg,
-		size_t *sz, struct res **resp,
-		void (*res)(struct res *, size_t, void *))
-{
-	struct rectree	 tree;
-	struct mchars	*mc;
-	int		 i, rc;
-
-	memset(&tree, 0, sizeof(struct rectree));
-
-	rc = 0;
-	mc = mchars_alloc();
-	*sz = 0;
-	*resp = NULL;
-
-	/*
-	 * Main loop.  Change into the directory containing manpage
-	 * databases.  Run our expession over each database in the set.
-	 */
-
-	for (i = 0; i < pathsz; i++) {
-		assert('/' == paths[i][0]);
-		if (chdir(paths[i]))
-			continue;
-		if (single_search(&tree, opts, expr, terms, mc, i))
-			continue;
-
-		resfree(tree.node, tree.len);
-		mchars_free(mc);
-		return(0);
-	}
-
-	(*res)(tree.node, tree.len, arg);
-	*sz = tree.len;
-	*resp = tree.node;
-	mchars_free(mc);
-	return(1);
-}
-
-static int
-single_search(struct rectree *tree, const struct opts *opts,
-		const struct expr *expr, size_t terms,
-		struct mchars *mc, int vol)
-{
-	int		 root, leaf, ch;
-	DBT		 key, val;
-	DB		*btree, *idx;
-	char		*buf;
-	struct res	*rs;
-	struct res	 r;
-	uint64_t	 mask;
-	recno_t		 rec;
-
-	root	= -1;
-	leaf	= -1;
-	btree	= NULL;
-	idx	= NULL;
-	buf	= NULL;
-	rs	= tree->node;
-
-	memset(&r, 0, sizeof(struct res));
-
-	if (NULL == (btree = btree_open()))
-		return(1);
-
-	if (NULL == (idx = index_open())) {
-		(*btree->close)(btree);
-		return(1);
-	}
-
-	while (0 == (ch = (*btree->seq)(btree, &key, &val, R_NEXT))) {
-		if ( ! btree_read(&key, &val, mc, &mask, &rec, &buf))
-			break;
-
-		/*
-		 * See if this keyword record matches any of the
-		 * expressions we have stored.
-		 */
-		if ( ! exprmark(expr, buf, mask, NULL))
-			continue;
-
-		/*
-		 * O(log n) scan for prior records.  Since a record
-		 * number is unbounded, this has decent performance over
-		 * a complex hash function.
-		 */
-
-		for (leaf = root; leaf >= 0; )
-			if (rec > rs[leaf].rec &&
-					rs[leaf].rhs >= 0)
-				leaf = rs[leaf].rhs;
-			else if (rec < rs[leaf].rec &&
-					rs[leaf].lhs >= 0)
-				leaf = rs[leaf].lhs;
-			else
-				break;
-
-		/*
-		 * If we find a record, see if it has already evaluated
-		 * to true.  If it has, great, just keep going.  If not,
-		 * try to evaluate it now and continue anyway.
-		 */
-
-		if (leaf >= 0 && rs[leaf].rec == rec) {
-			if (0 == rs[leaf].matched)
-				exprexec(expr, buf, mask, &rs[leaf]);
-			continue;
-		}
-
-		/*
-		 * We have a new file to examine.
-		 * Extract the manpage's metadata from the index
-		 * database, then begin partial evaluation.
-		 */
-
-		key.data = &rec;
-		key.size = sizeof(recno_t);
-
-		if (0 != (*idx->get)(idx, &key, &val, 0))
-			break;
-
-		r.lhs = r.rhs = -1;
-		if ( ! index_read(&key, &val, vol, mc, &r))
-			break;
-
-		/* XXX: this should be elsewhere, I guess? */
-
-		if (opts->cat && strcasecmp(opts->cat, r.cat))
-			continue;
-
-		if (opts->arch && *r.arch)
-			if (strcasecmp(opts->arch, r.arch))
-				continue;
-
-		tree->node = rs = mandoc_realloc
-			(rs, (tree->len + 1) * sizeof(struct res));
-
-		memcpy(&rs[tree->len], &r, sizeof(struct res));
-		memset(&r, 0, sizeof(struct res));
-		rs[tree->len].matches =
-			mandoc_calloc(terms, sizeof(int));
-
-		exprexec(expr, buf, mask, &rs[tree->len]);
-
-		/* Append to our tree. */
-
-		if (leaf >= 0) {
-			if (rec > rs[leaf].rec)
-				rs[leaf].rhs = tree->len;
-			else
-				rs[leaf].lhs = tree->len;
-		} else
-			root = tree->len;
-
-		tree->len++;
-	}
-
-	(*btree->close)(btree);
-	(*idx->close)(idx);
-
-	free(buf);
-	RESFREE(&r);
-	return(1 == ch);
-}
-
-void
-resfree(struct res *rec, size_t sz)
-{
-	size_t		 i;
-
-	for (i = 0; i < sz; i++)
-		RESFREE(&rec[i]);
-	free(rec);
-}
-
-/*
- * Compile a list of straight-up terms.
- * The arguments are re-written into ~[[:<:]]term[[:>:]], or "term"
- * surrounded by word boundaries, then pumped through exprterm().
- * Terms are case-insensitive.
- * This emulates whatis(1) behaviour.
- */
-struct expr *
-termcomp(int argc, char *argv[], size_t *tt)
-{
-	char		*buf;
-	int		 pos;
-	struct expr	*e, *next;
-	size_t		 sz;
-
-	buf = NULL;
-	e = NULL;
-	*tt = 0;
-
-	for (pos = argc - 1; pos >= 0; pos--) {
-		sz = strlen(argv[pos]) + 18;
-		buf = mandoc_realloc(buf, sz);
-		strlcpy(buf, "Nm~[[:<:]]", sz);
-		strlcat(buf, argv[pos], sz);
-		strlcat(buf, "[[:>:]]", sz);
-		if (NULL == (next = exprterm(buf, 0))) {
-			free(buf);
-			exprfree(e);
-			return(NULL);
-		}
-		next->next = e;
-		e = next;
-		(*tt)++;
-	}
-
-	free(buf);
-	return(e);
-}
-
-/*
- * Compile a sequence of logical expressions.
- * See apropos.1 for a grammar of this sequence.
- */
-struct expr *
-exprcomp(int argc, char *argv[], size_t *tt)
-{
-	int		 pos, lvl;
-	struct expr	*e;
-
-	pos = lvl = 0;
-	*tt = 0;
-
-	e = exprexpr(argc, argv, &pos, &lvl, tt);
-
-	if (0 == lvl && pos >= argc)
-		return(e);
-
-	exprfree(e);
-	return(NULL);
-}
-
-/*
- * Compile an array of tokens into an expression.
- * An informal expression grammar is defined in apropos(1).
- * Return NULL if we fail doing so.  All memory will be cleaned up.
- * Return the root of the expression sequence if alright.
- */
-static struct expr *
-exprexpr(int argc, char *argv[], int *pos, int *lvl, size_t *tt)
-{
-	struct expr	*e, *first, *next;
-	int		 log;
-
-	first = next = NULL;
-
-	for ( ; *pos < argc; (*pos)++) {
-		e = next;
-
-		/*
-		 * Close out a subexpression.
-		 */
-
-		if (NULL != e && 0 == strcmp(")", argv[*pos])) {
-			if (--(*lvl) < 0)
-				goto err;
-			break;
-		}
-
-		/*
-		 * Small note: if we're just starting, don't let "-a"
-		 * and "-o" be considered logical operators: they're
-		 * just tokens unless pairwise joining, in which case we
-		 * record their existence (or assume "OR").
-		 */
-		log = 0;
-
-		if (NULL != e && 0 == strcmp("-a", argv[*pos]))
-			log = 1;
-		else if (NULL != e && 0 == strcmp("-o", argv[*pos]))
-			log = 2;
-
-		if (log > 0 && ++(*pos) >= argc)
-			goto err;
-
-		/*
-		 * Now we parse the term part.  This can begin with
-		 * "-i", in which case the expression is case
-		 * insensitive.
-		 */
-
-		if (0 == strcmp("(", argv[*pos])) {
-			++(*pos);
-			++(*lvl);
-			next = mandoc_calloc(1, sizeof(struct expr));
-			next->subexpr = exprexpr(argc, argv, pos, lvl, tt);
-			if (NULL == next->subexpr) {
-				free(next);
-				next = NULL;
-			}
-		} else if (0 == strcmp("-i", argv[*pos])) {
-			if (++(*pos) >= argc)
-				goto err;
-			next = exprterm(argv[*pos], 0);
-		} else
-			next = exprterm(argv[*pos], 1);
-
-		if (NULL == next)
-			goto err;
-
-		next->and = log == 1;
-		next->index = (int)(*tt)++;
-
-		/* Append to our chain of expressions. */
-
-		if (NULL == first) {
-			assert(NULL == e);
-			first = next;
-		} else {
-			assert(NULL != e);
-			e->next = next;
-		}
-	}
-
-	return(first);
-err:
-	exprfree(first);
-	return(NULL);
-}
-
-/*
- * Parse a terminal expression with the grammar as defined in
- * apropos(1).
- * Return NULL if we fail the parse.
- */
-static struct expr *
-exprterm(char *buf, int cs)
-{
-	struct expr	 e;
-	struct expr	*p;
-	char		*key;
-	int		 i;
-
-	memset(&e, 0, sizeof(struct expr));
-
-	/* Choose regex or substring match. */
-
-	if (NULL == (e.v = strpbrk(buf, "=~"))) {
-		e.regex = 0;
-		e.v = buf;
-	} else {
-		e.regex = '~' == *e.v;
-		*e.v++ = '\0';
-	}
-
-	/* Determine the record types to search for. */
-
-	e.mask = 0;
-	if (buf < e.v) {
-		while (NULL != (key = strsep(&buf, ","))) {
-			i = 0;
-			while (types[i].mask &&
-					strcmp(types[i].name, key))
-				i++;
-			e.mask |= types[i].mask;
-		}
-	}
-	if (0 == e.mask)
-		e.mask = TYPE_Nm | TYPE_Nd;
-
-	if (e.regex) {
-		i = REG_EXTENDED | REG_NOSUB | (cs ? 0 : REG_ICASE);
-		if (regcomp(&e.re, e.v, i))
-			return(NULL);
-	}
-
-	e.v = mandoc_strdup(e.v);
-
-	p = mandoc_calloc(1, sizeof(struct expr));
-	memcpy(p, &e, sizeof(struct expr));
-	return(p);
-}
-
-void
-exprfree(struct expr *p)
-{
-	struct expr	*pp;
-
-	while (NULL != p) {
-		if (p->subexpr)
-			exprfree(p->subexpr);
-		if (p->regex)
-			regfree(&p->re);
-		free(p->v);
-		pp = p->next;
-		free(p);
-		p = pp;
-	}
-}
-
-static int
-exprmark(const struct expr *p, const char *cp,
-		uint64_t mask, int *ms)
-{
-
-	for ( ; p; p = p->next) {
-		if (p->subexpr) {
-			if (exprmark(p->subexpr, cp, mask, ms))
-				return(1);
-			continue;
-		} else if ( ! (mask & p->mask))
-			continue;
-
-		if (p->regex) {
-			if (regexec(&p->re, cp, 0, NULL, 0))
-				continue;
-		} else if (NULL == strcasestr(cp, p->v))
-			continue;
-
-		if (NULL == ms)
-			return(1);
-		else
-			ms[p->index] = 1;
-	}
-
-	return(0);
-}
-
-static int
-expreval(const struct expr *p, int *ms)
-{
-	int		 match;
-
-	/*
-	 * AND has precedence over OR.  Analysis is left-right, though
-	 * it doesn't matter because there are no side-effects.
-	 * Thus, step through pairwise ANDs and accumulate their Boolean
-	 * evaluation.  If we encounter a single true AND collection or
-	 * standalone term, the whole expression is true (by definition
-	 * of OR).
-	 */
-
-	for (match = 0; p && ! match; p = p->next) {
-		/* Evaluate a subexpression, if applicable. */
-		if (p->subexpr && ! ms[p->index])
-			ms[p->index] = expreval(p->subexpr, ms);
-
-		match = ms[p->index];
-		for ( ; p->next && p->next->and; p = p->next) {
-			/* Evaluate a subexpression, if applicable. */
-			if (p->next->subexpr && ! ms[p->next->index])
-				ms[p->next->index] =
-					expreval(p->next->subexpr, ms);
-			match = match && ms[p->next->index];
-		}
-	}
-
-	return(match);
-}
-
-/*
- * First, update the array of terms for which this expression evaluates
- * to true.
- * Second, logically evaluate all terms over the updated array of truth
- * values.
- * If this evaluates to true, mark the expression as satisfied.
- */
-static void
-exprexec(const struct expr *e, const char *cp,
-		uint64_t mask, struct res *r)
-{
-
-	assert(0 == r->matched);
-	exprmark(e, cp, mask, r->matches);
-	r->matched = expreval(e, r->matches);
-}
Index: apropos_db.h
===================================================================
RCS file: apropos_db.h
diff -N apropos_db.h
--- apropos_db.h	24 Mar 2012 01:46:25 -0000	1.13
+++ /dev/null	1 Jan 1970 00:00:00 -0000
@@ -1,73 +0,0 @@
-/*	$Id: apropos_db.h,v 1.13 2012/03/24 01:46:25 kristaps Exp $ */
-/*
- * Copyright (c) 2011, 2012 Kristaps Dzonsons <kristaps@bsd.lv>
- *
- * Permission to use, copy, modify, and distribute this software for any
- * purpose with or without fee is hereby granted, provided that the above
- * copyright notice and this permission notice appear in all copies.
- *
- * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
- * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
- * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
- * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
- * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
- * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
- * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
- */
-#ifndef APROPOS_H
-#define APROPOS_H
-
-enum	restype {
-	RESTYPE_MAN, /* man(7) file */
-	RESTYPE_MDOC, /* mdoc(7) file */
-	RESTYPE_CAT /* pre-formatted file */
-};
-
-struct	res {
-	enum restype	 type; /* input file type */
-	char		*file; /* file in file-system */
-	char		*cat; /* category (3p, 3, etc.) */
-	char		*title; /* title (FOO, etc.) */
-	char		*arch; /* arch (or empty string) */
-	char		*desc; /* description (from Nd) */
-	unsigned int	 rec; /* record in index */
-	/*
-	 * The index volume.  This indexes into the array of directories
-	 * searched for manual page databases.
-	 */
-	unsigned int	 volume;
-	/*
-	 * The following fields are used internally.
-	 *
-	 * Maintain a binary tree for checking the uniqueness of `rec'
-	 * when adding elements to the results array.
-	 * Since the results array is dynamic, use offset in the array
-	 * instead of a pointer to the structure.
-	 */
-	int		 lhs;
-	int		 rhs;
-	int		 matched; /* expression is true */
-	int		*matches; /* partial truth evaluations */
-};
-
-struct	opts {
-	const char	*arch; /* restrict to architecture */
-	const char	*cat; /* restrict to manual section */
-};
-
-__BEGIN_DECLS
-
-struct	expr;
-
-int		 apropos_search(int, char **, const struct opts *,
-			const struct expr *, size_t, 
-			void *, size_t *, struct res **,
-			void (*)(struct res *, size_t, void *));
-struct	expr	*exprcomp(int, char *[], size_t *);
-void		 exprfree(struct expr *);
-void	 	 resfree(struct res *, size_t);
-struct	expr	*termcomp(int, char *[], size_t *);
-
-__END_DECLS
-
-#endif /*!APROPOS_H*/
Index: catman.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/catman.c,v
retrieving revision 1.10
diff -u -p -r1.10 catman.c
--- catman.c	3 Jan 2012 15:17:20 -0000	1.10
+++ catman.c	7 Jun 2012 15:09:33 -0000
@@ -380,7 +380,8 @@ manup(const struct manpaths *dirs, char 
 	char		 dst[MAXPATHLEN],
 			 src[MAXPATHLEN];
 	const char	*path;
-	int		 i, c;
+	size_t		 i;
+	int		 c;
 	size_t		 sz;
 	FILE		*f;
 
Index: config.h.post
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/config.h.post,v
retrieving revision 1.5
diff -u -p -r1.5 config.h.post
--- config.h.post	24 Mar 2012 06:23:14 -0000	1.5
+++ config.h.post	7 Jun 2012 15:09:33 -0000
@@ -15,20 +15,6 @@
 #  endif
 #endif
 
-#if defined(__APPLE__)
-# define htobe32(x) OSSwapHostToBigInt32(x)
-# define betoh32(x) OSSwapBigToHostInt32(x)
-# define htobe64(x) OSSwapHostToBigInt64(x)
-# define betoh64(x) OSSwapBigToHostInt64(x)
-#elif defined(__linux__) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__DragonFly__)
-# define betoh32(x) be32toh(x)
-# define betoh64(x) be64toh(x)
-#elif defined(__OpenBSD__)
-/* Nothing */
-#else
-/* XXX Fallback */
-#endif
-
 #ifndef HAVE_STRLCAT
 extern	size_t	  strlcat(char *, const char *, size_t);
 #endif
Index: mandocdb.8
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandocdb.8,v
retrieving revision 1.17
diff -u -p -r1.17 mandocdb.8
--- mandocdb.8	25 Dec 2011 21:00:23 -0000	1.17
+++ mandocdb.8	7 Jun 2012 15:09:33 -0000
@@ -1,6 +1,6 @@
 .\"	$Id: mandocdb.8,v 1.17 2011/12/25 21:00:23 schwarze Exp $
 .\"
-.\" Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
+.\" Copyright (c) 2011, 2012 Kristaps Dzonsons <kristaps@bsd.lv>
 .\"
 .\" Permission to use, copy, modify, and distribute this software for any
 .\" purpose with or without fee is hereby granted, provided that the above
@@ -22,17 +22,17 @@
 .Nd index UNIX manuals
 .Sh SYNOPSIS
 .Nm
-.Op Fl avW
+.Op Fl anvW
 .Op Fl C Ar file
 .Nm
-.Op Fl avW
+.Op Fl anvW
 .Ar dir ...
 .Nm
-.Op Fl vW
+.Op Fl nvW
 .Fl d Ar dir
 .Op Ar
 .Nm
-.Op Fl vW
+.Op Fl nvW
 .Fl u Ar dir
 .Op Ar
 .Nm
@@ -42,21 +42,15 @@ The
 .Nm
 utility extracts keywords from
 .Ux
-manuals and indexes them in a
-.Sx Keyword Database
-and
-.Sx Index Database
-for fast retrieval by
+manuals and indexes them in a database for fast retrieval by
 .Xr apropos 1 ,
 .Xr whatis 1 ,
 and
-.Xr man 1 Ns 's
-.Fl k
-option.
+.Xr man 1 .
 .Pp
 By default,
 .Nm
-creates databases in each
+creates a database in each
 .Ar dir
 using the files
 .Sm off
@@ -70,14 +64,16 @@ and
 .Op Ar arch Li /
 .Ar title . Sy 0
 .Sm on
-in that directory;
-existing databases are truncated.
+in that directory.
+Existing databases are replaced.
 If
 .Ar dir
 is not provided,
 .Nm
 uses the default paths stipulated by
-.Xr man 1 .
+.Xr manpath 1 ,
+or
+.Xr man.conf 5 .
 .Pp
 The arguments are as follows:
 .Bl -tag -width "-C file"
@@ -94,15 +90,17 @@ format.
 Merge (remove and re-add)
 .Ar
 to the database in
-.Ar dir
-without truncating it.
+.Ar dir .
+.It Fl n
+Do not create or modify any database;
+scan and parse only.
 .It Fl t Ar
 Check the given
 .Ar files
 for potential problems.
-No databases are modified.
 Implies
-.Fl a
+.Fl a ,
+.Fl n ,
 and
 .Fl W .
 All diagnostic messages are printed to the standard output;
@@ -111,8 +109,7 @@ the standard error output is not used.
 Remove
 .Ar
 from the database in
-.Ar dir
-without truncating it.
+.Ar dir .
 .It Fl v
 Display all files added or removed to the index.
 .It Fl W
@@ -123,171 +120,33 @@ to the standard error output.
 If fatal parse errors are encountered while parsing, the offending file
 is printed to stderr, omitted from the index, and the parse continues
 with the next input file.
-.Ss Index Database
-The index database,
-.Pa whatis.index ,
-is a
-.Xr recno 3
-database with record values consisting of
-.Pp
-.Bl -enum -compact
-.It
-the character
-.Cm d ,
-.Cm a ,
-or
-.Cm c
-to indicate the file type
-.Po
-.Xr mdoc 7 ,
-.Xr man 7 ,
-and post-formatted, respectively
-.Pc ,
-.It
-the filename relative to the databases' path,
-.It
-the manual section,
-.It
-the manual title,
-.It
-the architecture
-.Pq often empty ,
-.It
-and the description.
-.El
-.Pp
-Each of the above is NUL-terminated.
-.Pp
-If the record value is zero-length, it is unassigned.
-.Ss Keyword Database
-The keyword database,
-.Pa whatis.db ,
-is a
-.Xr btree 3
-database of NUL-terminated keywords (record length is non-zero string
-length plus one) mapping to a 16-byte binary field consisting of the
-64-bit keyword type and the 64-bit
-.Sx Index Database
-record number, both in network-byte order.
-.Pp
-The type bit-mask consists of the following
-values mapping into
-.Xr mdoc 7
-macro identifiers:
-.Pp
-.Bl -column "x0x0000000000000001ULLx" "xLix" -offset indent -compact
-.It Li 0x0000000000000001ULL Ta \&An
-.It Li 0x0000000000000002ULL Ta \&Ar
-.It Li 0x0000000000000004ULL Ta \&At
-.It Li 0x0000000000000008ULL Ta \&Bsx
-.It Li 0x0000000000000010ULL Ta \&Bx
-.It Li 0x0000000000000020ULL Ta \&Cd
-.It Li 0x0000000000000040ULL Ta \&Cm
-.It Li 0x0000000000000080ULL Ta \&Dv
-.It Li 0x0000000000000100ULL Ta \&Dx
-.It Li 0x0000000000000200ULL Ta \&Em
-.It Li 0x0000000000000400ULL Ta \&Er
-.It Li 0x0000000000000800ULL Ta \&Ev
-.It Li 0x0000000000001000ULL Ta \&Fa
-.It Li 0x0000000000002000ULL Ta \&Fl
-.It Li 0x0000000000004000ULL Ta \&Fn
-.It Li 0x0000000000008000ULL Ta \&Ft
-.It Li 0x0000000000010000ULL Ta \&Fx
-.It Li 0x0000000000020000ULL Ta \&Ic
-.It Li 0x0000000000040000ULL Ta \&In
-.It Li 0x0000000000080000ULL Ta \&Lb
-.It Li 0x0000000000100000ULL Ta \&Li
-.It Li 0x0000000000200000ULL Ta \&Lk
-.It Li 0x0000000000400000ULL Ta \&Ms
-.It Li 0x0000000000800000ULL Ta \&Mt
-.It Li 0x0000000001000000ULL Ta \&Nd
-.It Li 0x0000000002000000ULL Ta \&Nm
-.It Li 0x0000000004000000ULL Ta \&Nx
-.It Li 0x0000000008000000ULL Ta \&Ox
-.It Li 0x0000000010000000ULL Ta \&Pa
-.It Li 0x0000000020000000ULL Ta \&Rs
-.It Li 0x0000000040000000ULL Ta \&Sh
-.It Li 0x0000000080000000ULL Ta \&Ss
-.It Li 0x0000000100000000ULL Ta \&St
-.It Li 0x0000000200000000ULL Ta \&Sy
-.It Li 0x0000000400000000ULL Ta \&Tn
-.It Li 0x0000000800000000ULL Ta \&Va
-.It Li 0x0000001000000000ULL Ta \&Vt
-.It Li 0x0000002000000000ULL Ta \&Xr
-.El
-.Sh IMPLEMENTATION NOTES
-The time to construct a new database pair grows linearly with the
-number of keywords in the input files.
-However, removing or updating entries with
-.Fl u
-or
-.Fl d ,
-respectively, grows as a multiple of the index length and input size.
 .Sh FILES
 .Bl -tag -width Ds
-.It Pa whatis.db
-A
-.Xr btree 3
-keyword database mapping keywords to a type and file reference in
-.Pa whatis.index .
-.It Pa whatis.index
-A
-.Xr recno 3
-database of indexed file-names.
-.It Pa /etc/man.conf
-The default
-.Xr man 1
-configuration file.
+.It Pa mandocdb.db
+A database of manpages relative to the directory of the file.
+This file is portable across architectures and systems, so long as the
+manpage hierarchy it indexes does not change.
+.It Pa mandocdb.db~
+A temporary database used by
+.Xr makewhatis 8
+during scanning and parsing.
+This will be atomically renamed as
+.Pa mandocdb.db
+when the parse has completed.
 .El
 .Sh EXIT STATUS
-The
-.Nm
-utility exits with one of the following values:
-.Pp
-.Bl -tag -width Ds -compact
-.It 0
-No errors occurred.
-.It 5
-Invalid command line arguments were specified.
-No input files have been read.
-.It 6
-An operating system error occurred, for example memory exhaustion or an
-error accessing input files.
-Such errors cause
-.Nm
-to exit at once, possibly in the middle of parsing or formatting a file.
-The output databases are corrupt and should be removed.
-.El
-.Sh DIAGNOSTICS
-If the following errors occur, the
-.Nm
-databases should be rebuilt.
-.Bl -diag
-.It "%s: Corrupt database"
-The keyword database file indicated by
-.Pa %s
-is unreadable.
-.It "%s: Corrupt index"
-The index database file indicated by
-.Pa %s
-is unreadable.
-.It "%s: Path too long"
-The file
-.Pa %s
-is too long.
-This usually indicates database corruption or invalid command-line
-arguments.
-.El
+.Ex -std
 .Sh SEE ALSO
 .Xr apropos 1 ,
 .Xr man 1 ,
 .Xr whatis 1 ,
-.Xr btree 3 ,
-.Xr recno 3 ,
 .Xr man.conf 5
 .Sh AUTHORS
 The
 .Nm
 utility was written by
 .An Kristaps Dzonsons ,
-.Mt kristaps@bsd.lv .
+.Mt kristaps@bsd.lv ,
+and
+.An Ingo Schwarze ,
+.Mt schwarze@openbsd.org .
Index: mandocdb.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandocdb.c,v
retrieving revision 1.49
diff -u -p -r1.49 mandocdb.c
--- mandocdb.c	27 May 2012 17:48:57 -0000	1.49
+++ mandocdb.c	7 Jun 2012 15:09:34 -0000
@@ -20,42 +20,30 @@
 #endif
 
 #include <sys/param.h>
-#include <sys/types.h>
+#include <sys/stat.h>
 
 #include <assert.h>
 #include <ctype.h>
-#include <dirent.h>
 #include <errno.h>
 #include <fcntl.h>
+#include <fts.h>
 #include <getopt.h>
-#include <stdio.h>
+#include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 
-#if defined(__linux__)
-# include <endian.h>
-# include <db_185.h>
-#elif defined(__APPLE__)
-# include <libkern/OSByteOrder.h>
-# include <db.h>
-#else
-# include <db.h>
-#endif
+#include <ohash.h>
+#include <sqlite3.h>
 
-#include "man.h"
 #include "mdoc.h"
+#include "man.h"
 #include "mandoc.h"
 #include "mandocdb.h"
 #include "manpath.h"
 
-#define	MANDOC_BUFSZ	  BUFSIZ
-#define	MANDOC_SLOP	  1024
-
-#define	MANDOC_SRC	  0x1
-#define	MANDOC_FORM	  0x2
-
+/* Post a warning to stderr. */
 #define WARNING(_f, _b, _fmt, _args...) \
 	do if (warnings) { \
 		fprintf(stderr, "%s: ", (_b)); \
@@ -64,114 +52,140 @@
 			fprintf(stderr, ": %s", (_f)); \
 		fprintf(stderr, "\n"); \
 	} while (/* CONSTCOND */ 0)
-		
-/* Access to the mandoc database on disk. */
+/* Post a "verbose" message to stderr. */
+#define	DEBUG(_f, _b, _fmt, _args...) \
+	do if (verb) { \
+		fprintf(stderr, "%s: ", (_b)); \
+		fprintf(stderr, (_fmt), ##_args); \
+		fprintf(stderr, ": %s\n", (_f)); \
+	} while (/* CONSTCOND */ 0)
 
-struct	mdb {
-	char		  idxn[MAXPATHLEN]; /* index db filename */
-	char		  dbn[MAXPATHLEN]; /* keyword db filename */
-	DB		 *idx; /* index recno database */
-	DB		 *db; /* keyword btree database */
+enum	op {
+	OP_DEFAULT = 0, /* new dbs from dir list or default config */
+	OP_CONFFILE, /* new databases from custom config file */
+	OP_UPDATE, /* delete/add entries in existing database */
+	OP_DELETE, /* delete entries from existing database */
+	OP_TEST /* change no databases, report potential problems */
 };
 
-/* Stack of temporarily unused index records. */
-
-struct	recs {
-	recno_t		 *stack; /* pointer to a malloc'ed array */
-	size_t		  size; /* number of allocated slots */
-	size_t		  cur; /* current number of empty records */
-	recno_t		  last; /* last record number in the index */
+enum	form {
+	FORM_SRC, /* format is -man or -mdoc */
+	FORM_CAT, /* format is cat */
+	FORM_NONE /* format is unknown */
 };
 
-/* Tiny list for files.  No need to bring in QUEUE. */
-
-struct	of {
-	char		 *fname; /* heap-allocated */
-	char		 *sec;
-	char		 *arch;
-	char		 *title;
-	int		  src_form;
-	struct of	 *next; /* NULL for last one */
-	struct of	 *first; /* first in list */
+struct	str {
+	char		*utf8; /* key in UTF-8 form */
+	const struct of *of; /* if set, the owning parse */
+	struct str	*next; /* next in owning parse sequence */
+	uint64_t	 mask; /* bitmask in sequence */
+	char		 key[1]; /* the string itself */
 };
 
-/* Buffer for storing growable data. */
-
-struct	buf {
-	char		 *cp;
-	size_t		  len; /* current length */
-	size_t		  size; /* total buffer size */
+struct	id {
+	ino_t		 ino;
+	dev_t		 dev;
 };
 
-/* Operation we're going to perform. */
-
-enum	op {
-	OP_DEFAULT = 0, /* new dbs from dir list or default config */
-	OP_CONFFILE, /* new databases from custom config file */
-	OP_UPDATE, /* delete/add entries in existing database */
-	OP_DELETE, /* delete entries from existing database */
-	OP_TEST /* change no databases, report potential problems */
+struct	of {
+	struct id	 id; /* used for hashing routine */
+	struct of	*next; /* next in ofs */
+	enum form	 dform; /* path-cued form */
+	enum form	 sform; /* suffix-cued form */
+	char		 file[MAXPATHLEN]; /* filename rel. to manpath */
+	const char	*desc; /* parsed description */
+	const char	*sec; /* suffix-cued section (or empty) */
+	const char	*dsec; /* path-cued section (or empty) */
+	const char	*arch; /* path-cued arch. (or empty) */
+	const char	*name; /* name (from filename) (not empty) */
 };
 
-#define	MAN_ARGS	  DB *hash, \
-			  struct buf *buf, \
-			  struct buf *dbuf, \
-			  const struct man_node *n
-#define	MDOC_ARGS	  DB *hash, \
-			  struct buf *buf, \
-			  struct buf *dbuf, \
-			  const struct mdoc_node *n, \
-			  const struct mdoc_meta *m
-
-static	void		  buf_appendmdoc(struct buf *, 
-				const struct mdoc_node *, int);
-static	void		  buf_append(struct buf *, const char *);
-static	void		  buf_appendb(struct buf *, 
-				const void *, size_t);
-static	void		  dbt_put(DB *, const char *, DBT *, DBT *);
-static	void		  hash_put(DB *, const struct buf *, uint64_t);
-static	void		  hash_reset(DB **);
-static	void		  index_merge(const struct of *, struct mparse *,
-				struct buf *, struct buf *, DB *,
-				struct mdb *, struct recs *,
-				const char *);
-static	void		  index_prune(const struct of *, struct mdb *,
-				struct recs *, const char *);
-static	void		  ofile_argbuild(int, char *[],
-				struct of **, const char *);
-static	void		  ofile_dirbuild(const char *, const char *,
-				const char *, int, struct of **, char *);
-static	void		  ofile_free(struct of *);
-static	void		  pformatted(DB *, struct buf *, struct buf *,
-				const struct of *, const char *);
-static	int		  pman_node(MAN_ARGS);
-static	void		  pmdoc_node(MDOC_ARGS);
-static	int		  pmdoc_head(MDOC_ARGS);
-static	int		  pmdoc_body(MDOC_ARGS);
-static	int		  pmdoc_Fd(MDOC_ARGS);
-static	int		  pmdoc_In(MDOC_ARGS);
-static	int		  pmdoc_Fn(MDOC_ARGS);
-static	int		  pmdoc_Nd(MDOC_ARGS);
-static	int		  pmdoc_Nm(MDOC_ARGS);
-static	int		  pmdoc_Sh(MDOC_ARGS);
-static	int		  pmdoc_St(MDOC_ARGS);
-static	int		  pmdoc_Xr(MDOC_ARGS);
+enum	stmt {
+	STMT_DELETE = 0, /* delete manpage */
+	STMT_INSERT_DOC, /* insert manpage */
+	STMT_INSERT_KEY, /* insert parsed key */
+	STMT__MAX
+};
 
-#define	MDOCF_CHILD	  0x01  /* Automatically index child nodes. */
+typedef	int (*mdoc_fp)(struct of *, const struct mdoc_node *);
 
 struct	mdoc_handler {
-	int		(*fp)(MDOC_ARGS);  /* Optional handler. */
-	uint64_t	  mask;  /* Set unless handler returns 0. */
-	int		  flags;  /* For use by pmdoc_node. */
+	mdoc_fp		 fp; /* optional handler */
+	uint64_t	 mask;  /* set unless handler returns 0 */
+	int		 flags;  /* for use by pmdoc_node */
+#define	MDOCF_CHILD	 0x01  /* automatically index child nodes */
 };
 
+static	void	 dbclose(const char *, int);
+static	void	 dbindex(struct mchars *, int,
+			const struct of *, const char *);
+static	int	 dbopen(const char *, int);
+static	void	 dbprune(const char *);
+static	int	 dirscan(size_t, char *[], const char *);
+static	int	 dirtreescan(const char *);
+static	void	 fileadd(struct of *);
+static	int	 filecheck(const char *);
+static	void	 filescan(const char *, const char *);
+static	struct str *hashget(const char *, size_t);
+static	void	*hash_alloc(size_t, void *);
+static	void	 hash_free(void *, size_t, void *);
+static	void	*hash_halloc(size_t, void *);
+static	void	 inoadd(const struct stat *, struct of *);
+static	int	 inocheck(const struct stat *);
+static	void	 ofadd(const char *, int, const char *, 
+			const char *, const char *, const char *, 
+			const char *, const struct stat *);
+static	void	 offree(void);
+static	int	 ofmerge(struct mchars *, 
+			struct mparse *, const char *);
+static	void	 parse_catpage(struct of *, const char *);
+static	int	 parse_man(struct of *, 
+			const struct man_node *);
+static	void	 parse_mdoc(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_body(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_head(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_Fd(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_Fn(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_In(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_Nd(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_Nm(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_Sh(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_St(struct of *, const struct mdoc_node *);
+static	int	 parse_mdoc_Xr(struct of *, const struct mdoc_node *);
+static	void	 putkey(const struct of *, 
+			const char *, uint64_t);
+static	void	 putkeys(const struct of *, 
+			const char *, int, uint64_t);
+static	void	 putmdockey(const struct of *,
+			const struct mdoc_node *, uint64_t);
+static	char 	*stradd(const char *);
+static	char 	*straddbuf(const char *, size_t);
+static	size_t	 utf8(unsigned int, char [7]);
+static	void	 utf8key(struct mchars *, struct str *);
+static	void 	 wordaddbuf(const struct of *, 
+			const char *, size_t, uint64_t);
+
+static	char		*progname;
+static	int	 	 use_all; /* use all found files */
+static	int		 nodb; /* no database changes */
+static	int	  	 verb; /* print what we're doing */
+static	int	  	 warnings; /* warn about crap */
+static	enum op	  	 op; /* operational mode */
+static	struct ohash	 inos; /* table of inodes/devices */
+static	struct ohash	 filenames; /* table of filenames */
+static	struct ohash	 strings; /* table of all strings */
+static	struct of	*ofs = NULL; /* vector of files to parse */
+static	struct str	*words = NULL; /* word list in current parse */
+static	sqlite3		*db = NULL; /* current database */
+static	sqlite3_stmt	*stmts[STMT__MAX]; /* current statements */
+
 static	const struct mdoc_handler mdocs[MDOC_MAX] = {
 	{ NULL, 0, 0 },  /* Ap */
 	{ NULL, 0, 0 },  /* Dd */
 	{ NULL, 0, 0 },  /* Dt */
 	{ NULL, 0, 0 },  /* Os */
-	{ pmdoc_Sh, TYPE_Sh, MDOCF_CHILD }, /* Sh */
-	{ pmdoc_head, TYPE_Ss, MDOCF_CHILD }, /* Ss */
+	{ parse_mdoc_Sh, TYPE_Sh, MDOCF_CHILD }, /* Sh */
+	{ parse_mdoc_head, TYPE_Ss, MDOCF_CHILD }, /* Ss */
 	{ NULL, 0, 0 },  /* Pp */
 	{ NULL, 0, 0 },  /* D1 */
 	{ NULL, 0, 0 },  /* Dl */
@@ -190,23 +204,23 @@ static	const struct mdoc_handler mdocs[M
 	{ NULL, TYPE_Ev, MDOCF_CHILD },  /* Ev */
 	{ NULL, 0, 0 },  /* Ex */
 	{ NULL, TYPE_Fa, MDOCF_CHILD },  /* Fa */
-	{ pmdoc_Fd, TYPE_In, 0 },  /* Fd */
+	{ parse_mdoc_Fd, TYPE_In, 0 },  /* Fd */
 	{ NULL, TYPE_Fl, MDOCF_CHILD },  /* Fl */
-	{ pmdoc_Fn, 0, 0 },  /* Fn */
+	{ parse_mdoc_Fn, 0, 0 },  /* Fn */
 	{ NULL, TYPE_Ft, MDOCF_CHILD },  /* Ft */
 	{ NULL, TYPE_Ic, MDOCF_CHILD },  /* Ic */
-	{ pmdoc_In, TYPE_In, 0 },  /* In */
+	{ parse_mdoc_In, TYPE_In, MDOCF_CHILD },  /* In */
 	{ NULL, TYPE_Li, MDOCF_CHILD },  /* Li */
-	{ pmdoc_Nd, TYPE_Nd, MDOCF_CHILD },  /* Nd */
-	{ pmdoc_Nm, TYPE_Nm, MDOCF_CHILD },  /* Nm */
+	{ parse_mdoc_Nd, TYPE_Nd, MDOCF_CHILD },  /* Nd */
+	{ parse_mdoc_Nm, TYPE_Nm, MDOCF_CHILD },  /* Nm */
 	{ NULL, 0, 0 },  /* Op */
 	{ NULL, 0, 0 },  /* Ot */
 	{ NULL, TYPE_Pa, MDOCF_CHILD },  /* Pa */
 	{ NULL, 0, 0 },  /* Rv */
-	{ pmdoc_St, TYPE_St, 0 },  /* St */
+	{ parse_mdoc_St, TYPE_St, 0 },  /* St */
 	{ NULL, TYPE_Va, MDOCF_CHILD },  /* Va */
-	{ pmdoc_body, TYPE_Va, MDOCF_CHILD },  /* Vt */
-	{ pmdoc_Xr, TYPE_Xr, 0 },  /* Xr */
+	{ parse_mdoc_body, TYPE_Va, MDOCF_CHILD },  /* Vt */
+	{ parse_mdoc_Xr, TYPE_Xr, 0 },  /* Xr */
 	{ NULL, 0, 0 },  /* %A */
 	{ NULL, 0, 0 },  /* %B */
 	{ NULL, 0, 0 },  /* %D */
@@ -262,7 +276,7 @@ static	const struct mdoc_handler mdocs[M
 	{ NULL, 0, 0 },  /* Ux */
 	{ NULL, 0, 0 },  /* Xc */
 	{ NULL, 0, 0 },  /* Xo */
-	{ pmdoc_head, TYPE_Fn, 0 },  /* Fo */
+	{ parse_mdoc_head, TYPE_Fn, 0 },  /* Fo */
 	{ NULL, 0, 0 },  /* Fc */
 	{ NULL, 0, 0 },  /* Oo */
 	{ NULL, 0, 0 },  /* Oc */
@@ -290,30 +304,31 @@ static	const struct mdoc_handler mdocs[M
 	{ NULL, 0, 0 },  /* Ta */
 };
 
-static	const char	 *progname;
-static	int		  use_all;  /* Use all directories and files. */
-static	int		  verb;  /* Output verbosity level. */
-static	int		  warnings;  /* Potential problems in manuals. */
-
 int
 main(int argc, char *argv[])
 {
-	struct mparse	*mp; /* parse sequence */
-	struct manpaths	 dirs;
-	struct mdb	 mdb;
-	struct recs	 recs;
-	enum op		 op; /* current operation */
-	const char	*dir;
-	int		 ch, i, flags;
-	char		 dirbuf[MAXPATHLEN];
-	DB		*hash; /* temporary keyword hashtable */
-	BTREEINFO	 info; /* btree configuration */
-	size_t		 sz1, sz2;
-	struct buf	 buf, /* keyword buffer */
-			 dbuf; /* description buffer */
-	struct of	*of; /* list of files for processing */
-	extern int	 optind;
-	extern char	*optarg;
+	int		  ch, rc;
+	size_t		  i, sz;
+	const char	 *dir;
+	struct str	 *s;
+	struct mchars	 *mc;
+	struct manpaths	  dirs;
+	struct mparse	 *mp;
+	struct ohash_info ino_info, filename_info, str_info;
+
+	memset(stmts, 0, STMT__MAX * sizeof(sqlite3_stmt *));
+	memset(&dirs, 0, sizeof(struct manpaths));
+
+	ino_info.halloc = filename_info.halloc = 
+		str_info.halloc = hash_halloc;
+	ino_info.hfree = filename_info.hfree = 
+		str_info.hfree = hash_free;
+	ino_info.alloc = filename_info.alloc = 
+		str_info.alloc = hash_alloc;
+
+	ino_info.key_offset = offsetof(struct of, id);
+	filename_info.key_offset = offsetof(struct of, file);
+	str_info.key_offset = offsetof(struct str, key);
 
 	progname = strrchr(argv[0], '/');
 	if (progname == NULL)
@@ -321,56 +336,48 @@ main(int argc, char *argv[])
 	else
 		++progname;
 
-	memset(&dirs, 0, sizeof(struct manpaths));
-	memset(&mdb, 0, sizeof(struct mdb));
-	memset(&recs, 0, sizeof(struct recs));
+	/*
+	 * We accept a few different invocations.  
+	 * The CHECKOP macro makes sure that invocation styles don't
+	 * clobber each other.
+	 */
+
+#define	CHECKOP(_op, _ch) do \
+	if (OP_DEFAULT != (_op)) { \
+		fprintf(stderr, "-%c: Conflicting option\n", (_ch)); \
+		goto usage; \
+	} while (/*CONSTCOND*/0)
 
-	of = NULL;
-	mp = NULL;
-	hash = NULL;
-	op = OP_DEFAULT;
 	dir = NULL;
+	op = OP_DEFAULT;
 
-	while (-1 != (ch = getopt(argc, argv, "aC:d:tu:vW")))
+	while (-1 != (ch = getopt(argc, argv, "aC:d:ntu:vW")))
 		switch (ch) {
 		case ('a'):
 			use_all = 1;
 			break;
 		case ('C'):
-			if (op) {
-				fprintf(stderr,
-				    "-C: conflicting options\n");
-				goto usage;
-			}
+			CHECKOP(op, ch);
 			dir = optarg;
 			op = OP_CONFFILE;
 			break;
 		case ('d'):
-			if (op) {
-				fprintf(stderr,
-				    "-d: conflicting options\n");
-				goto usage;
-			}
+			CHECKOP(op, ch);
 			dir = optarg;
 			op = OP_UPDATE;
 			break;
+		case ('n'):
+			nodb = 1;
+			break;
 		case ('t'):
+			CHECKOP(op, ch);
 			dup2(STDOUT_FILENO, STDERR_FILENO);
-			if (op) {
-				fprintf(stderr,
-				    "-t: conflicting options\n");
-				goto usage;
-			}
 			op = OP_TEST;
-			use_all = 1;
-			warnings = 1;
+			nodb = use_all = warnings = 1;
+			dir = ".";
 			break;
 		case ('u'):
-			if (op) {
-				fprintf(stderr,
-				    "-u: conflicting options\n");
-				goto usage;
-			}
+			CHECKOP(op, ch);
 			dir = optarg;
 			op = OP_DELETE;
 			break;
@@ -388,233 +395,617 @@ main(int argc, char *argv[])
 	argv += optind;
 
 	if (OP_CONFFILE == op && argc > 0) {
-		fprintf(stderr, "-C: too many arguments\n");
+		fprintf(stderr, "-C: Too many arguments\n");
 		goto usage;
 	}
 
-	memset(&info, 0, sizeof(BTREEINFO));
-	info.lorder = 4321;
-	info.flags = R_DUP;
+	rc = 1;
+	mp = mparse_alloc(MPARSE_AUTO, 
+		MANDOCLEVEL_FATAL, NULL, NULL, NULL);
+	mc = mchars_alloc();
 
-	mp = mparse_alloc(MPARSE_AUTO, MANDOCLEVEL_FATAL, NULL, NULL, NULL);
+	ohash_init(&strings, 6, &str_info);
 
-	memset(&buf, 0, sizeof(struct buf));
-	memset(&dbuf, 0, sizeof(struct buf));
+	if (OP_UPDATE == op || OP_DELETE == op || OP_TEST == op) {
+		/*
+		 * All of these deal with a specific directory.
+		 * Jump into that directory then collect files specified
+		 * on the command-line.
+		 */
+		if (0 == (rc = dirscan(argc, argv, dir)))
+			goto out;
+		if (0 == (rc = dbopen(dir, 1)))
+			goto out;
+		if (OP_TEST != op)
+			dbprune(dir);
+		if (OP_DELETE != op)
+			rc = ofmerge(mc, mp, dir);
+		else
+			dbclose(dir, 1);
+	} else {
+		/*
+		 * If we have arguments, use them as our manpaths.
+		 * If we don't, grok from manpath(1) or however else
+		 * manpath_parse() wants to do it.
+		 */
+		if (argc > 0) {
+			dirs.paths = mandoc_calloc
+				(argc, sizeof(char *));
+			dirs.sz = argc;
+			for (i = 0; i < (size_t)argc; i++)
+				dirs.paths[i] = mandoc_strdup(argv[i]);
+		} else
+			manpath_parse(&dirs, dir, NULL, NULL);
 
-	buf.size = dbuf.size = MANDOC_BUFSZ;
+		/*
+		 * First scan the tree rooted at a base directory.
+		 * Then whak its database (if one exists), parse, and
+		 * build up the database.
+		 * Ignore zero-length directories and strip trailing
+		 * slashes.
+		 */
+		for (i = 0; i < dirs.sz; i++) {
+			sz = strlen(dirs.paths[i]);
+			if (sz && '/' == dirs.paths[i][sz - 1])
+				dirs.paths[i][--sz] = '\0';
+			if (0 == sz)
+				continue;
+			ohash_init(&inos, 6, &ino_info);
+			ohash_init(&filenames, 6, &filename_info);
+			if (0 == (rc = dirtreescan(dirs.paths[i])))
+				goto out;
+			remove(MANDOC_DB);
+			if (0 == (rc = ofmerge(mc, mp, dirs.paths[i])))
+				goto out;
+			ohash_delete(&inos);
+			ohash_delete(&filenames);
+			offree();
+		}
+	}
+out:
+	manpath_free(&dirs);
+	mchars_free(mc);
+	mparse_free(mp);
+	for (s = ohash_first(&strings, &ch);
+			NULL != s; s = ohash_next(&strings, &ch)) {
+		if (s->utf8 != s->key)
+			free(s->utf8);
+		free(s);
+	}
+	ohash_delete(&strings);
+	ohash_delete(&inos);
+	ohash_delete(&filenames);
+	offree();
+	return(rc ? EXIT_SUCCESS : EXIT_FAILURE);
+usage:
+	fprintf(stderr, "usage: %s [-anvW] [-C file]\n"
+			"       %s [-anvW] dir ...\n"
+			"       %s [-nvW] -d dir [file ...]\n"
+			"       %s [-nvW] -u dir [file ...]\n"
+			"       %s -t file ...\n",
+		       progname, progname, progname, 
+		       progname, progname);
 
-	buf.cp = mandoc_malloc(buf.size);
-	dbuf.cp = mandoc_malloc(dbuf.size);
+	return(EXIT_FAILURE);
+}
 
-	if (OP_TEST == op) {
-		ofile_argbuild(argc, argv, &of, ".");
-		if (NULL == of)
-			goto out;
-		index_merge(of, mp, &dbuf, &buf,
-				hash, &mdb, &recs, ".");
-		goto out;
-	}
-
-	if (OP_UPDATE == op || OP_DELETE == op) {
-		strlcat(mdb.dbn, dir, MAXPATHLEN);
-		strlcat(mdb.dbn, "/", MAXPATHLEN);
-		sz1 = strlcat(mdb.dbn, MANDOC_DB, MAXPATHLEN);
-
-		strlcat(mdb.idxn, dir, MAXPATHLEN);
-		strlcat(mdb.idxn, "/", MAXPATHLEN);
-		sz2 = strlcat(mdb.idxn, MANDOC_IDX, MAXPATHLEN);
-
-		if (sz1 >= MAXPATHLEN || sz2 >= MAXPATHLEN) {
-			fprintf(stderr, "%s: path too long\n", dir);
-			exit((int)MANDOCLEVEL_BADARG);
-		}
+/*
+ * Scan a directory tree rooted at "base" for manpages.
+ * We use fts(), scanning directory parts along the way for clues to our
+ * section and architecture.
+ *
+ * If use_all has been specified, grok all files.
+ * If not, sanitise paths to the following:
+ *
+ *   [./]man*[/<arch>]/<name>.<section> 
+ *   or
+ *   [./]cat<section>[/<arch>]/<name>.0
+ *
+ * TODO: accomodate for multi-language directories.
+ */
+static int
+dirtreescan(const char *base)
+{
+	FTS		*f;
+	FTSENT		*ff;
+	int		 fd, dform;
+	size_t		 sz;
+	char		*sec;
+	const char	*dsec, *arch, *cp, *name;
+	char		 cwd[MAXPATHLEN];
+	const char	*argv[2];
 
-		flags = O_CREAT | O_RDWR;
-		mdb.db = dbopen(mdb.dbn, flags, 0644, DB_BTREE, &info);
-		mdb.idx = dbopen(mdb.idxn, flags, 0644, DB_RECNO, NULL);
-
-		if (NULL == mdb.db) {
-			perror(mdb.dbn);
-			exit((int)MANDOCLEVEL_SYSERR);
-		} else if (NULL == mdb.idx) {
-			perror(mdb.idxn);
-			exit((int)MANDOCLEVEL_SYSERR);
-		}
+	/*
+	 * Remember where we started by keeping a fd open to the origin
+	 * path component.
+	 * This is because we chdir() to relative paths, so we can't
+	 * just re-chdir() into the cwd if it's also relative.
+	 */
+	if (NULL == getcwd(cwd, MAXPATHLEN)) {
+		perror(NULL);
+		return(0);
+	} else if (-1 == (fd = open(cwd, O_RDONLY, 0))) {
+		perror(cwd);
+		return(0);
+	}
 
-		ofile_argbuild(argc, argv, &of, dir);
+	/* Sanitise the base directory.  */
 
-		if (NULL == of)
-			goto out;
+	if (0 == strncmp(base, "./", 2))
+		base += 2;
+	sz = strlen(base) + 1;
+	if ('/' == base[sz - 1])
+		sz++;
+	argv[0] = base;
+	argv[1] = (char *)NULL;
+
+	/*
+	 * Walk through all components under the directory, using the
+	 * logical descent of files.
+	 */
+	f = fts_open((char * const *)argv, FTS_LOGICAL, NULL);
+	if (NULL == f) {
+		perror(base);
+		close(fd);
+		return(0);
+	}
 
-		index_prune(of, &mdb, &recs, dir);
+	dsec = arch = NULL;
+	dform = FORM_NONE;
 
+	while (NULL != (ff = fts_read(f))) {
 		/*
-		 * Go to the root of the respective manual tree.
-		 * This must work or no manuals may be found (they're
-		 * indexed relative to the root).
-		 */
+		 * If we're a regular file, add an "of" by using the
+		 * stored directory data and handling the filename.
+		 * Disallow duplicate (hard-linked) files.
+		 */
+		if (FTS_F == ff->fts_info) {
+			if ( ! use_all && ff->fts_level < 2) {
+				WARNING(ff->fts_path + sz, base,
+					"Extraneous file");
+				continue;
+			} else if (inocheck(ff->fts_statp)) {
+				WARNING(ff->fts_path + sz, base,
+					"Duplicate file");
+				continue;
+			} 
+
+			cp = ff->fts_name;
+
+			if (0 == strcmp(cp, "mandocdb.db")) {
+				WARNING(ff->fts_path + sz, base,
+					"Skipping database");
+				continue;
+			} else if (NULL != (cp = strrchr(cp, '.'))) {
+				if (0 == strcmp(cp + 1, "html")) {
+					WARNING(ff->fts_path + sz, 
+						base, "Skipping html");
+					continue;
+				} else if (0 == strcmp(cp + 1, "gz")) {
+					WARNING(ff->fts_path + sz, 
+						base, "Skipping gz");
+					continue;
+				} else if (0 == strcmp(cp + 1, "ps")) {
+					WARNING(ff->fts_path + sz, 
+						base, "Skipping ps");
+					continue;
+				} else if (0 == strcmp(cp + 1, "pdf")) {
+					WARNING(ff->fts_path + sz, 
+						base, "Skipping pdf");
+					continue;
+				}
+			}
+
+			if (NULL != (sec = strrchr(ff->fts_name, '.'))) {
+				*sec = '\0';
+				sec = stradd(sec + 1);
+			}
+			name = stradd(ff->fts_name);
+			ofadd(base, dform, ff->fts_path + sz, 
+				name, dsec, sec, arch, ff->fts_statp);
+			continue;
+		} else if (FTS_D != ff->fts_info && 
+				FTS_DP != ff->fts_info)
+			continue;
+
+		switch (ff->fts_level) {
+		case (0):
+			/* Ignore the root directory. */
+			break;
+		case (1):
+			/*
+			 * This might contain manX/ or catX/.
+			 * Try to infer this from the name.
+			 * If we're not in use_all, enforce it.
+			 */
+			dsec = NULL;
+			dform = FORM_NONE;
+			cp = ff->fts_name;
+			if (FTS_DP == ff->fts_info)
+				break;
 
-		if (OP_UPDATE == op) {
-			if (-1 == chdir(dir)) {
-				perror(dir);
-				exit((int)MANDOCLEVEL_SYSERR);
+			if (0 == strncmp(cp, "man", 3)) {
+				dform = FORM_SRC;
+				dsec = stradd(cp + 3);
+			} else if (0 == strncmp(cp, "cat", 3)) {
+				dform = FORM_CAT;
+				dsec = stradd(cp + 3);
 			}
-			index_merge(of, mp, &dbuf, &buf, hash,
-					&mdb, &recs, dir);
+
+			if (NULL != dsec || use_all) 
+				break;
+
+			WARNING(ff->fts_path + sz, base,
+				"Unknown directory part");
+			fts_set(f, ff, FTS_SKIP);
+			break;
+		case (2):
+			/*
+			 * Possibly our architecture.
+			 * If we're descending, keep tabs on it.
+			 */
+			arch = NULL;
+			if (FTS_DP != ff->fts_info && NULL != dsec)
+				arch = stradd(ff->fts_name);
+			break;
+		default:
+			if (FTS_DP == ff->fts_info || use_all)
+				break;
+			WARNING(ff->fts_path + sz, base,
+				"Extraneous directory part");
+			fts_set(f, ff, FTS_SKIP);
+			break;
 		}
+	}
 
-		goto out;
+	fts_close(f);
+	if (errno) { 
+		perror(base);
+		close(fd);
+		return(0);
 	}
 
 	/*
-	 * Configure the directories we're going to scan.
-	 * If we have command-line arguments, use them.
-	 * If not, we use man(1)'s method (see mandocdb.8).
+	 * We want to exit in our base directory.
+	 * To do so, first return to the original cwd.
+	 * Then use chdir() relative to that.
 	 */
+	if (-1 == fchdir(fd)) {
+		perror(cwd);
+		close(fd);
+		return(0);
+	}
+	close(fd);
+	if (-1 == chdir(base)) {
+		perror(base);
+		return(0);
+	}
+	return(1);
+}
 
-	if (argc > 0) {
-		dirs.paths = mandoc_calloc(argc, sizeof(char *));
-		dirs.sz = argc;
-		for (i = 0; i < argc; i++) 
-			dirs.paths[i] = mandoc_strdup(argv[i]);
-	} else
-		manpath_parse(&dirs, dir, NULL, NULL);
+static int
+dirscan(size_t argc, char *argv[], const char *base)
+{
+	size_t		 i;
 
-	for (i = 0; i < dirs.sz; i++) {
-		/*
-		 * Go to the root of the respective manual tree.
-		 * This must work or no manuals may be found:
-		 * They are indexed relative to the root.
-		 */
+	if (-1 == chdir(base)) {
+		perror(base);
+		return(0);
+	}
 
-		if (-1 == chdir(dirs.paths[i])) {
-			perror(dirs.paths[i]);
-			exit((int)MANDOCLEVEL_SYSERR);
-		}
+	for (i = 0; i < argc; i++)
+		filescan(argv[i], base);
+
+	return(1);
+}
 
-		strlcpy(mdb.dbn, MANDOC_DB, MAXPATHLEN);
-		strlcpy(mdb.idxn, MANDOC_IDX, MAXPATHLEN);
+/*
+ * Add a file to the file vector.
+ * Do not verify that it's a "valid" looking manpage (we'll do that
+ * later).
+ *
+ * Try to infer the manual section, architecture, and page name from the
+ * path, assuming it looks like
+ *
+ *   [./]man*[/<arch>]/<name>.<section> 
+ *   or
+ *   [./]cat<section>[/<arch>]/<name>.0
+ *
+ * Stuff this information directly into the "of" vector.
+ * See dirtreescan() for the fts(3) version of this.
+ */
+static void
+filescan(const char *file, const char *base)
+{
+	const char	*sec, *arch, *name, *dsec;
+	char		*p, *start, *buf;
+	int		 dform;
+	struct stat	 st;
 
-		flags = O_CREAT | O_TRUNC | O_RDWR;
-		mdb.db = dbopen(mdb.dbn, flags, 0644, DB_BTREE, &info);
-		mdb.idx = dbopen(mdb.idxn, flags, 0644, DB_RECNO, NULL);
-
-		if (NULL == mdb.db) {
-			perror(mdb.dbn);
-			exit((int)MANDOCLEVEL_SYSERR);
-		} else if (NULL == mdb.idx) {
-			perror(mdb.idxn);
-			exit((int)MANDOCLEVEL_SYSERR);
-		}
+	assert(use_all);
 
-		/*
-		 * Search for manuals and fill the new database.
-		 */
+	if (0 == strncmp(file, "./", 2))
+		file += 2;
 
-		strlcpy(dirbuf, dirs.paths[i], MAXPATHLEN);
-	       	ofile_dirbuild(".", "", "", 0, &of, dirbuf);
+	if (-1 == stat(file, &st)) {
+		WARNING(file, base, "%s", strerror(errno));
+		return;
+	} else if ( ! (S_IFREG & st.st_mode)) {
+		WARNING(file, base, "Not a regular file");
+		return;
+	} else if (inocheck(&st)) {
+		WARNING(file, base, "Duplicate file");
+		return;
+	}
 
-		if (NULL != of) {
-			index_merge(of, mp, &dbuf, &buf, hash,
-			     &mdb, &recs, dirs.paths[i]);
-			ofile_free(of);
-			of = NULL;
+	buf = mandoc_strdup(file);
+	start = buf;
+	sec = arch = name = dsec = NULL;
+	dform = FORM_NONE;
+
+	/*
+	 * First try to guess our directory structure.
+	 * If we find a separator, try to look for man* or cat*.
+	 * If we find one of these and what's underneath is a directory,
+	 * assume it's an architecture.
+	 */
+	if (NULL != (p = strchr(start, '/'))) {
+		*p++ = '\0';
+		if (0 == strncmp(start, "man", 3)) {
+			dform = FORM_SRC;
+			dsec = start + 3;
+		} else if (0 == strncmp(start, "cat", 3)) {
+			dform = FORM_CAT;
+			dsec = start + 3;
 		}
 
-		(*mdb.db->close)(mdb.db);
-		(*mdb.idx->close)(mdb.idx);
-		mdb.db = NULL;
-		mdb.idx = NULL;
+		start = p;
+		if (NULL != dsec && NULL != (p = strchr(start, '/'))) {
+			*p++ = '\0';
+			arch = start;
+			start = p;
+		} 
+	}
+
+	/*
+	 * Now check the file suffix.
+	 * Suffix of `.0' indicates a catpage, `.1-9' is a manpage.
+	 */
+	p = strrchr(start, '\0');
+	while (p-- > start && '/' != *p && '.' != *p)
+		/* Loop. */ ;
+
+	if ('.' == *p) {
+		*p++ = '\0';
+		sec = p;
 	}
 
-out:
-	if (mdb.db)
-		(*mdb.db->close)(mdb.db);
-	if (mdb.idx)
-		(*mdb.idx->close)(mdb.idx);
-	if (hash)
-		(*hash->close)(hash);
-	if (mp)
-		mparse_free(mp);
+	/*
+	 * Now try to parse the name.
+	 * Use the filename portion of the path.
+	 */
+	name = start;
+	if (NULL != (p = strrchr(start, '/'))) {
+		name = p + 1;
+		*p = '\0';
+	} 
 
-	manpath_free(&dirs);
-	ofile_free(of);
-	free(buf.cp);
-	free(dbuf.cp);
-	free(recs.stack);
+	ofadd(base, dform, file, name, dsec, sec, arch, &st);
+	free(buf);
+}
 
-	return(MANDOCLEVEL_OK);
+/*
+ * See fileadd(). 
+ */
+static int
+filecheck(const char *name)
+{
+	unsigned int	 index;
 
-usage:
-	fprintf(stderr,
-		"usage: %s [-av] [-C file] | dir ... | -t file ...\n"
-		"                        -d dir [file ...] | "
-		"-u dir [file ...]\n",
-		progname);
-
-	return((int)MANDOCLEVEL_BADARG);
-}
-
-void
-index_merge(const struct of *of, struct mparse *mp,
-		struct buf *dbuf, struct buf *buf, DB *hash,
-		struct mdb *mdb, struct recs *recs,
-		const char *basedir)
-{
-	recno_t		 rec;
-	int		 ch, skip;
-	DBT		 key, val;
-	DB		*files;  /* temporary file name table */
-	char	 	 emptystring[1] = {'\0'};
+	index = ohash_qlookup(&filenames, name);
+	return(NULL != ohash_find(&filenames, index));
+}
+
+/*
+ * Use the standard hashing mechanism (K&R) to see if the given filename
+ * already exists.
+ */
+static void
+fileadd(struct of *of)
+{
+	unsigned int	 index;
+
+	index = ohash_qlookup(&filenames, of->file);
+	assert(NULL == ohash_find(&filenames, index));
+	ohash_insert(&filenames, index, of);
+}
+
+/*
+ * See inoadd().
+ */
+static int
+inocheck(const struct stat *st)
+{
+	struct id	 id;
+	uint32_t	 hash;
+	unsigned int	 index;
+
+	memset(&id, 0, sizeof(id));
+	id.ino = hash = st->st_ino;
+	id.dev = st->st_dev;
+	index = ohash_lookup_memory
+		(&inos, (char *)&id, sizeof(id), hash);
+
+	return(NULL != ohash_find(&inos, index));
+}
+
+/*
+ * The hashing function used here is quite simple: simply take the inode
+ * and use uint32_t of its bits.
+ * Then when we do the lookup, use both the inode and device identifier.
+ */
+static void
+inoadd(const struct stat *st, struct of *of)
+{
+	uint32_t	 hash;
+	unsigned int	 index;
+
+	of->id.ino = hash = st->st_ino;
+	of->id.dev = st->st_dev;
+	index = ohash_lookup_memory
+		(&inos, (char *)&of->id, sizeof(of->id), hash);
+
+	assert(NULL == ohash_find(&inos, index));
+	ohash_insert(&inos, index, of);
+}
+
+static void
+ofadd(const char *base, int dform, const char *file, 
+		const char *name, const char *dsec, const char *sec, 
+		const char *arch, const struct stat *st)
+{
+	struct of	*of;
+	int		 sform;
+
+	assert(NULL != file);
+
+	if (NULL == name)
+		name = "";
+	if (NULL == sec)
+		sec = "";
+	if (NULL == dsec)
+		dsec = "";
+	if (NULL == arch)
+		arch = "";
+
+	sform = FORM_NONE;
+	if (NULL != sec && *sec <= '9' && *sec >= '1')
+		sform = FORM_SRC;
+	else if (NULL != sec && *sec == '0') {
+		sec = dsec;
+		sform = FORM_CAT;
+	}
+
+	of = mandoc_calloc(1, sizeof(struct of));
+	strlcpy(of->file, file, MAXPATHLEN);
+	of->name = name;
+	of->sec = sec;
+	of->dsec = dsec;
+	of->arch = arch;
+	of->sform = sform;
+	of->dform = dform;
+	of->next = ofs;
+	ofs = of;
+
+	/*
+	 * Add to unique identifier hash.
+	 * Then if it's a source manual and we're going to use source in
+	 * favour of catpages, add it to that hash.
+	 */
+	inoadd(st, of);
+	fileadd(of);
+}
+
+static void
+offree(void)
+{
+	struct of	*of;
+
+	while (NULL != (of = ofs)) {
+		ofs = of->next;
+		free(of);
+	}
+}
+
+/*
+ * Run through the files in the global vector "ofs" and add them to the
+ * database specified in "base".
+ *
+ * This handles the parsing scheme itself, using the cues of directory
+ * and filename to determine whether the file is parsable or not.
+ */
+static int
+ofmerge(struct mchars *mc, struct mparse *mp, const char *base)
+{
+	int		 form;
+	size_t		 sz;
 	struct mdoc	*mdoc;
 	struct man	*man;
-	char		*p;
-	const char	*fn, *msec, *march, *mtitle;
-	uint64_t	 mask;
-	size_t		 sv;
-	unsigned	 seq;
-	uint64_t	 vbuf[2];
-	char		 type;
-
-	if (warnings) {
-		files = NULL;
-		hash_reset(&files);
-	}
-
-	rec = 0;
-	for (of = of->first; of; of = of->next) {
-		fn = of->fname;
+	char		 buf[MAXPATHLEN];
+	char		*bufp;
+	const char	*msec, *march, *mtitle, *cp;
+	struct of	*of;
+	enum mandoclevel lvl;
+
+	if (0 == dbopen(base, 0))
+		return(0);
 
+	for (of = ofs; NULL != of; of = of->next) {
 		/*
-		 * Try interpreting the file as mdoc(7) or man(7)
-		 * source code, unless it is already known to be
-		 * formatted.  Fall back to formatted mode.
+		 * If we're a catpage (as defined by our path), then see
+		 * if a manpage exists by the same name (ignoring the
+		 * suffix).
+		 * If it does, then we want to use it instead of our
+		 * own.
 		 */
+		if ( ! use_all && FORM_CAT == of->dform) {
+			sz = strlcpy(buf, of->file, MAXPATHLEN);
+			if (sz >= MAXPATHLEN) {
+				WARNING(of->file, base, 
+					"Filename too long");
+				continue;
+			}
+			bufp = strstr(buf, "cat");
+			assert(NULL != bufp);
+			memcpy(bufp, "man", 3);
+			if (NULL != (bufp = strrchr(buf, '.')))
+				*++bufp = '\0';
+			strlcat(buf, of->dsec, MAXPATHLEN);
+			if (filecheck(buf)) {
+				WARNING(of->file, base, "Man "
+					"source exists: %s", buf);
+				continue;
+			}
+		}
 
+		words = NULL;
 		mparse_reset(mp);
 		mdoc = NULL;
 		man = NULL;
+		form = 0;
+		msec = of->dsec;
+		march = of->arch;
+		mtitle = of->name;
 
-		if ((MANDOC_SRC & of->src_form ||
-		    ! (MANDOC_FORM & of->src_form)) &&
-		    MANDOCLEVEL_FATAL > mparse_readfd(mp, -1, fn))
-			mparse_result(mp, &mdoc, &man);
+		/*
+		 * Try interpreting the file as mdoc(7) or man(7)
+		 * source code, unless it is already known to be
+		 * formatted.  Fall back to formatted mode.
+		 */
+		if (FORM_SRC == of->dform || FORM_SRC == of->sform) {
+			lvl = mparse_readfd(mp, -1, of->file);
+			if (lvl < MANDOCLEVEL_FATAL)
+				mparse_result(mp, &mdoc, &man);
+		} 
 
 		if (NULL != mdoc) {
+			form = 1;
 			msec = mdoc_meta(mdoc)->msec;
 			march = mdoc_meta(mdoc)->arch;
-			if (NULL == march)
-				march = "";
 			mtitle = mdoc_meta(mdoc)->title;
 		} else if (NULL != man) {
+			form = 1;
 			msec = man_meta(man)->msec;
 			march = "";
 			mtitle = man_meta(man)->title;
-		} else {
-			msec = of->sec;
-			march = of->arch;
-			mtitle = of->title;
-		}
+		} 
+
+		if (NULL == msec) 
+			msec = "";
+		if (NULL == march) 
+			march = "";
+		if (NULL == mtitle) 
+			mtitle = "";
 
 		/*
 		 * Check whether the manual section given in a file
@@ -625,13 +1016,11 @@ index_merge(const struct of *of, struct 
 		 * section, like encrypt(1) = makekey(8).  Do not skip
 		 * manuals for such reasons.
 		 */
+		if ( ! use_all && form && strcasecmp(msec, of->dsec))
+			WARNING(of->file, base, "Section \"%s\" "
+				"manual in %s directory", 
+				msec, of->dsec);
 
-		skip = 0;
-		assert(of->sec);
-		assert(msec);
-		if (strcasecmp(msec, of->sec))
-			WARNING(fn, basedir, "Section \"%s\" manual "
-				"in \"%s\" directory", msec, of->sec);
 		/*
 		 * Manual page directories exist for each kernel
 		 * architecture as returned by machine(1).
@@ -646,415 +1035,363 @@ index_merge(const struct of *of, struct 
 		 * Thus, warn about architecture mismatches,
 		 * but don't skip manuals for this reason.
 		 */
-
-		assert(of->arch);
-		assert(march);
-		if (strcasecmp(march, of->arch))
-			WARNING(fn, basedir, "Architecture \"%s\" "
+		if ( ! use_all && strcasecmp(march, of->arch))
+			WARNING(of->file, base, "Architecture \"%s\" "
 				"manual in \"%s\" directory",
 				march, of->arch);
 
-		/*
-		 * By default, skip a file if the title given
-		 * in the file disagrees with the file name.
-		 * Do not warn, this happens for all MLINKs.
-		 */
-
-		assert(of->title);
-		assert(mtitle);
-		if (strcasecmp(mtitle, of->title))
-			skip = 1;
-
-		/*
-		 * Build a title string for the file.  If it matches
-		 * the location of the file, remember the title as
-		 * found; else, remember it as missing.
-		 */
+		putkey(of, of->name, TYPE_Nm);
 
-		if (warnings) {
-			buf->len = 0;
-			buf_appendb(buf, mtitle, strlen(mtitle));
-			buf_appendb(buf, "(", 1);
-			buf_appendb(buf, msec, strlen(msec));
-			if ('\0' != *march) {
-				buf_appendb(buf, "/", 1);
-				buf_appendb(buf, march, strlen(march));
-			}
-			buf_appendb(buf, ")", 2);
-			for (p = buf->cp; '\0' != *p; p++)
-				*p = tolower(*p);
-			key.data = buf->cp;
-			key.size = buf->len;
-			val.data = NULL;
-			val.size = 0;
-			if (0 == skip)
-				val.data = emptystring;
-			else {
-				ch = (*files->get)(files, &key, &val, 0);
-				if (ch < 0) {
-					perror("hash");
-					exit((int)MANDOCLEVEL_SYSERR);
-				} else if (ch > 0) {
-					val.data = (void *)fn;
-					val.size = strlen(fn) + 1;
-				} else
-					val.data = NULL;
-			}
-			if (NULL != val.data &&
-			    (*files->put)(files, &key, &val, 0) < 0) {
-				perror("hash");
-				exit((int)MANDOCLEVEL_SYSERR);
-			}
-		}
+		if (NULL != mdoc) {
+			if (NULL != (cp = mdoc_meta(mdoc)->name))
+				putkey(of, cp, TYPE_Nm);
+			parse_mdoc(of, mdoc_node(mdoc));
+		} else if (NULL != man)
+			parse_man(of, man_node(man));
+		else
+			parse_catpage(of, base);
 
-		if (skip && !use_all)
-			continue;
+		dbindex(mc, form, of, base);
+	}
 
-		/*
-		 * The index record value consists of a nil-terminated
-		 * filename, a nil-terminated manual section, and a
-		 * nil-terminated description.  Use the actual
-		 * location of the file, such that the user can find
-		 * it with man(1).  Since the description may not be
-		 * set, we set a sentinel to see if we're going to
-		 * write a nil byte in its place.
-		 */
+	dbclose(base, 0);
+	return(1);
+}
 
-		dbuf->len = 0;
-		type = mdoc ? 'd' : (man ? 'a' : 'c');
-		buf_appendb(dbuf, &type, 1);
-		buf_appendb(dbuf, fn, strlen(fn) + 1);
-		buf_appendb(dbuf, of->sec, strlen(of->sec) + 1);
-		buf_appendb(dbuf, of->title, strlen(of->title) + 1);
-		buf_appendb(dbuf, of->arch, strlen(of->arch) + 1);
+static void
+parse_catpage(struct of *of, const char *base)
+{
+	FILE		*stream;
+	char		*line, *p, *title;
+	size_t		 len, plen, titlesz;
 
-		sv = dbuf->len;
+	if (NULL == (stream = fopen(of->file, "r"))) {
+		WARNING(of->file, base, "%s", strerror(errno));
+		return;
+	}
 
-		/*
-		 * Collect keyword/mask pairs.
-		 * Each pair will become a new btree node.
-		 */
+	/* Skip to first blank line. */
 
-		hash_reset(&hash);
-		if (mdoc)
-			pmdoc_node(hash, buf, dbuf,
-				mdoc_node(mdoc), mdoc_meta(mdoc));
-		else if (man)
-			pman_node(hash, buf, dbuf, man_node(man));
-		else
-			pformatted(hash, buf, dbuf, of, basedir);
+	while (NULL != (line = fgetln(stream, &len)))
+		if ('\n' == *line)
+			break;
 
-		/* Test mode, do not access any database. */
-
-		if (NULL == mdb->db || NULL == mdb->idx)
-			continue;
-
-		/*
-		 * Make sure the file name is always registered
-		 * as an .Nm search key.
-		 */
-		buf->len = 0;
-		buf_append(buf, of->title);
-		hash_put(hash, buf, TYPE_Nm);
-
-		/*
-		 * Reclaim an empty index record, if available.
-		 * Use its record number for all new btree nodes.
-		 */
+	/*
+	 * Assume the first line that is not indented
+	 * is the first section header.  Skip to it.
+	 */
 
-		if (recs->cur > 0) {
-			recs->cur--;
-			rec = recs->stack[(int)recs->cur];
-		} else if (recs->last > 0) {
-			rec = recs->last;
-			recs->last = 0;
-		} else
-			rec++;
-		vbuf[1] = htobe64(rec);
+	while (NULL != (line = fgetln(stream, &len)))
+		if ('\n' != *line && ' ' != *line)
+			break;
+	
+	/*
+	 * Read up until the next section into a buffer.
+	 * Strip the leading and trailing newline from each read line,
+	 * appending a trailing space.
+	 * Ignore empty (whitespace-only) lines.
+	 */
 
-		/*
-		 * Copy from the in-memory hashtable of pending
-		 * keyword/mask pairs into the database.
-		 */
+	titlesz = 0;
+	title = NULL;
 
-		seq = R_FIRST;
-		while (0 == (ch = (*hash->seq)(hash, &key, &val, seq))) {
-			seq = R_NEXT;
-			assert(sizeof(uint64_t) == val.size);
-			memcpy(&mask, val.data, val.size);
-			vbuf[0] = htobe64(mask);
-			val.size = sizeof(vbuf);
-			val.data = &vbuf;
-			dbt_put(mdb->db, mdb->dbn, &key, &val);
-		}
-		if (ch < 0) {
-			perror("hash");
-			exit((int)MANDOCLEVEL_SYSERR);
+	while (NULL != (line = fgetln(stream, &len))) {
+		if (' ' != *line || '\n' != line[len - 1])
+			break;
+		while (len > 0 && isspace((unsigned char)*line)) {
+			line++;
+			len--;
 		}
+		if (1 == len)
+			continue;
+		title = mandoc_realloc(title, titlesz + len);
+		memcpy(title + titlesz, line, len);
+		titlesz += len;
+		title[titlesz - 1] = ' ';
+	}
 
-		/*
-		 * Apply to the index.  If we haven't had a description
-		 * set, put an empty one in now.
-		 */
-
-		if (dbuf->len == sv)
-			buf_appendb(dbuf, "", 1);
+	/*
+	 * If no page content can be found, or the input line
+	 * is already the next section header, or there is no
+	 * trailing newline, reuse the page title as the page
+	 * description.
+	 */
 
-		key.data = &rec;
-		key.size = sizeof(recno_t);
+	if (NULL == title || '\0' == *title) {
+		WARNING(of->file, base, "Cannot find NAME section");
+		fclose(stream);
+		free(title);
+		return;
+	}
 
-		val.data = dbuf->cp;
-		val.size = dbuf->len;
+	title = mandoc_realloc(title, titlesz + 1);
+	title[titlesz] = '\0';
 
-		if (verb)
-			printf("%s: Adding to index: %s\n", basedir, fn);
+	/*
+	 * Skip to the first dash.
+	 * Use the remaining line as the description (no more than 70
+	 * bytes).
+	 */
 
-		dbt_put(mdb->idx, mdb->idxn, &key, &val);
+	if (NULL != (p = strstr(title, "- "))) {
+		for (p += 2; ' ' == *p || '\b' == *p; p++)
+			/* Skip to next word. */ ;
+	} else {
+		WARNING(of->file, base, "No dash in title line");
+		p = title;
 	}
 
-	/*
-	 * Iterate the remembered file titles and check that
-	 * all files can be found by their main title.
-	 */
+	plen = strlen(p);
 
-	if (warnings) {
-		seq = R_FIRST;
-		while (0 == (*files->seq)(files, &key, &val, seq)) {
-			seq = R_NEXT;
-			if (val.size)
-				WARNING((char *)val.data, basedir,
-					"Probably unreachable, title "
-					"is %s", (char *)key.data);
-		}
-		(*files->close)(files);
+	/* Strip backspace-encoding from line. */
+
+	while (NULL != (line = memchr(p, '\b', plen))) {
+		len = line - p;
+		if (0 == len) {
+			memmove(line, line + 1, plen--);
+			continue;
+		} 
+		memmove(line - 1, line + 1, plen - len);
+		plen -= 2;
 	}
+
+	of->desc = stradd(p);
+	putkey(of, p, TYPE_Nd);
+	fclose(stream);
+	free(title);
 }
 
 /*
- * Scan through all entries in the index file `idx' and prune those
- * entries in `ofile'.
- * Pruning consists of removing from `db', then invalidating the entry
- * in `idx' (zeroing its value size).
+ * Put a type/word pair into the word database for this particular file.
  */
 static void
-index_prune(const struct of *ofile, struct mdb *mdb,
-		struct recs *recs, const char *basedir)
+putkey(const struct of *of, const char *value, uint64_t type)
 {
-	const struct of	*of;
-	const char	*fn;
-	uint64_t	 vbuf[2];
-	unsigned	 seq, sseq;
-	DBT		 key, val;
-	int		 ch;
-
-	recs->cur = 0;
-	seq = R_FIRST;
-	while (0 == (ch = (*mdb->idx->seq)(mdb->idx, &key, &val, seq))) {
-		seq = R_NEXT;
-		assert(sizeof(recno_t) == key.size);
-		memcpy(&recs->last, key.data, key.size);
-
-		/* Deleted records are zero-sized.  Skip them. */
 
-		if (0 == val.size)
-			goto cont;
+	assert(NULL != value);
+	wordaddbuf(of, value, strlen(value), type);
+}
 
-		/*
-		 * Make sure we're sane.
-		 * Read past our mdoc/man/cat type to the next string,
-		 * then make sure it's bounded by a NUL.
-		 * Failing any of these, we go into our error handler.
-		 */
+/*
+ * Like putkey() but for unterminated strings.
+ */
+static void
+putkeys(const struct of *of, const char *value, int sz, uint64_t type)
+{
 
-		fn = (char *)val.data + 1;
-		if (NULL == memchr(fn, '\0', val.size - 1))
-			break;
+	wordaddbuf(of, value, sz, type);
+}
 
-		/*
-		 * Search for the file in those we care about.
-		 * XXX: build this into a tree.  Too slow.
-		 */
+/*
+ * Grok all nodes at or below a certain mdoc node into putkey().
+ */
+static void
+putmdockey(const struct of *of, const struct mdoc_node *n, uint64_t m)
+{
 
-		for (of = ofile->first; of; of = of->next)
-			if (0 == strcmp(fn, of->fname))
-				break;
+	for ( ; NULL != n; n = n->next) {
+		if (NULL != n->child)
+			putmdockey(of, n->child, m);
+		if (MDOC_TEXT == n->type)
+			putkey(of, n->string, m);
+	}
+}
 
-		if (NULL == of)
-			continue;
+static int
+parse_man(struct of *of, const struct man_node *n)
+{
+	const struct man_node *head, *body;
+	char		*start, *sv, *title;
+	char		 byte;
+	size_t		 sz, titlesz;
 
-		/*
-		 * Search through the keyword database, throwing out all
-		 * references to our file.
-		 */
+	if (NULL == n)
+		return(0);
 
-		sseq = R_FIRST;
-		while (0 == (ch = (*mdb->db->seq)(mdb->db,
-					&key, &val, sseq))) {
-			sseq = R_NEXT;
-			if (sizeof(vbuf) != val.size)
-				break;
+	/*
+	 * We're only searching for one thing: the first text child in
+	 * the BODY of a NAME section.  Since we don't keep track of
+	 * sections in -man, run some hoops to find out whether we're in
+	 * the correct section or not.
+	 */
 
-			memcpy(vbuf, val.data, val.size);
-			if (recs->last != betoh64(vbuf[1]))
-				continue;
+	if (MAN_BODY == n->type && MAN_SH == n->tok) {
+		body = n;
+		assert(body->parent);
+		if (NULL != (head = body->parent->head) &&
+				1 == head->nchild &&
+				NULL != (head = (head->child)) &&
+				MAN_TEXT == head->type &&
+				0 == strcmp(head->string, "NAME") &&
+				NULL != (body = body->child) &&
+				MAN_TEXT == body->type) {
 
-			if ((ch = (*mdb->db->del)(mdb->db,
-					&key, R_CURSOR)) < 0)
-				break;
-		}
+			title = NULL;
+			titlesz = 0;
 
-		if (ch < 0) {
-			perror(mdb->dbn);
-			exit((int)MANDOCLEVEL_SYSERR);
-		} else if (1 != ch) {
-			fprintf(stderr, "%s: corrupt database\n",
-					mdb->dbn);
-			exit((int)MANDOCLEVEL_SYSERR);
-		}
+			/*
+			 * Suck the entire NAME section into memory.
+			 * Yes, we might run away.
+			 * But too many manuals have big, spread-out
+			 * NAME sections over many lines.
+			 */
 
-		if (verb)
-			printf("%s: Deleting from index: %s\n",
-					basedir, fn);
+			for ( ; NULL != body; body = body->next) {
+				if (MAN_TEXT != body->type)
+					break;
+				if (0 == (sz = strlen(body->string)))
+					continue;
+				title = mandoc_realloc
+					(title, titlesz + sz + 1);
+				memcpy(title + titlesz, body->string, sz);
+				titlesz += sz + 1;
+				title[titlesz - 1] = ' ';
+			}
+			if (NULL == title)
+				return(1);
 
-		val.size = 0;
-		ch = (*mdb->idx->put)(mdb->idx, &key, &val, R_CURSOR);
+			title = mandoc_realloc(title, titlesz + 1);
+			title[titlesz] = '\0';
 
-		if (ch < 0)
-			break;
-cont:
-		if (recs->cur >= recs->size) {
-			recs->size += MANDOC_SLOP;
-			recs->stack = mandoc_realloc(recs->stack,
-					recs->size * sizeof(recno_t));
-		}
+			/* Skip leading space.  */
 
-		recs->stack[(int)recs->cur] = recs->last;
-		recs->cur++;
-	}
+			sv = title;
+			while (isspace((unsigned char)*sv))
+				sv++;
 
-	if (ch < 0) {
-		perror(mdb->idxn);
-		exit((int)MANDOCLEVEL_SYSERR);
-	} else if (1 != ch) {
-		fprintf(stderr, "%s: corrupt index\n", mdb->idxn);
-		exit((int)MANDOCLEVEL_SYSERR);
-	}
+			if (0 == (sz = strlen(sv))) {
+				free(title);
+				return(1);
+			}
 
-	recs->last++;
-}
+			/* Erase trailing space. */
 
-/*
- * Grow the buffer (if necessary) and copy in a binary string.
- */
-static void
-buf_appendb(struct buf *buf, const void *cp, size_t sz)
-{
+			start = &sv[sz - 1];
+			while (start > sv && isspace((unsigned char)*start))
+				*start-- = '\0';
 
-	/* Overshoot by MANDOC_BUFSZ. */
+			if (start == sv) {
+				free(title);
+				return(1);
+			}
 
-	while (buf->len + sz >= buf->size) {
-		buf->size = buf->len + sz + MANDOC_BUFSZ;
-		buf->cp = mandoc_realloc(buf->cp, buf->size);
-	}
+			start = sv;
 
-	memcpy(buf->cp + (int)buf->len, cp, sz);
-	buf->len += sz;
-}
+			/* 
+			 * Go through a special heuristic dance here.
+			 * Conventionally, one or more manual names are
+			 * comma-specified prior to a whitespace, then a
+			 * dash, then a description.  Try to puzzle out
+			 * the name parts here.
+			 */
 
-/*
- * Append a nil-terminated string to the buffer.  
- * This can be invoked multiple times.  
- * The buffer string will be nil-terminated.
- * If invoked multiple times, a space is put between strings.
- */
-static void
-buf_append(struct buf *buf, const char *cp)
-{
-	size_t		 sz;
+			for ( ;; ) {
+				sz = strcspn(start, " ,");
+				if ('\0' == start[sz])
+					break;
 
-	if (0 == (sz = strlen(cp)))
-		return;
+				byte = start[sz];
+				start[sz] = '\0';
 
-	if (buf->len)
-		buf->cp[(int)buf->len - 1] = ' ';
+				putkey(of, start, TYPE_Nm);
 
-	buf_appendb(buf, cp, sz + 1);
-}
+				if (' ' == byte) {
+					start += sz + 1;
+					break;
+				}
 
-/*
- * Recursively add all text from a given node.  
- * This is optimised for general mdoc nodes in this context, which do
- * not consist of subexpressions and having a recursive call for n->next
- * would be wasteful.
- * The "f" variable should be 0 unless called from pmdoc_Nd for the
- * description buffer, which does not start at the beginning of the
- * buffer.
- */
-static void
-buf_appendmdoc(struct buf *buf, const struct mdoc_node *n, int f)
-{
+				assert(',' == byte);
+				start += sz + 1;
+				while (' ' == *start)
+					start++;
+			}
 
-	for ( ; n; n = n->next) {
-		if (n->child)
-			buf_appendmdoc(buf, n->child, f);
-
-		if (MDOC_TEXT == n->type && f) {
-			f = 0;
-			buf_appendb(buf, n->string, 
-					strlen(n->string) + 1);
-		} else if (MDOC_TEXT == n->type)
-			buf_append(buf, n->string);
+			if (sv == start) {
+				putkey(of, start, TYPE_Nm);
+				free(title);
+				return(1);
+			}
 
-	}
-}
+			while (isspace((unsigned char)*start))
+				start++;
 
-static void
-hash_reset(DB **db)
-{
-	DB		*hash;
+			if (0 == strncmp(start, "-", 1))
+				start += 1;
+			else if (0 == strncmp(start, "\\-\\-", 4))
+				start += 4;
+			else if (0 == strncmp(start, "\\-", 2))
+				start += 2;
+			else if (0 == strncmp(start, "\\(en", 4))
+				start += 4;
+			else if (0 == strncmp(start, "\\(em", 4))
+				start += 4;
 
-	if (NULL != (hash = *db))
-		(*hash->close)(hash);
+			while (' ' == *start)
+				start++;
 
-	*db = dbopen(NULL, O_CREAT|O_RDWR, 0644, DB_HASH, NULL);
-	if (NULL == *db) {
-		perror("hash");
-		exit((int)MANDOCLEVEL_SYSERR);
+			assert(NULL == of->desc);
+			of->desc = stradd(start);
+			putkey(of, start, TYPE_Nd);
+			free(title);
+			return(1);
+		}
 	}
-}
 
-/* ARGSUSED */
-static int
-pmdoc_head(MDOC_ARGS)
-{
+	for (n = n->child; n; n = n->next)
+		if (parse_man(of, n))
+			return(1);
 
-	return(MDOC_HEAD == n->type);
+	return(0);
 }
 
-/* ARGSUSED */
-static int
-pmdoc_body(MDOC_ARGS)
+static void
+parse_mdoc(struct of *of, const struct mdoc_node *n)
 {
 
-	return(MDOC_BODY == n->type);
+	assert(NULL != n);
+	for (n = n->child; NULL != n; n = n->next) {
+		switch (n->type) {
+		case (MDOC_ELEM):
+			/* FALLTHROUGH */
+		case (MDOC_BLOCK):
+			/* FALLTHROUGH */
+		case (MDOC_HEAD):
+			/* FALLTHROUGH */
+		case (MDOC_BODY):
+			/* FALLTHROUGH */
+		case (MDOC_TAIL):
+			if (NULL != mdocs[n->tok].fp)
+			       if (0 == (*mdocs[n->tok].fp)(of, n))
+				       break;
+
+			if (MDOCF_CHILD & mdocs[n->tok].flags)
+				putmdockey(of, n->child, mdocs[n->tok].mask);
+			break;
+		default:
+			assert(MDOC_ROOT != n->type);
+			continue;
+		}
+		if (NULL != n->child)
+			parse_mdoc(of, n);
+	}
 }
 
-/* ARGSUSED */
 static int
-pmdoc_Fd(MDOC_ARGS)
+parse_mdoc_Fd(struct of *of, const struct mdoc_node *n)
 {
 	const char	*start, *end;
 	size_t		 sz;
 
-	if (SEC_SYNOPSIS != n->sec)
-		return(0);
-	if (NULL == (n = n->child) || MDOC_TEXT != n->type)
+	if (SEC_SYNOPSIS != n->sec ||
+			NULL == (n = n->child) || 
+			MDOC_TEXT != n->type)
 		return(0);
 
 	/*
 	 * Only consider those `Fd' macro fields that begin with an
 	 * "inclusion" token (versus, e.g., #define).
 	 */
+
 	if (strcmp("#include", n->string))
 		return(0);
 
@@ -1078,120 +1415,114 @@ pmdoc_Fd(MDOC_ARGS)
 		end--;
 
 	assert(end >= start);
-
-	buf_appendb(buf, start, (size_t)(end - start + 1));
-	buf_appendb(buf, "", 1);
+	putkeys(of, start, end - start + 1, TYPE_In);
 	return(1);
 }
 
-/* ARGSUSED */
 static int
-pmdoc_In(MDOC_ARGS)
+parse_mdoc_In(struct of *of, const struct mdoc_node *n)
 {
 
-	if (NULL == n->child || MDOC_TEXT != n->child->type)
+	if (NULL != n->child && MDOC_TEXT == n->child->type)
 		return(0);
 
-	buf_append(buf, n->child->string);
+	putkey(of, n->child->string, TYPE_In);
 	return(1);
 }
 
-/* ARGSUSED */
 static int
-pmdoc_Fn(MDOC_ARGS)
+parse_mdoc_Fn(struct of *of, const struct mdoc_node *n)
 {
-	struct mdoc_node *nn;
 	const char	*cp;
 
-	nn = n->child;
-
-	if (NULL == nn || MDOC_TEXT != nn->type)
+	if (NULL == (n = n->child) || MDOC_TEXT != n->type)
 		return(0);
 
-	/* .Fn "struct type *name" "char *arg" */
-
-	cp = strrchr(nn->string, ' ');
-	if (NULL == cp)
-		cp = nn->string;
+	/* 
+	 * Parse: .Fn "struct type *name" "char *arg".
+	 * First strip away pointer symbol. 
+	 * Then store the function name, then type.
+	 * Finally, store the arguments. 
+	 */
 
-	/* Strip away pointer symbol. */
+	if (NULL == (cp = strrchr(n->string, ' ')))
+		cp = n->string;
 
 	while ('*' == *cp)
 		cp++;
 
-	/* Store the function name. */
+	putkey(of, cp, TYPE_Fn);
 
-	buf_append(buf, cp);
-	hash_put(hash, buf, TYPE_Fn);
+	if (n->string < cp)
+		putkeys(of, n->string, cp - n->string, TYPE_Ft);
 
-	/* Store the function type. */
-
-	if (nn->string < cp) {
-		buf->len = 0;
-		buf_appendb(buf, nn->string, cp - nn->string);
-		buf_appendb(buf, "", 1);
-		hash_put(hash, buf, TYPE_Ft);
-	}
-
-	/* Store the arguments. */
-
-	for (nn = nn->next; nn; nn = nn->next) {
-		if (MDOC_TEXT != nn->type)
-			continue;
-		buf->len = 0;
-		buf_append(buf, nn->string);
-		hash_put(hash, buf, TYPE_Fa);
-	}
+	for (n = n->next; NULL != n; n = n->next)
+		if (MDOC_TEXT == n->type)
+			putkey(of, n->string, TYPE_Fa);
 
 	return(0);
 }
 
-/* ARGSUSED */
 static int
-pmdoc_St(MDOC_ARGS)
+parse_mdoc_St(struct of *of, const struct mdoc_node *n)
 {
 
 	if (NULL == n->child || MDOC_TEXT != n->child->type)
 		return(0);
 
-	buf_append(buf, n->child->string);
+	putkey(of, n->child->string, TYPE_St);
 	return(1);
 }
 
-/* ARGSUSED */
 static int
-pmdoc_Xr(MDOC_ARGS)
+parse_mdoc_Xr(struct of *of, const struct mdoc_node *n)
 {
 
 	if (NULL == (n = n->child))
 		return(0);
 
-	buf_appendb(buf, n->string, strlen(n->string));
-
-	if (NULL != (n = n->next)) {
-		buf_appendb(buf, ".", 1);
-		buf_appendb(buf, n->string, strlen(n->string) + 1);
-	} else
-		buf_appendb(buf, ".", 2);
-
+	putkey(of, n->string, TYPE_Xr);
 	return(1);
 }
 
-/* ARGSUSED */
 static int
-pmdoc_Nd(MDOC_ARGS)
+parse_mdoc_Nd(struct of *of, const struct mdoc_node *n)
 {
+	size_t		 sz;
+	char		*sv, *desc;
 
 	if (MDOC_BODY != n->type)
 		return(0);
 
-	buf_appendmdoc(dbuf, n->child, 1);
+	/*
+	 * Special-case the `Nd' because we need to put the description
+	 * into the document table.
+	 */
+
+	desc = NULL;
+	for (n = n->child; NULL != n; n = n->next) {
+		if (MDOC_TEXT == n->type) {
+			sz = strlen(n->string) + 1;
+			if (NULL != (sv = desc))
+				sz += strlen(desc) + 1;
+			desc = mandoc_realloc(desc, sz);
+			if (NULL != sv)
+				strlcat(desc, " ", sz);
+			else
+				*desc = '\0';
+			strlcat(desc, n->string, sz);
+		}
+		if (NULL != n->child)
+			parse_mdoc_Nd(of, n);
+	}
+
+	of->desc = NULL != desc ? stradd(desc) : NULL;
+	free(desc);
 	return(1);
 }
 
-/* ARGSUSED */
 static int
-pmdoc_Nm(MDOC_ARGS)
+parse_mdoc_Nm(struct of *of, const struct mdoc_node *n)
 {
 
 	if (SEC_NAME == n->sec)
@@ -1199,711 +1530,491 @@ pmdoc_Nm(MDOC_ARGS)
 	else if (SEC_SYNOPSIS != n->sec || MDOC_HEAD != n->type)
 		return(0);
 
-	if (NULL == n->child)
-		buf_append(buf, m->name);
-
 	return(1);
 }
 
-/* ARGSUSED */
 static int
-pmdoc_Sh(MDOC_ARGS)
+parse_mdoc_Sh(struct of *of, const struct mdoc_node *n)
 {
 
 	return(SEC_CUSTOM == n->sec && MDOC_HEAD == n->type);
 }
 
-static void
-hash_put(DB *db, const struct buf *buf, uint64_t mask)
+static int
+parse_mdoc_head(struct of *of, const struct mdoc_node *n)
 {
-	uint64_t	 oldmask;
-	DBT		 key, val;
-	int		 rc;
 
-	if (buf->len < 2)
-		return;
+	return(MDOC_HEAD == n->type);
+}
 
-	key.data = buf->cp;
-	key.size = buf->len;
+static int
+parse_mdoc_body(struct of *of, const struct mdoc_node *n)
+{
 
-	if ((rc = (*db->get)(db, &key, &val, 0)) < 0) {
-		perror("hash");
-		exit((int)MANDOCLEVEL_SYSERR);
-	} else if (0 == rc) {
-		assert(sizeof(uint64_t) == val.size);
-		memcpy(&oldmask, val.data, val.size);
-		mask |= oldmask;
-	}
-
-	val.data = &mask;
-	val.size = sizeof(uint64_t); 
-
-	if ((rc = (*db->put)(db, &key, &val, 0)) < 0) {
-		perror("hash");
-		exit((int)MANDOCLEVEL_SYSERR);
-	} 
+	return(MDOC_BODY == n->type);
 }
 
-static void
-dbt_put(DB *db, const char *dbn, DBT *key, DBT *val)
+/*
+ * See straddbuf().
+ */
+static char *
+stradd(const char *cp)
 {
 
-	assert(key->size);
-	assert(val->size);
-
-	if (0 == (*db->put)(db, key, val, 0))
-		return;
-	
-	perror(dbn);
-	exit((int)MANDOCLEVEL_SYSERR);
-	/* NOTREACHED */
+	return(straddbuf(cp, strlen(cp)));
 }
 
 /*
- * Call out to per-macro handlers after clearing the persistent database
- * key.  If the macro sets the database key, flush it to the database.
+ * This looks up or adds a string to the string table.
+ * The string table is a table of all strings encountered during parse
+ * or file scan.
+ * In using it, we avoid having thousands of (e.g.) "cat1" string
+ * allocations for the "of" table.
+ * We also have a layer atop the string table for keeping track of words
+ * in a parse sequence (see wordaddbuf()).
  */
-static void
-pmdoc_node(MDOC_ARGS)
+static char *
+straddbuf(const char *cp, size_t sz)
 {
+	struct str	*s;
+	unsigned int	 index;
+	const char	*end;
 
-	if (NULL == n)
-		return;
+	if (NULL != (s = hashget(cp, sz)))
+		return(s->key);
 
-	switch (n->type) {
-	case (MDOC_HEAD):
-		/* FALLTHROUGH */
-	case (MDOC_BODY):
-		/* FALLTHROUGH */
-	case (MDOC_TAIL):
-		/* FALLTHROUGH */
-	case (MDOC_BLOCK):
-		/* FALLTHROUGH */
-	case (MDOC_ELEM):
-		buf->len = 0;
+	s = mandoc_calloc(sizeof(struct str) + sz, 1);
+	memcpy(s->key, cp, sz);
 
-		/*
-		 * Both NULL handlers and handlers returning true
-		 * request using the data.  Only skip the element
-		 * when the handler returns false.
-		 */
+	end = cp + sz;
+	index = ohash_qlookupi(&strings, cp, &end);
+	assert(NULL == ohash_find(&strings, index));
+	ohash_insert(&strings, index, s);
+	return(s->key);
+}
 
-		if (NULL != mdocs[n->tok].fp &&
-		    0 == (*mdocs[n->tok].fp)(hash, buf, dbuf, n, m))
-			break;
+static struct str *
+hashget(const char *cp, size_t sz)
+{
+	unsigned int	 index;
+	const char	*end;
 
-		/*
-		 * For many macros, use the text from all children.
-		 * Set zero flags for macros not needing this.
-		 * In that case, the handler must fill the buffer.
-		 */
+	end = cp + sz;
+	index = ohash_qlookupi(&strings, cp, &end);
+	return(ohash_find(&strings, index));
+}
 
-		if (MDOCF_CHILD & mdocs[n->tok].flags)
-			buf_appendmdoc(buf, n->child, 0);
+/*
+ * Add a word to the current parse sequence.
+ * Within the hashtable of strings, we maintain a list of strings that
+ * are currently indexed.
+ * Each of these ("words") has a bitmask modified within the parse.
+ * When we finish a parse, we'll dump the list, then remove the head
+ * entry -- since the next parse will have a new "of", it can keep track
+ * of its entries without conflict.
+ */
+static void
+wordaddbuf(const struct of *of, 
+		const char *cp, size_t sz, uint64_t v)
+{
+	struct str	*s;
+	unsigned int	 index;
+	const char	*end;
 
-		/*
-		 * Cover the most common case:
-		 * Automatically stage one string per element.
-		 * Set a zero mask for macros not needing this.
-		 * Additional staging can be done in the handler.
-		 */
+	if (0 == sz)
+		return;
+
+	s = hashget(cp, sz);
 
-		if (mdocs[n->tok].mask)
-			hash_put(hash, buf, mdocs[n->tok].mask);
-		break;
-	default:
-		break;
+	if (NULL != s && of == s->of) {
+		s->mask |= v;
+		return;
+	} else if (NULL == s) {
+		s = mandoc_calloc(sizeof(struct str) + sz, 1);
+		memcpy(s->key, cp, sz);
+		end = cp + sz;
+		index = ohash_qlookupi(&strings, cp, &end);
+		assert(NULL == ohash_find(&strings, index));
+		ohash_insert(&strings, index, s);
 	}
 
-	pmdoc_node(hash, buf, dbuf, n->child, m);
-	pmdoc_node(hash, buf, dbuf, n->next, m);
+	s->next = words;
+	s->of = of;
+	s->mask = v;
+	words = s;
 }
 
-static int
-pman_node(MAN_ARGS)
+/*
+ * Take a Unicode codepoint and produce its UTF-8 encoding.
+ * This isn't the best way to do this, but it works.
+ * The magic numbers are from the UTF-8 packaging.
+ * They're not as scary as they seem: read the UTF-8 spec for details.
+ */
+static size_t
+utf8(unsigned int cp, char out[7])
 {
-	const struct man_node *head, *body;
-	char		*start, *sv, *title;
-	size_t		 sz, titlesz;
+	size_t		 rc;
 
-	if (NULL == n)
+	rc = 0;
+	if (cp <= 0x0000007F) {
+		rc = 1;
+		out[0] = (char)cp;
+	} else if (cp <= 0x000007FF) {
+		rc = 2;
+		out[0] = (cp >> 6  & 31) | 192;
+		out[1] = (cp       & 63) | 128;
+	} else if (cp <= 0x0000FFFF) {
+		rc = 3;
+		out[0] = (cp >> 12 & 15) | 224;
+		out[1] = (cp >> 6  & 63) | 128;
+		out[2] = (cp       & 63) | 128;
+	} else if (cp <= 0x001FFFFF) {
+		rc = 4;
+		out[0] = (cp >> 18 &  7) | 240;
+		out[1] = (cp >> 12 & 63) | 128;
+		out[2] = (cp >> 6  & 63) | 128;
+		out[3] = (cp       & 63) | 128;
+	} else if (cp <= 0x03FFFFFF) {
+		rc = 5;
+		out[0] = (cp >> 24 &  3) | 248;
+		out[1] = (cp >> 18 & 63) | 128;
+		out[2] = (cp >> 12 & 63) | 128;
+		out[3] = (cp >> 6  & 63) | 128;
+		out[4] = (cp       & 63) | 128;
+	} else if (cp <= 0x7FFFFFFF) {
+		rc = 6;
+		out[0] = (cp >> 30 &  1) | 252;
+		out[1] = (cp >> 24 & 63) | 128;
+		out[2] = (cp >> 18 & 63) | 128;
+		out[3] = (cp >> 12 & 63) | 128;
+		out[4] = (cp >> 6  & 63) | 128;
+		out[5] = (cp       & 63) | 128;
+	} else
 		return(0);
 
-	/*
-	 * We're only searching for one thing: the first text child in
-	 * the BODY of a NAME section.  Since we don't keep track of
-	 * sections in -man, run some hoops to find out whether we're in
-	 * the correct section or not.
-	 */
-
-	if (MAN_BODY == n->type && MAN_SH == n->tok) {
-		body = n;
-		assert(body->parent);
-		if (NULL != (head = body->parent->head) &&
-				1 == head->nchild &&
-				NULL != (head = (head->child)) &&
-				MAN_TEXT == head->type &&
-				0 == strcmp(head->string, "NAME") &&
-				NULL != (body = body->child) &&
-				MAN_TEXT == body->type) {
-
-			title = NULL;
-			titlesz = 0;
-			/*
-			 * Suck the entire NAME section into memory.
-			 * Yes, we might run away.
-			 * But too many manuals have big, spread-out
-			 * NAME sections over many lines.
-			 */
-			for ( ; NULL != body; body = body->next) {
-				if (MAN_TEXT != body->type)
-					break;
-				if (0 == (sz = strlen(body->string)))
-					continue;
-				title = mandoc_realloc
-					(title, titlesz + sz + 1);
-				memcpy(title + titlesz, body->string, sz);
-				titlesz += sz + 1;
-				title[(int)titlesz - 1] = ' ';
-			}
-			if (NULL == title)
-				return(0);
-
-			title = mandoc_realloc(title, titlesz + 1);
-			title[(int)titlesz] = '\0';
-
-			/* Skip leading space.  */
-
-			sv = title;
-			while (isspace((unsigned char)*sv))
-				sv++;
-
-			if (0 == (sz = strlen(sv))) {
-				free(title);
-				return(0);
-			}
-
-			/* Erase trailing space. */
-
-			start = &sv[sz - 1];
-			while (start > sv && isspace((unsigned char)*start))
-				*start-- = '\0';
-
-			if (start == sv) {
-				free(title);
-				return(0);
-			}
-
-			start = sv;
-
-			/* 
-			 * Go through a special heuristic dance here.
-			 * This is why -man manuals are great!
-			 * (I'm being sarcastic: my eyes are bleeding.)
-			 * Conventionally, one or more manual names are
-			 * comma-specified prior to a whitespace, then a
-			 * dash, then a description.  Try to puzzle out
-			 * the name parts here.
-			 */
-
-			for ( ;; ) {
-				sz = strcspn(start, " ,");
-				if ('\0' == start[(int)sz])
-					break;
-
-				buf->len = 0;
-				buf_appendb(buf, start, sz);
-				buf_appendb(buf, "", 1);
-
-				hash_put(hash, buf, TYPE_Nm);
-
-				if (' ' == start[(int)sz]) {
-					start += (int)sz + 1;
-					break;
-				}
-
-				assert(',' == start[(int)sz]);
-				start += (int)sz + 1;
-				while (' ' == *start)
-					start++;
-			}
-
-			buf->len = 0;
-
-			if (sv == start) {
-				buf_append(buf, start);
-				free(title);
-				return(1);
-			}
-
-			while (isspace((unsigned char)*start))
-				start++;
-
-			if (0 == strncmp(start, "-", 1))
-				start += 1;
-			else if (0 == strncmp(start, "\\-\\-", 4))
-				start += 4;
-			else if (0 == strncmp(start, "\\-", 2))
-				start += 2;
-			else if (0 == strncmp(start, "\\(en", 4))
-				start += 4;
-			else if (0 == strncmp(start, "\\(em", 4))
-				start += 4;
-
-			while (' ' == *start)
-				start++;
-
-			sz = strlen(start) + 1;
-			buf_appendb(dbuf, start, sz);
-			buf_appendb(buf, start, sz);
-
-			hash_put(hash, buf, TYPE_Nd);
-			free(title);
-		}
-	}
-
-	for (n = n->child; n; n = n->next)
-		if (pman_node(hash, buf, dbuf, n))
-			return(1);
-
-	return(0);
+	out[rc] = '\0';
+	return(rc);
 }
 
 /*
- * Parse a formatted manual page.
- * By necessity, this involves rather crude guesswork.
+ * Store the UTF-8 version of a key, or alias the pointer if the key has
+ * no UTF-8 transcription marks in it.
  */
 static void
-pformatted(DB *hash, struct buf *buf, struct buf *dbuf, 
-		const struct of *of, const char *basedir)
+utf8key(struct mchars *mc, struct str *key)
 {
-	FILE		*stream;
-	char		*line, *p, *title;
-	size_t		 len, plen, titlesz;
+	size_t		 sz, bsz, pos;
+	char		 utfbuf[7], res[5];
+	char		*buf;
+	const char	*seq, *cpp, *val;
+	int		 len, u;
+	enum mandoc_esc	 esc;
+
+	assert(NULL == key->utf8);
+
+	res[0] = '\\';
+	res[1] = '\t';
+	res[2] = ASCII_NBRSP;
+	res[3] = ASCII_HYPH;
+	res[4] = '\0';
 
-	if (NULL == (stream = fopen(of->fname, "r"))) {
-		WARNING(of->fname, basedir, "%s", strerror(errno));
-		return;
-	}
+	val = key->key;
+	bsz = strlen(val);
 
 	/*
-	 * Always use the title derived from the filename up front,
-	 * do not even try to find it in the file.  This also makes
-	 * sure we don't end up with an orphan index record, even if
-	 * the file content turns out to be completely unintelligible.
+	 * Pre-check: if we have no stop-characters, then set the
+	 * pointer as ourselvse and get out of here.
 	 */
+	if (strcspn(val, res) == bsz) {
+		key->utf8 = key->key;
+		return;
+	} 
 
-	buf->len = 0;
-	buf_append(buf, of->title);
-	hash_put(hash, buf, TYPE_Nm);
-
-	/* Skip to first blank line. */
+	/* Pre-allocate by the length of the input */
 
-	while (NULL != (line = fgetln(stream, &len)))
-		if ('\n' == *line)
-			break;
+	buf = mandoc_malloc(++bsz);
+	pos = 0;
 
-	/*
-	 * Assume the first line that is not indented
-	 * is the first section header.  Skip to it.
-	 */
+	while ('\0' != *val) {
+		/*
+		 * Halt on the first escape sequence.
+		 * This also halts on the end of string, in which case
+		 * we just copy, fallthrough, and exit the loop.
+		 */
+		if ((sz = strcspn(val, res)) > 0) {
+			memcpy(&buf[pos], val, sz);
+			pos += sz;
+			val += sz;
+		}
 
-	while (NULL != (line = fgetln(stream, &len)))
-		if ('\n' != *line && ' ' != *line)
+		if (ASCII_HYPH == *val) {
+			buf[pos++] = '-';
+			val++;
+			continue;
+		} else if ('\t' == *val || ASCII_NBRSP == *val) {
+			buf[pos++] = ' ';
+			val++;
+			continue;
+		} else if ('\\' != *val)
 			break;
-	
-	/*
-	 * Read up until the next section into a buffer.
-	 * Strip the leading and trailing newline from each read line,
-	 * appending a trailing space.
-	 * Ignore empty (whitespace-only) lines.
-	 */
 
-	titlesz = 0;
-	title = NULL;
+		/* Read past the slash. */
 
-	while (NULL != (line = fgetln(stream, &len))) {
-		if (' ' != *line || '\n' != line[(int)len - 1])
-			break;
-		while (len > 0 && isspace((unsigned char)*line)) {
-			line++;
-			len--;
-		}
-		if (1 == len)
-			continue;
-		title = mandoc_realloc(title, titlesz + len);
-		memcpy(title + titlesz, line, len);
-		titlesz += len;
-		title[(int)titlesz - 1] = ' ';
-	}
+		val++;
+		u = 0;
 
-	/*
-	 * If no page content can be found, or the input line
-	 * is already the next section header, or there is no
-	 * trailing newline, reuse the page title as the page
-	 * description.
-	 */
-
-	if (NULL == title || '\0' == *title) {
-		WARNING(of->fname, basedir, 
-			"Cannot find NAME section");
-		buf_appendb(dbuf, buf->cp, buf->size);
-		hash_put(hash, buf, TYPE_Nd);
-		fclose(stream);
-		free(title);
-		return;
-	}
+		/*
+		 * Parse the escape sequence and see if it's a
+		 * predefined character or special character.
+		 */
+		esc = mandoc_escape
+			((const char **)&val, &seq, &len);
+		if (ESCAPE_ERROR == esc)
+			break;
 
-	title = mandoc_realloc(title, titlesz + 1);
-	title[(int)titlesz] = '\0';
+		if (ESCAPE_SPECIAL != esc)
+			continue;
+		if (0 == (u = mchars_spec2cp(mc, seq, len)))
+			continue;
 
-	/*
-	 * Skip to the first dash.
-	 * Use the remaining line as the description (no more than 70
-	 * bytes).
-	 */
+		/*
+		 * If we have a Unicode codepoint, try to convert that
+		 * to a UTF-8 byte string.
+		 */
+		cpp = utfbuf;
+		if (0 == (sz = utf8(u, utfbuf)))
+			continue;
 
-	if (NULL != (p = strstr(title, "- "))) {
-		for (p += 2; ' ' == *p || '\b' == *p; p++)
-			/* Skip to next word. */ ;
-	} else {
-		WARNING(of->fname, basedir, 
-			"No dash in title line");
-		p = title;
-	}
+		/* Copy the rendered glyph into the stream. */
 
-	plen = strlen(p);
+		sz = strlen(cpp);
+		bsz += sz;
 
-	/* Strip backspace-encoding from line. */
+		buf = mandoc_realloc(buf, bsz);
 
-	while (NULL != (line = memchr(p, '\b', plen))) {
-		len = line - p;
-		if (0 == len) {
-			memmove(line, line + 1, plen--);
-			continue;
-		} 
-		memmove(line - 1, line + 1, plen - len);
-		plen -= 2;
+		memcpy(&buf[pos], cpp, sz);
+		pos += sz;
 	}
 
-	buf_appendb(dbuf, p, plen + 1);
-	buf->len = 0;
-	buf_appendb(buf, p, plen + 1);
-	hash_put(hash, buf, TYPE_Nd);
-	fclose(stream);
-	free(title);
+	buf[pos] = '\0';
+	key->utf8 = buf;
 }
 
+/*
+ * Flush the current page's terms (and their bits) into the database.
+ * Wrap the entire set of additions in a transaction to make sqlite be a
+ * little faster.
+ * Also, UTF-8-encode the description at the last possible moment.
+ */
 static void
-ofile_argbuild(int argc, char *argv[], 
-		struct of **of, const char *basedir)
+dbindex(struct mchars *mc, int form, 
+		const struct of *of, const char *base)
 {
-	char		 buf[MAXPATHLEN];
-	const char	*sec, *arch, *title;
-	char		*p;
-	int		 i, src_form;
-	struct of	*nof;
+	struct str	*key;
+	const char	*desc;
+	int64_t		 recno;
 
-	for (i = 0; i < argc; i++) {
+	DEBUG(of->file, base, "Adding to index");
 
-		/*
-		 * Try to infer the manual section, architecture and
-		 * page title from the path, assuming it looks like
-		 *   man*[/<arch>]/<title>.<section>   or
-		 *   cat<section>[/<arch>]/<title>.0
-		 */
+	if (nodb)
+		return;
 
-		if (strlcpy(buf, argv[i], sizeof(buf)) >= sizeof(buf)) {
-			fprintf(stderr, "%s: Path too long\n", argv[i]);
-			continue;
-		}
-		sec = arch = title = "";
-		src_form = 0;
-		p = strrchr(buf, '\0');
-		while (p-- > buf) {
-			if ('\0' == *sec && '.' == *p) {
-				sec = p + 1;
-				*p = '\0';
-				if ('0' == *sec)
-					src_form |= MANDOC_FORM;
-				else if ('1' <= *sec && '9' >= *sec)
-					src_form |= MANDOC_SRC;
-				continue;
-			}
-			if ('/' != *p)
-				continue;
-			if ('\0' == *title) {
-				title = p + 1;
-				*p = '\0';
-				continue;
-			}
-			if (0 == strncmp("man", p + 1, 3))
-				src_form |= MANDOC_SRC;
-			else if (0 == strncmp("cat", p + 1, 3))
-				src_form |= MANDOC_FORM;
-			else
-				arch = p + 1;
-			break;
-		}
-		if ('\0' == *title) {
-			WARNING(argv[i], basedir, 
-				"Cannot deduce title from filename");
-			title = buf;
-		}
+	desc = "";
+	if (NULL != of->desc) {
+		key = hashget(of->desc, strlen(of->desc));
+		assert(NULL != key);
+		if (NULL == key->utf8)
+			utf8key(mc, key);
+		desc = key->utf8;
+	}
+
+	sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);
+
+	sqlite3_bind_text
+		(stmts[STMT_INSERT_DOC], 1, 
+		 of->file, -1, SQLITE_STATIC);
+	sqlite3_bind_text
+		(stmts[STMT_INSERT_DOC], 2, 
+		 of->sec, -1, SQLITE_STATIC);
+	sqlite3_bind_text
+		(stmts[STMT_INSERT_DOC], 3, 
+		 of->arch, -1, SQLITE_STATIC);
+	sqlite3_bind_text
+		(stmts[STMT_INSERT_DOC], 4, 
+		 desc, -1, SQLITE_STATIC);
+	sqlite3_bind_int
+		(stmts[STMT_INSERT_DOC], 5, form);
+	sqlite3_step(stmts[STMT_INSERT_DOC]);
+	recno = sqlite3_last_insert_rowid(db);
+	sqlite3_reset(stmts[STMT_INSERT_DOC]);
+
+	for (key = words; NULL != key; key = key->next) {
+		assert(key->of == of);
+		if (NULL == key->utf8)
+			utf8key(mc, key);
+		sqlite3_bind_int64
+			(stmts[STMT_INSERT_KEY], 1, key->mask);
+		sqlite3_bind_text
+			(stmts[STMT_INSERT_KEY], 2, 
+			 key->utf8, -1, SQLITE_STATIC);
+		sqlite3_bind_int64
+			(stmts[STMT_INSERT_KEY], 3, recno);
+		sqlite3_step(stmts[STMT_INSERT_KEY]);
+		sqlite3_reset(stmts[STMT_INSERT_KEY]);
+	}
 
-		/*
-		 * Build the file structure.
-		 */
+	sqlite3_exec(db, "COMMIT TRANSACTION", NULL, NULL, NULL);
 
-		nof = mandoc_calloc(1, sizeof(struct of));
-		nof->fname = mandoc_strdup(argv[i]);
-		nof->sec = mandoc_strdup(sec);
-		nof->arch = mandoc_strdup(arch);
-		nof->title = mandoc_strdup(title);
-		nof->src_form = src_form;
+}
 
-		/*
-		 * Add the structure to the list.
-		 */
+static void
+dbprune(const char *base)
+{
+	struct of	*of;
 
-		if (NULL == *of) {
-			*of = nof;
-			(*of)->first = nof;
-		} else {
-			nof->first = (*of)->first;
-			(*of)->next = nof;
-			*of = nof;
-		}
+	if (nodb)
+		return;
+
+	for (of = ofs; NULL != of; of = of->next) {
+		sqlite3_bind_text
+			(stmts[STMT_DELETE], 1, 
+			 of->file, -1, SQLITE_STATIC);
+		sqlite3_step(stmts[STMT_DELETE]);
+		sqlite3_reset(stmts[STMT_DELETE]);
+		DEBUG(of->file, base, "Deleted from index");
 	}
 }
 
 /*
- * Recursively build up a list of files to parse.
- * We use this instead of ftw() and so on because I don't want global
- * variables hanging around.
- * This ignores the mandocdb.db and mandocdb.index files, but assumes that
- * everything else is a manual.
- * Pass in a pointer to a NULL structure for the first invocation.
+ * Close an existing database and its prepared statements.
+ * If "real" is not set, rename the temporary file into the real one.
  */
 static void
-ofile_dirbuild(const char *dir, const char* psec, const char *parch,
-		int p_src_form, struct of **of, char *basedir)
+dbclose(const char *base, int real)
 {
-	char		 buf[MAXPATHLEN];
-	size_t		 sz;
-	DIR		*d;
-	const char	*fn, *sec, *arch;
-	char		*p, *q, *suffix;
-	struct of	*nof;
-	struct dirent	*dp;
-	int		 src_form;
+	size_t		 i;
+	char		 file[MAXPATHLEN];
 
-	if (NULL == (d = opendir(dir))) {
-		WARNING("", dir, "%s", strerror(errno));
+	if (nodb)
 		return;
-	}
-
-	while (NULL != (dp = readdir(d))) {
-		fn = dp->d_name;
 
-		if ('.' == *fn)
-			continue;
+	for (i = 0; i < STMT__MAX; i++) {
+		sqlite3_finalize(stmts[i]);
+		stmts[i] = NULL;
+	}
 
-		src_form = p_src_form;
+	sqlite3_close(db);
+	db = NULL;
 
-		if (DT_DIR == dp->d_type) {
-			sec = psec;
-			arch = parch;
+	if (real)
+		return;
 
-			/*
-			 * By default, only use directories called:
-			 *   man<section>/[<arch>/]   or
-			 *   cat<section>/[<arch>/]
-			 */
+	strlcpy(file, MANDOC_DB, MAXPATHLEN);
+	strlcat(file, "~", MAXPATHLEN);
+	if (-1 == rename(file, MANDOC_DB))
+		perror(MANDOC_DB);
+}
 
-			if ('\0' == *sec) {
-				if(0 == strncmp("man", fn, 3)) {
-					src_form |= MANDOC_SRC;
-					sec = fn + 3;
-				} else if (0 == strncmp("cat", fn, 3)) {
-					src_form |= MANDOC_FORM;
-					sec = fn + 3;
-				} else {
-					WARNING(fn, basedir, "Bad section");
-					if (use_all)
-						sec = fn;
-					else
-						continue;
-				}
-			} else if ('\0' == *arch) {
-				if (NULL != strchr(fn, '.')) {
-					WARNING(fn, basedir, "Bad architecture");
-					if (0 == use_all)
-						continue;
-				}
-				arch = fn;
-			} else {
-				WARNING(fn, basedir, "Excessive subdirectory");
-				if (0 == use_all)
-					continue;
-			}
+/*
+ * This is straightforward stuff.
+ * Open a database connection to a "temporary" database, then open a set
+ * of prepared statements we'll use over and over again.
+ * If "real" is set, we use the existing database; if not, we truncate a
+ * temporary one.
+ * Must be matched by dbclose().
+ */
+static int
+dbopen(const char *base, int real)
+{
+	char		 file[MAXPATHLEN];
+	const char	*sql;
+	int		 rc, ofl;
+	size_t		 sz;
 
-			buf[0] = '\0';
-			strlcat(buf, dir, MAXPATHLEN);
-			strlcat(buf, "/", MAXPATHLEN);
-			strlcat(basedir, "/", MAXPATHLEN);
-			strlcat(basedir, fn, MAXPATHLEN);
-			sz = strlcat(buf, fn, MAXPATHLEN);
+	if (nodb) 
+		return(1);
 
-			if (MAXPATHLEN <= sz) {
-				WARNING(fn, basedir, "Path too long");
-				continue;
-			}
+	sz = strlcpy(file, MANDOC_DB, MAXPATHLEN);
+	if ( ! real)
+		sz = strlcat(file, "~", MAXPATHLEN);
 
-			ofile_dirbuild(buf, sec, arch,
-					src_form, of, basedir);
+	if (sz >= MAXPATHLEN) {
+		fprintf(stderr, "%s: Path too long\n", file);
+		return(0);
+	}
 
-			p = strrchr(basedir, '/');
-			*p = '\0';
-			continue;
-		}
+	if ( ! real)
+		remove(file);
 
-		if (DT_REG != dp->d_type) {
-			WARNING(fn, basedir, "Not a regular file");
-			continue;
-		}
-		if (!strcmp(MANDOC_DB, fn) || !strcmp(MANDOC_IDX, fn))
-			continue;
-		if ('\0' == *psec) {
-			WARNING(fn, basedir, "File outside section");
-			if (0 == use_all)
-				continue;
-		}
+	ofl = SQLITE_OPEN_PRIVATECACHE | SQLITE_OPEN_READWRITE;
+	rc = sqlite3_open_v2(file, &db, ofl, NULL);
+	if (SQLITE_OK == rc) 
+		return(1);
+	if (SQLITE_CANTOPEN != rc) {
+		perror(file);
+		return(0);
+	}
 
-		/*
-		 * By default, skip files where the file name suffix
-		 * does not agree with the section directory
-		 * they are located in.
-		 */
+	sqlite3_close(db);
+	db = NULL;
 
-		suffix = strrchr(fn, '.');
-		if (NULL == suffix) {
-			WARNING(fn, basedir, "No filename suffix");
-			if (0 == use_all)
-				continue;
-		} else if ((MANDOC_SRC & src_form &&
-				strcmp(suffix + 1, psec)) ||
-			    (MANDOC_FORM & src_form &&
-				strcmp(suffix + 1, "0"))) {
-			WARNING(fn, basedir, "Wrong filename suffix");
-			if (0 == use_all)
-				continue;
-			if ('0' == suffix[1])
-				src_form |= MANDOC_FORM;
-			else if ('1' <= suffix[1] && '9' >= suffix[1])
-				src_form |= MANDOC_SRC;
-		}
-
-		/*
-		 * Skip formatted manuals if a source version is
-		 * available.  Ignore the age: it is very unlikely
-		 * that people install newer formatted base manuals
-		 * when they used to have source manuals before,
-		 * and in ports, old manuals get removed on update.
-		 */
-		if (0 == use_all && MANDOC_FORM & src_form &&
-				'\0' != *psec) {
-			buf[0] = '\0';
-			strlcat(buf, dir, MAXPATHLEN);
-			p = strrchr(buf, '/');
-			if ('\0' != *parch && NULL != p)
-				for (p--; p > buf; p--)
-					if ('/' == *p)
-						break;
-			if (NULL == p)
-				p = buf;
-			else
-				p++;
-			if (0 == strncmp("cat", p, 3))
-				memcpy(p, "man", 3);
-			strlcat(buf, "/", MAXPATHLEN);
-			sz = strlcat(buf, fn, MAXPATHLEN);
-			if (sz >= MAXPATHLEN) {
-				WARNING(fn, basedir, "Path too long");
-				continue;
-			}
-			q = strrchr(buf, '.');
-			if (NULL != q && p < q++) {
-				*q = '\0';
-				sz = strlcat(buf, psec, MAXPATHLEN);
-				if (sz >= MAXPATHLEN) {
-					WARNING(fn, basedir, "Path too long");
-					continue;
-				}
-				if (0 == access(buf, R_OK))
-					continue;
-			}
-		}
+	if (SQLITE_OK != (rc = sqlite3_open(file, &db))) {
+		perror(file);
+		return(0);
+	}
 
-		buf[0] = '\0';
-		assert('.' == dir[0]);
-		if ('/' == dir[1]) {
-			strlcat(buf, dir + 2, MAXPATHLEN);
-			strlcat(buf, "/", MAXPATHLEN);
-		}
-		sz = strlcat(buf, fn, MAXPATHLEN);
-		if (sz >= MAXPATHLEN) {
-			WARNING(fn, basedir, "Path too long");
-			continue;
-		}
+	sql = "CREATE TABLE \"docs\" (\n"
+	      " \"file\" TEXT NOT NULL,\n"
+	      " \"sec\" TEXT NOT NULL,\n"
+	      " \"arch\" TEXT NOT NULL,\n"
+	      " \"desc\" TEXT NOT NULL,\n"
+	      " \"form\" INTEGER NOT NULL,\n"
+	      " \"id\" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL\n"
+	      ");\n"
+	      "\n"
+	      "CREATE TABLE \"keys\" (\n"
+	      " \"bits\" INTEGER NOT NULL,\n"
+	      " \"key\" TEXT NOT NULL,\n"
+	      " \"docid\" INTEGER NOT NULL REFERENCES docs(id) "
+	      	"ON DELETE CASCADE,\n"
+	      " \"id\" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL\n"
+	      ");\n"
+	      "\n"
+	      "CREATE INDEX \"key_index\" ON keys (key);\n";
 
-		nof = mandoc_calloc(1, sizeof(struct of));
-		nof->fname = mandoc_strdup(buf);
-		nof->sec = mandoc_strdup(psec);
-		nof->arch = mandoc_strdup(parch);
-		nof->src_form = src_form;
+	if (SQLITE_OK != sqlite3_exec(db, sql, NULL, NULL, NULL)) {
+		perror(sqlite3_errmsg(db));
+		return(0);
+	}
 
-		/*
-		 * Remember the file name without the extension,
-		 * to be used as the page title in the database.
-		 */
+	sql = "DELETE FROM docs where file=?";
+	sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_DELETE], NULL);
+	sql = "INSERT INTO docs "
+		"(file,sec,arch,desc,form) VALUES (?,?,?,?,?)";
+	sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_INSERT_DOC], NULL);
+	sql = "INSERT INTO keys "
+		"(bits,key,docid) VALUES (?,?,?)";
+	sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_INSERT_KEY], NULL);
+	return(1);
+}
 
-		if (NULL != suffix)
-			*suffix = '\0';
-		nof->title = mandoc_strdup(fn);
+static void *
+hash_halloc(size_t sz, void *arg)
+{
 
-		/*
-		 * Add the structure to the list.
-		 */
+	return(mandoc_calloc(sz, 1));
+}
 
-		if (NULL == *of) {
-			*of = nof;
-			(*of)->first = nof;
-		} else {
-			nof->first = (*of)->first;
-			(*of)->next = nof;
-			*of = nof;
-		}
-	}
+static void *
+hash_alloc(size_t sz, void *arg)
+{
 
-	closedir(d);
+	return(mandoc_malloc(sz));
 }
 
 static void
-ofile_free(struct of *of)
+hash_free(void *p, size_t sz, void *arg)
 {
-	struct of	*nof;
-
-	if (NULL != of)
-		of = of->first;
 
-	while (NULL != of) {
-		nof = of->next;
-		free(of->fname);
-		free(of->sec);
-		free(of->arch);
-		free(of->title);
-		free(of);
-		of = nof;
-	}
+	free(p);
 }
Index: mandocdb.h
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/mandocdb.h,v
retrieving revision 1.6
diff -u -p -r1.6 mandocdb.h
--- mandocdb.h	23 Mar 2012 02:52:33 -0000	1.6
+++ mandocdb.h	7 Jun 2012 15:09:34 -0000
@@ -18,7 +18,6 @@
 #define MANDOCDB_H
 
 #define	MANDOC_DB	"mandocdb.db"
-#define	MANDOC_IDX	"mandocdb.index"
 
 #define	TYPE_An		0x0000000000000001ULL
 #define	TYPE_Ar		0x0000000000000002ULL
Index: manpage.c
===================================================================
RCS file: manpage.c
diff -N manpage.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ manpage.c	7 Jun 2012 15:09:34 -0000
@@ -0,0 +1,178 @@
+/*	$Id: mandocdb.c,v 1.46 2012/03/23 06:52:17 kristaps Exp $ */
+/*
+ * Copyright (c) 2012 Kristaps Dzonsons <kristaps@bsd.lv>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+#include <sys/param.h>
+
+#include <assert.h>
+#include <getopt.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "manpath.h"
+#include "mansearch.h"
+
+static	void	 show(const char *, const char *);
+
+int
+main(int argc, char *argv[])
+{
+	int		 ch, term;
+	size_t		 i, sz, len;
+	struct manpage	*res;
+	char		*conf_file, *defpaths, *auxpaths, *cp,
+			*arch, *sec;
+	char		 buf[MAXPATHLEN];
+	const char	*cmd;
+	struct manpaths	 paths;
+	char		*progname;
+	extern char	*optarg;
+	extern int	 optind;
+
+	term = isatty(STDIN_FILENO) && isatty(STDOUT_FILENO);
+
+	progname = strrchr(argv[0], '/');
+	if (progname == NULL)
+		progname = argv[0];
+	else
+		++progname;
+
+	auxpaths = defpaths = conf_file = arch = sec = NULL;
+	memset(&paths, 0, sizeof(struct manpaths));
+
+	while (-1 != (ch = getopt(argc, argv, "C:M:m:S:s:")))
+		switch (ch) {
+		case ('C'):
+			conf_file = optarg;
+			break;
+		case ('M'):
+			defpaths = optarg;
+			break;
+		case ('m'):
+			auxpaths = optarg;
+			break;
+		case ('S'):
+			arch = optarg;
+			break;
+		case ('s'):
+			sec = optarg;
+			break;
+		default:
+			goto usage;
+		}
+
+	argc -= optind;
+	argv += optind;
+
+	if (0 == argc)
+		goto usage;
+
+	manpath_parse(&paths, conf_file, defpaths, auxpaths);
+	ch = mansearch(&paths, arch, sec, argc, argv, &res, &sz);
+	manpath_free(&paths);
+
+	if (0 == ch)
+		goto usage;
+
+	if (0 == sz) {
+		free(res);
+		return(EXIT_FAILURE);
+	} else if (1 == sz && term) {
+		i = 1;
+		goto show;
+	} else if (NULL == res)
+		return(EXIT_FAILURE);
+
+	for (i = 0; i < sz; i++) {
+		printf("%6zu  %s: %s\n", 
+			i + 1, res[i].file, res[i].desc);
+		free(res[i].desc);
+	}
+
+	if (0 == term) {
+		free(res);
+		return(EXIT_SUCCESS);
+	}
+
+	i = 1;
+	printf("Enter a choice [1]: ");
+	fflush(stdout);
+
+	if (NULL != (cp = fgetln(stdin, &len)))
+		if ('\n' == cp[--len] && len > 0) {
+			cp[len] = '\0';
+			if ((i = atoi(cp)) < 1 || i > sz)
+				i = 0;
+		}
+
+	if (0 == i) {
+		free(res);
+		return(EXIT_SUCCESS);
+	}
+show:
+	cmd = res[i - 1].form ? "mandoc" : "cat";
+	strlcpy(buf, res[i - 1].file, MAXPATHLEN);
+	free(res);
+
+	show(cmd, buf);
+	/* NOTREACHED */
+usage:
+	fprintf(stderr, "usage: %s [-C conf] "
+			 	  "[-M paths] "
+				  "[-m paths] "
+				  "[-S arch] "
+				  "[-s section] "
+			          "expr ...\n", 
+				  progname);
+	return(EXIT_FAILURE);
+}
+
+static void
+show(const char *cmd, const char *file)
+{
+	int		 fds[2];
+	pid_t		 pid;
+
+	if (-1 == pipe(fds)) {
+		perror(NULL);
+		exit(EXIT_FAILURE);
+	}
+
+	if (-1 == (pid = fork())) {
+		perror(NULL);
+		exit(EXIT_FAILURE);
+	} else if (pid > 0) {
+		dup2(fds[0], STDIN_FILENO);
+		close(fds[1]);
+		cmd = NULL != getenv("MANPAGER") ? 
+			getenv("MANPAGER") :
+			(NULL != getenv("PAGER") ? 
+			 getenv("PAGER") : "more");
+		execlp(cmd, cmd, (char *)NULL);
+		perror(cmd);
+		exit(EXIT_FAILURE);
+	}
+
+	dup2(fds[1], STDOUT_FILENO);
+	close(fds[0]);
+	execlp(cmd, cmd, file, (char *)NULL);
+	perror(cmd);
+	exit(EXIT_FAILURE);
+}
Index: manpath.c
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/manpath.c,v
retrieving revision 1.8
diff -u -p -r1.8 manpath.c
--- manpath.c	24 Dec 2011 22:37:16 -0000	1.8
+++ manpath.c	7 Jun 2012 15:09:34 -0000
@@ -74,7 +74,7 @@ manpath_parse(struct manpaths *dirs, con
 
 	do {
 		buf = mandoc_realloc(buf, bsz + 1024);
-		sz = fread(buf + (int)bsz, 1, 1024, stream);
+		sz = fread(buf + bsz, 1, 1024, stream);
 		bsz += sz;
 	} while (sz > 0);
 
@@ -117,7 +117,7 @@ manpath_parse(struct manpaths *dirs, con
 	}
 
 	/* Append man.conf(5) to MANPATH. */
-	if (':' == defp[(int)strlen(defp) - 1]) {
+	if (':' == defp[strlen(defp) - 1]) {
 		manpath_parseline(dirs, defp);
 		manpath_manconf(dirs, file);
 		return;
@@ -162,7 +162,7 @@ manpath_add(struct manpaths *dirs, const
 {
 	char		 buf[PATH_MAX];
 	char		*cp;
-	int		 i;
+	size_t		 i;
 
 	if (NULL == (cp = realpath(dir, buf)))
 		return;
@@ -173,7 +173,7 @@ manpath_add(struct manpaths *dirs, const
 
 	dirs->paths = mandoc_realloc
 		(dirs->paths,
-		 ((size_t)dirs->sz + 1) * sizeof(char *));
+		 (dirs->sz + 1) * sizeof(char *));
 
 	dirs->paths[dirs->sz++] = mandoc_strdup(cp);
 }
@@ -181,7 +181,7 @@ manpath_add(struct manpaths *dirs, const
 void
 manpath_free(struct manpaths *p)
 {
-	int		 i;
+	size_t		 i;
 
 	for (i = 0; i < p->sz; i++)
 		free(p->paths[i]);
Index: manpath.h
===================================================================
RCS file: /usr/vhosts/mdocml.bsd.lv/cvs/mdocml/manpath.h,v
retrieving revision 1.5
diff -u -p -r1.5 manpath.h
--- manpath.h	13 Dec 2011 20:56:46 -0000	1.5
+++ manpath.h	7 Jun 2012 15:09:34 -0000
@@ -23,7 +23,7 @@
  * databases.
  */
 struct	manpaths {
-	int	  sz;
+	size_t	  sz;
 	char	**paths;
 };
 
Index: mansearch.c
===================================================================
RCS file: mansearch.c
diff -N mansearch.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ mansearch.c	7 Jun 2012 15:09:34 -0000
@@ -0,0 +1,436 @@
+/*	$Id: mandocdb.c,v 1.46 2012/03/23 06:52:17 kristaps Exp $ */
+/*
+ * Copyright (c) 2012 Kristaps Dzonsons <kristaps@bsd.lv>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#include <sys/param.h>
+
+#include <assert.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <ohash.h>
+#include <sqlite3.h>
+
+#include "mandoc.h"
+#include "manpath.h"
+#include "mandocdb.h"
+#include "mansearch.h"
+
+struct	expr {
+	int		 glob; /* is glob? */
+	uint64_t 	 bits; /* type-mask */
+	const char	*v; /* search value */
+	struct expr	*next; /* next in sequence */
+};
+
+struct	match {
+	uint64_t	 id; /* identifier in database */
+	char		*file; /* relative filepath of manpage */
+	char		*desc; /* description of manpage */
+	int		 form; /* 0 == catpage */
+};
+
+struct	type {
+	uint64_t	 bits;
+	const char	*name;
+};
+
+static	const struct type types[] = {
+	{ TYPE_An,  "An" },
+	{ TYPE_Ar,  "Ar" },
+	{ TYPE_At,  "At" },
+	{ TYPE_Bsx, "Bsx" },
+	{ TYPE_Bx,  "Bx" },
+	{ TYPE_Cd,  "Cd" },
+	{ TYPE_Cm,  "Cm" },
+	{ TYPE_Dv,  "Dv" },
+	{ TYPE_Dx,  "Dx" },
+	{ TYPE_Em,  "Em" },
+	{ TYPE_Er,  "Er" },
+	{ TYPE_Ev,  "Ev" },
+	{ TYPE_Fa,  "Fa" },
+	{ TYPE_Fl,  "Fl" },
+	{ TYPE_Fn,  "Fn" },
+	{ TYPE_Fn,  "Fo" },
+	{ TYPE_Ft,  "Ft" },
+	{ TYPE_Fx,  "Fx" },
+	{ TYPE_Ic,  "Ic" },
+	{ TYPE_In,  "In" },
+	{ TYPE_Lb,  "Lb" },
+	{ TYPE_Li,  "Li" },
+	{ TYPE_Lk,  "Lk" },
+	{ TYPE_Ms,  "Ms" },
+	{ TYPE_Mt,  "Mt" },
+	{ TYPE_Nd,  "Nd" },
+	{ TYPE_Nm,  "Nm" },
+	{ TYPE_Nx,  "Nx" },
+	{ TYPE_Ox,  "Ox" },
+	{ TYPE_Pa,  "Pa" },
+	{ TYPE_Rs,  "Rs" },
+	{ TYPE_Sh,  "Sh" },
+	{ TYPE_Ss,  "Ss" },
+	{ TYPE_St,  "St" },
+	{ TYPE_Sy,  "Sy" },
+	{ TYPE_Tn,  "Tn" },
+	{ TYPE_Va,  "Va" },
+	{ TYPE_Va,  "Vt" },
+	{ TYPE_Xr,  "Xr" },
+	{ ~0ULL,    "any" },
+	{ 0ULL, NULL }
+};
+
+static	void		*hash_alloc(size_t, void *);
+static	void		 hash_free(void *, size_t, void *);
+static	void		*hash_halloc(size_t, void *);
+static	struct expr	*exprcomp(int, char *[]);
+static	void		 exprfree(struct expr *);
+static	struct expr	*exprterm(char *);
+static	char		*sql_statement(const struct expr *,
+				const char *, const char *);
+
+int
+mansearch(const struct manpaths *paths, 
+		const char *arch, const char *sec,
+		int argc, char *argv[], 
+		struct manpage **res, size_t *sz)
+{
+	int		 fd, rc;
+	int64_t		 id;
+	char		 buf[MAXPATHLEN];
+	char		*sql;
+	struct expr	*e, *ep;
+	sqlite3		*db;
+	sqlite3_stmt	*s;
+	struct match	*mp;
+	struct ohash_info info;
+	struct ohash	 htab;
+	unsigned int	 idx;
+	size_t		 i, j, cur, maxres;
+
+	memset(&info, 0, sizeof(struct ohash_info));
+
+	info.halloc = hash_halloc;
+	info.alloc = hash_alloc;
+	info.hfree = hash_free;
+	info.key_offset = offsetof(struct match, id);
+
+	*sz = 0;
+	sql = NULL;
+	*res = NULL;
+	fd = -1;
+	e = NULL;
+	cur = maxres = 0;
+
+	if (0 == argc)
+		goto out;
+	if (NULL == (e = exprcomp(argc, argv)))
+		goto out;
+
+	/*
+	 * Save a descriptor to the current working directory.
+	 * Since pathnames in the "paths" variable might be relative,
+	 * and we'll be chdir()ing into them, we need to keep a handle
+	 * on our current directory from which to start the chdir().
+	 */
+
+	if (NULL == getcwd(buf, MAXPATHLEN)) {
+		perror(NULL);
+		goto out;
+	} else if (-1 == (fd = open(buf, O_RDONLY, 0))) {
+		perror(buf);
+		goto out;
+	}
+
+	sql = sql_statement(e, arch, sec);
+
+	/*
+	 * Loop over the directories (containing databases) for us to
+	 * search.
+	 * Don't let missing/bad databases/directories phase us.
+	 * In each, try to open the resident database and, if it opens,
+	 * scan it for our match expression.
+	 */
+
+	for (i = 0; i < paths->sz; i++) {
+		if (-1 == fchdir(fd)) {
+			/* FIXME: will return success */
+			perror(buf);
+			free(*res);
+			break;
+		} else if (-1 == chdir(paths->paths[i])) {
+			perror(paths->paths[i]);
+			continue;
+		} 
+
+		rc =  sqlite3_open_v2
+			(MANDOC_DB, &db, SQLITE_OPEN_READONLY, NULL);
+
+		if (SQLITE_OK != rc) {
+			perror(MANDOC_DB);
+			sqlite3_close(db);
+			continue;
+		}
+
+		j = 1;
+		sqlite3_prepare_v2(db, sql, -1, &s, NULL);
+
+		if (NULL != arch)
+			sqlite3_bind_text
+				(s, j++, arch, -1, SQLITE_STATIC);
+		if (NULL != sec)
+			sqlite3_bind_text
+				(s, j++, sec, -1, SQLITE_STATIC);
+
+		for (ep = e; NULL != ep; ep = ep->next) {
+			sqlite3_bind_text
+				(s, j++, ep->v, -1, SQLITE_STATIC);
+			sqlite3_bind_int64
+				(s, j++, ep->bits);
+		}
+
+		memset(&htab, 0, sizeof(struct ohash));
+		ohash_init(&htab, 4, &info);
+
+		/*
+		 * Hash each entry on its [unique] document identifier.
+		 * This is a uint64_t.
+		 * Instead of using a hash function, simply convert the
+		 * uint64_t to a uint32_t, the hash value's type.
+		 * This gives good performance and preserves the
+		 * distribution of buckets in the table.
+		 */
+		while (SQLITE_ROW == sqlite3_step(s)) {
+			id = sqlite3_column_int64(s, 0);
+			idx = ohash_lookup_memory
+				(&htab, (char *)&id, 
+				 sizeof(uint64_t), (uint32_t)id);
+
+			if (NULL != ohash_find(&htab, idx))
+				continue;
+
+			mp = mandoc_calloc(1, sizeof(struct match));
+			mp->id = id;
+			mp->file = mandoc_strdup
+				((char *)sqlite3_column_text(s, 3));
+			mp->desc = mandoc_strdup
+				((char *)sqlite3_column_text(s, 4));
+			mp->form = sqlite3_column_int(s, 5);
+			ohash_insert(&htab, idx, mp);
+		}
+
+		sqlite3_finalize(s);
+		sqlite3_close(db);
+
+		for (mp = ohash_first(&htab, &idx);
+				NULL != mp;
+				mp = ohash_next(&htab, &idx)) {
+			if (cur + 1 > maxres) {
+				maxres += 1024;
+				*res = mandoc_realloc
+					(*res, maxres * sizeof(struct manpage));
+			}
+			strlcpy((*res)[cur].file, 
+				paths->paths[i], MAXPATHLEN);
+			strlcat((*res)[cur].file, "/", MAXPATHLEN);
+			strlcat((*res)[cur].file, mp->file, MAXPATHLEN);
+			(*res)[cur].desc = mp->desc;
+			(*res)[cur].form = mp->form;
+			free(mp->file);
+			free(mp);
+			cur++;
+		}
+		ohash_delete(&htab);
+	}
+out:
+	exprfree(e);
+	if (-1 != fd)
+		close(fd);
+	free(sql);
+	*sz = cur;
+	return(1);
+}
+
+/*
+ * Prepare the search SQL statement.
+ * We search for any of the words specified in our match expression.
+ * We filter the per-doc AND expressions when collecting results.
+ */
+static char *
+sql_statement(const struct expr *e, const char *arch, const char *sec)
+{
+	char		*sql;
+	const char	*glob = "(key GLOB ? AND bits & ?)";
+	const char	*eq = "(key = ? AND bits & ?)";
+	const char	*andarch = "arch = ? AND ";
+	const char	*andsec = "sec = ? AND ";
+	const size_t	 globsz = 27;
+	const size_t	 eqsz = 22;
+	size_t		 sz;
+
+	sql = mandoc_strdup
+		("SELECT docid,bits,key,file,desc,form,sec,arch "
+		 "FROM keys "
+		 "INNER JOIN docs ON docs.id=keys.docid "
+		 "WHERE ");
+	sz = strlen(sql);
+
+	if (NULL != arch) {
+		sz += strlen(andarch) + 1;
+		sql = mandoc_realloc(sql, sz);
+		strlcat(sql, andarch, sz);
+	}
+	if (NULL != sec) {
+		sz += strlen(andsec) + 1;
+		sql = mandoc_realloc(sql, sz);
+		strlcat(sql, andsec, sz);
+	}
+
+	sz += 2;
+	sql = mandoc_realloc(sql, sz);
+	strlcat(sql, "(", sz);
+
+	for ( ; NULL != e; e = e->next) {
+		sz += (e->glob ? globsz : eqsz) + 
+			(NULL == e->next ? 3 : 5);
+		sql = mandoc_realloc(sql, sz);
+		strlcat(sql, e->glob ? glob : eq, sz);
+		strlcat(sql, NULL == e->next ? ");" : " OR ", sz);
+	}
+
+	return(sql);
+}
+
+/*
+ * Compile a set of string tokens into an expression.
+ * Tokens in "argv" are assumed to be individual expression atoms (e.g.,
+ * "(", "foo=bar", etc.).
+ */
+static struct expr *
+exprcomp(int argc, char *argv[])
+{
+	int		 i;
+	struct expr	*first, *next, *cur;
+
+	first = cur = NULL;
+
+	for (i = 0; i < argc; i++) {
+		next = exprterm(argv[i]);
+		if (NULL == next) {
+			exprfree(first);
+			return(NULL);
+		}
+		if (NULL != first) {
+			cur->next = next;
+			cur = next;
+		} else
+			cur = first = next;
+	}
+
+	return(first);
+}
+
+static struct expr *
+exprterm(char *buf)
+{
+	struct expr	*e;
+	char		*key, *v;
+	size_t		 i;
+
+	if ('\0' == *buf)
+		return(NULL);
+
+	e = mandoc_calloc(1, sizeof(struct expr));
+
+	/*
+	 * If no =~ is specified, search with equality over names and
+	 * descriptions.
+	 * If =~ begins the phrase, use name and description fields.
+	 */
+
+	if (NULL == (v = strpbrk(buf, "=~"))) {
+		e->v = buf;
+		e->bits = TYPE_Nm | TYPE_Nd;
+		return(e);
+	} else if (v == buf)
+		e->bits = TYPE_Nm | TYPE_Nd;
+
+	e->glob = '~' == *v;
+	*v++ = '\0';
+	e->v = v;
+
+	/*
+	 * Parse out all possible fields.
+	 * If the field doesn't resolve, bail.
+	 */
+
+	while (NULL != (key = strsep(&buf, ","))) {
+		if ('\0' == *key)
+			continue;
+		i = 0;
+		while (types[i].bits && 
+			strcasecmp(types[i].name, key))
+			i++;
+		if (0 == types[i].bits) {
+			free(e);
+			return(NULL);
+		}
+		e->bits |= types[i].bits;
+	}
+
+	return(e);
+}
+
+static void
+exprfree(struct expr *p)
+{
+	struct expr	*pp;
+
+	while (NULL != p) {
+		pp = p->next;
+		free(p);
+		p = pp;
+	}
+}
+
+static void *
+hash_halloc(size_t sz, void *arg)
+{
+
+	return(mandoc_calloc(sz, 1));
+}
+
+static void *
+hash_alloc(size_t sz, void *arg)
+{
+
+	return(mandoc_malloc(sz));
+}
+
+static void
+hash_free(void *p, size_t sz, void *arg)
+{
+
+	free(p);
+}
Index: mansearch.h
===================================================================
RCS file: mansearch.h
diff -N mansearch.h
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ mansearch.h	7 Jun 2012 15:09:34 -0000
@@ -0,0 +1,38 @@
+/*	$Id: manpath.h,v 1.5 2011/12/13 20:56:46 kristaps Exp $ */
+/*
+ * Copyright (c) 2012 Kristaps Dzonsons <kristaps@bsd.lv>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+#ifndef MANSEARCH_H
+#define MANSEARCH_H
+
+struct	manpage {
+	char		 file[MAXPATHLEN]; /* prefixed by manpath */
+	char		*desc; /* description of manpage */
+	int		 form; /* 0 == catpage */
+};
+
+__BEGIN_DECLS
+
+int	mansearch(const struct manpaths *paths, /* manpaths */
+		const char *arch, /* architecture */
+		const char *sec,  /* manual section */
+		int argc, /* size of argv */
+		char *argv[],  /* search terms */
+		struct manpage **res, /* results */
+		size_t *ressz); /* results returned */
+
+__END_DECLS
+
+#endif /*!MANSEARCH_H*/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-06-08 14:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-07 15:15 mandocdb tools, sqlite3, and ohash Kristaps Dzonsons
2012-06-07 16:29 ` Joerg Sonnenberger
2012-06-07 18:06 ` Ingo Schwarze
2012-06-08 10:29   ` Kristaps Dzonsons
2012-06-08 12:25     ` Joerg Sonnenberger
2012-06-08 13:38       ` Kristaps Dzonsons
2012-06-08 14:13         ` Joerg Sonnenberger
     [not found] ` <20120607165106.GA8819@lain.home>
2012-06-08 12:05   ` Kristaps Dzonsons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).