From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-1.sys.kth.se (smtp-1.sys.kth.se [130.237.32.175]) by krisdoz.my.domain (8.14.5/8.14.5) with ESMTP id q34IR4Kj004242 for ; Wed, 4 Apr 2012 14:27:05 -0400 (EDT) Received: from mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) by smtp-1.sys.kth.se (Postfix) with ESMTP id E00F0157F3F for ; Wed, 4 Apr 2012 20:26:58 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-1.sys.kth.se ([130.237.32.175]) by mailscan-1.sys.kth.se (mailscan-1.sys.kth.se [130.237.32.91]) (amavisd-new, port 10024) with LMTP id 5GR17TFgRgOy for ; Wed, 4 Apr 2012 20:26:56 +0200 (CEST) X-KTH-Auth: kristaps [85.224.187.222] X-KTH-mail-from: kristaps@bsd.lv X-KTH-rcpt-to: discuss@mdocml.bsd.lv Received: from macky.lan (c-debbe055.641-1-64736c20.cust.bredbandsbolaget.se [85.224.187.222]) by smtp-1.sys.kth.se (Postfix) with ESMTP id BDB04157F3A for ; Wed, 4 Apr 2012 20:26:54 +0200 (CEST) Message-ID: <4F7C926E.9050302@bsd.lv> Date: Wed, 04 Apr 2012 20:26:54 +0200 From: Kristaps Dzonsons User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 X-Mailinglist: mdocml-discuss Reply-To: discuss@mdocml.bsd.lv MIME-Version: 1.0 To: discuss@mdocml.bsd.lv Subject: mandocdb(8) full re-write Content-Type: multipart/mixed; boundary="------------080602040207030500020401" This is a multi-part message in MIME format. --------------080602040207030500020401 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, During AsiaBSDCon, I had the opportunity to take a more serious look at mandocdb(8). As the code was rather complex, I opted to start over rather than whittling down. The results are in the enclosed file, summarised as follows: (0) Overall code cleanliness. mandocdb.c gained a lot of features real fast. This re-write let me integrate those systematically. (1) Aggressive hashing of strings. All strings -- filename components, file suffixes, parsed words, and so on -- are hashed (using uthash). Parsed manpage terms overlay the string hash, so after a few files, there are very few allocations at all. This brings us a huge performance improvement: a lot of the last version, when profiled with valgrind, was spent allocating and twiddling with strings. (2) Use of fts(3) instead of ad hoc file walking. This makes the code much cleaner and neater. This also improved performance because examining the file path is much easier by looking at the hierarchy level. Again, less string twiddling. (3) De-duping/winnowing at the file-scan phase. I de-duplicate files by hashing inode/device and tossing dupes. I also throw out non-conforming suffixes (if !use_all) early on, making the end list of files to parse much smaller. I'm much more picky about what's considered "mandoc source" in this version because mandoc(1) lets pretty much anything be parsed, defaulting to -man, which lead to lots of noise. Now I require the right suffix or directory parts before using mandoc(3). (4) Using SQLite instead of Berkeley DB. Ok, this is the most controversial. After talking with some OpenBSD and NetBSD folks, nobody could find anything against using SQLite. NetBSD already has it in base, and apparently OpenBSD is moving in the same direction. Not to worry: it's really easy to plug in another database: the database functions (open/close/index/prune) completely contain the database routines. Open/close are run for each manpath, index is run for each page, and prune for each page's removal. Check out that DELETE CASCADE. So easy! (5) Input encoding cleanup. The last mandocdb was a little fuzzy on encodings. This time around, I store UTF-8 encoded strings directory. Due to the hashing method, I only compute the UTF-8 string (which isn't all that expensive) once during the full parse lifetime! This also makes apropos_db's job MUCH easier. I cherry-picked schwarze@'s fine work with the last mandocdb.c to retain its behaviour regarding path sanitising. There might be some omissions, but I think I have them all. Some behaviour changes and possibilities: (1) I'll likely kick out searching by regexp in favour of globbing, which is better handled natively in SQLite, but we'll see---it's just a matter of search performance (SQLite supports regexp with matches, but it's not optimal). (2) Obviously, we now only have one database file with two tables. mandocdb(8) writes into a temporary file then rename(2)s into the real one (unless with -u or -d). This is much neater and more readable. (3) Language and encoding. I'd like to smartify the directory parse to recognise a language (e.g., ru/man1/amd64) alongside the rest. This way, folks can use apropos to search for native-language manuals using the UTF-8 methods. (4) Full text search. This will only be a few lines of code as the heavy lifting of word hashing is all in place. I spoke with Jorg and Abhinav (NetBSD GSoC folks) about having a "natural-language" CGI in mdocml.bsd.lv. I think it'd be awesome and a good pre-filter for, say, retarded misc@ questions ("how do I configure my bridge?"). Before committing anything, I'll transcribe apropos_db.c as well, then use it for a while "in production". My plan is to make an OpenBSD package out of mdocml's "apropos tools" that install alternatives to the regular apropos and friends. This way I can have fun and find bugs without displacing the prior tools. Thoughts? Kristaps --------------080602040207030500020401 Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0"; name="mandocdb.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mandocdb.c" /* $Id: mandocdb.c,v 1.46 2012/03/23 06:52:17 kristaps Exp $ */ /* * Copyright (c) 2011, 2012 Kristaps Dzonsons * Copyright (c) 2011 Ingo Schwarze * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #ifdef HAVE_CONFIG_H #include "config.h" #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "mdoc.h" #include "man.h" #include "mandoc.h" #include "mandocdb.h" #include "manpath.h" /* Post a warning to stderr. */ #define WARNING(_f, _b, _fmt, _args...) \ do if (warnings) { \ fprintf(stderr, "%s: ", (_b)); \ fprintf(stderr, (_fmt), ##_args); \ if ('\0' != *(_f)) \ fprintf(stderr, ": %s", (_f)); \ fprintf(stderr, "\n"); \ } while (/* CONSTCOND */ 0) /* Post a "verbose" message to stderr. */ #define DEBUG(_f, _b, _fmt, _args...) \ do if (verb) { \ fprintf(stderr, "%s: ", (_b)); \ fprintf(stderr, (_fmt), ##_args); \ fprintf(stderr, ": %s\n", (_f)); \ } while (/* CONSTCOND */ 0) enum op { OP_DEFAULT = 0, /* new dbs from dir list or default config */ OP_CONFFILE, /* new databases from custom config file */ OP_UPDATE, /* delete/add entries in existing database */ OP_DELETE, /* delete entries from existing database */ OP_TEST /* change no databases, report potential problems */ }; enum form { FORM_SRC, /* format is -man or -mdoc */ FORM_CAT, /* format is cat */ FORM_NONE /* format is unknown */ }; struct str { char *key; /* the string itself */ char *utf8; /* key in UTF-8 form */ const struct of *of; /* if set, the owning parse */ struct str *next; /* next in owning parse sequence */ uint64_t mask; /* bitmask in sequence */ UT_hash_handle hash_string; /* string hash */ }; struct id { ino_t ino; /* inode of file */ dev_t dev; /* device of file */ }; struct of { struct id id; /* unique identifier */ struct of *next; /* next in ofs */ enum form dform; /* path-cued form */ enum form sform; /* suffix-cued form */ const char *file; /* filename rel. to manpath */ const char *desc; /* parsed description */ const char *sec; /* suffix-cued section (or empty) */ const char *dsec; /* path-cued section (or empty) */ const char *arch; /* path-cued arch. (or empty) */ const char *name; /* name (from filename) (not empty) */ UT_hash_handle hash_ino; /* inode hash */ UT_hash_handle hash_filename; /* filename hash */ }; enum stmt { STMT_DELETE = 0, /* delete manpage */ STMT_INSERT_DOC, /* insert manpage */ STMT_INSERT_KEY, /* insert parsed key */ STMT__MAX }; typedef int (*mdoc_fp)(struct of *, const struct mdoc_node *); struct mdoc_handler { mdoc_fp fp; /* optional handler */ uint64_t mask; /* set unless handler returns 0 */ int flags; /* for use by pmdoc_node */ #define MDOCF_CHILD 0x01 /* automatically index child nodes */ }; static void dbclose(const char *, int); static void dbindex(struct mchars *, const struct of *, const char *); static int dbopen(const char *, int); static void dbprune(const char *); static int dirscan(size_t, char *[], const char *); static int dirtreescan(const char *); static const char *filecheck(const char *); static void filescan(const char *, const char *); static int inocheck(const struct stat *); static void ofadd(const char *, int, const char *, const char *, const char *, const char *, const char *, const struct stat *st); static void offree(void); static int ofmerge(struct mchars *, struct mparse *, const char *); static void parse_catpage(struct of *, const char *); static int parse_man(struct of *, const struct man_node *); static void parse_mdoc(struct of *, const struct mdoc_node *); static int parse_mdoc_body(struct of *, const struct mdoc_node *); static int parse_mdoc_head(struct of *, const struct mdoc_node *); static int parse_mdoc_Fd(struct of *, const struct mdoc_node *); static int parse_mdoc_Fn(struct of *, const struct mdoc_node *); static int parse_mdoc_In(struct of *, const struct mdoc_node *); static int parse_mdoc_Nd(struct of *, const struct mdoc_node *); static int parse_mdoc_Nm(struct of *, const struct mdoc_node *); static int parse_mdoc_Sh(struct of *, const struct mdoc_node *); static int parse_mdoc_St(struct of *, const struct mdoc_node *); static int parse_mdoc_Xr(struct of *, const struct mdoc_node *); static void putkey(const struct of *, const char *, uint64_t); static void putkeys(const struct of *, const char *, int, uint64_t); static void putmdockey(const struct of *, const struct mdoc_node *, uint64_t); static char *stradd(const char *); static char *straddbuf(const char *, size_t); static size_t utf8(unsigned int, char [7]); static void utf8key(struct mchars *, struct str *); static void wordadd(const struct of *, const char *, uint64_t); static void wordaddbuf(const struct of *, const char *, size_t, uint64_t); static char *progname; static int use_all; /* use all found files */ static int nodb; /* no database changes */ static int verb; /* print what we're doing */ static int warnings; /* warn about crap */ static enum op op; /* operational mode */ static struct of *ofs = NULL; /* vector of files to parse */ static struct of *inos = NULL; /* table of inodes in path */ static struct of *filenames = NULL; /* table of filenames */ static struct str *strings = NULL; /* table of all strings */ static struct str *words = NULL; /* list of words in parse */ static sqlite3 *db = NULL; /* the current database */ static sqlite3_stmt *stmts[STMT__MAX]; /* current statements */ static const struct mdoc_handler mdocs[MDOC_MAX] = { { NULL, 0, 0 }, /* Ap */ { NULL, 0, 0 }, /* Dd */ { NULL, 0, 0 }, /* Dt */ { NULL, 0, 0 }, /* Os */ { parse_mdoc_Sh, TYPE_Sh, MDOCF_CHILD }, /* Sh */ { parse_mdoc_head, TYPE_Ss, MDOCF_CHILD }, /* Ss */ { NULL, 0, 0 }, /* Pp */ { NULL, 0, 0 }, /* D1 */ { NULL, 0, 0 }, /* Dl */ { NULL, 0, 0 }, /* Bd */ { NULL, 0, 0 }, /* Ed */ { NULL, 0, 0 }, /* Bl */ { NULL, 0, 0 }, /* El */ { NULL, 0, 0 }, /* It */ { NULL, 0, 0 }, /* Ad */ { NULL, TYPE_An, MDOCF_CHILD }, /* An */ { NULL, TYPE_Ar, MDOCF_CHILD }, /* Ar */ { NULL, TYPE_Cd, MDOCF_CHILD }, /* Cd */ { NULL, TYPE_Cm, MDOCF_CHILD }, /* Cm */ { NULL, TYPE_Dv, MDOCF_CHILD }, /* Dv */ { NULL, TYPE_Er, MDOCF_CHILD }, /* Er */ { NULL, TYPE_Ev, MDOCF_CHILD }, /* Ev */ { NULL, 0, 0 }, /* Ex */ { NULL, TYPE_Fa, MDOCF_CHILD }, /* Fa */ { parse_mdoc_Fd, TYPE_In, 0 }, /* Fd */ { NULL, TYPE_Fl, MDOCF_CHILD }, /* Fl */ { parse_mdoc_Fn, 0, 0 }, /* Fn */ { NULL, TYPE_Ft, MDOCF_CHILD }, /* Ft */ { NULL, TYPE_Ic, MDOCF_CHILD }, /* Ic */ { parse_mdoc_In, TYPE_In, MDOCF_CHILD }, /* In */ { NULL, TYPE_Li, MDOCF_CHILD }, /* Li */ { parse_mdoc_Nd, TYPE_Nd, MDOCF_CHILD }, /* Nd */ { parse_mdoc_Nm, TYPE_Nm, MDOCF_CHILD }, /* Nm */ { NULL, 0, 0 }, /* Op */ { NULL, 0, 0 }, /* Ot */ { NULL, TYPE_Pa, MDOCF_CHILD }, /* Pa */ { NULL, 0, 0 }, /* Rv */ { parse_mdoc_St, TYPE_St, 0 }, /* St */ { NULL, TYPE_Va, MDOCF_CHILD }, /* Va */ { parse_mdoc_body, TYPE_Va, MDOCF_CHILD }, /* Vt */ { parse_mdoc_Xr, TYPE_Xr, 0 }, /* Xr */ { NULL, 0, 0 }, /* %A */ { NULL, 0, 0 }, /* %B */ { NULL, 0, 0 }, /* %D */ { NULL, 0, 0 }, /* %I */ { NULL, 0, 0 }, /* %J */ { NULL, 0, 0 }, /* %N */ { NULL, 0, 0 }, /* %O */ { NULL, 0, 0 }, /* %P */ { NULL, 0, 0 }, /* %R */ { NULL, 0, 0 }, /* %T */ { NULL, 0, 0 }, /* %V */ { NULL, 0, 0 }, /* Ac */ { NULL, 0, 0 }, /* Ao */ { NULL, 0, 0 }, /* Aq */ { NULL, TYPE_At, MDOCF_CHILD }, /* At */ { NULL, 0, 0 }, /* Bc */ { NULL, 0, 0 }, /* Bf */ { NULL, 0, 0 }, /* Bo */ { NULL, 0, 0 }, /* Bq */ { NULL, TYPE_Bsx, MDOCF_CHILD }, /* Bsx */ { NULL, TYPE_Bx, MDOCF_CHILD }, /* Bx */ { NULL, 0, 0 }, /* Db */ { NULL, 0, 0 }, /* Dc */ { NULL, 0, 0 }, /* Do */ { NULL, 0, 0 }, /* Dq */ { NULL, 0, 0 }, /* Ec */ { NULL, 0, 0 }, /* Ef */ { NULL, TYPE_Em, MDOCF_CHILD }, /* Em */ { NULL, 0, 0 }, /* Eo */ { NULL, TYPE_Fx, MDOCF_CHILD }, /* Fx */ { NULL, TYPE_Ms, MDOCF_CHILD }, /* Ms */ { NULL, 0, 0 }, /* No */ { NULL, 0, 0 }, /* Ns */ { NULL, TYPE_Nx, MDOCF_CHILD }, /* Nx */ { NULL, TYPE_Ox, MDOCF_CHILD }, /* Ox */ { NULL, 0, 0 }, /* Pc */ { NULL, 0, 0 }, /* Pf */ { NULL, 0, 0 }, /* Po */ { NULL, 0, 0 }, /* Pq */ { NULL, 0, 0 }, /* Qc */ { NULL, 0, 0 }, /* Ql */ { NULL, 0, 0 }, /* Qo */ { NULL, 0, 0 }, /* Qq */ { NULL, 0, 0 }, /* Re */ { NULL, 0, 0 }, /* Rs */ { NULL, 0, 0 }, /* Sc */ { NULL, 0, 0 }, /* So */ { NULL, 0, 0 }, /* Sq */ { NULL, 0, 0 }, /* Sm */ { NULL, 0, 0 }, /* Sx */ { NULL, TYPE_Sy, MDOCF_CHILD }, /* Sy */ { NULL, TYPE_Tn, MDOCF_CHILD }, /* Tn */ { NULL, 0, 0 }, /* Ux */ { NULL, 0, 0 }, /* Xc */ { NULL, 0, 0 }, /* Xo */ { parse_mdoc_head, TYPE_Fn, 0 }, /* Fo */ { NULL, 0, 0 }, /* Fc */ { NULL, 0, 0 }, /* Oo */ { NULL, 0, 0 }, /* Oc */ { NULL, 0, 0 }, /* Bk */ { NULL, 0, 0 }, /* Ek */ { NULL, 0, 0 }, /* Bt */ { NULL, 0, 0 }, /* Hf */ { NULL, 0, 0 }, /* Fr */ { NULL, 0, 0 }, /* Ud */ { NULL, TYPE_Lb, MDOCF_CHILD }, /* Lb */ { NULL, 0, 0 }, /* Lp */ { NULL, TYPE_Lk, MDOCF_CHILD }, /* Lk */ { NULL, TYPE_Mt, MDOCF_CHILD }, /* Mt */ { NULL, 0, 0 }, /* Brq */ { NULL, 0, 0 }, /* Bro */ { NULL, 0, 0 }, /* Brc */ { NULL, 0, 0 }, /* %C */ { NULL, 0, 0 }, /* Es */ { NULL, 0, 0 }, /* En */ { NULL, TYPE_Dx, MDOCF_CHILD }, /* Dx */ { NULL, 0, 0 }, /* %Q */ { NULL, 0, 0 }, /* br */ { NULL, 0, 0 }, /* sp */ { NULL, 0, 0 }, /* %U */ { NULL, 0, 0 }, /* Ta */ }; int main(int argc, char *argv[]) { int ch, rc, i; const char *dir; struct str *keyp, *keypp; struct mchars *mc; struct manpaths dirs; struct mparse *mp; memset(stmts, 0, STMT__MAX * sizeof(sqlite3_stmt *)); memset(&dirs, 0, sizeof(struct manpaths)); progname = strrchr(argv[0], '/'); if (progname == NULL) progname = argv[0]; else ++progname; #define CHECKOP(_op, _ch) do \ if (OP_DEFAULT != (_op)) { \ fprintf(stderr, "-%c: Conflicting option\n", (_ch)); \ goto usage; \ } while (/*CONSTCOND*/0) dir = NULL; op = OP_DEFAULT; while (-1 != (ch = getopt(argc, argv, "aC:d:ntu:vW"))) switch (ch) { case ('a'): use_all = 1; break; case ('C'): CHECKOP(op, ch); dir = optarg; op = OP_CONFFILE; break; case ('d'): CHECKOP(op, ch); dir = optarg; op = OP_UPDATE; break; case ('n'): nodb = 1; break; case ('t'): CHECKOP(op, ch); dup2(STDOUT_FILENO, STDERR_FILENO); op = OP_TEST; nodb = use_all = warnings = 1; dir = "."; break; case ('u'): CHECKOP(op, ch); dir = optarg; op = OP_DELETE; break; case ('v'): verb++; break; case ('W'): warnings = 1; break; default: goto usage; } argc -= optind; argv += optind; if (OP_CONFFILE == op && argc > 0) { fprintf(stderr, "-C: Too many arguments\n"); goto usage; } rc = 1; mp = mparse_alloc(MPARSE_AUTO, MANDOCLEVEL_FATAL, NULL, NULL); mc = mchars_alloc(); if (OP_UPDATE == op || OP_DELETE == op || OP_TEST == op) { /* * All of these deal with a specific directory. * Jump into that directory then collect files specified * on the command-line. */ if (0 == (rc = dirscan(argc, argv, dir))) goto out; if (0 == (rc = dbopen(dir, 1))) goto out; if (OP_TEST != op) dbprune(dir); if (OP_DELETE != op) rc = ofmerge(mc, mp, dir); else dbclose(dir, 1); } else { /* * If we have arguments, use them as our manpaths. * If we don't, grok from manpath(1) or however else * manpath_parse() wants to do it. */ if (argc > 0) { dirs.paths = mandoc_calloc (argc, sizeof(char *)); dirs.sz = argc; for (i = 0; i < argc; i++) dirs.paths[i] = mandoc_strdup(argv[i]); } else manpath_parse(&dirs, dir, NULL, NULL); /* * First scan the tree rooted at a base directory. * Then whak its database (if one exists), parse, and * build up the database. */ for (i = 0; i < dirs.sz; i++) { if (0 == (rc = dirtreescan(dirs.paths[i]))) goto out; remove(MANDOC_DB); if (0 == (rc = ofmerge(mc, mp, dirs.paths[i]))) goto out; HASH_CLEAR(hash_ino, inos); HASH_CLEAR(hash_filename, filenames); offree(); } } out: manpath_free(&dirs); mchars_free(mc); mparse_free(mp); HASH_ITER(hash_string, strings, keyp, keypp) { HASH_DELETE(hash_string, strings, keyp); if (keyp->key != keyp->utf8) free(keyp->utf8); free(keyp->key); free(keyp); } HASH_CLEAR(hash_string, strings); HASH_CLEAR(hash_ino, inos); HASH_CLEAR(hash_filename, filenames); offree(); return(rc ? EXIT_SUCCESS : EXIT_FAILURE); usage: fprintf(stderr, "usage: %s [-anvW] [-C file]\n" " %s [-anvW] dir ...\n" " %s [-nvW] -d dir [file ...]\n" " %s [-nvW] -u dir [file ...]\n" " %s -t file ...\n", progname, progname, progname, progname, progname); return(EXIT_FAILURE); } /* * Scan a directory tree rooted at "base" for manpages. * We use fts(), scanning directory parts along the way for clues to our * section and architecture. * * If use_all has been specified, grok all files. * If not, sanitise paths to the following: * * [./]man*[/]/.
* or * [./]cat
[/]/.0 */ static int dirtreescan(const char *base) { FTS *f; FTSENT *ff; int fd, dform; size_t sz; char *sec; const char *file, *dsec, *arch, *cp, *name; char cwd[MAXPATHLEN]; const char *argv[2]; /* * Remember where we started by keeping a fd open to the origin * path component. * This is because we chdir() to relative paths, so we can't * just re-chdir() into the cwd if it's also relative. */ if (NULL == getcwd(cwd, MAXPATHLEN)) { perror(NULL); return(0); } else if (-1 == (fd = open(cwd, O_RDONLY, 0))) { perror(cwd); return(0); } /* Sanitise the base directory. */ if (0 == strncmp(base, "./", 2)) base += 2; sz = strlen(base) + 1; if ('/' == base[sz - 1]) sz++; argv[0] = base; argv[1] = (char *)NULL; /* * Walk through all components under the directory, using the * logical descent of files. */ f = fts_open((char * const *)argv, FTS_LOGICAL, NULL); if (NULL == f) { perror(base); close(fd); return(0); } dsec = arch = NULL; dform = FORM_NONE; while (NULL != (ff = fts_read(f))) { /* * If we're a regular file, add an "of" by using the * stored directory data and handling the filename. * Disallow duplicate (hard-linked) files. */ if (FTS_F == ff->fts_info) { if ( ! use_all && ff->fts_level < 2) { WARNING(ff->fts_path + sz, base, "Extraneous file"); continue; } else if (inocheck(ff->fts_statp)) { WARNING(ff->fts_path + sz, base, "Duplicate file"); continue; } cp = ff->fts_name; if (NULL != (cp = strrchr(cp, '.'))) { if (0 == strcmp(cp + 1, "html")) { WARNING(ff->fts_path + sz, base, "Skipping html"); continue; } else if (0 == strcmp(cp + 1, "gz")) { WARNING(ff->fts_path + sz, base, "Skipping gz"); continue; } else if (0 == strcmp(cp + 1, "ps")) { WARNING(ff->fts_path + sz, base, "Skipping ps"); continue; } else if (0 == strcmp(cp + 1, "pdf")) { WARNING(ff->fts_path + sz, base, "Skipping pdf"); continue; } } file = stradd(ff->fts_path + sz); name = stradd(ff->fts_name); if (NULL != (sec = strrchr(name, '.'))) *sec++ = '\0'; ofadd(base, dform, file, name, dsec, sec, arch, ff->fts_statp); continue; } else if (FTS_D != ff->fts_info && FTS_DP != ff->fts_info) continue; switch (ff->fts_level) { case (0): /* Ignore the root directory. */ break; case (1): /* * This might contain manX/ or catX/. * Try to infer this from the name. * If we're not in use_all, enforce it. */ dsec = NULL; dform = FORM_NONE; cp = ff->fts_name; if (FTS_DP == ff->fts_info) break; if (0 == strncmp(cp, "man", 3)) { dform = FORM_SRC; dsec = stradd(cp + 3); } else if (0 == strncmp(cp, "cat", 3)) { dform = FORM_CAT; dsec = stradd(cp + 3); } if (NULL != dsec || use_all) break; WARNING(ff->fts_path + sz, base, "Unknown directory part"); fts_set(f, ff, FTS_SKIP); break; case (2): /* * Possibly our architecture. * If we're descending, keep tabs on it. */ arch = NULL; if (FTS_DP != ff->fts_info && NULL != dsec) arch = stradd(ff->fts_name); break; default: if (FTS_DP == ff->fts_info || use_all) break; WARNING(ff->fts_path + sz, base, "Extraneous directory part"); fts_set(f, ff, FTS_SKIP); break; } } fts_close(f); if (errno) { perror(base); close(fd); return(0); } /* * We want to exit in our base directory. * To do so, first return to the original cwd. * Then use chdir() relative to that. */ if (-1 == fchdir(fd)) { perror(cwd); close(fd); return(0); } close(fd); if (-1 == chdir(base)) { perror(base); return(0); } return(1); } static int dirscan(size_t argc, char *argv[], const char *base) { size_t i; if (-1 == chdir(base)) { perror(base); return(0); } for (i = 0; i < argc; i++) filescan(argv[i], base); return(1); } /* * Add a file to the file vector. * Do not verify that it's a "valid" looking manpage. * * Then try to infer the manual section, architecture and page name from * the path, assuming it looks like * * [./]man*[/]/.
* or * [./]cat
[/]/.0 * * Stuff this information directly into the "of" vector. */ static void filescan(const char *file, const char *base) { const char *sec, *arch, *name, *dsec, *filep; char *p, *start, *buf; int dform; struct stat st; assert(use_all); if (0 == strncmp(file, "./", 2)) file += 2; if (-1 == stat(file, &st)) { WARNING(file, base, "%s", strerror(errno)); return; } else if ( ! (S_IFREG & st.st_mode)) { WARNING(file, base, "Not a regular file"); return; } else if (inocheck(&st)) { WARNING(file, base, "Duplicate file"); return; } filep = stradd(file); buf = mandoc_strdup(file); start = buf; sec = arch = name = dsec = NULL; dform = FORM_NONE; /* * First try to guess our directory structure. * If we find a separator, try to look for man* or cat*. * If we find one of these and what's underneath is a directory, * assume it's an architecture. */ if (NULL != (p = strchr(start, '/'))) { *p++ = '\0'; if (0 == strncmp(start, "man", 3)) { dform = FORM_SRC; dsec = start + 3; } else if (0 == strncmp(start, "cat", 3)) { dform = FORM_CAT; dsec = start + 3; } start = p; if (NULL != dsec && NULL != (p = strchr(start, '/'))) { *p++ = '\0'; arch = start; start = p; } } /* * Now check the file suffix. * Suffix of `.0' indicates a catpage, `.1-9' is a manpage. */ p = strrchr(start, '\0'); while (p-- > start && '/' != *p && '.' != *p) /* Loop. */ ; if ('.' == *p) { *p++ = '\0'; sec = p; } /* * Now try to parse the name. * Use the filename portion of the path. */ name = start; if (NULL != (p = strrchr(start, '/'))) { name = p + 1; *p = '\0'; } ofadd(base, dform, filep, name, dsec, sec, arch, &st); free(buf); } static const char * filecheck(const char *name) { struct of *p; HASH_FIND(hash_filename, filenames, name, strlen(name), p); return(NULL != p ? p->file : NULL); } static int inocheck(const struct stat *st) { struct id id; struct of *p; memset(&id, 0, sizeof(struct id)); id.ino = st->st_ino; id.dev = st->st_dev; HASH_FIND(hash_ino, inos, &id, sizeof(struct id), p); return(NULL != p); } static void ofadd(const char *base, int dform, const char *file, const char *name, const char *dsec, const char *sec, const char *arch, const struct stat *st) { struct of *of; int sform; size_t sz; assert(NULL != file); if (NULL == name) name = ""; if (NULL == sec) sec = ""; if (NULL == dsec) dsec = ""; if (NULL == arch) arch = ""; sform = FORM_NONE; if (NULL != sec && *sec <= '9' && *sec >= '1') sform = FORM_SRC; else if (NULL != sec && *sec == '0') sform = FORM_CAT; /* XXX: structure warnings go here */ of = mandoc_calloc(1, sizeof(struct of)); of->file = file; of->name = name; of->sec = sec; of->dsec = dsec; of->arch = arch; of->sform = sform; of->dform = dform; of->id.ino = st->st_ino; of->id.dev = st->st_dev; of->next = ofs; sz = strlen(of->sec) + 1; ofs = of; /* * Add to unique identifier hash. * Then if it's a source manual and we're going to use source in * favour of catpages, add it to that hash. */ HASH_ADD(hash_ino, inos, id, sizeof(struct id), of); HASH_ADD_KEYPTR(hash_filename, filenames, file, strlen(file) - sz, of); } static void offree(void) { struct of *of; while (NULL != (of = ofs)) { ofs = of->next; free(of); } } static int ofmerge(struct mchars *mc, struct mparse *mp, const char *base) { int form; size_t sz; struct mdoc *mdoc; struct man *man; char buf[MAXPATHLEN]; char *bufp; const char *msec, *march, *mtitle, *cp; struct of *of; enum mandoclevel lvl; if (0 == dbopen(base, 0)) return(0); for (of = ofs; NULL != of; of = of->next) { /* * If we're a catpage (as defined by our path), then see * if a manpage exists by the same name (ignoring the * suffix). * If it does, then we want to use it instead of our * own. */ if ( ! use_all && FORM_CAT == of->dform) { sz = strlcpy(buf, of->file, MAXPATHLEN); if (sz >= MAXPATHLEN) { WARNING(of->file, base, "Filename too long"); continue; } bufp = strstr(buf, "cat"); assert(NULL != bufp); memcpy(bufp, "man", 3); if (NULL != (bufp = strrchr(buf, '.'))) *bufp = '\0'; if (NULL != (cp = filecheck(buf))) { WARNING(of->file, base, "Man " "source exists: %s", cp); continue; } } words = NULL; mparse_reset(mp); mdoc = NULL; man = NULL; form = 0; msec = of->dsec; march = of->arch; mtitle = of->name; /* * Try interpreting the file as mdoc(7) or man(7) * source code, unless it is already known to be * formatted. Fall back to formatted mode. */ if (FORM_SRC == of->dform || FORM_SRC == of->sform) { lvl = mparse_readfd(mp, -1, of->file); if (lvl < MANDOCLEVEL_FATAL) mparse_result(mp, &mdoc, &man); } if (NULL != mdoc) { form = 1; msec = mdoc_meta(mdoc)->msec; march = mdoc_meta(mdoc)->arch; mtitle = mdoc_meta(mdoc)->title; } else if (NULL != man) { form = 1; msec = man_meta(man)->msec; march = ""; mtitle = man_meta(man)->title; } if (NULL == msec) msec = ""; if (NULL == march) march = ""; if (NULL == mtitle) mtitle = ""; /* * Check whether the manual section given in a file * agrees with the directory where the file is located. * Some manuals have suffixes like (3p) on their * section number either inside the file or in the * directory name, some are linked into more than one * section, like encrypt(1) = makekey(8). Do not skip * manuals for such reasons. */ if (form && strcasecmp(msec, of->dsec)) WARNING(of->file, base, "Section %s " "manual in %s directory", msec, of->dsec); /* * Manual page directories exist for each kernel * architecture as returned by machine(1). * However, many manuals only depend on the * application architecture as returned by arch(1). * For example, some (2/ARM) manuals are shared * across the "armish" and "zaurus" kernel * architectures. * A few manuals are even shared across completely * different architectures, for example fdformat(1) * on amd64, i386, sparc, and sparc64. * Thus, warn about architecture mismatches, * but don't skip manuals for this reason. */ if (strcasecmp(march, of->arch)) WARNING(of->file, base, "Architecture %s " "manual in %s directory", march, of->arch); putkey(of, of->name, TYPE_Nm); if (NULL != mdoc) { if (NULL != (cp = mdoc_meta(mdoc)->name)) putkey(of, cp, TYPE_Nm); parse_mdoc(of, mdoc_node(mdoc)); } else if (NULL != man) parse_man(of, man_node(man)); else parse_catpage(of, base); dbindex(mc, of, base); } dbclose(base, 0); return(1); } static void parse_catpage(struct of *of, const char *base) { FILE *stream; char *line, *p, *title; size_t len, plen, titlesz; if (NULL == (stream = fopen(of->file, "r"))) { WARNING(of->file, base, "%s", strerror(errno)); return; } /* Skip to first blank line. */ while (NULL != (line = fgetln(stream, &len))) if ('\n' == *line) break; /* * Assume the first line that is not indented * is the first section header. Skip to it. */ while (NULL != (line = fgetln(stream, &len))) if ('\n' != *line && ' ' != *line) break; /* * Read up until the next section into a buffer. * Strip the leading and trailing newline from each read line, * appending a trailing space. * Ignore empty (whitespace-only) lines. */ titlesz = 0; title = NULL; while (NULL != (line = fgetln(stream, &len))) { if (' ' != *line || '\n' != line[len - 1]) break; while (len > 0 && isspace((unsigned char)*line)) { line++; len--; } if (1 == len) continue; title = mandoc_realloc(title, titlesz + len); memcpy(title + titlesz, line, len); titlesz += len; title[titlesz - 1] = ' '; } /* * If no page content can be found, or the input line * is already the next section header, or there is no * trailing newline, reuse the page title as the page * description. */ if (NULL == title || '\0' == *title) { WARNING(of->file, base, "Cannot find NAME section"); fclose(stream); free(title); return; } title = mandoc_realloc(title, titlesz + 1); title[titlesz] = '\0'; /* * Skip to the first dash. * Use the remaining line as the description (no more than 70 * bytes). */ if (NULL != (p = strstr(title, "- "))) { for (p += 2; ' ' == *p || '\b' == *p; p++) /* Skip to next word. */ ; } else { WARNING(of->file, base, "No dash in title line"); p = title; } plen = strlen(p); /* Strip backspace-encoding from line. */ while (NULL != (line = memchr(p, '\b', plen))) { len = line - p; if (0 == len) { memmove(line, line + 1, plen--); continue; } memmove(line - 1, line + 1, plen - len); plen -= 2; } of->desc = stradd(p); putkey(of, p, TYPE_Nd); fclose(stream); free(title); } static void putkey(const struct of *of, const char *value, uint64_t type) { wordadd(of, value, type); } static void putkeys(const struct of *of, const char *value, int sz, uint64_t type) { wordaddbuf(of, value, sz, type); } static void putmdockey(const struct of *of, const struct mdoc_node *n, uint64_t m) { for ( ; NULL != n; n = n->next) { if (n->child) putmdockey(of, n->child, m); if (MDOC_TEXT == n->type) putkey(of, n->string, m); } } static int parse_man(struct of *of, const struct man_node *n) { const struct man_node *head, *body; char *start, *sv, *title; char byte; size_t sz, titlesz; if (NULL == n) return(0); /* * We're only searching for one thing: the first text child in * the BODY of a NAME section. Since we don't keep track of * sections in -man, run some hoops to find out whether we're in * the correct section or not. */ if (MAN_BODY == n->type && MAN_SH == n->tok) { body = n; assert(body->parent); if (NULL != (head = body->parent->head) && 1 == head->nchild && NULL != (head = (head->child)) && MAN_TEXT == head->type && 0 == strcmp(head->string, "NAME") && NULL != (body = body->child) && MAN_TEXT == body->type) { title = NULL; titlesz = 0; /* * Suck the entire NAME section into memory. * Yes, we might run away. * But too many manuals have big, spread-out * NAME sections over many lines. */ for ( ; NULL != body; body = body->next) { if (MAN_TEXT != body->type) break; if (0 == (sz = strlen(body->string))) continue; title = mandoc_realloc (title, titlesz + sz + 1); memcpy(title + titlesz, body->string, sz); titlesz += sz + 1; title[titlesz - 1] = ' '; } if (NULL == title) return(1); title = mandoc_realloc(title, titlesz + 1); title[titlesz] = '\0'; /* Skip leading space. */ sv = title; while (isspace((unsigned char)*sv)) sv++; if (0 == (sz = strlen(sv))) { free(title); return(1); } /* Erase trailing space. */ start = &sv[sz - 1]; while (start > sv && isspace((unsigned char)*start)) *start-- = '\0'; if (start == sv) { free(title); return(1); } start = sv; /* * Go through a special heuristic dance here. * Conventionally, one or more manual names are * comma-specified prior to a whitespace, then a * dash, then a description. Try to puzzle out * the name parts here. */ for ( ;; ) { sz = strcspn(start, " ,"); if ('\0' == start[sz]) break; byte = start[sz]; start[sz] = '\0'; putkey(of, start, TYPE_Nm); if (' ' == byte) { start += sz + 1; break; } assert(',' == byte); start += sz + 1; while (' ' == *start) start++; } if (sv == start) { putkey(of, start, TYPE_Nm); free(title); return(1); } while (isspace((unsigned char)*start)) start++; if (0 == strncmp(start, "-", 1)) start += 1; else if (0 == strncmp(start, "\\-\\-", 4)) start += 4; else if (0 == strncmp(start, "\\-", 2)) start += 2; else if (0 == strncmp(start, "\\(en", 4)) start += 4; else if (0 == strncmp(start, "\\(em", 4)) start += 4; while (' ' == *start) start++; assert(NULL == of->desc); of->desc = stradd(start); putkey(of, start, TYPE_Nd); free(title); return(1); } } for (n = n->child; n; n = n->next) if (parse_man(of, n)) return(1); return(0); } static void parse_mdoc(struct of *of, const struct mdoc_node *n) { for (n = n->child; NULL != n; n = n->next) { switch (n->type) { case (MDOC_ELEM): /* FALLTHROUGH */ case (MDOC_BLOCK): /* FALLTHROUGH */ case (MDOC_HEAD): /* FALLTHROUGH */ case (MDOC_BODY): /* FALLTHROUGH */ case (MDOC_TAIL): if (NULL != mdocs[n->tok].fp) if (0 == (*mdocs[n->tok].fp)(of, n)) break; if (MDOCF_CHILD & mdocs[n->tok].flags) putmdockey(of, n->child, mdocs[n->tok].mask); break; default: assert(MDOC_ROOT != n->type); continue; } if (NULL != n->child) parse_mdoc(of, n); } } static int parse_mdoc_Fd(struct of *of, const struct mdoc_node *n) { const char *start, *end; size_t sz; if (SEC_SYNOPSIS != n->sec || NULL == (n = n->child) || MDOC_TEXT != n->type) return(0); /* * Only consider those `Fd' macro fields that begin with an * "inclusion" token (versus, e.g., #define). */ if (strcmp("#include", n->string)) return(0); if (NULL == (n = n->next) || MDOC_TEXT != n->type) return(0); /* * Strip away the enclosing angle brackets and make sure we're * not zero-length. */ start = n->string; if ('<' == *start || '"' == *start) start++; if (0 == (sz = strlen(start))) return(0); end = &start[(int)sz - 1]; if ('>' == *end || '"' == *end) end--; assert(end >= start); putkeys(of, start, end - start + 1, TYPE_In); return(1); } static int parse_mdoc_In(struct of *of, const struct mdoc_node *n) { if (NULL != n->child && MDOC_TEXT == n->child->type) return(0); putkey(of, n->child->string, TYPE_In); return(1); } static int parse_mdoc_Fn(struct of *of, const struct mdoc_node *n) { const char *cp; if (NULL == (n = n->child) || MDOC_TEXT != n->type) return(0); /* * Parse: .Fn "struct type *name" "char *arg". * First strip away pointer symbol. * Then store the function name, then type. * Finally, store the arguments. */ if (NULL == (cp = strrchr(n->string, ' '))) cp = n->string; while ('*' == *cp) cp++; putkey(of, cp, TYPE_Fn); if (n->string < cp) putkeys(of, n->string, cp - n->string, TYPE_Ft); for (n = n->next; NULL != n; n = n->next) if (MDOC_TEXT == n->type) putkey(of, n->string, TYPE_Fa); return(0); } static int parse_mdoc_St(struct of *of, const struct mdoc_node *n) { if (NULL == n->child || MDOC_TEXT != n->child->type) return(0); putkey(of, n->child->string, TYPE_St); return(1); } static int parse_mdoc_Xr(struct of *of, const struct mdoc_node *n) { if (NULL == (n = n->child)) return(0); putkey(of, n->string, TYPE_Xr); return(1); } static int parse_mdoc_Nd(struct of *of, const struct mdoc_node *n) { size_t sz; char *sv, *desc; if (MDOC_BODY != n->type) return(0); /* * Special-case the `Nd' because we need to put the description * into the document table. */ desc = NULL; for (n = n->child; NULL != n; n = n->next) { if (MDOC_TEXT == n->type) { sz = strlen(n->string) + 1; if (NULL != (sv = desc)) sz += strlen(desc) + 1; desc = mandoc_realloc(desc, sz); if (NULL != sv) strlcat(desc, " ", sz); else *desc = '\0'; strlcat(desc, n->string, sz); } if (NULL != n->child) parse_mdoc_Nd(of, n); } of->desc = NULL != desc ? stradd(desc) : NULL; free(desc); return(1); } static int parse_mdoc_Nm(struct of *of, const struct mdoc_node *n) { if (SEC_NAME == n->sec) return(1); else if (SEC_SYNOPSIS != n->sec || MDOC_HEAD != n->type) return(0); return(1); } static int parse_mdoc_Sh(struct of *of, const struct mdoc_node *n) { return(SEC_CUSTOM == n->sec && MDOC_HEAD == n->type); } static int parse_mdoc_head(struct of *of, const struct mdoc_node *n) { return(MDOC_HEAD == n->type); } static int parse_mdoc_body(struct of *of, const struct mdoc_node *n) { return(MDOC_BODY == n->type); } /* * See straddbuf(). */ static char * stradd(const char *cp) { return(straddbuf(cp, strlen(cp))); } /* * See wordaddbuf(). */ static void wordadd(const struct of *of, const char *cp, uint64_t mask) { if (0 == cp[0]) return; wordaddbuf(of, cp, strlen(cp), mask); } /* * This looks up or adds a string to the string table. * The string table is a table of all strings encountered during parse * or file scan. * In using it, we avoid having thousands of (e.g.) "cat1" string * allocations for the "of" table. * We also have a layer atop the string table for keeping track of words * in a parse sequence (see wordaddbuf()). */ static char * straddbuf(const char *cp, size_t sz) { struct str *s; HASH_FIND(hash_string, strings, cp, sz, s); if (NULL != s) return(s->key); s = mandoc_calloc(1, sizeof(struct str)); s->key = mandoc_malloc(sz + 1); memcpy(s->key, cp, sz); s->key[sz] = '\0'; HASH_ADD_KEYPTR(hash_string, strings, s->key, sz, s); return(s->key); } /* * Add a word to the current parse sequence. * Within the hashtable of strings, we maintain a list of strings that * are currently indexed. * Each of these ("words") has a bitmask modified within the parse. * When we finish a parse, we'll dump the list, then remove the head * entry -- since the next parse will have a new "of", it can keep track * of its entries without conflict. */ static void wordaddbuf(const struct of *of, const char *cp, size_t sz, uint64_t v) { struct str *s; if (0 == sz) return; HASH_FIND(hash_string, strings, cp, sz, s); if (NULL != s && of == s->of) { s->mask |= v; return; } else if (NULL == s) { s = mandoc_calloc(1, sizeof(struct str)); s->key = mandoc_malloc(sz + 1); memcpy(s->key, cp, sz); s->key[sz] = '\0'; HASH_ADD_KEYPTR(hash_string, strings, s->key, sz, s); } s->next = words; s->of = of; s->mask = v; words = s; } /* * Take a Unicode codepoint and produce its UTF-8 encoding. * This isn't the best way to do this, but it works. * The magic numbers are from the UTF-8 packaging. * They're not as scary as they seem: read the UTF-8 spec for details. */ static size_t utf8(unsigned int cp, char out[7]) { size_t rc; rc = 0; if (cp <= 0x0000007F) { rc = 1; out[0] = (char)cp; } else if (cp <= 0x000007FF) { rc = 2; out[0] = (cp >> 6 & 31) | 192; out[1] = (cp & 63) | 128; } else if (cp <= 0x0000FFFF) { rc = 3; out[0] = (cp >> 12 & 15) | 224; out[1] = (cp >> 6 & 63) | 128; out[2] = (cp & 63) | 128; } else if (cp <= 0x001FFFFF) { rc = 4; out[0] = (cp >> 18 & 7) | 240; out[1] = (cp >> 12 & 63) | 128; out[2] = (cp >> 6 & 63) | 128; out[3] = (cp & 63) | 128; } else if (cp <= 0x03FFFFFF) { rc = 5; out[0] = (cp >> 24 & 3) | 248; out[1] = (cp >> 18 & 63) | 128; out[2] = (cp >> 12 & 63) | 128; out[3] = (cp >> 6 & 63) | 128; out[4] = (cp & 63) | 128; } else if (cp <= 0x7FFFFFFF) { rc = 6; out[0] = (cp >> 30 & 1) | 252; out[1] = (cp >> 24 & 63) | 128; out[2] = (cp >> 18 & 63) | 128; out[3] = (cp >> 12 & 63) | 128; out[4] = (cp >> 6 & 63) | 128; out[5] = (cp & 63) | 128; } else return(0); out[rc] = '\0'; return(rc); } /* * Store the UTF-8 version of a key, or alias the pointer if the key has * no UTF-8 transcription marks in it. */ static void utf8key(struct mchars *mc, struct str *key) { size_t sz, bsz, pos; char utfbuf[7]; char *buf; const char *seq, *cpp, *val; int len, u; enum mandoc_esc esc; char res[5]; res[0] = '\\'; res[1] = '\t'; res[2] = ASCII_NBRSP; res[3] = ASCII_HYPH; res[4] = '\0'; val = key->key; bsz = strlen(val); /* * Pre-check: if we have no stop-characters, then set the * pointer as ourselvse and get out of here. */ if (strcspn(val, res) == bsz) { key->utf8 = key->key; return; } /* Pre-allocate by the length of the input */ buf = mandoc_malloc(++bsz); pos = 0; while ('\0' != *val) { /* * Halt on the first escape sequence. * This also halts on the end of string, in which case * we just copy, fallthrough, and exit the loop. */ if ((sz = strcspn(val, res)) > 0) { memcpy(&buf[pos], val, sz); pos += sz; val += sz; } if (ASCII_HYPH == *val) { buf[pos++] = '-'; val++; continue; } else if ('\t' == *val || ASCII_NBRSP == *val) { buf[pos++] = ' '; val++; continue; } else if ('\\' != *val) break; /* Read past the slash. */ val++; u = 0; /* * Parse the escape sequence and see if it's a * predefined character or special character. */ esc = mandoc_escape ((const char **)&val, &seq, &len); if (ESCAPE_ERROR == esc) break; if (ESCAPE_SPECIAL != esc) continue; if (0 == (u = mchars_spec2cp(mc, seq, len))) continue; /* * If we have a Unicode codepoint, try to convert that * to a UTF-8 byte string. */ cpp = utfbuf; if (0 == (sz = utf8(u, utfbuf))) continue; /* Copy the rendered glyph into the stream. */ sz = strlen(cpp); bsz += sz; buf = mandoc_realloc(buf, bsz); memcpy(&buf[pos], cpp, sz); pos += sz; } buf[pos] = '\0'; key->utf8 = buf; } static void dbindex(struct mchars *mc, const struct of *of, const char *base) { struct str *key; int64_t recno; if (nodb) return; sqlite3_bind_text (stmts[STMT_INSERT_DOC], 1, of->file, -1, SQLITE_STATIC); sqlite3_bind_text (stmts[STMT_INSERT_DOC], 2, of->sec, -1, SQLITE_STATIC); sqlite3_bind_text (stmts[STMT_INSERT_DOC], 3, of->arch, -1, SQLITE_STATIC); sqlite3_bind_text (stmts[STMT_INSERT_DOC], 3, NULL != of->desc ? of->desc : "", -1, SQLITE_STATIC); sqlite3_step(stmts[STMT_INSERT_DOC]); DEBUG(of->file, base, "Added to index"); recno = sqlite3_last_insert_rowid(db); sqlite3_reset(stmts[STMT_INSERT_DOC]); for (key = words; NULL != key; key = key->next) { assert(key->of == of); if (NULL == key->utf8) utf8key(mc, key); sqlite3_bind_int64 (stmts[STMT_INSERT_KEY], 1, key->mask); sqlite3_bind_text (stmts[STMT_INSERT_KEY], 2, key->utf8, -1, SQLITE_STATIC); sqlite3_bind_int64 (stmts[STMT_INSERT_KEY], 3, recno); } } static void dbprune(const char *base) { struct of *of; if (nodb) return; for (of = ofs; NULL != of; of = of->next) { sqlite3_bind_text (stmts[STMT_DELETE], 1, of->file, -1, SQLITE_STATIC); sqlite3_step(stmts[STMT_DELETE]); sqlite3_reset(stmts[STMT_DELETE]); DEBUG(of->file, base, "Deleted from index"); } } /* * Close an existing database and its prepared statements. * If "real" is not set, rename the temporary file into the real one. */ static void dbclose(const char *base, int real) { size_t i; char file[MAXPATHLEN]; if (nodb) return; for (i = 0; i < STMT__MAX; i++) { sqlite3_finalize(stmts[i]); stmts[i] = NULL; } sqlite3_close(db); db = NULL; if (real) return; strlcpy(file, MANDOC_DB, MAXPATHLEN); strlcat(file, "~", MAXPATHLEN); if (-1 == rename(file, MANDOC_DB)) perror(MANDOC_DB); } /* * This is straightforward stuff. * Open a database connection to a "temporary" database, then open a set * of prepared statements we'll use over and over again. * If "real" is set, we use the existing database; if not, we truncate a * temporary one. * Must be matched by dbclose(). */ static int dbopen(const char *base, int real) { char file[MAXPATHLEN]; const char *sql; int rc, ofl; size_t sz; if (nodb) return(1); sz = strlcpy(file, MANDOC_DB, MAXPATHLEN); if ( ! real) sz = strlcat(file, "~", MAXPATHLEN); if (sz >= MAXPATHLEN) { fprintf(stderr, "%s: Path too long\n", file); return(0); } if ( ! real) remove(file); ofl = SQLITE_OPEN_PRIVATECACHE | SQLITE_OPEN_READWRITE; rc = sqlite3_open_v2(file, &db, ofl, NULL); if (SQLITE_OK == rc) return(1); if (SQLITE_CANTOPEN != rc) { perror(file); return(0); } sqlite3_close(db); db = NULL; if (SQLITE_OK != (rc = sqlite3_open(file, &db))) { perror(file); return(0); } sql = "PRAGMA journal_mode=off;\n" "PRAGMA encoding=\"UTF-8\";\n" "\n" "CREATE TABLE \"docs\" (\n" " \"file\" TEXT NOT NULL,\n" " \"sec\" TEXT NOT NULL,\n" " \"arch\" TEXT NOT NULL,\n" " \"desc\" TEXT NOT NULL,\n" " \"id\" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL\n" ");\n" "\n" "CREATE TABLE \"keys\" (\n" " \"bits\" INTEGER NOT NULL,\n" " \"key\" TEXT NOT NULL,\n" " \"docid\" INTEGER NOT NULL REFERENCES docs(id) ON DELETE CASCADE,\n" " \"id\" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL\n" ");\n"; if (SQLITE_OK != sqlite3_exec(db, sql, NULL, NULL, NULL)) { perror(sqlite3_errmsg(db)); return(0); } sql = "DELETE FROM docs where file=?"; sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_DELETE], NULL); sql = "INSERT INTO docs (file,sec,arch,desc) VALUES (?,?,?,?)"; sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_INSERT_DOC], NULL); sql = "INSERT INTO keys (bits,key,docid) VALUES (?,?,?)"; sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_INSERT_KEY], NULL); return(1); } --------------080602040207030500020401-- -- To unsubscribe send an email to discuss+unsubscribe@mdocml.bsd.lv