* mandocdb: do not use bogus files @ 2011-11-13 18:42 Ingo Schwarze 2011-11-13 20:05 ` Kristaps Dzonsons 0 siblings, 1 reply; 7+ messages in thread From: Ingo Schwarze @ 2011-11-13 18:42 UTC (permalink / raw) To: tech Hi, here is the first step towards usable titles in apropos(1) output. Currently, if you have a page /usr/share/man/man3p/FooBar::cAmEl.3p, you are quite lost, because apropos(1) will tell you FOOBAR::CAMEL(3p) - my fancy Perl module but of course, this will not find the page: man FOOBAR::CAMEL Also, imagine /usr/share/man/foobar/bogus.mdoc contained .Dt fancy 6, then apropos fancy would happily return FANCY - my well hidden manual but of course man(1) will search for it in vain. Thus, only look for files in reasonable places, and save the filename stem (title), suffix (section), and architecture in the struct of, for later use by the indexer. OK? Ingo --- mandocdb.c.orig +++ mandocdb.c @@ -38,6 +38,9 @@ struct of { char *fname; /* heap-allocated */ + char *sec; + char *arch; + char *title; struct of *next; /* NULL for last one */ struct of *first; /* first in list */ }; @@ -85,7 +88,8 @@ static void index_prune(const struct of *, DB *, const char *, DB *, const char *, int, recno_t *, recno_t **, size_t *); static void ofile_argbuild(char *[], int, int, struct of **); -static int ofile_dirbuild(const char *, int, struct of **); +static int ofile_dirbuild(const char *, const char *, + const char *, int, struct of **); static void ofile_free(struct of *); static int pman_node(MAN_ARGS); static void pmdoc_node(MDOC_ARGS); @@ -396,7 +400,7 @@ mandocdb(int argc, char *argv[]) ofile_free(of); of = NULL; - if ( ! ofile_dirbuild(argv[i], verb, &of)) + if ( ! ofile_dirbuild(argv[i], NULL, NULL, verb, &of)) exit((int)MANDOCLEVEL_SYSERR); if (NULL == of) @@ -1180,12 +1184,14 @@ ofile_argbuild(char *argv[], int argc, int verb, struct of **of) * Pass in a pointer to a NULL structure for the first invocation. */ static int -ofile_dirbuild(const char *dir, int verb, struct of **of) +ofile_dirbuild(const char *dir, const char* psec, const char *parch, + int verb, struct of **of) { char buf[MAXPATHLEN]; size_t sz; DIR *d; - const char *fn; + const char *fn, *sec, *arch; + char *suffix; struct of *nof; struct dirent *dp; @@ -1194,12 +1200,27 @@ ofile_dirbuild(const char *dir, int verb, struct of **of) return(0); } + sec = psec; + arch = parch; + while (NULL != (dp = readdir(d))) { fn = dp->d_name; if (DT_DIR == dp->d_type) { - if (0 == strcmp(".", fn)) - continue; - if (0 == strcmp("..", fn)) + + /* + * Don't bother parsing directories + * that man(1) won't find. + */ + + if (NULL == psec) { + if(strncmp("man", fn, 3)) + continue; + sec = fn + 3; + arch = NULL; + } else if (NULL == parch && + NULL == strchr(fn, '.')) + arch = fn; + else continue; buf[0] = '\0'; @@ -1207,21 +1228,27 @@ ofile_dirbuild(const char *dir, int verb, struct of **of) strlcat(buf, "/", MAXPATHLEN); sz = strlcat(buf, fn, MAXPATHLEN); - if (sz < MAXPATHLEN) { - if ( ! ofile_dirbuild(buf, verb, of)) - return(0); - continue; - } else if (sz < MAXPATHLEN) - continue; + if (MAXPATHLEN <= sz) { + fprintf(stderr, "%s: Path too long\n", dir); + return(0); + } + + if (verb > 2) + printf("%s: Scanning\n", buf); - fprintf(stderr, "%s: Path too long\n", dir); - return(0); + if ( ! ofile_dirbuild(buf, sec, arch, verb, of)) + return(0); } - if (DT_REG != dp->d_type) + if (NULL == sec || DT_REG != dp->d_type) continue; - if (0 == strcmp(MANDOC_DB, fn) || - 0 == strcmp(MANDOC_IDX, fn)) + /* + * Don't bother parsing files that man(1) won't find. + */ + + if (NULL == (suffix = strrchr(fn, '.'))) + continue; + if (strcmp(suffix + 1, sec)) continue; buf[0] = '\0'; @@ -1235,6 +1262,11 @@ ofile_dirbuild(const char *dir, int verb, struct of **of) nof = mandoc_calloc(1, sizeof(struct of)); nof->fname = mandoc_strdup(buf); + nof->sec = mandoc_strdup(sec); + if (NULL != arch) + nof->arch = mandoc_strdup(arch); + *suffix = '\0'; + nof->title = mandoc_strdup(fn); if (verb > 2) printf("%s: Scheduling\n", buf); -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mandocdb: do not use bogus files 2011-11-13 18:42 mandocdb: do not use bogus files Ingo Schwarze @ 2011-11-13 20:05 ` Kristaps Dzonsons 2011-11-14 0:06 ` Ingo Schwarze 0 siblings, 1 reply; 7+ messages in thread From: Kristaps Dzonsons @ 2011-11-13 20:05 UTC (permalink / raw) To: tech; +Cc: Ingo Schwarze On 13/11/2011 19:42, Ingo Schwarze wrote: > Hi, > > here is the first step towards usable titles in apropos(1) output. > > Currently, if you have a page /usr/share/man/man3p/FooBar::cAmEl.3p, > you are quite lost, because apropos(1) will tell you > > FOOBAR::CAMEL(3p) - my fancy Perl module > > but of course, this will not find the page: > > man FOOBAR::CAMEL > > Also, imagine /usr/share/man/foobar/bogus.mdoc contained .Dt fancy 6, > then apropos fancy would happily return > > FANCY - my well hidden manual > > but of course man(1) will search for it in vain. > > Thus, only look for files in reasonable places, and save the filename > stem (title), suffix (section), and architecture in the struct of, > for later use by the indexer. > > OK? > Ingo Ingo, Yes, I like this as the default behaviour. However, there must be a command-line flag not to do so, as I use mandocdb in my own directory structures. You also need to free up the memory in ofile_free(). Furthermore, please amend mandocdb.8 to note which directories are visited and which are not (in the default case). I also think it's a good idea to note skipped directories with a warning that's described in DIAGNOSTICS. Since this is a .8, we should (in my humble opinion) be quite rigorous in what's done and what's not done. Thoughts? Kristaps -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mandocdb: do not use bogus files 2011-11-13 20:05 ` Kristaps Dzonsons @ 2011-11-14 0:06 ` Ingo Schwarze 2011-11-14 19:13 ` Ingo Schwarze 0 siblings, 1 reply; 7+ messages in thread From: Ingo Schwarze @ 2011-11-14 0:06 UTC (permalink / raw) To: tech Hi Kristaps, Kristaps Dzonsons wrote on Sun, Nov 13, 2011 at 09:05:08PM +0100: > Yes, I like this as the default behaviour. Good, so i'll prod on down that road. :-) > However, there must be a command-line flag not to do so, > as I use mandocdb in my own directory structures. You have a point, and i have added -a. > You also need to free up the memory in ofile_free(). Sure, i planned to, but forget to write it down before sending the patch. Done now. > Furthermore, please amend mandocdb.8 to note which directories are > visited and which are not (in the default case). I have put some minimal text for now. > I also think it's a good idea to note skipped directories with > a warning that's described in DIAGNOSTICS. Not sure i want that in weekly(8) output; in any case, deciding that is for later, when the dust has settled. Right now, i want to get the stuff to do some basic work, not polish it. > Since this is a .8, we should (in my humble opinion) be quite > rigorous in what's done and what's not done. In general, i agree, but i don't think now is the right time. This is not yet stable at all, so i'd have to rewrite that documentation multiple times. I'm not yet sure how much man.conf(5) will be used here. There are several other open questions, and what i have so far is rudimentary at best. So rigourous documentation is for later, too. Here is an update: * -a, use_all * calculate the same info in ofile_argbuild as well; obviously, the algorithm must be rather different * skip all dot-files; i think that's reasonable * free the memory Not asking for OKs right now, i'm planning to first use the info in index_merge, then commit. Of course, comments are welcome! Yours, Ingo --- mandocdb.8.orig +++ mandocdb.8 @@ -22,7 +22,7 @@ .Nd index UNIX manuals .Sh SYNOPSIS .Nm -.Op Fl v +.Op Fl av .Op Ar dir... .Nm .Op Fl v @@ -42,8 +42,15 @@ manuals and indexes them in a and .Sx Index Database for fast retrieval. +.Pp The arguments are as follows: .Bl -tag -width Ds +.It Fl a +Use all directories and files found below +.Ar dir ... . +By default, directories and files +.Xr man 1 +cannot find will be silently skipped. .It Fl d Ar dir Merge (remove and re-add) .Ar --- mandocdb.c.orig +++ mandocdb.c @@ -38,6 +38,9 @@ struct of { char *fname; /* heap-allocated */ + char *sec; + char *arch; + char *title; struct of *next; /* NULL for last one */ struct of *first; /* first in list */ }; @@ -79,13 +82,15 @@ static void hash_reset(DB **); static void index_merge(const struct of *, struct mparse *, struct buf *, struct buf *, DB *, DB *, const char *, - DB *, const char *, int, + DB *, const char *, int, int, recno_t, const recno_t *, size_t); static void index_prune(const struct of *, DB *, const char *, DB *, const char *, int, recno_t *, recno_t **, size_t *); -static void ofile_argbuild(char *[], int, int, struct of **); -static int ofile_dirbuild(const char *, int, struct of **); +static void ofile_argbuild(char *[], int, int, int, + struct of **); +static int ofile_dirbuild(const char *, const char *, + const char *, int, int, struct of **); static void ofile_free(struct of *); static int pman_node(MAN_ARGS); static void pmdoc_node(MDOC_ARGS); @@ -243,6 +248,7 @@ mandocdb(int argc, char *argv[]) char ibuf[MAXPATHLEN], /* index fname */ fbuf[MAXPATHLEN]; /* btree fname */ int verb, /* output verbosity */ + use_all, /* use all directories and files */ ch, i, flags; DB *idx, /* index database */ *db, /* keyword database */ @@ -266,6 +272,7 @@ mandocdb(int argc, char *argv[]) ++progname; verb = 0; + use_all = 0; of = NULL; db = idx = NULL; mp = NULL; @@ -276,8 +283,11 @@ mandocdb(int argc, char *argv[]) op = OP_NEW; dir = NULL; - while (-1 != (ch = getopt(argc, argv, "d:u:v"))) + while (-1 != (ch = getopt(argc, argv, "ad:u:v"))) switch (ch) { + case ('a'): + use_all = 1; + break; case ('d'): dir = optarg; op = OP_UPDATE; @@ -344,7 +354,7 @@ mandocdb(int argc, char *argv[]) printf("%s: Opened\n", ibuf); } - ofile_argbuild(argv, argc, verb, &of); + ofile_argbuild(argv, argc, use_all, verb, &of); if (NULL == of) goto out; @@ -355,8 +365,8 @@ mandocdb(int argc, char *argv[]) if (OP_UPDATE == op) index_merge(of, mp, &dbuf, &buf, hash, - db, fbuf, idx, ibuf, verb, - maxrec, recs, reccur); + db, fbuf, idx, ibuf, use_all, + verb, maxrec, recs, reccur); goto out; } @@ -396,7 +406,8 @@ mandocdb(int argc, char *argv[]) ofile_free(of); of = NULL; - if ( ! ofile_dirbuild(argv[i], verb, &of)) + if ( ! ofile_dirbuild(argv[i], NULL, NULL, + use_all, verb, &of)) exit((int)MANDOCLEVEL_SYSERR); if (NULL == of) @@ -405,7 +416,8 @@ mandocdb(int argc, char *argv[]) of = of->first; index_merge(of, mp, &dbuf, &buf, hash, db, fbuf, - idx, ibuf, verb, maxrec, recs, reccur); + idx, ibuf, use_all, verb, + maxrec, recs, reccur); } out: @@ -430,7 +442,7 @@ void index_merge(const struct of *of, struct mparse *mp, struct buf *dbuf, struct buf *buf, DB *hash, DB *db, const char *dbf, - DB *idx, const char *idxf, int verb, + DB *idx, const char *idxf, int use_all, int verb, recno_t maxrec, const recno_t *recs, size_t reccur) { recno_t rec; @@ -1150,14 +1162,62 @@ pman_node(MAN_ARGS) } static void -ofile_argbuild(char *argv[], int argc, int verb, struct of **of) +ofile_argbuild(char *argv[], int argc, int use_all, int verb, + struct of **of) { + char buf[MAXPATHLEN]; + char *sec, *arch, *title, *p; int i; struct of *nof; for (i = 0; i < argc; i++) { + + /* + * Analyze the path. + */ + + if (strlcpy(buf, argv[i], sizeof(buf)) >= sizeof(buf)) { + fprintf(stderr, "%s: Path too long\n", argv[i]); + continue; + } + sec = arch = title = NULL; + p = strrchr(buf, '\0'); + while (p-- > buf) { + if (NULL == sec && '.' == *p) { + sec = p + 1; + *p = '\0'; + continue; + } + if ('/' != *p) + continue; + if (NULL == title) { + title = p + 1; + *p = '\0'; + continue; + } + if (strncmp("man", p + 1, 3)) + arch = p + 1; + break; + } + if (NULL == title) + title = buf; + + /* + * Build the file structure. + */ + nof = mandoc_calloc(1, sizeof(struct of)); - nof->fname = strdup(argv[i]); + nof->fname = mandoc_strdup(argv[i]); + if (NULL != sec) + nof->sec = mandoc_strdup(sec); + if (NULL != arch) + nof->arch = mandoc_strdup(arch); + nof->title = mandoc_strdup(title); + + /* + * Add the structure to the list. + */ + if (verb > 2) printf("%s: Scheduling\n", argv[i]); if (NULL == *of) { @@ -1180,12 +1240,14 @@ ofile_argbuild(char *argv[], int argc, int verb, struct of **of) * Pass in a pointer to a NULL structure for the first invocation. */ static int -ofile_dirbuild(const char *dir, int verb, struct of **of) +ofile_dirbuild(const char *dir, const char* psec, const char *parch, + int use_all, int verb, struct of **of) { char buf[MAXPATHLEN]; size_t sz; DIR *d; - const char *fn; + const char *fn, *sec, *arch; + char *suffix; struct of *nof; struct dirent *dp; @@ -1194,12 +1256,34 @@ ofile_dirbuild(const char *dir, int verb, struct of **of) return(0); } + sec = psec; + arch = parch; + while (NULL != (dp = readdir(d))) { fn = dp->d_name; + + if ('.' == *fn) + continue; + if (DT_DIR == dp->d_type) { - if (0 == strcmp(".", fn)) - continue; - if (0 == strcmp("..", fn)) + + /* + * Don't bother parsing directories + * that man(1) won't find. + */ + + if (NULL == psec) { + if(0 == strncmp("man", fn, 3)) + sec = fn + 3; + else if (use_all) + sec = fn; + else + continue; + arch = NULL; + } else if (NULL == parch && (use_all || + NULL == strchr(fn, '.'))) + arch = fn; + else if (0 == use_all) continue; buf[0] = '\0'; @@ -1207,22 +1291,35 @@ ofile_dirbuild(const char *dir, int verb, struct of **of) strlcat(buf, "/", MAXPATHLEN); sz = strlcat(buf, fn, MAXPATHLEN); - if (sz < MAXPATHLEN) { - if ( ! ofile_dirbuild(buf, verb, of)) - return(0); - continue; - } else if (sz < MAXPATHLEN) - continue; + if (MAXPATHLEN <= sz) { + fprintf(stderr, "%s: Path too long\n", dir); + return(0); + } + + if (verb > 2) + printf("%s: Scanning\n", buf); - fprintf(stderr, "%s: Path too long\n", dir); - return(0); + if ( ! ofile_dirbuild(buf, sec, arch, + use_all, verb, of)) + return(0); } - if (DT_REG != dp->d_type) + if (DT_REG != dp->d_type || + (NULL == sec && !use_all) || + !strcmp(MANDOC_DB, fn) || + !strcmp(MANDOC_IDX, fn)) continue; - if (0 == strcmp(MANDOC_DB, fn) || - 0 == strcmp(MANDOC_IDX, fn)) - continue; + /* + * Don't bother parsing files that man(1) won't find. + */ + + suffix = strrchr(fn, '.'); + if (0 == use_all) { + if (NULL == suffix) + continue; + if (strcmp(suffix + 1, sec)) + continue; + } buf[0] = '\0'; strlcat(buf, dir, MAXPATHLEN); @@ -1235,6 +1332,13 @@ ofile_dirbuild(const char *dir, int verb, struct of **of) nof = mandoc_calloc(1, sizeof(struct of)); nof->fname = mandoc_strdup(buf); + if (NULL != sec) + nof->sec = mandoc_strdup(sec); + if (NULL != arch) + nof->arch = mandoc_strdup(arch); + if (NULL != suffix) + *suffix = '\0'; + nof->title = mandoc_strdup(fn); if (verb > 2) printf("%s: Scheduling\n", buf); @@ -1261,6 +1365,9 @@ ofile_free(struct of *of) while (of) { nof = of->next; free(of->fname); + free(of->sec); + free(of->arch); + free(of->title); free(of); of = nof; } -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mandocdb: do not use bogus files 2011-11-14 0:06 ` Ingo Schwarze @ 2011-11-14 19:13 ` Ingo Schwarze 2011-11-14 20:07 ` Kristaps Dzonsons 0 siblings, 1 reply; 7+ messages in thread From: Ingo Schwarze @ 2011-11-14 19:13 UTC (permalink / raw) To: tech Hi, because this patch started getting large and unwieldy, and i'm trying to move fast, i have just put it into OpenBSD: ----- 8< ----- schnipp ----- >8 ----- 8< ----- schnapp ----- >8 ----- CVSROOT: /cvs Module name: src Changes by: schwarze@cvs.openbsd.org 2011/11/14 11:52:05 Modified files: usr.bin/mandoc : mandocdb.8 mandocdb.c Log message: Store page titles in the correct case, and by default, only put stuff into the database that man(1) will be able to retrieve. However, support an option to use all directories and files. Kristaps@ agreed with the general direction and provided some feedback. ----- 8< ----- schnipp ----- >8 ----- 8< ----- schnapp ----- >8 ----- Of course, more tweaking will be required, and i'm open for suggestions! Tell me if you think i should merge to bsd.lv; for that purpose, i'm appending the committed patch below. Yours, Ingo Index: mandocdb.8 =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mandocdb.8,v retrieving revision 1.4 diff -u -p -r1.4 mandocdb.8 --- mandocdb.8 9 Oct 2011 17:59:56 -0000 1.4 +++ mandocdb.8 14 Nov 2011 18:36:22 -0000 @@ -22,7 +22,7 @@ .Nd index UNIX manuals .Sh SYNOPSIS .Nm -.Op Fl v +.Op Fl av .Op Ar dir... .Nm .Op Fl v @@ -42,8 +42,15 @@ manuals and indexes them in a and .Sx Index Database for fast retrieval. +.Pp The arguments are as follows: .Bl -tag -width Ds +.It Fl a +Use all directories and files found below +.Ar dir ... . +By default, directories and files +.Xr man 1 +cannot find will be silently skipped. .It Fl d Ar dir Merge (remove and re-add) .Ar Index: mandocdb.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mandocdb.c,v retrieving revision 1.5 diff -u -p -r1.5 mandocdb.c --- mandocdb.c 13 Nov 2011 10:40:52 -0000 1.5 +++ mandocdb.c 14 Nov 2011 18:36:22 -0000 @@ -38,6 +38,9 @@ struct of { char *fname; /* heap-allocated */ + char *sec; + char *arch; + char *title; struct of *next; /* NULL for last one */ struct of *first; /* first in list */ }; @@ -79,13 +82,15 @@ static void hash_reset(DB **); static void index_merge(const struct of *, struct mparse *, struct buf *, struct buf *, DB *, DB *, const char *, - DB *, const char *, int, + DB *, const char *, int, int, recno_t, const recno_t *, size_t); static void index_prune(const struct of *, DB *, const char *, DB *, const char *, int, recno_t *, recno_t **, size_t *); -static void ofile_argbuild(char *[], int, int, struct of **); -static int ofile_dirbuild(const char *, int, struct of **); +static void ofile_argbuild(char *[], int, int, int, + struct of **); +static int ofile_dirbuild(const char *, const char *, + const char *, int, int, struct of **); static void ofile_free(struct of *); static int pman_node(MAN_ARGS); static void pmdoc_node(MDOC_ARGS); @@ -243,6 +248,7 @@ mandocdb(int argc, char *argv[]) char ibuf[MAXPATHLEN], /* index fname */ fbuf[MAXPATHLEN]; /* btree fname */ int verb, /* output verbosity */ + use_all, /* use all directories and files */ ch, i, flags; DB *idx, /* index database */ *db, /* keyword database */ @@ -266,6 +272,7 @@ mandocdb(int argc, char *argv[]) ++progname; verb = 0; + use_all = 0; of = NULL; db = idx = NULL; mp = NULL; @@ -276,8 +283,11 @@ mandocdb(int argc, char *argv[]) op = OP_NEW; dir = NULL; - while (-1 != (ch = getopt(argc, argv, "d:u:v"))) + while (-1 != (ch = getopt(argc, argv, "ad:u:v"))) switch (ch) { + case ('a'): + use_all = 1; + break; case ('d'): dir = optarg; op = OP_UPDATE; @@ -344,7 +354,7 @@ mandocdb(int argc, char *argv[]) printf("%s: Opened\n", ibuf); } - ofile_argbuild(argv, argc, verb, &of); + ofile_argbuild(argv, argc, use_all, verb, &of); if (NULL == of) goto out; @@ -355,8 +365,8 @@ mandocdb(int argc, char *argv[]) if (OP_UPDATE == op) index_merge(of, mp, &dbuf, &buf, hash, - db, fbuf, idx, ibuf, verb, - maxrec, recs, reccur); + db, fbuf, idx, ibuf, use_all, + verb, maxrec, recs, reccur); goto out; } @@ -396,7 +406,8 @@ mandocdb(int argc, char *argv[]) ofile_free(of); of = NULL; - if ( ! ofile_dirbuild(argv[i], verb, &of)) + if ( ! ofile_dirbuild(argv[i], NULL, NULL, + use_all, verb, &of)) exit((int)MANDOCLEVEL_SYSERR); if (NULL == of) @@ -405,7 +416,8 @@ mandocdb(int argc, char *argv[]) of = of->first; index_merge(of, mp, &dbuf, &buf, hash, db, fbuf, - idx, ibuf, verb, maxrec, recs, reccur); + idx, ibuf, use_all, verb, + maxrec, recs, reccur); } out: @@ -430,7 +442,7 @@ void index_merge(const struct of *of, struct mparse *mp, struct buf *dbuf, struct buf *buf, DB *hash, DB *db, const char *dbf, - DB *idx, const char *idxf, int verb, + DB *idx, const char *idxf, int use_all, int verb, recno_t maxrec, const recno_t *recs, size_t reccur) { recno_t rec; @@ -466,17 +478,52 @@ index_merge(const struct of *of, struct if (NULL == mdoc && NULL == man) continue; + /* + * Make sure the manual section and architecture + * agree with the directory where the file is located + * or man(1) will not be able to find it. + */ + msec = NULL != mdoc ? mdoc_meta(mdoc)->msec : man_meta(man)->msec; - mtitle = NULL != mdoc ? - mdoc_meta(mdoc)->title : man_meta(man)->title; arch = NULL != mdoc ? mdoc_meta(mdoc)->arch : NULL; + if (0 == use_all) { + assert(of->sec); + assert(msec); + if (strcmp(msec, of->sec)) + continue; + + if (NULL == arch) { + if (NULL != of->arch) + continue; + } else if (NULL == of->arch || + strcmp(arch, of->arch)) + continue; + } + if (NULL == arch) arch = ""; /* + * Case is relevant for man(1), so use the file name + * instead of the (usually) all caps page title, + * if the two agree. + */ + + mtitle = NULL != mdoc ? + mdoc_meta(mdoc)->title : man_meta(man)->title; + + assert(of->title); + assert(mtitle); + + if (0 == strcasecmp(mtitle, of->title)) + mtitle = of->title; + else if (0 == use_all) + continue; + + /* * The index record value consists of a nil-terminated * filename, a nil-terminated manual section, and a * nil-terminated description. Since the description @@ -1150,14 +1197,62 @@ pman_node(MAN_ARGS) } static void -ofile_argbuild(char *argv[], int argc, int verb, struct of **of) +ofile_argbuild(char *argv[], int argc, int use_all, int verb, + struct of **of) { + char buf[MAXPATHLEN]; + char *sec, *arch, *title, *p; int i; struct of *nof; for (i = 0; i < argc; i++) { + + /* + * Analyze the path. + */ + + if (strlcpy(buf, argv[i], sizeof(buf)) >= sizeof(buf)) { + fprintf(stderr, "%s: Path too long\n", argv[i]); + continue; + } + sec = arch = title = NULL; + p = strrchr(buf, '\0'); + while (p-- > buf) { + if (NULL == sec && '.' == *p) { + sec = p + 1; + *p = '\0'; + continue; + } + if ('/' != *p) + continue; + if (NULL == title) { + title = p + 1; + *p = '\0'; + continue; + } + if (strncmp("man", p + 1, 3)) + arch = p + 1; + break; + } + if (NULL == title) + title = buf; + + /* + * Build the file structure. + */ + nof = mandoc_calloc(1, sizeof(struct of)); - nof->fname = strdup(argv[i]); + nof->fname = mandoc_strdup(argv[i]); + if (NULL != sec) + nof->sec = mandoc_strdup(sec); + if (NULL != arch) + nof->arch = mandoc_strdup(arch); + nof->title = mandoc_strdup(title); + + /* + * Add the structure to the list. + */ + if (verb > 2) printf("%s: Scheduling\n", argv[i]); if (NULL == *of) { @@ -1180,12 +1275,14 @@ ofile_argbuild(char *argv[], int argc, i * Pass in a pointer to a NULL structure for the first invocation. */ static int -ofile_dirbuild(const char *dir, int verb, struct of **of) +ofile_dirbuild(const char *dir, const char* psec, const char *parch, + int use_all, int verb, struct of **of) { char buf[MAXPATHLEN]; size_t sz; DIR *d; - const char *fn; + const char *fn, *sec, *arch; + char *suffix; struct of *nof; struct dirent *dp; @@ -1196,10 +1293,30 @@ ofile_dirbuild(const char *dir, int verb while (NULL != (dp = readdir(d))) { fn = dp->d_name; + + if ('.' == *fn) + continue; + if (DT_DIR == dp->d_type) { - if (0 == strcmp(".", fn)) - continue; - if (0 == strcmp("..", fn)) + sec = psec; + arch = parch; + + /* + * Don't bother parsing directories + * that man(1) won't find. + */ + + if (NULL == sec) { + if(0 == strncmp("man", fn, 3)) + sec = fn + 3; + else if (use_all) + sec = fn; + else + continue; + } else if (NULL == arch && (use_all || + NULL == strchr(fn, '.'))) + arch = fn; + else if (0 == use_all) continue; buf[0] = '\0'; @@ -1207,22 +1324,35 @@ ofile_dirbuild(const char *dir, int verb strlcat(buf, "/", MAXPATHLEN); sz = strlcat(buf, fn, MAXPATHLEN); - if (sz < MAXPATHLEN) { - if ( ! ofile_dirbuild(buf, verb, of)) - return(0); - continue; - } else if (sz < MAXPATHLEN) - continue; - - fprintf(stderr, "%s: Path too long\n", dir); - return(0); + if (MAXPATHLEN <= sz) { + fprintf(stderr, "%s: Path too long\n", dir); + return(0); + } + + if (verb > 2) + printf("%s: Scanning\n", buf); + + if ( ! ofile_dirbuild(buf, sec, arch, + use_all, verb, of)) + return(0); } - if (DT_REG != dp->d_type) + if (DT_REG != dp->d_type || + (NULL == psec && !use_all) || + !strcmp(MANDOC_DB, fn) || + !strcmp(MANDOC_IDX, fn)) continue; - if (0 == strcmp(MANDOC_DB, fn) || - 0 == strcmp(MANDOC_IDX, fn)) - continue; + /* + * Don't bother parsing files that man(1) won't find. + */ + + suffix = strrchr(fn, '.'); + if (0 == use_all) { + if (NULL == suffix) + continue; + if (strcmp(suffix + 1, psec)) + continue; + } buf[0] = '\0'; strlcat(buf, dir, MAXPATHLEN); @@ -1235,6 +1365,13 @@ ofile_dirbuild(const char *dir, int verb nof = mandoc_calloc(1, sizeof(struct of)); nof->fname = mandoc_strdup(buf); + if (NULL != psec) + nof->sec = mandoc_strdup(psec); + if (NULL != parch) + nof->arch = mandoc_strdup(parch); + if (NULL != suffix) + *suffix = '\0'; + nof->title = mandoc_strdup(fn); if (verb > 2) printf("%s: Scheduling\n", buf); @@ -1261,6 +1398,9 @@ ofile_free(struct of *of) while (of) { nof = of->next; free(of->fname); + free(of->sec); + free(of->arch); + free(of->title); free(of); of = nof; } -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mandocdb: do not use bogus files 2011-11-14 19:13 ` Ingo Schwarze @ 2011-11-14 20:07 ` Kristaps Dzonsons 2011-11-14 23:34 ` Ingo Schwarze 0 siblings, 1 reply; 7+ messages in thread From: Kristaps Dzonsons @ 2011-11-14 20:07 UTC (permalink / raw) To: tech; +Cc: Ingo Schwarze On 14/11/2011 20:13, Ingo Schwarze wrote: > Hi, > > because this patch started getting large and unwieldy, and i'm trying > to move fast, i have just put it into OpenBSD: > > ----- 8< ----- schnipp ----->8 ----- 8< ----- schnapp ----->8 ----- > > CVSROOT: /cvs > Module name: src > Changes by: schwarze@cvs.openbsd.org 2011/11/14 11:52:05 > > Modified files: > usr.bin/mandoc : mandocdb.8 mandocdb.c > > Log message: > Store page titles in the correct case, and by default, only > put stuff into the database that man(1) will be able to retrieve. > However, support an option to use all directories and files. > > Kristaps@ agreed with the general direction and provided some feedback. > > ----- 8< ----- schnipp ----->8 ----- 8< ----- schnapp ----->8 ----- > > Of course, more tweaking will be required, and i'm open for > suggestions! > > Tell me if you think i should merge to bsd.lv; > for that purpose, i'm appending the committed patch below. Hi Ingo, This is a good start, but it needs more documentation as to what's happening. You mention man(1) a great deal, but I can't easily find an authoritative list of man's behaviour in this regard (not on a 4.9 box, anyway). > -.Op Fl v > +.Op Fl av > .Op Ar dir... > .Nm > .Op Fl v > @@ -42,8 +42,15 @@ manuals and indexes them in a > and > .Sx Index Database > for fast retrieval. > +.Pp > The arguments are as follows: > .Bl -tag -width Ds > +.It Fl a > +Use all directories and files found below > +.Ar dir ... . > +By default, directories and files > +.Xr man 1 > +cannot find will be silently skipped. Can you instead list the measures taken and remove the Xr man reference? This reference is not portable. > + /* > + * Make sure the manual section and architecture > + * agree with the directory where the file is located > + * or man(1) will not be able to find it. > + */ Same here. Yes, this seems like quibbling, but without making this change all of these comments are useless beyond OpenBSD. Plus, I'd rather have the correct measures in-line than having to infer behaviour. > for (i = 0; i< argc; i++) { > + > + /* > + * Analyze the path. > + */ Hm. Can be you be more specific? Yes, I can figure it out by looking at the code (the strrchr() call threw me: I've never seen it used like that). But I'd rather you describe it beforehand. > + > + if (strlcpy(buf, argv[i], sizeof(buf))>= sizeof(buf)) { > + fprintf(stderr, "%s: Path too long\n", argv[i]); > + continue; > + } > + sec = arch = title = NULL; > + p = strrchr(buf, '\0'); > + while (p--> buf) { > + if (NULL == sec&& '.' == *p) { > + sec = p + 1; > + *p = '\0'; > + continue; > + } > + if ('/' != *p) > + continue; > + if (NULL == title) { > + title = p + 1; > + *p = '\0'; > + continue; > + } > + if (strncmp("man", p + 1, 3)) > + arch = p + 1; > + break; > + } > + if (NULL == title) > + title = buf; > + > + /* > + * Build the file structure. > + */ > + > nof = mandoc_calloc(1, sizeof(struct of)); > - nof->fname = strdup(argv[i]); > + nof->fname = mandoc_strdup(argv[i]); > + if (NULL != sec) > + nof->sec = mandoc_strdup(sec); > + if (NULL != arch) > + nof->arch = mandoc_strdup(arch); > + nof->title = mandoc_strdup(title); > + > + /* > + * Add the structure to the list. > + */ > + > if (verb> 2) > printf("%s: Scheduling\n", argv[i]); > if (NULL == *of) { > @@ -1180,12 +1275,14 @@ ofile_argbuild(char *argv[], int argc, i > * Pass in a pointer to a NULL structure for the first invocation. > */ > static int > -ofile_dirbuild(const char *dir, int verb, struct of **of) > +ofile_dirbuild(const char *dir, const char* psec, const char *parch, > + int use_all, int verb, struct of **of) > { > char buf[MAXPATHLEN]; > size_t sz; > DIR *d; > - const char *fn; > + const char *fn, *sec, *arch; > + char *suffix; > struct of *nof; > struct dirent *dp; > > @@ -1196,10 +1293,30 @@ ofile_dirbuild(const char *dir, int verb > > while (NULL != (dp = readdir(d))) { > fn = dp->d_name; > + > + if ('.' == *fn) > + continue; > + > if (DT_DIR == dp->d_type) { > - if (0 == strcmp(".", fn)) > - continue; > - if (0 == strcmp("..", fn)) > + sec = psec; > + arch = parch; > + > + /* > + * Don't bother parsing directories > + * that man(1) won't find. > + */ Same argument as above.... > + > + if (NULL == sec) { > + if(0 == strncmp("man", fn, 3)) > + sec = fn + 3; > + else if (use_all) > + sec = fn; > + else > + continue; > + } else if (NULL == arch&& (use_all || > + NULL == strchr(fn, '.'))) > + arch = fn; > + else if (0 == use_all) > continue; > > buf[0] = '\0'; > @@ -1207,22 +1324,35 @@ ofile_dirbuild(const char *dir, int verb > strlcat(buf, "/", MAXPATHLEN); > sz = strlcat(buf, fn, MAXPATHLEN); > > - if (sz< MAXPATHLEN) { > - if ( ! ofile_dirbuild(buf, verb, of)) > - return(0); > - continue; > - } else if (sz< MAXPATHLEN) > - continue; > - > - fprintf(stderr, "%s: Path too long\n", dir); > - return(0); > + if (MAXPATHLEN<= sz) { > + fprintf(stderr, "%s: Path too long\n", dir); > + return(0); > + } > + > + if (verb> 2) > + printf("%s: Scanning\n", buf); > + > + if ( ! ofile_dirbuild(buf, sec, arch, > + use_all, verb, of)) > + return(0); > } > - if (DT_REG != dp->d_type) > + if (DT_REG != dp->d_type || > + (NULL == psec&& !use_all) || > + !strcmp(MANDOC_DB, fn) || > + !strcmp(MANDOC_IDX, fn)) > continue; > > - if (0 == strcmp(MANDOC_DB, fn) || > - 0 == strcmp(MANDOC_IDX, fn)) > - continue; > + /* > + * Don't bother parsing files that man(1) won't find. > + */ Again... > + > + suffix = strrchr(fn, '.'); > + if (0 == use_all) { > + if (NULL == suffix) > + continue; > + if (strcmp(suffix + 1, psec)) > + continue; > + } > > buf[0] = '\0'; > strlcat(buf, dir, MAXPATHLEN); > @@ -1235,6 +1365,13 @@ ofile_dirbuild(const char *dir, int verb > > nof = mandoc_calloc(1, sizeof(struct of)); > nof->fname = mandoc_strdup(buf); > + if (NULL != psec) > + nof->sec = mandoc_strdup(psec); > + if (NULL != parch) > + nof->arch = mandoc_strdup(parch); > + if (NULL != suffix) > + *suffix = '\0'; > + nof->title = mandoc_strdup(fn); > > if (verb> 2) > printf("%s: Scheduling\n", buf); > @@ -1261,6 +1398,9 @@ ofile_free(struct of *of) > while (of) { > nof = of->next; > free(of->fname); > + free(of->sec); > + free(of->arch); > + free(of->title); > free(of); > of = nof; > } > -- > To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv > -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mandocdb: do not use bogus files 2011-11-14 20:07 ` Kristaps Dzonsons @ 2011-11-14 23:34 ` Ingo Schwarze 2011-11-24 10:10 ` Kristaps Dzonsons 0 siblings, 1 reply; 7+ messages in thread From: Ingo Schwarze @ 2011-11-14 23:34 UTC (permalink / raw) To: tech Hi Kristaps, Kristaps Dzonsons wrote on Mon, Nov 14, 2011 at 09:07:13PM +0100: > This is a good start, but it needs more documentation as to what's > happening. You mention man(1) a great deal, but I can't easily find > an authoritative list of man's behaviour in this regard (not on a > 4.9 box, anyway). I fear that is only documented in /etc/man.conf. So i kind of see your point that this is all rather fuzzy. Below is a patch to avoid references to man(1) and be somewhat more explicit about what i'm doing (right now, that is...) Again, i expect many more changes in this area before the dust settles, so i'd like to keep the documentation brief for now lest i have to rewrite it over and over. While here, i'd like to reword a few comments that left me wondering when i tried to understand the code, and i'd like to fix a minor pasto in an error path. Is this OK for you? Thanks, Ingo Index: mandocdb.8 =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mandocdb.8,v retrieving revision 1.5 diff -u -p -r1.5 mandocdb.8 --- mandocdb.8 14 Nov 2011 18:52:05 -0000 1.5 +++ mandocdb.8 14 Nov 2011 23:23:14 -0000 @@ -48,9 +48,13 @@ The arguments are as follows: .It Fl a Use all directories and files found below .Ar dir ... . -By default, directories and files -.Xr man 1 -cannot find will be silently skipped. +By default, only files matching +.Sm off +.Sy man Ar section Li / +.Op Ar arch Li / +.Ar title . section +.Sm on +will be used. .It Fl d Ar dir Merge (remove and re-add) .Ar Index: mandocdb.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mandocdb.c,v retrieving revision 1.6 diff -u -p -r1.6 mandocdb.c --- mandocdb.c 14 Nov 2011 18:52:05 -0000 1.6 +++ mandocdb.c 14 Nov 2011 23:23:15 -0000 @@ -254,11 +254,11 @@ mandocdb(int argc, char *argv[]) *db, /* keyword database */ *hash; /* temporary keyword hashtable */ BTREEINFO info; /* btree configuration */ - recno_t maxrec; /* supremum of all records */ - recno_t *recs; /* buffer of empty records */ + recno_t maxrec; /* last record number in the index */ + recno_t *recs; /* the numbers of all empty records */ size_t sz1, sz2, - recsz, /* buffer size of recs */ - reccur; /* valid number of recs */ + recsz, /* number of allocated slots in recs */ + reccur; /* current number of empty records */ struct buf buf, /* keyword buffer */ dbuf; /* description buffer */ struct of *of; /* list of files for processing */ @@ -344,7 +344,7 @@ mandocdb(int argc, char *argv[]) if (NULL == db) { perror(fbuf); exit((int)MANDOCLEVEL_SYSERR); - } else if (NULL == db) { + } else if (NULL == idx) { perror(ibuf); exit((int)MANDOCLEVEL_SYSERR); } @@ -393,7 +393,7 @@ mandocdb(int argc, char *argv[]) if (NULL == db) { perror(fbuf); exit((int)MANDOCLEVEL_SYSERR); - } else if (NULL == db) { + } else if (NULL == idx) { perror(ibuf); exit((int)MANDOCLEVEL_SYSERR); } @@ -479,9 +479,9 @@ index_merge(const struct of *of, struct continue; /* - * Make sure the manual section and architecture - * agree with the directory where the file is located - * or man(1) will not be able to find it. + * By default, skip a file if the manual section + * and architecture given in the file disagree + * with the directory where the file is located. */ msec = NULL != mdoc ? @@ -507,9 +507,10 @@ index_merge(const struct of *of, struct arch = ""; /* - * Case is relevant for man(1), so use the file name - * instead of the (usually) all caps page title, - * if the two agree. + * By default, skip a file if the title given + * in the file disagrees with the file name. + * If both agree, use the file name as the title, + * because the one in the file usually is all caps. */ mtitle = NULL != mdoc ? @@ -1208,7 +1209,9 @@ ofile_argbuild(char *argv[], int argc, i for (i = 0; i < argc; i++) { /* - * Analyze the path. + * Try to infer the manual section, architecture and + * page title from the path, assuming it looks like + * man*/[<arch>/]<title>.<section> */ if (strlcpy(buf, argv[i], sizeof(buf)) >= sizeof(buf)) { @@ -1302,8 +1305,8 @@ ofile_dirbuild(const char *dir, const ch arch = parch; /* - * Don't bother parsing directories - * that man(1) won't find. + * By default, only use directories called: + * man<section>/[<arch>/] */ if (NULL == sec) { @@ -1343,7 +1346,9 @@ ofile_dirbuild(const char *dir, const ch continue; /* - * Don't bother parsing files that man(1) won't find. + * By default, skip files where the file name suffix + * does not agree with the section directory + * they are located in. */ suffix = strrchr(fn, '.'); @@ -1369,6 +1374,12 @@ ofile_dirbuild(const char *dir, const ch nof->sec = mandoc_strdup(psec); if (NULL != parch) nof->arch = mandoc_strdup(parch); + + /* + * Remember the file name without the extension, + * to be used as the page title in the database. + */ + if (NULL != suffix) *suffix = '\0'; nof->title = mandoc_strdup(fn); -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mandocdb: do not use bogus files 2011-11-14 23:34 ` Ingo Schwarze @ 2011-11-24 10:10 ` Kristaps Dzonsons 0 siblings, 0 replies; 7+ messages in thread From: Kristaps Dzonsons @ 2011-11-24 10:10 UTC (permalink / raw) To: tech On 11/15/11 00:34, Ingo Schwarze wrote: > Hi Kristaps, > > Kristaps Dzonsons wrote on Mon, Nov 14, 2011 at 09:07:13PM +0100: > >> This is a good start, but it needs more documentation as to what's >> happening. You mention man(1) a great deal, but I can't easily find >> an authoritative list of man's behaviour in this regard (not on a >> 4.9 box, anyway). > > I fear that is only documented in /etc/man.conf. > So i kind of see your point that this is all rather fuzzy. > > Below is a patch to avoid references to man(1) and be somewhat > more explicit about what i'm doing (right now, that is...) > > Again, i expect many more changes in this area before the dust > settles, so i'd like to keep the documentation brief for now > lest i have to rewrite it over and over. > > While here, i'd like to reword a few comments that left me > wondering when i tried to understand the code, and i'd like > to fix a minor pasto in an error path. > > Is this OK for you? Ingo, Please check these in when you get the chance! Thanks, Kristaps -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-11-24 10:10 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-11-13 18:42 mandocdb: do not use bogus files Ingo Schwarze 2011-11-13 20:05 ` Kristaps Dzonsons 2011-11-14 0:06 ` Ingo Schwarze 2011-11-14 19:13 ` Ingo Schwarze 2011-11-14 20:07 ` Kristaps Dzonsons 2011-11-14 23:34 ` Ingo Schwarze 2011-11-24 10:10 ` Kristaps Dzonsons
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).