From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailout.scc.kit.edu (mailout.scc.kit.edu [129.13.185.202]); by fantadrom.bsd.lv (OpenSMTPD) with ESMTP id 647c93c8; for ; Wed, 26 Nov 2014 19:12:45 -0500 (EST) Received: from hekate.usta.de (asta-nat.asta.uni-karlsruhe.de [172.22.63.82]) by scc-mailout-02.scc.kit.edu with esmtp (Exim 4.72 #1) id 1XtmhP-0000K9-Iy; Thu, 27 Nov 2014 01:12:43 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1XtmhP-0006ed-Cd; Thu, 27 Nov 2014 01:12:43 +0100 Received: from iris.usta.de ([172.24.96.5] helo=usta.de) by donnerwolke.usta.de with esmtp (Exim 4.72) (envelope-from ) id 1XtmhP-0006ay-AC; Thu, 27 Nov 2014 01:12:43 +0100 Received: from schwarze by usta.de with local (Exim 4.77) (envelope-from ) id 1XtmhP-0004UY-0T; Thu, 27 Nov 2014 01:12:43 +0100 Date: Thu, 27 Nov 2014 01:12:42 +0100 From: Ingo Schwarze To: Baptiste Daroussin Cc: tech@mdocml.bsd.lv Subject: Re: Allow gzipped .so and search .so in manpath Message-ID: <20141127001242.GH26411@iris.usta.de> References: <20141124143629.GB11567@ivaldir.etoilebsd.net> <20141126201033.GG26411@iris.usta.de> <20141126202413.GF82931@ivaldir.etoilebsd.net> X-Mailinglist: mdocml-tech Reply-To: tech@mdocml.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141126202413.GF82931@ivaldir.etoilebsd.net> User-Agent: Mutt/1.5.21 (2010-09-15) Hi Baptiste, Baptiste Daroussin wrote on Wed, Nov 26, 2014 at 09:24:13PM +0100: > I was expecting the above :) I just committed the two following patches, and i believe they resolve the issue. There is still a minor problem in makewhatis(8) with database generation when .so to .gz is involved, but i'll get round to that. Note that -current is at http://fantadrom.bsd.lv/cgi-bin/cvsweb/ CVSROOT=anoncvs@fantadrom.bsd.lv:/cvs today, mdocl.bsd.lv is a few days old. I'm right in the middle of switching the server to new hardware. It will be back at mdocml tomorrow, or in a few days at the latest. > What about using zlib directly, that could simplify a bunch of things. That idea makes sense to me. However, i want to do the 1.13.2 release without adding any new dependencies to help the downstream projects to stabilize and/or migrate to the 1.13 series. Granted, libz is all but heavyweight, and it's hard to imagine there might be anybody out there who doesn't have it, but anyway... I'd rather return to that idea after the 1.13.2 release. > Right now in the ports tree I'm ripping them at package creation > using soelim from groff and I wrote minimalistic version in base > to be able to continue using that when groff will be removed. Sounds reasonable. When dealing with pure .so files, i.e. those containing a single .so request and nothing else, it would even be better to replace them with hard links instead of having two files with the same content - not so much because that saves space, but rather because it allows combining information in mandoc.db(5)/apropos(1). For the user, the end result looks like this: With copies of files: $ apropos getuid geteuid(2) - get user identification getuid(2) - get user identification With hard (or soft or .so) links: $ apropos getuid geteuid, getuid(2) - get user identification The latter not only saves screen real estate, but also helps to realize both are documented in the same manual page. Yours, Ingo Log Message: ----------- Simplify the mparse_open()/mparse_wait() interface. Don't bother the user with the PID of the child process, store it inside the opaque mparse handle. Modified Files: -------------- mdocml: TODO main.c mandoc.3 mandoc.h mandocdb.c read.c Revision Data ------------- Index: mandocdb.c =================================================================== RCS file: /home/cvs/mdocml/mdocml/mandocdb.c,v retrieving revision 1.169 retrieving revision 1.170 diff -Lmandocdb.c -Lmandocdb.c -u -p -r1.169 -r1.170 --- mandocdb.c +++ mandocdb.c @@ -1084,7 +1084,6 @@ mpages_merge(struct mchars *mc, struct m struct man *man; char *sodest; char *cp; - pid_t child_pid; int fd; unsigned int pslot; enum mandoclevel lvl; @@ -1112,9 +1111,8 @@ mpages_merge(struct mchars *mc, struct m mdoc = NULL; man = NULL; sodest = NULL; - child_pid = 0; - mparse_open(mp, &fd, mpage->mlinks->file, &child_pid); + mparse_open(mp, &fd, mpage->mlinks->file); if (fd == -1) { say(mpage->mlinks->file, "&open"); goto nextpage; @@ -1231,8 +1229,7 @@ mpages_merge(struct mchars *mc, struct m dbadd(mpage, mc); nextpage: - if (child_pid && - mparse_wait(mp, child_pid) != MANDOCLEVEL_OK) { + if (mparse_wait(mp) != MANDOCLEVEL_OK) { exitcode = (int)MANDOCLEVEL_SYSERR; say(mpage->mlinks->file, "&wait gunzip"); } Index: mandoc.h =================================================================== RCS file: /home/cvs/mdocml/mdocml/mandoc.h,v retrieving revision 1.168 retrieving revision 1.169 diff -Lmandoc.h -Lmandoc.h -u -p -r1.168 -r1.169 --- mandoc.h +++ mandoc.h @@ -436,8 +436,7 @@ struct mparse *mparse_alloc(int, enum m const struct mchars *, const char *); void mparse_free(struct mparse *); void mparse_keep(struct mparse *); -enum mandoclevel mparse_open(struct mparse *, int *, const char *, - pid_t *); +enum mandoclevel mparse_open(struct mparse *, int *, const char *); enum mandoclevel mparse_readfd(struct mparse *, int, const char *); enum mandoclevel mparse_readmem(struct mparse *, const void *, size_t, const char *); @@ -447,7 +446,7 @@ void mparse_result(struct mparse *, const char *mparse_getkeep(const struct mparse *); const char *mparse_strerror(enum mandocerr); const char *mparse_strlevel(enum mandoclevel); -enum mandoclevel mparse_wait(struct mparse *, pid_t); +enum mandoclevel mparse_wait(struct mparse *); __END_DECLS Index: TODO =================================================================== RCS file: /home/cvs/mdocml/mdocml/TODO,v retrieving revision 1.188 retrieving revision 1.189 diff -LTODO -LTODO -u -p -r1.188 -r1.189 --- TODO +++ TODO @@ -564,6 +564,9 @@ Several areas can be cleaned up to make * structural issues ************************************************************************ +- Use libz directly instead of forking gunzip(1). + Suggested by bapt at FreeBSD among others. + - We use the input line number at several places to distinguish same-line from different-line input. That plainly doesn't work with user-defined macros, leading to random breakage. Index: read.c =================================================================== RCS file: /home/cvs/mdocml/mdocml/read.c,v retrieving revision 1.96 retrieving revision 1.97 diff -Lread.c -Lread.c -u -p -r1.96 -r1.97 --- read.c +++ read.c @@ -64,6 +64,7 @@ struct mparse { int filenc; /* encoding of the current file */ int reparse_count; /* finite interp. stack */ int line; /* line number in the file */ + pid_t child; /* the gunzip(1) process */ }; static void choose_parser(struct mparse *); @@ -823,8 +824,7 @@ mparse_readfd(struct mparse *curp, int f } enum mandoclevel -mparse_open(struct mparse *curp, int *fd, const char *file, - pid_t *child_pid) +mparse_open(struct mparse *curp, int *fd, const char *file) { int pfd[2]; char *cp; @@ -834,7 +834,7 @@ mparse_open(struct mparse *curp, int *fd curp->file = file; if ((cp = strrchr(file, '.')) == NULL || strcmp(cp + 1, "gz")) { - *child_pid = 0; + curp->child = 0; if ((*fd = open(file, O_RDONLY)) == -1) { err = MANDOCERR_SYSOPEN; goto out; @@ -847,7 +847,7 @@ mparse_open(struct mparse *curp, int *fd goto out; } - switch (*child_pid = fork()) { + switch (curp->child = fork()) { case -1: err = MANDOCERR_SYSFORK; close(pfd[0]); @@ -871,7 +871,7 @@ mparse_open(struct mparse *curp, int *fd out: *fd = -1; - *child_pid = 0; + curp->child = 0; curp->file_status = MANDOCLEVEL_SYSERR; if (curp->mmsg) (*curp->mmsg)(err, curp->file_status, file, @@ -882,11 +882,14 @@ out: } enum mandoclevel -mparse_wait(struct mparse *curp, pid_t child_pid) +mparse_wait(struct mparse *curp) { int status; - if (waitpid(child_pid, &status, 0) == -1) { + if (curp->child == 0) + return(MANDOCLEVEL_OK); + + if (waitpid(curp->child, &status, 0) == -1) { mandoc_msg(MANDOCERR_SYSWAIT, curp, 0, 0, strerror(errno)); curp->file_status = MANDOCLEVEL_SYSERR; Index: mandoc.3 =================================================================== RCS file: /home/cvs/mdocml/mdocml/mandoc.3,v retrieving revision 1.27 retrieving revision 1.28 diff -Lmandoc.3 -Lmandoc.3 -u -p -r1.27 -r1.28 --- mandoc.3 +++ mandoc.3 @@ -81,7 +81,6 @@ .Fa "struct mparse *parse" .Fa "int *fd" .Fa "const char *fname" -.Fa "pid_t *child_pid" .Fc .Ft "enum mandoclevel" .Fo mparse_readfd @@ -111,7 +110,6 @@ .Ft "enum mandoclevel" .Fo mparse_wait .Fa "struct mparse *parse" -.Fa "pid_t child_pid" .Fc .In sys/types.h .In mandoc.h @@ -404,14 +402,6 @@ or -1 on failure. It can be passed to .Fn mparse_readfd or used directly. -If applicable, return the -.Xr gunzip 1 -child process ID in -.Fa child_pid , -or otherwise 0. -If non-zero, it should be passed to -.Fn mparse_wait -after completing the parse sequence. Declared in .In mandoc.h , implemented in Index: main.c =================================================================== RCS file: /home/cvs/mdocml/mdocml/main.c,v retrieving revision 1.199 retrieving revision 1.200 diff -Lmain.c -Lmain.c -u -p -r1.199 -r1.200 --- main.c +++ main.c @@ -115,7 +115,6 @@ main(int argc, char *argv[]) #endif enum mandoclevel rc; enum outmode outmode; - pid_t child_pid; int fd; int show_usage; int use_pager; @@ -388,8 +387,7 @@ main(int argc, char *argv[]) while (argc) { #if HAVE_SQLITE3 if (resp != NULL) { - rc = mparse_open(curp.mp, &fd, resp->file, - &child_pid); + rc = mparse_open(curp.mp, &fd, resp->file); if (fd == -1) /* nothing */; else if (resp->form & FORM_SRC) { @@ -403,14 +401,12 @@ main(int argc, char *argv[]) } else #endif { - rc = mparse_open(curp.mp, &fd, *argv++, - &child_pid); + rc = mparse_open(curp.mp, &fd, *argv++); if (fd != -1) parse(&curp, fd, argv[-1], &rc); } - if (child_pid && - mparse_wait(curp.mp, child_pid) != MANDOCLEVEL_OK) + if (mparse_wait(curp.mp) != MANDOCLEVEL_OK) rc = MANDOCLEVEL_SYSERR; if (MANDOCLEVEL_OK != rc && curp.wstop) Log Message: ----------- Let mparse_readfd() use mparse_open() and mparse_wait() and let mparse_open() fall back to .gz files such that .so works even when the target is zipped, requested by and in part using ideas from . While here, make sure files are readable before forking, both for efficiency and for better error reporting. Modified Files: -------------- mdocml: mandoc.3 read.c Revision Data ------------- Index: read.c =================================================================== RCS file: /home/cvs/mdocml/mdocml/read.c,v retrieving revision 1.97 retrieving revision 1.98 diff -Lread.c -Lread.c -u -p -r1.97 -r1.98 --- read.c +++ read.c @@ -780,28 +780,25 @@ mparse_readmem(struct mparse *curp, cons return(curp->file_status); } +/* + * If a file descriptor is given, use it and assume it points + * to the named file. Otherwise, open the named file. + * Read the whole file into memory and call the parsers. + * Called recursively when an .so request is encountered. + */ enum mandoclevel mparse_readfd(struct mparse *curp, int fd, const char *file) { struct buf blk; int with_mmap; int save_filenc; + pid_t save_child; - if (-1 == fd && -1 == (fd = open(file, O_RDONLY, 0))) { - curp->file_status = MANDOCLEVEL_SYSERR; - if (curp->mmsg) - (*curp->mmsg)(MANDOCERR_SYSOPEN, - curp->file_status, - file, 0, 0, strerror(errno)); - return(curp->file_status); - } - - /* - * Run for each opened file; may be called more than once for - * each full parse sequence if the opened file is nested (i.e., - * from `so'). Simply sucks in the whole file and moves into - * the parse phase for the file. - */ + save_child = curp->child; + if (fd != -1) + curp->child = 0; + else if (mparse_open(curp, &fd, file) >= MANDOCLEVEL_SYSERR) + goto out; if (read_whole_file(curp, file, fd, &blk, &with_mmap)) { save_filenc = curp->filenc; @@ -817,9 +814,12 @@ mparse_readfd(struct mparse *curp, int f free(blk.buf); } - if (STDIN_FILENO != fd && -1 == close(fd)) + if (fd != STDIN_FILENO && close(fd) == -1) perror(file); + mparse_wait(curp); +out: + curp->child = save_child; return(curp->file_status); } @@ -827,21 +827,40 @@ enum mandoclevel mparse_open(struct mparse *curp, int *fd, const char *file) { int pfd[2]; + int save_errno; char *cp; enum mandocerr err; pfd[1] = -1; curp->file = file; + + /* Unless zipped, try to just open the file. */ + if ((cp = strrchr(file, '.')) == NULL || strcmp(cp + 1, "gz")) { curp->child = 0; - if ((*fd = open(file, O_RDONLY)) == -1) { - err = MANDOCERR_SYSOPEN; - goto out; - } - return(MANDOCLEVEL_OK); + if ((*fd = open(file, O_RDONLY)) != -1) + return(MANDOCLEVEL_OK); + + /* Open failed; try to append ".gz". */ + + mandoc_asprintf(&cp, "%s.gz", file); + file = cp; + } else + cp = NULL; + + /* Before forking, make sure the file can be read. */ + + save_errno = errno; + if (access(file, R_OK) == -1) { + if (cp != NULL) + errno = save_errno; + err = MANDOCERR_SYSOPEN; + goto out; } + /* Run gunzip(1). */ + if (pipe(pfd) == -1) { err = MANDOCERR_SYSPIPE; goto out; @@ -870,11 +889,12 @@ mparse_open(struct mparse *curp, int *fd } out: + free(cp); *fd = -1; curp->child = 0; curp->file_status = MANDOCLEVEL_SYSERR; if (curp->mmsg) - (*curp->mmsg)(err, curp->file_status, file, + (*curp->mmsg)(err, curp->file_status, curp->file, 0, 0, strerror(errno)); if (pfd[1] != -1) exit(1); Index: mandoc.3 =================================================================== RCS file: /home/cvs/mdocml/mdocml/mandoc.3,v retrieving revision 1.28 retrieving revision 1.29 diff -Lmandoc.3 -Lmandoc.3 -u -p -r1.28 -r1.29 --- mandoc.3 +++ mandoc.3 @@ -396,6 +396,12 @@ open with .Xr gunzip 1 ; otherwise, with .Xr open 2 . +If +.Xr open 2 +fails, append +.Pa .gz +and try with +.Xr gunzip 1 . Return a file descriptor open for reading in .Fa fd , or -1 on failure. @@ -410,14 +416,18 @@ implemented in Parse a file or file descriptor. If .Va fd -is -1, +is -1, open .Va fname -is opened for reading. +with +.Fn mparse_open . Otherwise, .Va fname is assumed to be the name associated with .Va fd . -This may be called multiple times with different parameters; however, +Calls +.Fn mparse_wait +before returning. +This function may be called multiple times with different parameters; however, .Fn mparse_reset should be invoked between parses. Declared in @@ -461,11 +471,12 @@ implemented in .It Fn mparse_wait Bury a .Xr gunzip 1 -child process -.Fa child_pid -that was spawned with +child process that was spawned with .Fn mparse_open . To be called after the parse sequence is complete. +Not needed after +.Fn mparse_readfd , +but does no harm in that case, either. Returns .Dv MANDOCLEVEL_OK on success and -- To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv