From: Ingo Schwarze <schwarze@usta.de>
To: Baptiste Daroussin <bapt@FreeBSD.org>
Cc: tech@mdocml.bsd.lv
Subject: Re: Allow gzipped .so and search .so in manpath
Date: Thu, 27 Nov 2014 01:12:42 +0100 [thread overview]
Message-ID: <20141127001242.GH26411@iris.usta.de> (raw)
In-Reply-To: <20141126202413.GF82931@ivaldir.etoilebsd.net>
Hi Baptiste,
Baptiste Daroussin wrote on Wed, Nov 26, 2014 at 09:24:13PM +0100:
> I was expecting the above :)
I just committed the two following patches, and i believe they
resolve the issue.
There is still a minor problem in makewhatis(8) with database
generation when .so to .gz is involved, but i'll get round to that.
Note that -current is at
http://fantadrom.bsd.lv/cgi-bin/cvsweb/
CVSROOT=anoncvs@fantadrom.bsd.lv:/cvs
today, mdocl.bsd.lv is a few days old. I'm right in the middle of
switching the server to new hardware. It will be back at mdocml
tomorrow, or in a few days at the latest.
> What about using zlib directly, that could simplify a bunch of things.
That idea makes sense to me. However, i want to do the 1.13.2
release without adding any new dependencies to help the downstream
projects to stabilize and/or migrate to the 1.13 series. Granted,
libz is all but heavyweight, and it's hard to imagine there might
be anybody out there who doesn't have it, but anyway...
I'd rather return to that idea after the 1.13.2 release.
> Right now in the ports tree I'm ripping them at package creation
> using soelim from groff and I wrote minimalistic version in base
> to be able to continue using that when groff will be removed.
Sounds reasonable.
When dealing with pure .so files, i.e. those containing a single .so
request and nothing else, it would even be better to replace them with
hard links instead of having two files with the same content - not so
much because that saves space, but rather because it allows combining
information in mandoc.db(5)/apropos(1).
For the user, the end result looks like this:
With copies of files:
$ apropos getuid
geteuid(2) - get user identification
getuid(2) - get user identification
With hard (or soft or .so) links:
$ apropos getuid
geteuid, getuid(2) - get user identification
The latter not only saves screen real estate, but also helps to
realize both are documented in the same manual page.
Yours,
Ingo
Log Message:
-----------
Simplify the mparse_open()/mparse_wait() interface.
Don't bother the user with the PID of the child process,
store it inside the opaque mparse handle.
Modified Files:
--------------
mdocml:
TODO
main.c
mandoc.3
mandoc.h
mandocdb.c
read.c
Revision Data
-------------
Index: mandocdb.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/mandocdb.c,v
retrieving revision 1.169
retrieving revision 1.170
diff -Lmandocdb.c -Lmandocdb.c -u -p -r1.169 -r1.170
--- mandocdb.c
+++ mandocdb.c
@@ -1084,7 +1084,6 @@ mpages_merge(struct mchars *mc, struct m
struct man *man;
char *sodest;
char *cp;
- pid_t child_pid;
int fd;
unsigned int pslot;
enum mandoclevel lvl;
@@ -1112,9 +1111,8 @@ mpages_merge(struct mchars *mc, struct m
mdoc = NULL;
man = NULL;
sodest = NULL;
- child_pid = 0;
- mparse_open(mp, &fd, mpage->mlinks->file, &child_pid);
+ mparse_open(mp, &fd, mpage->mlinks->file);
if (fd == -1) {
say(mpage->mlinks->file, "&open");
goto nextpage;
@@ -1231,8 +1229,7 @@ mpages_merge(struct mchars *mc, struct m
dbadd(mpage, mc);
nextpage:
- if (child_pid &&
- mparse_wait(mp, child_pid) != MANDOCLEVEL_OK) {
+ if (mparse_wait(mp) != MANDOCLEVEL_OK) {
exitcode = (int)MANDOCLEVEL_SYSERR;
say(mpage->mlinks->file, "&wait gunzip");
}
Index: mandoc.h
===================================================================
RCS file: /home/cvs/mdocml/mdocml/mandoc.h,v
retrieving revision 1.168
retrieving revision 1.169
diff -Lmandoc.h -Lmandoc.h -u -p -r1.168 -r1.169
--- mandoc.h
+++ mandoc.h
@@ -436,8 +436,7 @@ struct mparse *mparse_alloc(int, enum m
const struct mchars *, const char *);
void mparse_free(struct mparse *);
void mparse_keep(struct mparse *);
-enum mandoclevel mparse_open(struct mparse *, int *, const char *,
- pid_t *);
+enum mandoclevel mparse_open(struct mparse *, int *, const char *);
enum mandoclevel mparse_readfd(struct mparse *, int, const char *);
enum mandoclevel mparse_readmem(struct mparse *, const void *, size_t,
const char *);
@@ -447,7 +446,7 @@ void mparse_result(struct mparse *,
const char *mparse_getkeep(const struct mparse *);
const char *mparse_strerror(enum mandocerr);
const char *mparse_strlevel(enum mandoclevel);
-enum mandoclevel mparse_wait(struct mparse *, pid_t);
+enum mandoclevel mparse_wait(struct mparse *);
__END_DECLS
Index: TODO
===================================================================
RCS file: /home/cvs/mdocml/mdocml/TODO,v
retrieving revision 1.188
retrieving revision 1.189
diff -LTODO -LTODO -u -p -r1.188 -r1.189
--- TODO
+++ TODO
@@ -564,6 +564,9 @@ Several areas can be cleaned up to make
* structural issues
************************************************************************
+- Use libz directly instead of forking gunzip(1).
+ Suggested by bapt at FreeBSD among others.
+
- We use the input line number at several places to distinguish
same-line from different-line input. That plainly doesn't work
with user-defined macros, leading to random breakage.
Index: read.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/read.c,v
retrieving revision 1.96
retrieving revision 1.97
diff -Lread.c -Lread.c -u -p -r1.96 -r1.97
--- read.c
+++ read.c
@@ -64,6 +64,7 @@ struct mparse {
int filenc; /* encoding of the current file */
int reparse_count; /* finite interp. stack */
int line; /* line number in the file */
+ pid_t child; /* the gunzip(1) process */
};
static void choose_parser(struct mparse *);
@@ -823,8 +824,7 @@ mparse_readfd(struct mparse *curp, int f
}
enum mandoclevel
-mparse_open(struct mparse *curp, int *fd, const char *file,
- pid_t *child_pid)
+mparse_open(struct mparse *curp, int *fd, const char *file)
{
int pfd[2];
char *cp;
@@ -834,7 +834,7 @@ mparse_open(struct mparse *curp, int *fd
curp->file = file;
if ((cp = strrchr(file, '.')) == NULL ||
strcmp(cp + 1, "gz")) {
- *child_pid = 0;
+ curp->child = 0;
if ((*fd = open(file, O_RDONLY)) == -1) {
err = MANDOCERR_SYSOPEN;
goto out;
@@ -847,7 +847,7 @@ mparse_open(struct mparse *curp, int *fd
goto out;
}
- switch (*child_pid = fork()) {
+ switch (curp->child = fork()) {
case -1:
err = MANDOCERR_SYSFORK;
close(pfd[0]);
@@ -871,7 +871,7 @@ mparse_open(struct mparse *curp, int *fd
out:
*fd = -1;
- *child_pid = 0;
+ curp->child = 0;
curp->file_status = MANDOCLEVEL_SYSERR;
if (curp->mmsg)
(*curp->mmsg)(err, curp->file_status, file,
@@ -882,11 +882,14 @@ out:
}
enum mandoclevel
-mparse_wait(struct mparse *curp, pid_t child_pid)
+mparse_wait(struct mparse *curp)
{
int status;
- if (waitpid(child_pid, &status, 0) == -1) {
+ if (curp->child == 0)
+ return(MANDOCLEVEL_OK);
+
+ if (waitpid(curp->child, &status, 0) == -1) {
mandoc_msg(MANDOCERR_SYSWAIT, curp, 0, 0,
strerror(errno));
curp->file_status = MANDOCLEVEL_SYSERR;
Index: mandoc.3
===================================================================
RCS file: /home/cvs/mdocml/mdocml/mandoc.3,v
retrieving revision 1.27
retrieving revision 1.28
diff -Lmandoc.3 -Lmandoc.3 -u -p -r1.27 -r1.28
--- mandoc.3
+++ mandoc.3
@@ -81,7 +81,6 @@
.Fa "struct mparse *parse"
.Fa "int *fd"
.Fa "const char *fname"
-.Fa "pid_t *child_pid"
.Fc
.Ft "enum mandoclevel"
.Fo mparse_readfd
@@ -111,7 +110,6 @@
.Ft "enum mandoclevel"
.Fo mparse_wait
.Fa "struct mparse *parse"
-.Fa "pid_t child_pid"
.Fc
.In sys/types.h
.In mandoc.h
@@ -404,14 +402,6 @@ or -1 on failure.
It can be passed to
.Fn mparse_readfd
or used directly.
-If applicable, return the
-.Xr gunzip 1
-child process ID in
-.Fa child_pid ,
-or otherwise 0.
-If non-zero, it should be passed to
-.Fn mparse_wait
-after completing the parse sequence.
Declared in
.In mandoc.h ,
implemented in
Index: main.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/main.c,v
retrieving revision 1.199
retrieving revision 1.200
diff -Lmain.c -Lmain.c -u -p -r1.199 -r1.200
--- main.c
+++ main.c
@@ -115,7 +115,6 @@ main(int argc, char *argv[])
#endif
enum mandoclevel rc;
enum outmode outmode;
- pid_t child_pid;
int fd;
int show_usage;
int use_pager;
@@ -388,8 +387,7 @@ main(int argc, char *argv[])
while (argc) {
#if HAVE_SQLITE3
if (resp != NULL) {
- rc = mparse_open(curp.mp, &fd, resp->file,
- &child_pid);
+ rc = mparse_open(curp.mp, &fd, resp->file);
if (fd == -1)
/* nothing */;
else if (resp->form & FORM_SRC) {
@@ -403,14 +401,12 @@ main(int argc, char *argv[])
} else
#endif
{
- rc = mparse_open(curp.mp, &fd, *argv++,
- &child_pid);
+ rc = mparse_open(curp.mp, &fd, *argv++);
if (fd != -1)
parse(&curp, fd, argv[-1], &rc);
}
- if (child_pid &&
- mparse_wait(curp.mp, child_pid) != MANDOCLEVEL_OK)
+ if (mparse_wait(curp.mp) != MANDOCLEVEL_OK)
rc = MANDOCLEVEL_SYSERR;
if (MANDOCLEVEL_OK != rc && curp.wstop)
Log Message:
-----------
Let mparse_readfd() use mparse_open() and mparse_wait()
and let mparse_open() fall back to .gz files
such that .so works even when the target is zipped,
requested by and in part using ideas from <bapt at FreeBSD>.
While here, make sure files are readable before forking,
both for efficiency and for better error reporting.
Modified Files:
--------------
mdocml:
mandoc.3
read.c
Revision Data
-------------
Index: read.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/read.c,v
retrieving revision 1.97
retrieving revision 1.98
diff -Lread.c -Lread.c -u -p -r1.97 -r1.98
--- read.c
+++ read.c
@@ -780,28 +780,25 @@ mparse_readmem(struct mparse *curp, cons
return(curp->file_status);
}
+/*
+ * If a file descriptor is given, use it and assume it points
+ * to the named file. Otherwise, open the named file.
+ * Read the whole file into memory and call the parsers.
+ * Called recursively when an .so request is encountered.
+ */
enum mandoclevel
mparse_readfd(struct mparse *curp, int fd, const char *file)
{
struct buf blk;
int with_mmap;
int save_filenc;
+ pid_t save_child;
- if (-1 == fd && -1 == (fd = open(file, O_RDONLY, 0))) {
- curp->file_status = MANDOCLEVEL_SYSERR;
- if (curp->mmsg)
- (*curp->mmsg)(MANDOCERR_SYSOPEN,
- curp->file_status,
- file, 0, 0, strerror(errno));
- return(curp->file_status);
- }
-
- /*
- * Run for each opened file; may be called more than once for
- * each full parse sequence if the opened file is nested (i.e.,
- * from `so'). Simply sucks in the whole file and moves into
- * the parse phase for the file.
- */
+ save_child = curp->child;
+ if (fd != -1)
+ curp->child = 0;
+ else if (mparse_open(curp, &fd, file) >= MANDOCLEVEL_SYSERR)
+ goto out;
if (read_whole_file(curp, file, fd, &blk, &with_mmap)) {
save_filenc = curp->filenc;
@@ -817,9 +814,12 @@ mparse_readfd(struct mparse *curp, int f
free(blk.buf);
}
- if (STDIN_FILENO != fd && -1 == close(fd))
+ if (fd != STDIN_FILENO && close(fd) == -1)
perror(file);
+ mparse_wait(curp);
+out:
+ curp->child = save_child;
return(curp->file_status);
}
@@ -827,21 +827,40 @@ enum mandoclevel
mparse_open(struct mparse *curp, int *fd, const char *file)
{
int pfd[2];
+ int save_errno;
char *cp;
enum mandocerr err;
pfd[1] = -1;
curp->file = file;
+
+ /* Unless zipped, try to just open the file. */
+
if ((cp = strrchr(file, '.')) == NULL ||
strcmp(cp + 1, "gz")) {
curp->child = 0;
- if ((*fd = open(file, O_RDONLY)) == -1) {
- err = MANDOCERR_SYSOPEN;
- goto out;
- }
- return(MANDOCLEVEL_OK);
+ if ((*fd = open(file, O_RDONLY)) != -1)
+ return(MANDOCLEVEL_OK);
+
+ /* Open failed; try to append ".gz". */
+
+ mandoc_asprintf(&cp, "%s.gz", file);
+ file = cp;
+ } else
+ cp = NULL;
+
+ /* Before forking, make sure the file can be read. */
+
+ save_errno = errno;
+ if (access(file, R_OK) == -1) {
+ if (cp != NULL)
+ errno = save_errno;
+ err = MANDOCERR_SYSOPEN;
+ goto out;
}
+ /* Run gunzip(1). */
+
if (pipe(pfd) == -1) {
err = MANDOCERR_SYSPIPE;
goto out;
@@ -870,11 +889,12 @@ mparse_open(struct mparse *curp, int *fd
}
out:
+ free(cp);
*fd = -1;
curp->child = 0;
curp->file_status = MANDOCLEVEL_SYSERR;
if (curp->mmsg)
- (*curp->mmsg)(err, curp->file_status, file,
+ (*curp->mmsg)(err, curp->file_status, curp->file,
0, 0, strerror(errno));
if (pfd[1] != -1)
exit(1);
Index: mandoc.3
===================================================================
RCS file: /home/cvs/mdocml/mdocml/mandoc.3,v
retrieving revision 1.28
retrieving revision 1.29
diff -Lmandoc.3 -Lmandoc.3 -u -p -r1.28 -r1.29
--- mandoc.3
+++ mandoc.3
@@ -396,6 +396,12 @@ open with
.Xr gunzip 1 ;
otherwise, with
.Xr open 2 .
+If
+.Xr open 2
+fails, append
+.Pa .gz
+and try with
+.Xr gunzip 1 .
Return a file descriptor open for reading in
.Fa fd ,
or -1 on failure.
@@ -410,14 +416,18 @@ implemented in
Parse a file or file descriptor.
If
.Va fd
-is -1,
+is -1, open
.Va fname
-is opened for reading.
+with
+.Fn mparse_open .
Otherwise,
.Va fname
is assumed to be the name associated with
.Va fd .
-This may be called multiple times with different parameters; however,
+Calls
+.Fn mparse_wait
+before returning.
+This function may be called multiple times with different parameters; however,
.Fn mparse_reset
should be invoked between parses.
Declared in
@@ -461,11 +471,12 @@ implemented in
.It Fn mparse_wait
Bury a
.Xr gunzip 1
-child process
-.Fa child_pid
-that was spawned with
+child process that was spawned with
.Fn mparse_open .
To be called after the parse sequence is complete.
+Not needed after
+.Fn mparse_readfd ,
+but does no harm in that case, either.
Returns
.Dv MANDOCLEVEL_OK
on success and
--
To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv
next prev parent reply other threads:[~2014-11-27 0:12 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-24 14:36 Baptiste Daroussin
2014-11-26 20:10 ` Ingo Schwarze
2014-11-26 20:24 ` Baptiste Daroussin
2014-11-27 0:12 ` Ingo Schwarze [this message]
2014-11-27 2:03 ` Ingo Schwarze
2014-11-29 17:54 ` Baptiste Daroussin
2014-11-29 18:14 ` Ingo Schwarze
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141127001242.GH26411@iris.usta.de \
--to=schwarze@usta.de \
--cc=bapt@FreeBSD.org \
--cc=tech@mdocml.bsd.lv \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).