tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
From: Ingo Schwarze <schwarze@usta.de>
To: Baptiste Daroussin <bapt@FreeBSD.org>
Cc: tech@mdocml.bsd.lv
Subject: Re: Allow gzipped .so and search .so in manpath
Date: Thu, 27 Nov 2014 01:12:42 +0100	[thread overview]
Message-ID: <20141127001242.GH26411@iris.usta.de> (raw)
In-Reply-To: <20141126202413.GF82931@ivaldir.etoilebsd.net>

Hi Baptiste,

Baptiste Daroussin wrote on Wed, Nov 26, 2014 at 09:24:13PM +0100:

> I was expecting the above :)

I just committed the two following patches, and i believe they
resolve the issue.

There is still a minor problem in makewhatis(8) with database
generation when .so to .gz is involved, but i'll get round to that.

Note that -current is at

  http://fantadrom.bsd.lv/cgi-bin/cvsweb/
  CVSROOT=anoncvs@fantadrom.bsd.lv:/cvs

today, mdocl.bsd.lv is a few days old.  I'm right in the middle of
switching the server to new hardware.  It will be back at mdocml
tomorrow, or in a few days at the latest.

> What about using zlib directly, that could simplify a bunch of things.

That idea makes sense to me.  However, i want to do the 1.13.2
release without adding any new dependencies to help the downstream
projects to stabilize and/or migrate to the 1.13 series.  Granted,
libz is all but heavyweight, and it's hard to imagine there might
be anybody out there who doesn't have it, but anyway...
I'd rather return to that idea after the 1.13.2 release.

> Right now in the ports tree I'm ripping them at package creation
> using soelim from groff and I wrote minimalistic version in base
> to be able to continue using that when groff will be removed.

Sounds reasonable.

When dealing with pure .so files, i.e. those containing a single .so
request and nothing else, it would even be better to replace them with
hard links instead of having two files with the same content - not so
much because that saves space, but rather because it allows combining
information in mandoc.db(5)/apropos(1).

For the user, the end result looks like this:

With copies of files:

   $ apropos getuid
  geteuid(2) - get user identification
  getuid(2) - get user identification

With hard (or soft or .so) links:

   $ apropos getuid
  geteuid, getuid(2) - get user identification

The latter not only saves screen real estate, but also helps to
realize both are documented in the same manual page.

Yours,
  Ingo


Log Message:
-----------
Simplify the mparse_open()/mparse_wait() interface.
Don't bother the user with the PID of the child process,
store it inside the opaque mparse handle.

Modified Files:
--------------
    mdocml:
        TODO
        main.c
        mandoc.3
        mandoc.h
        mandocdb.c
        read.c

Revision Data
-------------
Index: mandocdb.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/mandocdb.c,v
retrieving revision 1.169
retrieving revision 1.170
diff -Lmandocdb.c -Lmandocdb.c -u -p -r1.169 -r1.170
--- mandocdb.c
+++ mandocdb.c
@@ -1084,7 +1084,6 @@ mpages_merge(struct mchars *mc, struct m
 	struct man		*man;
 	char			*sodest;
 	char			*cp;
-	pid_t			 child_pid;
 	int			 fd;
 	unsigned int		 pslot;
 	enum mandoclevel	 lvl;
@@ -1112,9 +1111,8 @@ mpages_merge(struct mchars *mc, struct m
 		mdoc = NULL;
 		man = NULL;
 		sodest = NULL;
-		child_pid = 0;
 
-		mparse_open(mp, &fd, mpage->mlinks->file, &child_pid);
+		mparse_open(mp, &fd, mpage->mlinks->file);
 		if (fd == -1) {
 			say(mpage->mlinks->file, "&open");
 			goto nextpage;
@@ -1231,8 +1229,7 @@ mpages_merge(struct mchars *mc, struct m
 		dbadd(mpage, mc);
 
 nextpage:
-		if (child_pid &&
-		    mparse_wait(mp, child_pid) != MANDOCLEVEL_OK) {
+		if (mparse_wait(mp) != MANDOCLEVEL_OK) {
 			exitcode = (int)MANDOCLEVEL_SYSERR;
 			say(mpage->mlinks->file, "&wait gunzip");
 		}
Index: mandoc.h
===================================================================
RCS file: /home/cvs/mdocml/mdocml/mandoc.h,v
retrieving revision 1.168
retrieving revision 1.169
diff -Lmandoc.h -Lmandoc.h -u -p -r1.168 -r1.169
--- mandoc.h
+++ mandoc.h
@@ -436,8 +436,7 @@ struct mparse	 *mparse_alloc(int, enum m
 			const struct mchars *, const char *);
 void		  mparse_free(struct mparse *);
 void		  mparse_keep(struct mparse *);
-enum mandoclevel  mparse_open(struct mparse *, int *, const char *,
-			pid_t *);
+enum mandoclevel  mparse_open(struct mparse *, int *, const char *);
 enum mandoclevel  mparse_readfd(struct mparse *, int, const char *);
 enum mandoclevel  mparse_readmem(struct mparse *, const void *, size_t,
 			const char *);
@@ -447,7 +446,7 @@ void		  mparse_result(struct mparse *,
 const char	 *mparse_getkeep(const struct mparse *);
 const char	 *mparse_strerror(enum mandocerr);
 const char	 *mparse_strlevel(enum mandoclevel);
-enum mandoclevel  mparse_wait(struct mparse *, pid_t);
+enum mandoclevel  mparse_wait(struct mparse *);
 
 __END_DECLS
 
Index: TODO
===================================================================
RCS file: /home/cvs/mdocml/mdocml/TODO,v
retrieving revision 1.188
retrieving revision 1.189
diff -LTODO -LTODO -u -p -r1.188 -r1.189
--- TODO
+++ TODO
@@ -564,6 +564,9 @@ Several areas can be cleaned up to make 
 * structural issues
 ************************************************************************
 
+- Use libz directly instead of forking gunzip(1).
+  Suggested by bapt at FreeBSD among others.
+
 - We use the input line number at several places to distinguish
   same-line from different-line input.  That plainly doesn't work
   with user-defined macros, leading to random breakage.
Index: read.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/read.c,v
retrieving revision 1.96
retrieving revision 1.97
diff -Lread.c -Lread.c -u -p -r1.96 -r1.97
--- read.c
+++ read.c
@@ -64,6 +64,7 @@ struct	mparse {
 	int		  filenc; /* encoding of the current file */
 	int		  reparse_count; /* finite interp. stack */
 	int		  line; /* line number in the file */
+	pid_t		  child; /* the gunzip(1) process */
 };
 
 static	void	  choose_parser(struct mparse *);
@@ -823,8 +824,7 @@ mparse_readfd(struct mparse *curp, int f
 }
 
 enum mandoclevel
-mparse_open(struct mparse *curp, int *fd, const char *file,
-	pid_t *child_pid)
+mparse_open(struct mparse *curp, int *fd, const char *file)
 {
 	int		  pfd[2];
 	char		 *cp;
@@ -834,7 +834,7 @@ mparse_open(struct mparse *curp, int *fd
 	curp->file = file;
 	if ((cp = strrchr(file, '.')) == NULL ||
 	    strcmp(cp + 1, "gz")) {
-		*child_pid = 0;
+		curp->child = 0;
 		if ((*fd = open(file, O_RDONLY)) == -1) {
 			err = MANDOCERR_SYSOPEN;
 			goto out;
@@ -847,7 +847,7 @@ mparse_open(struct mparse *curp, int *fd
 		goto out;
 	}
 
-	switch (*child_pid = fork()) {
+	switch (curp->child = fork()) {
 	case -1:
 		err = MANDOCERR_SYSFORK;
 		close(pfd[0]);
@@ -871,7 +871,7 @@ mparse_open(struct mparse *curp, int *fd
 
 out:
 	*fd = -1;
-	*child_pid = 0;
+	curp->child = 0;
 	curp->file_status = MANDOCLEVEL_SYSERR;
 	if (curp->mmsg)
 		(*curp->mmsg)(err, curp->file_status, file,
@@ -882,11 +882,14 @@ out:
 }
 
 enum mandoclevel
-mparse_wait(struct mparse *curp, pid_t child_pid)
+mparse_wait(struct mparse *curp)
 {
 	int	  status;
 
-	if (waitpid(child_pid, &status, 0) == -1) {
+	if (curp->child == 0)
+		return(MANDOCLEVEL_OK);
+
+	if (waitpid(curp->child, &status, 0) == -1) {
 		mandoc_msg(MANDOCERR_SYSWAIT, curp, 0, 0,
 		    strerror(errno));
 		curp->file_status = MANDOCLEVEL_SYSERR;
Index: mandoc.3
===================================================================
RCS file: /home/cvs/mdocml/mdocml/mandoc.3,v
retrieving revision 1.27
retrieving revision 1.28
diff -Lmandoc.3 -Lmandoc.3 -u -p -r1.27 -r1.28
--- mandoc.3
+++ mandoc.3
@@ -81,7 +81,6 @@
 .Fa "struct mparse *parse"
 .Fa "int *fd"
 .Fa "const char *fname"
-.Fa "pid_t *child_pid"
 .Fc
 .Ft "enum mandoclevel"
 .Fo mparse_readfd
@@ -111,7 +110,6 @@
 .Ft "enum mandoclevel"
 .Fo mparse_wait
 .Fa "struct mparse *parse"
-.Fa "pid_t child_pid"
 .Fc
 .In sys/types.h
 .In mandoc.h
@@ -404,14 +402,6 @@ or -1 on failure.
 It can be passed to
 .Fn mparse_readfd
 or used directly.
-If applicable, return the
-.Xr gunzip 1
-child process ID in
-.Fa child_pid ,
-or otherwise 0.
-If non-zero, it should be passed to
-.Fn mparse_wait
-after completing the parse sequence.
 Declared in
 .In mandoc.h ,
 implemented in
Index: main.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/main.c,v
retrieving revision 1.199
retrieving revision 1.200
diff -Lmain.c -Lmain.c -u -p -r1.199 -r1.200
--- main.c
+++ main.c
@@ -115,7 +115,6 @@ main(int argc, char *argv[])
 #endif
 	enum mandoclevel rc;
 	enum outmode	 outmode;
-	pid_t		 child_pid;
 	int		 fd;
 	int		 show_usage;
 	int		 use_pager;
@@ -388,8 +387,7 @@ main(int argc, char *argv[])
 	while (argc) {
 #if HAVE_SQLITE3
 		if (resp != NULL) {
-			rc = mparse_open(curp.mp, &fd, resp->file,
-			    &child_pid);
+			rc = mparse_open(curp.mp, &fd, resp->file);
 			if (fd == -1)
 				/* nothing */;
 			else if (resp->form & FORM_SRC) {
@@ -403,14 +401,12 @@ main(int argc, char *argv[])
 		} else
 #endif
 		{
-			rc = mparse_open(curp.mp, &fd, *argv++,
-			    &child_pid);
+			rc = mparse_open(curp.mp, &fd, *argv++);
 			if (fd != -1)
 				parse(&curp, fd, argv[-1], &rc);
 		}
 
-		if (child_pid &&
-		    mparse_wait(curp.mp, child_pid) != MANDOCLEVEL_OK)
+		if (mparse_wait(curp.mp) != MANDOCLEVEL_OK)
 			rc = MANDOCLEVEL_SYSERR;
 
 		if (MANDOCLEVEL_OK != rc && curp.wstop)


Log Message:
-----------
Let mparse_readfd() use mparse_open() and mparse_wait()
and let mparse_open() fall back to .gz files
such that .so works even when the target is zipped,
requested by and in part using ideas from <bapt at FreeBSD>.
While here, make sure files are readable before forking,
both for efficiency and for better error reporting.

Modified Files:
--------------
    mdocml:
        mandoc.3
        read.c

Revision Data
-------------
Index: read.c
===================================================================
RCS file: /home/cvs/mdocml/mdocml/read.c,v
retrieving revision 1.97
retrieving revision 1.98
diff -Lread.c -Lread.c -u -p -r1.97 -r1.98
--- read.c
+++ read.c
@@ -780,28 +780,25 @@ mparse_readmem(struct mparse *curp, cons
 	return(curp->file_status);
 }
 
+/*
+ * If a file descriptor is given, use it and assume it points
+ * to the named file.  Otherwise, open the named file.
+ * Read the whole file into memory and call the parsers.
+ * Called recursively when an .so request is encountered.
+ */
 enum mandoclevel
 mparse_readfd(struct mparse *curp, int fd, const char *file)
 {
 	struct buf	 blk;
 	int		 with_mmap;
 	int		 save_filenc;
+	pid_t		 save_child;
 
-	if (-1 == fd && -1 == (fd = open(file, O_RDONLY, 0))) {
-		curp->file_status = MANDOCLEVEL_SYSERR;
-		if (curp->mmsg)
-			(*curp->mmsg)(MANDOCERR_SYSOPEN,
-			    curp->file_status,
-			    file, 0, 0, strerror(errno));
-		return(curp->file_status);
-	}
-
-	/*
-	 * Run for each opened file; may be called more than once for
-	 * each full parse sequence if the opened file is nested (i.e.,
-	 * from `so').  Simply sucks in the whole file and moves into
-	 * the parse phase for the file.
-	 */
+	save_child = curp->child;
+	if (fd != -1)
+		curp->child = 0;
+	else if (mparse_open(curp, &fd, file) >= MANDOCLEVEL_SYSERR)
+		goto out;
 
 	if (read_whole_file(curp, file, fd, &blk, &with_mmap)) {
 		save_filenc = curp->filenc;
@@ -817,9 +814,12 @@ mparse_readfd(struct mparse *curp, int f
 			free(blk.buf);
 	}
 
-	if (STDIN_FILENO != fd && -1 == close(fd))
+	if (fd != STDIN_FILENO && close(fd) == -1)
 		perror(file);
 
+	mparse_wait(curp);
+out:
+	curp->child = save_child;
 	return(curp->file_status);
 }
 
@@ -827,21 +827,40 @@ enum mandoclevel
 mparse_open(struct mparse *curp, int *fd, const char *file)
 {
 	int		  pfd[2];
+	int		  save_errno;
 	char		 *cp;
 	enum mandocerr	  err;
 
 	pfd[1] = -1;
 	curp->file = file;
+
+	/* Unless zipped, try to just open the file. */
+
 	if ((cp = strrchr(file, '.')) == NULL ||
 	    strcmp(cp + 1, "gz")) {
 		curp->child = 0;
-		if ((*fd = open(file, O_RDONLY)) == -1) {
-			err = MANDOCERR_SYSOPEN;
-			goto out;
-		}
-		return(MANDOCLEVEL_OK);
+		if ((*fd = open(file, O_RDONLY)) != -1)
+			return(MANDOCLEVEL_OK);
+
+		/* Open failed; try to append ".gz". */
+
+		mandoc_asprintf(&cp, "%s.gz", file);
+		file = cp;
+	} else
+		cp = NULL;
+
+	/* Before forking, make sure the file can be read. */
+
+	save_errno = errno;
+	if (access(file, R_OK) == -1) {
+		if (cp != NULL)
+			errno = save_errno;
+		err = MANDOCERR_SYSOPEN;
+		goto out;
 	}
 
+	/* Run gunzip(1). */
+
 	if (pipe(pfd) == -1) {
 		err = MANDOCERR_SYSPIPE;
 		goto out;
@@ -870,11 +889,12 @@ mparse_open(struct mparse *curp, int *fd
 	}
 
 out:
+	free(cp);
 	*fd = -1;
 	curp->child = 0;
 	curp->file_status = MANDOCLEVEL_SYSERR;
 	if (curp->mmsg)
-		(*curp->mmsg)(err, curp->file_status, file,
+		(*curp->mmsg)(err, curp->file_status, curp->file,
 		    0, 0, strerror(errno));
 	if (pfd[1] != -1)
 		exit(1);
Index: mandoc.3
===================================================================
RCS file: /home/cvs/mdocml/mdocml/mandoc.3,v
retrieving revision 1.28
retrieving revision 1.29
diff -Lmandoc.3 -Lmandoc.3 -u -p -r1.28 -r1.29
--- mandoc.3
+++ mandoc.3
@@ -396,6 +396,12 @@ open with
 .Xr gunzip 1 ;
 otherwise, with
 .Xr open 2 .
+If
+.Xr open 2
+fails, append
+.Pa .gz
+and try with
+.Xr gunzip 1 .
 Return a file descriptor open for reading in
 .Fa fd ,
 or -1 on failure.
@@ -410,14 +416,18 @@ implemented in
 Parse a file or file descriptor.
 If
 .Va fd
-is -1,
+is -1, open
 .Va fname
-is opened for reading.
+with
+.Fn mparse_open .
 Otherwise,
 .Va fname
 is assumed to be the name associated with
 .Va fd .
-This may be called multiple times with different parameters; however,
+Calls
+.Fn mparse_wait
+before returning.
+This function may be called multiple times with different parameters; however,
 .Fn mparse_reset
 should be invoked between parses.
 Declared in
@@ -461,11 +471,12 @@ implemented in
 .It Fn mparse_wait
 Bury a
 .Xr gunzip 1
-child process
-.Fa child_pid
-that was spawned with
+child process that was spawned with
 .Fn mparse_open .
 To be called after the parse sequence is complete.
+Not needed after
+.Fn mparse_readfd ,
+but does no harm in that case, either.
 Returns
 .Dv MANDOCLEVEL_OK
 on success and
--
 To unsubscribe send an email to tech+unsubscribe@mdocml.bsd.lv

  reply	other threads:[~2014-11-27  0:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-24 14:36 Baptiste Daroussin
2014-11-26 20:10 ` Ingo Schwarze
2014-11-26 20:24   ` Baptiste Daroussin
2014-11-27  0:12     ` Ingo Schwarze [this message]
2014-11-27  2:03       ` Ingo Schwarze
2014-11-29 17:54       ` Baptiste Daroussin
2014-11-29 18:14         ` Ingo Schwarze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141127001242.GH26411@iris.usta.de \
    --to=schwarze@usta.de \
    --cc=bapt@FreeBSD.org \
    --cc=tech@mdocml.bsd.lv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).