tech@mandoc.bsd.lv
 help / color / mirror / Atom feed
* read.c does not close gzipped files; leaks memory
@ 2018-02-23 19:28 Wolfgang Müller
  2018-02-23 22:13 ` Ingo Schwarze
  0 siblings, 1 reply; 2+ messages in thread
From: Wolfgang Müller @ 2018-02-23 19:28 UTC (permalink / raw)
  To: tech


[-- Attachment #1.1: Type: text/plain, Size: 361 bytes --]

Hi,

I'm using mandoc on a system with a significant quantity of gzipped
manpages, and noticed its memory usage being unusually high. After
investigating, I found that in read_whole_file, mandoc opens the gzipped
file, but then never closes it before returning from the function.

Find attached a patch which solves this issue.

-- 
Wolfgang Müller

[-- Attachment #1.2: mandoc-gzleak.patch --]
[-- Type: text/x-diff, Size: 777 bytes --]

--- read.c.orig	2018-02-18 07:27:07.124135724 +0000
+++ read.c	2018-02-18 07:34:02.832965838 +0000
@@ -556,6 +556,7 @@
 	gzFile		 gz;
 	size_t		 off;
 	ssize_t		 ssz;
+	int		 end;
 
 	if (fstat(fd, &st) == -1) {
 		mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0,
@@ -598,6 +599,7 @@
 
 	*with_mmap = 0;
 	off = 0;
+	end = 0;
 	fb->sz = 0;
 	fb->buf = NULL;
 	for (;;) {
@@ -614,7 +616,8 @@
 		    read(fd, fb->buf + (int)off, fb->sz - off);
 		if (ssz == 0) {
 			fb->sz = off;
-			return 1;
+			end = 1;
+			break;
 		}
 		if (ssz == -1) {
 			mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0,
@@ -624,6 +627,12 @@
 		off += (size_t)ssz;
 	}
 
+	if (curp->gzip)
+		gzclose(gz);
+
+	if (end)
+		return 1;
+
 	free(fb->buf);
 	fb->buf = NULL;
 	return 0;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: read.c does not close gzipped files; leaks memory
  2018-02-23 19:28 read.c does not close gzipped files; leaks memory Wolfgang Müller
@ 2018-02-23 22:13 ` Ingo Schwarze
  0 siblings, 0 replies; 2+ messages in thread
From: Ingo Schwarze @ 2018-02-23 22:13 UTC (permalink / raw)
  To: Wolfgang Mueller; +Cc: tech

Hello Wolfgang,

Wolfgang Mueller wrote on Fri, Feb 23, 2018 at 07:28:10PM +0000:

> I'm using mandoc on a system with a significant quantity of gzipped
> manpages, and noticed its memory usage being unusually high. After
> investigating, I found that in read_whole_file, mandoc opens the gzipped
> file, but then never closes it before returning from the function.
> 
> Find attached a patch which solves this issue.

Thanks a lot for noticing the bug, for tracking down the root cause,
for providing a concise explanation, and for even providing a patch.
That is very helpful!

I committed your patch with a number of tweaks, see below.
The main problem with your version was that it also closed the
original file descriptor, which this function is not supposed to do.
Besides, error reporting was incomplete.

Yours,
  Ingo

P.S.
I'm surprised to see that the zlib API is such a horrible mess -
and that the manual page is even worse:
 - Documentation of gzread(3) doesn't mention how errors are reported.
   Finding out required experimentation (or reading the code).
 - The mixture of reporting some errors via gzerror(3) and others
   via errno does not look like reasonable design.  It results
   in complicated and ugly application code.
 - The mixture of some functions returning gzerror values, some
   passing them out via pointer arguments, and the competing
   concept of retrieving them after the fact via gzerror(3) feels
   like redundant and badly inconsistent API design and causes
   application code to look inconsistent, too.
 - What gzerror(3) reports is not particularly helpful.  Invalidly
   changed bytes in the middle of a file merely result in "data error" -
   arguably accurate, but mildly helpful at best, whereas an empty
   file results in "buffer error", which is totally misleading.
 - The manual page explicitly recommends gzerror(3) after gzclose(3)
   failure, but that actually crashes (with segfault, bus error or
   similar) because gzclose(3) trashes the gzFile object even on
   failure.
 - The function you actually need after gzclose(3) failure,
   zError(3), is weirdly named and almost undocumented.
 - There doesn't appear to be a way to disable the behaviour of
   opening the file uncompressed when it is not in zlib format.
But what can we do, we have to use the API as it is.  Probably it was
a mistake to rely on zlib, given how ill-designed it is, the resulting
ugliness taints the core mandoc code.  But now it seems too late to
throw out zlib support again, i guess too many people already got
used to having support for gzipped manual pages...


Log Message:
-----------
After opening a file with gzdopen(3), we have to call gzclose(3) or 
we leak memory internally used by zlib to keep compression state.
Bug reported by Wolfgang Mueller <vehk at vehk dot de> who also
provided an incomplete patch, part of which i'm using in this commit.

Modified Files:
--------------
    mandoc:
        read.c

Revision Data
-------------
Index: read.c
===================================================================
RCS file: /home/cvs/mandoc/mandoc/read.c,v
retrieving revision 1.193
retrieving revision 1.194
diff -Lread.c -Lread.c -u -p -r1.193 -r1.194
--- read.c
+++ read.c
@@ -556,6 +556,7 @@ read_whole_file(struct mparse *curp, con
 	gzFile		 gz;
 	size_t		 off;
 	ssize_t		 ssz;
+	int		 gzerrnum, retval;
 
 	if (fstat(fd, &st) == -1) {
 		mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0,
@@ -583,9 +584,22 @@ read_whole_file(struct mparse *curp, con
 	}
 
 	if (curp->gzip) {
+		/*
+		 * Duplicating the file descriptor is required
+		 * because we will have to call gzclose(3)
+		 * to free memory used internally by zlib,
+		 * but that will also close the file descriptor,
+		 * which this function must not do.
+		 */
+		if ((fd = dup(fd)) == -1) {
+			mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0,
+			    "dup: %s", strerror(errno));
+			return 0;
+		}
 		if ((gz = gzdopen(fd, "rb")) == NULL) {
 			mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0,
 			    "gzdopen: %s", strerror(errno));
+			close(fd);
 			return 0;
 		}
 	} else
@@ -598,6 +612,7 @@ read_whole_file(struct mparse *curp, con
 
 	*with_mmap = 0;
 	off = 0;
+	retval = 0;
 	fb->sz = 0;
 	fb->buf = NULL;
 	for (;;) {
@@ -614,19 +629,29 @@ read_whole_file(struct mparse *curp, con
 		    read(fd, fb->buf + (int)off, fb->sz - off);
 		if (ssz == 0) {
 			fb->sz = off;
-			return 1;
+			retval = 1;
+			break;
 		}
 		if (ssz == -1) {
-			mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0,
-			    "read: %s", strerror(errno));
+			if (curp->gzip)
+				(void)gzerror(gz, &gzerrnum);
+			mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0, "read: %s",
+			    curp->gzip && gzerrnum != Z_ERRNO ?
+			    zError(gzerrnum) : strerror(errno));
 			break;
 		}
 		off += (size_t)ssz;
 	}
 
-	free(fb->buf);
-	fb->buf = NULL;
-	return 0;
+	if (curp->gzip && (gzerrnum = gzclose(gz)) != Z_OK)
+		mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0, "gzclose: %s",
+		    gzerrnum == Z_ERRNO ? strerror(errno) :
+		    zError(gzerrnum));
+	if (retval == 0) {
+		free(fb->buf);
+		fb->buf = NULL;
+	}
+	return retval;
 }
 
 static void
--
 To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-02-23 22:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-23 19:28 read.c does not close gzipped files; leaks memory Wolfgang Müller
2018-02-23 22:13 ` Ingo Schwarze

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).