From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from scc-mailout-kit-01.scc.kit.edu (scc-mailout-kit-01.scc.kit.edu [129.13.231.81]) by fantadrom.bsd.lv (OpenSMTPD) with ESMTP id 83f420eb for ; Fri, 23 Feb 2018 17:13:36 -0500 (EST) Received: from asta-nat.asta.uni-karlsruhe.de ([172.22.63.82] helo=hekate.usta.de) by scc-mailout-kit-01.scc.kit.edu with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (envelope-from ) id 1epLaw-0008ON-9o; Fri, 23 Feb 2018 23:13:35 +0100 Received: from donnerwolke.usta.de ([172.24.96.3]) by hekate.usta.de with esmtp (Exim 4.77) (envelope-from ) id 1epLav-00073P-UN; Fri, 23 Feb 2018 23:13:33 +0100 Received: from athene.usta.de ([172.24.96.10]) by donnerwolke.usta.de with esmtp (Exim 4.84_2) (envelope-from ) id 1epLav-0004Ft-Tv; Fri, 23 Feb 2018 23:13:33 +0100 Received: from localhost (athene.usta.de [local]) by athene.usta.de (OpenSMTPD) with ESMTPA id 6dbf2a87; Fri, 23 Feb 2018 23:13:33 +0100 (CET) Date: Fri, 23 Feb 2018 23:13:33 +0100 From: Ingo Schwarze To: Wolfgang Mueller Cc: tech@mandoc.bsd.lv Subject: Re: read.c does not close gzipped files; leaks memory Message-ID: <20180223221333.GC94422@athene.usta.de> References: <20180223192810.GA11077@tithonus.olympus> X-Mailinglist: mandoc-tech Reply-To: tech@mandoc.bsd.lv MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180223192810.GA11077@tithonus.olympus> User-Agent: Mutt/1.8.0 (2017-02-23) Hello Wolfgang, Wolfgang Mueller wrote on Fri, Feb 23, 2018 at 07:28:10PM +0000: > I'm using mandoc on a system with a significant quantity of gzipped > manpages, and noticed its memory usage being unusually high. After > investigating, I found that in read_whole_file, mandoc opens the gzipped > file, but then never closes it before returning from the function. > > Find attached a patch which solves this issue. Thanks a lot for noticing the bug, for tracking down the root cause, for providing a concise explanation, and for even providing a patch. That is very helpful! I committed your patch with a number of tweaks, see below. The main problem with your version was that it also closed the original file descriptor, which this function is not supposed to do. Besides, error reporting was incomplete. Yours, Ingo P.S. I'm surprised to see that the zlib API is such a horrible mess - and that the manual page is even worse: - Documentation of gzread(3) doesn't mention how errors are reported. Finding out required experimentation (or reading the code). - The mixture of reporting some errors via gzerror(3) and others via errno does not look like reasonable design. It results in complicated and ugly application code. - The mixture of some functions returning gzerror values, some passing them out via pointer arguments, and the competing concept of retrieving them after the fact via gzerror(3) feels like redundant and badly inconsistent API design and causes application code to look inconsistent, too. - What gzerror(3) reports is not particularly helpful. Invalidly changed bytes in the middle of a file merely result in "data error" - arguably accurate, but mildly helpful at best, whereas an empty file results in "buffer error", which is totally misleading. - The manual page explicitly recommends gzerror(3) after gzclose(3) failure, but that actually crashes (with segfault, bus error or similar) because gzclose(3) trashes the gzFile object even on failure. - The function you actually need after gzclose(3) failure, zError(3), is weirdly named and almost undocumented. - There doesn't appear to be a way to disable the behaviour of opening the file uncompressed when it is not in zlib format. But what can we do, we have to use the API as it is. Probably it was a mistake to rely on zlib, given how ill-designed it is, the resulting ugliness taints the core mandoc code. But now it seems too late to throw out zlib support again, i guess too many people already got used to having support for gzipped manual pages... Log Message: ----------- After opening a file with gzdopen(3), we have to call gzclose(3) or we leak memory internally used by zlib to keep compression state. Bug reported by Wolfgang Mueller who also provided an incomplete patch, part of which i'm using in this commit. Modified Files: -------------- mandoc: read.c Revision Data ------------- Index: read.c =================================================================== RCS file: /home/cvs/mandoc/mandoc/read.c,v retrieving revision 1.193 retrieving revision 1.194 diff -Lread.c -Lread.c -u -p -r1.193 -r1.194 --- read.c +++ read.c @@ -556,6 +556,7 @@ read_whole_file(struct mparse *curp, con gzFile gz; size_t off; ssize_t ssz; + int gzerrnum, retval; if (fstat(fd, &st) == -1) { mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0, @@ -583,9 +584,22 @@ read_whole_file(struct mparse *curp, con } if (curp->gzip) { + /* + * Duplicating the file descriptor is required + * because we will have to call gzclose(3) + * to free memory used internally by zlib, + * but that will also close the file descriptor, + * which this function must not do. + */ + if ((fd = dup(fd)) == -1) { + mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0, + "dup: %s", strerror(errno)); + return 0; + } if ((gz = gzdopen(fd, "rb")) == NULL) { mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0, "gzdopen: %s", strerror(errno)); + close(fd); return 0; } } else @@ -598,6 +612,7 @@ read_whole_file(struct mparse *curp, con *with_mmap = 0; off = 0; + retval = 0; fb->sz = 0; fb->buf = NULL; for (;;) { @@ -614,19 +629,29 @@ read_whole_file(struct mparse *curp, con read(fd, fb->buf + (int)off, fb->sz - off); if (ssz == 0) { fb->sz = off; - return 1; + retval = 1; + break; } if (ssz == -1) { - mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0, - "read: %s", strerror(errno)); + if (curp->gzip) + (void)gzerror(gz, &gzerrnum); + mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0, "read: %s", + curp->gzip && gzerrnum != Z_ERRNO ? + zError(gzerrnum) : strerror(errno)); break; } off += (size_t)ssz; } - free(fb->buf); - fb->buf = NULL; - return 0; + if (curp->gzip && (gzerrnum = gzclose(gz)) != Z_OK) + mandoc_vmsg(MANDOCERR_FILE, curp, 0, 0, "gzclose: %s", + gzerrnum == Z_ERRNO ? strerror(errno) : + zError(gzerrnum)); + if (retval == 0) { + free(fb->buf); + fb->buf = NULL; + } + return retval; } static void -- To unsubscribe send an email to tech+unsubscribe@mandoc.bsd.lv