From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7819 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Making stdio writes robust/recoverable under errors Date: Fri, 29 May 2015 09:53:00 -0400 Message-ID: <20150529135300.GD17573@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1432907621 9114 80.91.229.3 (29 May 2015 13:53:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 29 May 2015 13:53:41 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7831-gllmg-musl=m.gmane.org@lists.openwall.com Fri May 29 15:53:40 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YyKiv-0002Ke-P7 for gllmg-musl@m.gmane.org; Fri, 29 May 2015 15:53:21 +0200 Original-Received: (qmail 13609 invoked by uid 550); 29 May 2015 13:53:19 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 13564 invoked from network); 29 May 2015 13:53:14 -0000 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:7819 Archived-At: Currently, musl's stdio write operations take the liberty of leaving the current position and actual contents written rather unpredictable if a write error occurs. Originally the motivation was probably a mix of uncertainty about what to do and lack of code (or a desire to avoid complexity) for tracking what part of the buffer was unwritten if a write error occurred before the buffer was written out. But commit 58165923890865a6ac042fafce13f440ee986fd9, as part of adding cancellation support, added the code to track what part of the buffer remains unwritten, and therefore, from what I can see, avoiding data loss when stdio writes fail transiently (like EINTR, for instance) is simply a matter of removing line 31 in the error path of __stdio_write: f->wpos = f->wbase = f->wend = 0; so that the buffer is not thrown away. An attached test program demonstrates the issue. It assumes Linux 64k pipe buffers so it's not appropriate for inclusion in tests in its current form, but it gets the job done where it works. It attempts to send 128k over a pipe via write followed by a series of fwrite retries that are interrupted by signals that cause EINTR; thanks to the kernel's pipe buffer semantics, there's basically no timing/race aspect involved. With current musl, I get only 130560 across the pipe; with the above line removed, I get all 131072. I think this is a good change to make from the standpoint of robustness. Even if most errors are not recoverable, some may be; in fact ENOSPC could be handled meaningfully if the calling program knows it has temp files it could delete. What I am concerned about, though, are residual assumptions that, after f->write, the write buffer will be clear. With this change, fflush and similar operations would no longer discard the buffer on error, meaning we would need to check that callers don't need that property. As another example, fseek fails when the buffer can't be written out, but since the current buffer write throws away the buffer on failure, a second fseek call will succeed; this change would leave it failing. (As an aside, rewind already fails in this case and throws away the error; this is almost certainly wrong. It should probably attempt to flush and throw away the buffer on failure before seeking so that the seek always succeeds.) I don't want to make these changes right now at the end of a release cycle but I might promote them to important targets for the next cycle. Rich