From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/196 Path: news.gmane.org!not-for-mail From: Vasiliy Kulikov Newsgroups: gmane.linux.lib.musl.general Subject: holywar: malloc() vs. OOM Date: Sun, 24 Jul 2011 14:33:25 +0400 Message-ID: <20110724103325.GA24069@albatros> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1311503717 26463 80.91.229.12 (24 Jul 2011 10:35:17 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 24 Jul 2011 10:35:17 +0000 (UTC) To: musl Original-X-From: musl-return-280-gllmg-musl=m.gmane.org@lists.openwall.com Sun Jul 24 12:35:13 2011 Return-path: Envelope-to: gllmg-musl@lo.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by lo.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Qkw1f-0006s7-Vf for gllmg-musl@lo.gmane.org; Sun, 24 Jul 2011 12:35:12 +0200 Original-Received: (qmail 21727 invoked by uid 550); 24 Jul 2011 10:35:09 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 21719 invoked from network); 24 Jul 2011 10:35:09 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=TuGgd9wSq52KoSdnX6j3il2+uBKeL7wfYQpKgGXk4V8=; b=HpUVE1bOMoy2G61GTIFiQNc2qOsHYLVcI9IQRZ3HLxrCJMipL976PGUup0YzdeLY88 Ojp/XTfWQU1bKrW9ZXb6xiUgWET19GvDsZgzgG8P0iGYAz2fojOwRELFMe6Lc7R2ZZXK o7GgSV5uhwc+yC5qhXGSzgLqzJ3pdzhmG/z+c= Original-Sender: Vasiliy Kulikov Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Xref: news.gmane.org gmane.linux.lib.musl.general:196 Archived-At: Rich, This is more a question about your malloc() failure policy for musl than an actual proposal. When brk() or mmap() fails, libc usually returns NULL to the program. If the program wants to gracefully handle OOM, it may do it. If not, it will likely generate SIGSEGV and will be killed without any problem. However, there are potential issues with this behaviour: 1) If the program doesn't handle OOM at all, it can lead to security problems. a) if NULL page is not mmap'ed (the case of all nonroot apps and most of root apps), the page starting from vm.mmap_min_addr still may present in the process' vm. For some distros only one page was guarded this way in the past. So, if the allocation is bigger than ~4-64kb, and the write begins from the end of the page, then some bad things may happen before SIGSEGV (the worst case is privilege escalation). This is a patological case, I didn't see such cases myself, but it's possible in theory. b) if NULL page is mmap'ed, the application might not identify OOM at all as the page is mmap'ed and SIGSEGV is not sent. (Yes, apps mmap'ing NULL page must handle OOM, but see (2).) 2) If the program handle OOM, it might do it very bad way. The OOM handling code path is almost always not tested and contain bugs. Even the kernel, which obviously must handle OOM, doesn't properly handle it (I found bugs in OOM handling code much more often than in other error handling code) because this code is not tested. DBUS daemon, which is closely connected with init in modern distros, must not fail on OOM by design (otherwise init would fail and the whole system would hang/reboot), and it took much time to remove silent bugs in this code: (http://blog.ometer.com/2008/02/04/out-of-memory-handling-d-bus-experience/) In theory, these are bugs of applications and not of libc, and they should be fully handled in programs, not in libc. Period. But looking at the problem from the pragmatic point of view we'll see that libc is actually the easiest place where the problem may be workarounded (not fixed, surely). The workaround would be simply raising SIGKILL if malloc() fails (either because of brk() or mmap()). For the rare programs craving to handle OOM such code should be used: #define _OOM_MAY_FAIL_ #include Then the workaround is disabled. Probably I overestimate the importance of OOM errors, and (1) in particular. However, I think it is worth discussing. Thanks, -- Vasiliy