From mboxrd@z Thu Jan 1 00:00:00 1970 From: tytso@mit.edu (Theodore Ts'o) Date: Sat, 2 Dec 2017 09:59:52 -0500 Subject: [TUHS] signals and blocked in I/O In-Reply-To: <20171201230934.GA24335@mcvoy.com> References: <20171201154448.GL3924@mcvoy.com> <20171201161810.GM3924@mcvoy.com> <20171201172603.GO3924@mcvoy.com> <20171201223859.GX3924@mcvoy.com> <20171201230302.0DC351FA41@orac.inputplus.co.uk> <20171201230934.GA24335@mcvoy.com> Message-ID: <20171202145952.m4v76unuz5vbw4e5@thunk.org> On Fri, Dec 01, 2017 at 03:09:34PM -0800, Larry McVoy wrote: > > It's the old "10 pounds of shit in a 5 pound bag" problem, same old stuff, > just a bigger bag. > > The problem is that OOM can't kill the processes that are the problem, > they are stuck in disk wait. That's why I started asking why can't you > kill a process that's in the middle of I/O. You may need to solve the problem much earlier, by write throttling the process which is generating so many dirty pages in the first place. At one point Linux would press-gang the badly behaved process which was generating lots of dirty pages into helping to deactivate and clean pages; it doesn't do this any more, but stopping processes which are being badly behaved until the writeback daemons can catch up is certainly kinder than OOM-killing the bad process. Are you using ZFS? It does have a write throttling knob, apparently. - Ted