[TUHS] signals and blocked in I/O

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

From: bakul@bitblocks.com (Bakul Shah)
Subject: [TUHS] signals and blocked in I/O
Date: Fri, 01 Dec 2017 15:42:15 -0800	[thread overview]
Message-ID: <20171201234230.F33D4156E523@mail.bitblocks.com> (raw)
In-Reply-To: Your message of "Fri, 01 Dec 2017 15:09:34 -0800." <20171201230934.GA24335@mcvoy.com>

On Fri, 01 Dec 2017 15:09:34 -0800 Larry McVoy <lm at mcvoy.com> wrote:
Larry McVoy writes:
> On Fri, Dec 01, 2017 at 11:03:02PM +0000, Ralph Corderoy wrote:
> > Hi Larry,
> > 
> > > > So OOM code kills a (random) process in hopes of freeing up some
> > > > pages but if this process is stuck in diskIO, nothing can be freed
> > > > and everything grinds to a halt.
> > >
> > > Yep, exactly.
> > 
> > Is that because the pages have been dirty for so long they've reached
> > the VM-writeback timeout even though there's no pressure to use them for
> > something else?  Or has that been lengthened because you don't fear
> > power loss wiping volatile RAM?
> 
> I'm tinkering with the pageout daemon so I'm trying to apply memory
> pressure.  I have 10 25GB processes (25GB malloced) and the processes just
> walk the memory over and over.  This is on a 256GB main memory machine
> (2 socket haswell, 28 cpus, 28 1TB SSDs, on loan from Netflix).

How many times do processes walk their memory before this condition
occurs? 

So what may be happening is that a process references a page,
it page faults, the kernel finds its phys page has been paged
out, so it looks for a free page and once a free page is
found, the process will block on page in. Or if there is no
free page, it has to wait until some other dirty page is paged
out (but this would be a different wait queue).  As more and
more processes do this, the system runs out of all free pages.

Can you find out how many processes are waiting under what
conditions, how long they wait and how these queue lengths are
changing over time?  You can use a ring buffer to capture last
2^N measurements and dump them in the debugger when everything
grinds to a halt.

> It's the old "10 pounds of shit in a 5 pound bag" problem, same old stuff,
> just a bigger bag.
> 
> The problem is that OOM can't kill the processes that are the problem,
> they are stuck in disk wait.  That's why I started asking why can't you
> kill a process that's in the middle of I/O.

The OS equivalent of RED (random early drop) would be if a
process kills itself. e.g. when some critical metric crosses a
highwater mark.

Another option would be to return with an EFAULT and the
process can either kill itself or free up the page or
something. [I have used EFAULT to dyanmically allocate *more*
pages but no reason why the same can be used to free up
memory!]

next prev parent reply	other threads:[~2017-12-01 23:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-01 15:44 Larry McVoy
2017-12-01 15:53 ` Dan Cross
2017-12-01 16:11   ` Clem Cole
2017-12-01 16:18     ` Larry McVoy
2017-12-01 16:33       ` Warner Losh
2017-12-01 17:26         ` Larry McVoy
2017-12-01 19:10           ` Chris Torek
2017-12-01 23:21             ` Dave Horsfall
2017-12-01 21:33           ` Bakul Shah
2017-12-01 22:38             ` Larry McVoy
2017-12-01 23:03               ` Ralph Corderoy
2017-12-01 23:09                 ` Larry McVoy
2017-12-01 23:42                   ` Bakul Shah [this message]
2017-12-02  0:48                     ` Larry McVoy
2017-12-02  1:40                       ` Bakul Shah
2017-12-03 13:50                       ` Ralph Corderoy
2017-12-04 16:36                       ` arnold
2017-12-04 16:58                         ` Arthur Krewat
2017-12-04 17:19                         ` Warner Losh
2017-12-05  2:12                           ` Bakul Shah
2017-12-04 22:07                         ` Dave Horsfall
2017-12-04 22:54                           ` Ron Natalie
2017-12-04 22:56                             ` Warner Losh
2017-12-05  0:49                               ` Dave Horsfall
2017-12-05  0:58                                 ` Arthur Krewat
2017-12-05  2:15                                 ` Dave Horsfall
2017-12-05  2:54                                   ` Clem cole
2017-12-02 14:59                   ` Theodore Ts'o
2017-12-01 16:01 ` Dave Horsfall
2017-12-01 16:24 ` Warner Losh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171201234230.F33D4156E523@mail.bitblocks.com \
    --to=bakul@bitblocks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).