[TUHS] signals and blocked in I/O

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

From: bakul@bitblocks.com (Bakul Shah)
Subject: [TUHS] signals and blocked in I/O
Date: Fri, 01 Dec 2017 17:40:32 -0800	[thread overview]
Message-ID: <20171202014047.E5E0C156E523@mail.bitblocks.com> (raw)
In-Reply-To: Your message of "Fri, 01 Dec 2017 16:48:50 -0800." <20171202004850.GB24335@mcvoy.com>

On Fri, 01 Dec 2017 16:48:50 -0800 Larry McVoy <lm at mcvoy.com> wrote:
Larry McVoy writes:
> On Fri, Dec 01, 2017 at 03:42:15PM -0800, Bakul Shah wrote:
> > On Fri, 01 Dec 2017 15:09:34 -0800 Larry McVoy <lm at mcvoy.com> wrote:
> > Larry McVoy writes:
> > > On Fri, Dec 01, 2017 at 11:03:02PM +0000, Ralph Corderoy wrote:
> > > > Hi Larry,
> > > > 
> > > > > > So OOM code kills a (random) process in hopes of freeing up some
> > > > > > pages but if this process is stuck in diskIO, nothing can be freed
> > > > > > and everything grinds to a halt.
> > > > >
> > > > > Yep, exactly.
> > > > 
> > > > Is that because the pages have been dirty for so long they've reached
> > > > the VM-writeback timeout even though there's no pressure to use them fo
> r
> > > > something else?  Or has that been lengthened because you don't fear
> > > > power loss wiping volatile RAM?
> > > 
> > > I'm tinkering with the pageout daemon so I'm trying to apply memory
> > > pressure.  I have 10 25GB processes (25GB malloced) and the processes jus
> t
> > > walk the memory over and over.  This is on a 256GB main memory machine
> > > (2 socket haswell, 28 cpus, 28 1TB SSDs, on loan from Netflix).
> > 
> > How many times do processes walk their memory before this condition
> > occurs? 
> 
> Until free memory goes to ~0.  That's the point, I'm trying to 
> improve things when there is too much pressure on memory.

You said 10x25GB but you have 256GB. So there is still
6GB left...

> 
> > So what may be happening is that a process references a page,
> > it page faults, the kernel finds its phys page has been paged
> > out, so it looks for a free page and once a free page is
> > found, the process will block on page in. Or if there is no
> > free page, it has to wait until some other dirty page is paged
> > out (but this would be a different wait queue).  As more and
> > more processes do this, the system runs out of all free pages.
> 
> Yeah.
> 
> > Can you find out how many processes are waiting under what
> > conditions, how long they wait and how these queue lengths are
> > changing over time?  
> 
> So I have 10 processes, they all run until the system starts to
> thrash, then they are all in wait mode for memory but there isn't
> any (and there is no swap configured).
> 
> The fundamental problem is that they are sleeping waiting for memory to
> be freed.  They are NOT in I/O mode, there is no DMA happening, this is
> main memory, it is not backed by swap, there is no swap.  So they are
> sleeping waiting for the pageout daemon to free some memory.  It's not
> going to free their memory because there is no place to stash (no swap).
> So it's trying to free other memory.

This confuses me. Before I make more false assumptions,
can you show the code?

> The real question is where did they go to sleep and why did they sleep
> without PCATCH on?  If I can find that place where they are trying to
> alloc a page and failed and they go to sleep there, I could either

Can you use kgdb to find out where they sleep?

> a) commit seppuku because we are out of memory and I'm part of the problem
> b) go into a sleep / wakeup / check signals loop
> 
> I am reminded by you all that we ask the process to do it to itself but
> there does seem to be a way to sleep and respect signals, the tty stuff
> does that.  So if I can find this place, determine that I'm just asking
> for memory, not I/O, and sleep with PCATCH on then I might be golden.

Won't kgdb tell you? Or you can insert printfs. You should also print
something in your program as it walks a few pages and see if this happens
at almost the same time (when all pages are used up).

tty code probably assumes memory shortfall is a short term issue
(which is likely). Not the case with your test (unless I misguessed).

> Where "golden" means I can kill the process and the OOM thread could do
> it for me.
> 
> Thoughts?

Now I am starting to think this happens as soon as all the
phys pages are used up. But looking at your program would
help.

[removd cc: TUHS]

next prev parent reply	other threads:[~2017-12-02  1:40 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-01 15:44 Larry McVoy
2017-12-01 15:53 ` Dan Cross
2017-12-01 16:11   ` Clem Cole
2017-12-01 16:18     ` Larry McVoy
2017-12-01 16:33       ` Warner Losh
2017-12-01 17:26         ` Larry McVoy
2017-12-01 19:10           ` Chris Torek
2017-12-01 23:21             ` Dave Horsfall
2017-12-01 21:33           ` Bakul Shah
2017-12-01 22:38             ` Larry McVoy
2017-12-01 23:03               ` Ralph Corderoy
2017-12-01 23:09                 ` Larry McVoy
2017-12-01 23:42                   ` Bakul Shah
2017-12-02  0:48                     ` Larry McVoy
2017-12-02  1:40                       ` Bakul Shah [this message]
2017-12-03 13:50                       ` Ralph Corderoy
2017-12-04 16:36                       ` arnold
2017-12-04 16:58                         ` Arthur Krewat
2017-12-04 17:19                         ` Warner Losh
2017-12-05  2:12                           ` Bakul Shah
2017-12-04 22:07                         ` Dave Horsfall
2017-12-04 22:54                           ` Ron Natalie
2017-12-04 22:56                             ` Warner Losh
2017-12-05  0:49                               ` Dave Horsfall
2017-12-05  0:58                                 ` Arthur Krewat
2017-12-05  2:15                                 ` Dave Horsfall
2017-12-05  2:54                                   ` Clem cole
2017-12-02 14:59                   ` Theodore Ts'o
2017-12-01 16:01 ` Dave Horsfall
2017-12-01 16:24 ` Warner Losh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171202014047.E5E0C156E523@mail.bitblocks.com \
    --to=bakul@bitblocks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).