From: bakul@bitblocks.com (Bakul Shah)
Subject: [TUHS] signals and blocked in I/O
Date: Fri, 01 Dec 2017 17:40:32 -0800 [thread overview]
Message-ID: <20171202014047.E5E0C156E523@mail.bitblocks.com> (raw)
In-Reply-To: Your message of "Fri, 01 Dec 2017 16:48:50 -0800." <20171202004850.GB24335@mcvoy.com>
On Fri, 01 Dec 2017 16:48:50 -0800 Larry McVoy <lm at mcvoy.com> wrote:
Larry McVoy writes:
> On Fri, Dec 01, 2017 at 03:42:15PM -0800, Bakul Shah wrote:
> > On Fri, 01 Dec 2017 15:09:34 -0800 Larry McVoy <lm at mcvoy.com> wrote:
> > Larry McVoy writes:
> > > On Fri, Dec 01, 2017 at 11:03:02PM +0000, Ralph Corderoy wrote:
> > > > Hi Larry,
> > > >
> > > > > > So OOM code kills a (random) process in hopes of freeing up some
> > > > > > pages but if this process is stuck in diskIO, nothing can be freed
> > > > > > and everything grinds to a halt.
> > > > >
> > > > > Yep, exactly.
> > > >
> > > > Is that because the pages have been dirty for so long they've reached
> > > > the VM-writeback timeout even though there's no pressure to use them fo
> r
> > > > something else? Or has that been lengthened because you don't fear
> > > > power loss wiping volatile RAM?
> > >
> > > I'm tinkering with the pageout daemon so I'm trying to apply memory
> > > pressure. I have 10 25GB processes (25GB malloced) and the processes jus
> t
> > > walk the memory over and over. This is on a 256GB main memory machine
> > > (2 socket haswell, 28 cpus, 28 1TB SSDs, on loan from Netflix).
> >
> > How many times do processes walk their memory before this condition
> > occurs?
>
> Until free memory goes to ~0. That's the point, I'm trying to
> improve things when there is too much pressure on memory.
You said 10x25GB but you have 256GB. So there is still
6GB left...
>
> > So what may be happening is that a process references a page,
> > it page faults, the kernel finds its phys page has been paged
> > out, so it looks for a free page and once a free page is
> > found, the process will block on page in. Or if there is no
> > free page, it has to wait until some other dirty page is paged
> > out (but this would be a different wait queue). As more and
> > more processes do this, the system runs out of all free pages.
>
> Yeah.
>
> > Can you find out how many processes are waiting under what
> > conditions, how long they wait and how these queue lengths are
> > changing over time?
>
> So I have 10 processes, they all run until the system starts to
> thrash, then they are all in wait mode for memory but there isn't
> any (and there is no swap configured).
>
> The fundamental problem is that they are sleeping waiting for memory to
> be freed. They are NOT in I/O mode, there is no DMA happening, this is
> main memory, it is not backed by swap, there is no swap. So they are
> sleeping waiting for the pageout daemon to free some memory. It's not
> going to free their memory because there is no place to stash (no swap).
> So it's trying to free other memory.
This confuses me. Before I make more false assumptions,
can you show the code?
> The real question is where did they go to sleep and why did they sleep
> without PCATCH on? If I can find that place where they are trying to
> alloc a page and failed and they go to sleep there, I could either
Can you use kgdb to find out where they sleep?
> a) commit seppuku because we are out of memory and I'm part of the problem
> b) go into a sleep / wakeup / check signals loop
>
> I am reminded by you all that we ask the process to do it to itself but
> there does seem to be a way to sleep and respect signals, the tty stuff
> does that. So if I can find this place, determine that I'm just asking
> for memory, not I/O, and sleep with PCATCH on then I might be golden.
Won't kgdb tell you? Or you can insert printfs. You should also print
something in your program as it walks a few pages and see if this happens
at almost the same time (when all pages are used up).
tty code probably assumes memory shortfall is a short term issue
(which is likely). Not the case with your test (unless I misguessed).
> Where "golden" means I can kill the process and the OOM thread could do
> it for me.
>
> Thoughts?
Now I am starting to think this happens as soon as all the
phys pages are used up. But looking at your program would
help.
[removd cc: TUHS]
next prev parent reply other threads:[~2017-12-02 1:40 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-01 15:44 Larry McVoy
2017-12-01 15:53 ` Dan Cross
2017-12-01 16:11 ` Clem Cole
2017-12-01 16:18 ` Larry McVoy
2017-12-01 16:33 ` Warner Losh
2017-12-01 17:26 ` Larry McVoy
2017-12-01 19:10 ` Chris Torek
2017-12-01 23:21 ` Dave Horsfall
2017-12-01 21:33 ` Bakul Shah
2017-12-01 22:38 ` Larry McVoy
2017-12-01 23:03 ` Ralph Corderoy
2017-12-01 23:09 ` Larry McVoy
2017-12-01 23:42 ` Bakul Shah
2017-12-02 0:48 ` Larry McVoy
2017-12-02 1:40 ` Bakul Shah [this message]
2017-12-03 13:50 ` Ralph Corderoy
2017-12-04 16:36 ` arnold
2017-12-04 16:58 ` Arthur Krewat
2017-12-04 17:19 ` Warner Losh
2017-12-05 2:12 ` Bakul Shah
2017-12-04 22:07 ` Dave Horsfall
2017-12-04 22:54 ` Ron Natalie
2017-12-04 22:56 ` Warner Losh
2017-12-05 0:49 ` Dave Horsfall
2017-12-05 0:58 ` Arthur Krewat
2017-12-05 2:15 ` Dave Horsfall
2017-12-05 2:54 ` Clem cole
2017-12-02 14:59 ` Theodore Ts'o
2017-12-01 16:01 ` Dave Horsfall
2017-12-01 16:24 ` Warner Losh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171202014047.E5E0C156E523@mail.bitblocks.com \
--to=bakul@bitblocks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).