From mboxrd@z Thu Jan  1 00:00:00 1970
From: bakul@bitblocks.com (Bakul Shah)
Date: Mon, 04 Dec 2017 18:12:12 -0800
Subject: [TUHS] signals and blocked in I/O
In-Reply-To: Your message of "Mon, 04 Dec 2017 10:19:19 -0700."
 <CANCZdfoRk+R7xpYnFGk3O8sy-DUnykOZjxnTL1vVguaBpAVDBg@mail.gmail.com>
References: <CAEoi9W5mFaQjuGAqSQZSusuyj2uyUX4PvsKVq8bOuM5219E1tQ@mail.gmail.com>
 <CAC20D2NcnW1YHdd5+3Y8yQqghGeLMvy8urDV7R5GevcLpizLew@mail.gmail.com>
 <20171201161810.GM3924@mcvoy.com>
 <CANCZdfpYWbLdkx4fNURFnLM5VGt+qDcnnMyHQ+M4iwAziS2a9A@mail.gmail.com>
 <20171201172603.GO3924@mcvoy.com>
 <BADECBD9-D2B4-4788-8308-87EBF93D555A@bitblocks.com>
 <20171201223859.GX3924@mcvoy.com>
 <20171201230302.0DC351FA41@orac.inputplus.co.uk>
 <20171201230934.GA24335@mcvoy.com>
 <20171201234230.F33D4156E523@mail.bitblocks.com>
 <20171202004850.GB24335@mcvoy.com>
 <201712041636.vB4GaFEp017803@freefriends.org>
 <CANCZdfoRk+R7xpYnFGk3O8sy-DUnykOZjxnTL1vVguaBpAVDBg@mail.gmail.com>
Message-ID: <20171205021228.189EA156E523@mail.bitblocks.com>

On Mon, 04 Dec 2017 10:19:19 -0700 Warner Losh <imp at bsdimp.com> wrote:
Warner Losh writes:
> 
> On Mon, Dec 4, 2017 at 9:36 AM, <arnold at skeeve.com> wrote:
> 
> > Larry McVoy <lm at mcvoy.com> wrote:
> >
> > > So I have 10 processes, they all run until the system starts to
> > > thrash, then they are all in wait mode for memory but there isn't
> > > any (and there is no swap configured).
> >
> > Um, pardon me for asking the obvious question, but why not just configure
> > a few gigs of swap to give the OS some breathing room?
> >
> > Most modern systems let you use a regular old file in the filesystem
> > for swap space, instead of having to repartition your disk and using a
> > dedicated partition. I'd be suprised if your *BSD box didn't let you do
> > that too.  It's a little slower, but a gazillion times more convenient.
> 
> The deployed systems in the field have swap space configured. But if we're
> not careful we can still run out. Larry's tests are at the extreme limits,
> to be sure, and not having swap exacerbates the bad behavior at the limit.

Conceptually they are somewhat different cases.

1) In the no-swap case *no* pages will be available once all
   the phys pages are used up. To recover, the kernel must
   free up space by killing some process.

2) In the swap case pages will eventually be available. The
   paging rate becomes the bottleneck but no need to kill
   any process.

In case 1) there may be other disk traffic to confuse things.
But eventually that should die down. However I am not sure
FreeBSD has a clean way to indicate there are no free pages.
It calls OOM logic as a last resort. In case 2) OOM should
never be called.

I think the issue is that the kernel is unable to kill a
process that is waiting for a free page. Since all of them are
doing the same thing, the system hangs. I think this wait has
to be made interruptable to be able to kill a process. Of
course that may break something else....

May be better to avoid the whole issue in the first place for
a service latency sensitive production machine.  By providing
a way for a process to ask that there is enough backing store
for any mmaped pages (and thus avoid swapping or running out
of memory). Not sure if originally BSD did provide such a
guarantee...

> If he's trying to ameliorate bad effects, doing it on worst case scenario
> certainly helps. The BSD box allows for it, and it's not that much slower,
> but it kinda misses the point of the exercise.

> Also, the systems in questions often operate at the limits of the disk
> subsystem, so additional write traffic is undesirable. They are also SSDs,
> so doubly undesirable where avoidable.

This can be tested by writing some custom logic: instead of
writing out a page, compress it and stash in memory (or
ramdisk). The test can write highly compressable patterns.