9front - general discussion about 9front
 help / color / mirror / Atom feed
* Kernel memory leak
@ 2020-08-21 15:40 Alex Musolino
  2020-08-23  1:20 ` [9front] " cinap_lenrek
  2020-08-23  3:07 ` cinap_lenrek
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Musolino @ 2020-08-21 15:40 UTC (permalink / raw)
  To: 9front

Hi all,

As has been mentioned on #cat-v, some of us have known for a while now
that there is a memory leak in the 9front kernel.  Each month or so my
vultr VPS needs a reboot because it has run out of kernel memory.
Usually I reboot it just before this happens to minimise downtime.
Anyway, this week I finally tracked it down and now have a fix.

I noticed some time ago that there were small step changes in the
amount of kernel memory being used every minute, pretty well exactly.
Eventually I realised that there was a cron job running every minute;
disabling the cron job stopped the leak.

After some investigation I found that the following sequence is enough
to trigger the bug:

	@{rfork n; mount -a '#s/boot' /mnt/root; bind /mnt/root /}

This is more or less what cron(8) does when it runs a job in a new
namespace (see the first 2 line of /lib/namespace).  Each invocation
will leak one Mhead object, one Chan object, and some Path objects.
You can verify this yourself with kmem(1).

Here's what happens.  The initial mount triggers a special case in
namec which causes an Mhead to be allocated for the old Chan object
along with a Mount referencing the same Chan:

	if(m == nil){
		/*
		 *  nothing mounted here yet.  create a mount
		 *  head and add to the hash table.
		 */
		m = newmhead(old);
		*l = m;

		/*
		 *  if this is a union mount, add the old
		 *  node to the mount chain.
		 */
		if(order != MREPL)
			m->mount = newmount(old, 0, nil);
	}

Both newmhead and newmount bump the refcount of the old Chan.  This is
fine until the subsequent bind(2) where the original Chan is updated to
point back to the Mhead (see the Abind case in namec) and the refcount
of the Mhead object is incremented.  Now we have cycle and neither the
Mhead nor the Chan can or will be freed.

My fix (below) changes findmount to allocate a new Chan object if it
would otherwise create a cycle.  I've done only light testing but the
leak is gone and there don't seem to be any other ill effects.

Thoughts?

diff -r ec3da4e6c943 sys/src/9/port/chan.c
--- a/sys/src/9/port/chan.c	Fri Aug 21 10:06:22 2020 +0930
+++ b/sys/src/9/port/chan.c	Sat Aug 22 00:46:08 2020 +0930
@@ -864,6 +864,8 @@
 			}
 			if(*cp != nil)
 				cclose(*cp);
+			if(to == m->from)
+				to = cunique(to);
 			*cp = to;
 			return 1;
 		}

--
Cheers,
Alex Musolino


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-23  3:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-21 15:40 Kernel memory leak Alex Musolino
2020-08-23  1:20 ` [9front] " cinap_lenrek
2020-08-23  2:36   ` Alex Musolino
2020-08-23  3:07 ` cinap_lenrek
2020-08-23  3:16   ` Alex Musolino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).