9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Russ Cox" <rsc@swtch.com>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] kernel memory allocator got confused?
Date: Thu, 22 Nov 2007 11:14:58 -0500	[thread overview]
Message-ID: <20071122161448.F3C991E8C22@holo.morphisms.net> (raw)
In-Reply-To: <d81908eb510fb7c7281f68d2486a7e9b@quintile.net>

steve simon:
> aux/cifs: 146 long share names too long for RAP (>13 chars)
> 23795 cda: checked 8223 page table entries
> 23849 mk: checked 53 page table entries
> 23851 mk: checked 50 page table entries
> 23853 mk: checked 53 page table entries
> 6308 rc: checked 48 page table entries
> mem user overflow
> pool sbrkmem block 435a70
be 8d d8 ef

> hdr 0a110c09 0014b220 0000c847 0000c847 002e002e 002e002e
> tail 00000000 00000000 00000000 00000000 00000000 00000000 | efd88dbe 0014b220
> user data 00 00 00 00  00 00 00 00 | fe d1 f0 fa  00 00 00 00
> panic: pool panic
> 47 rio: checked 1190 page table entries
> rio 47: suicide: sys: trap: fault read addr=0x0 pc=0x00028fe0
> init: rc exit status: rio 47: sys: trap: fault read addr=0x0 pc=0x00028fe0
> 23857 maild: checked 45 page table entries
> rc: note: sys: trap: fault write addr=0xfffffff9 pc=0x0000d6a8
> maild 23857: suicide: sys: trap: fault write addr=0xfffffff9 pc=0x0000d6a8
> 
> init: starting /bin/rc
> larch% larch% sat: '/bin/sat' file does not exist
> larch% i: '/bin/i' file does not exist
> larch% i: '/bin/i' file does not exist
> assert failed: (*t)->magic == FREE_MAGIC
> 23868 rc: checked 48 page table entries
> rc: note: sys: trap: fault read addr=0x0 pc=0x0000fbcc
> rc 23868: suicide: sys: trap: fault read addr=0x0 pc=0x0000fbcc
> larch% i: '/bin/i' file does not exist
> assert failed: (*t)->magic == FREE_MAGIC
> 23870 rc: checked 48 page table entries
> rc: note: sys: trap: fault read addr=0x0 pc=0x0000fbcc
> rc 23870: suicide: sys: trap: fault read addr=0x0 pc=0x0000fbcc
> 23863 rc: checked 48 page table entries
> rc: note: sys: trap: fault read addr=0x657669 pc=0x00011301
> rc 23863: suicide: sys: trap: fault read addr=0x657669 pc=0x00011301
> init: rc exit status: rc 23863: sys: trap: fault read addr=0x657669 pc=0x00011301

there aren't many hard numbers above.

the original malloc panic in rio is interesting, though.  the block in question
is 1.3MB, with 36 kB of extra unused space beyond what was asked for.
so you'd have to write far beyond the end to really cause significant
corruption.  also, it's a rune buffer (002e 002e 002e 002e is ....)
allocated at /sys/src/cmd/rio/wind.c:1639.  the header and tail are
intact, and the end of user data, marked with the |, has not been
reached.  the text hasn't gotten that far (it's all zeros to the left of the |).
the bytes to the right of the | are supposed to be fe f1 f0 fa (fe fi fo fum)
but the f1 has turned into a d1 - it lost its 0x20 bit.
since rio isn't the kind of program that goes around flipping bits in memory, 
i wonder if your memory is on the fritz.  

it would be more useful if the assert said what (*t)->magic was (my fault).
i wonder if it too was something with just a bit wrong, indicating that
the cached rc text image in the kernel had lost a bit too.

obviously it's possible that the kernel screwed up in some mysterious way.
but in this instance, i'm more inclined to suspect memory problems.

erik quanstrom:
> the problem appears to be detected here:
> 
> /sys/src/libc/port/pool.c:833: 					printblock(p, b, "mem user overflow");
> 
> so either your memory has gone wonky or the kernel is writing past the
> end of the buffer.  
> 
> this must be in the kernel where the overflow is occuring.

no.  those are all user programs dying, including the very first dump.
malloc prints "panic: pool panic" even when running in user programs.
if the kernel had panicked, it would have stopped running (this isn't linux!).

the message says it was the block at 435a70, which is a user address.

POOL_TOLERANCE would not have printed "panic: pool panic".
it would have just done the "mem user overflow" block dump and
continued.

POOL_TOLERANCE only tolerates overflowing the block by a
single extra zero byte, to diagnose the common mistake of

	x = malloc(strlen(s));
	strcpy(x, s);

but be able to continue executing.  it would have tolerated:

	# user data 00 00 00 00  00 00 00 00 | 00 f1 f0 fa  00 00 00 00

but that's not what happened.

russ


  parent reply	other threads:[~2007-11-22 16:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-21 13:02 Steve Simon
2007-11-21 13:22 ` erik quanstrom
2007-11-22 16:14 ` Russ Cox [this message]
2007-11-22 17:20   ` erik quanstrom
2007-11-22 23:04   ` Steve Simon
2007-11-22 23:09     ` Uriel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071122161448.F3C991E8C22@holo.morphisms.net \
    --to=rsc@swtch.com \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).