9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Lock loop in malloc()
@ 2011-07-25 13:58 Lucio De Re
  2011-07-25 14:40 ` erik quanstrom
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Lucio De Re @ 2011-07-25 13:58 UTC (permalink / raw)
  To: 9fans

While attempting to compile Bison (yeah, still gnawing at that
bone!) I have managed to jam cpp more or less solid.  That's compiling
scan-code-c.c which reduces to compiling scan-code.c.

However, it does not seem to be Bison that's at fault: it seems that an
invocation of alloc() tries to set a lock and never succeeds or gives up.

This is a summary, with some help from acid, subject to some very limited
knowledge on my part:

	term% acid 3208
	/proc/3208/text:386 plan 9 executable
	/sys/lib/acid/port
	/sys/lib/acid/386
	acid: lstk()
	sleep()+0x7 /sys/src/libc/9syscall/sleep.s:5
	lock(lk=0x1f2f8)+0xb7 /sys/src/libc/port/lock.c:25
	plock(p=0x18310)+0x16 /sys/src/libc/port/malloc.c:81
	poolalloc(p=0x18310,n=0x20)+0xf /sys/src/libc/port/pool.c:1223
	malloc(size=0x18)+0x1c /sys/src/libc/port/malloc.c:207
	domalloc(size=0x18)+0xf /sys/src/cmd/cpp/cpp.c:271
	lookup(tp=0xab4a8,install=0x1)+0x74 /sys/src/cmd/cpp/nlist.c:213
	dodefine(trp=0xdfffeeac)+0x40 /sys/src/cmd/cpp/macro.c:23
	control(trp=0xdfffeeac)+0x4b2 /sys/src/cmd/cpp/cpp.c:133
	process(trp=0xdfffeeac)+0xec /sys/src/cmd/cpp/cpp.c:70
	main(argc=0xb,argv=0xdfffef0c)+0x8a /sys/src/cmd/cpp/cpp.c:35
	_main+0x31 /sys/src/libc/386/main9.s:16
	acid: lstk()
	sleep()+0x7 /sys/src/libc/9syscall/sleep.s:5
	lock(lk=0x1f2f8)+0xb7 /sys/src/libc/port/lock.c:25
		i=0x3e8
	plock(p=0x18310)+0x16 /sys/src/libc/port/malloc.c:81
		pv=0x1f2f8
	poolalloc(p=0x18310,n=0x20)+0xf /sys/src/libc/port/pool.c:1223
		v=0x1928
	malloc(size=0x18)+0x1c /sys/src/libc/port/malloc.c:207
		v=0x18310
	domalloc(size=0x18)+0xf /sys/src/cmd/cpp/cpp.c:271
		p=0x0
	lookup(tp=0xab4a8,install=0x1)+0x74 /sys/src/cmd/cpp/nlist.c:213
		h=0x6f
		np=0x1
	dodefine(trp=0xdfffeeac)+0x40 /sys/src/cmd/cpp/macro.c:23
		dots=0x0
		tp=0xab4a8
		np=0x27100
		args=0x3e
		narg=0x204b6
		err=0x6
		atp=0x27140
		def=0x3876
		tap=0x204b6
	control(trp=0xdfffeeac)+0x4b2 /sys/src/cmd/cpp/cpp.c:133
		tp=0xab498
		np=0x27100
	process(trp=0xdfffeeac)+0xec /sys/src/cmd/cpp/cpp.c:70
		anymacros=0x80000020
	main(argc=0xb,argv=0xdfffef0c)+0x8a /sys/src/cmd/cpp/cpp.c:35
		ebuf=0x3a707063
		tr=0xab498
	_main+0x31 /sys/src/libc/386/main9.s:16
	acid:

I've no idea how to track this problem down, let alone fix it.  But this
problem is reproducible, albeit not using a small code base.  It is
mildly possible that my Plan 9 installation is not altogether pristine
and is causing this situation, but I can't think how.

++L



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
  2011-07-25 13:58 [9fans] Lock loop in malloc() Lucio De Re
@ 2011-07-25 14:40 ` erik quanstrom
  2011-07-25 14:42 ` Russ Cox
  2011-07-26  1:38 ` erik quanstrom
  2 siblings, 0 replies; 11+ messages in thread
From: erik quanstrom @ 2011-07-25 14:40 UTC (permalink / raw)
  To: lucio, 9fans

could you snap(4) this process and mail me/put on sources the
compressed snap?  it's not really possible for this lock to be
held unless cpp has stepped on its lock and the resulting garbage
makes it look like the lock is set.

if you want to try some things yourself, i'm going to run
	; 8c -a /sys/src/cmd/cpp/macro.c > cpp.acid
	; acid -lcpp.acid $pid
	; (Lock)0x1f2f8
	; dump(0x1f2f8, 16, "\X")

to start off with and consider what to do next based on
the results.

- erik



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
  2011-07-25 13:58 [9fans] Lock loop in malloc() Lucio De Re
  2011-07-25 14:40 ` erik quanstrom
@ 2011-07-25 14:42 ` Russ Cox
  2011-07-25 15:17   ` Lucio De Re
  2011-07-26  1:38 ` erik quanstrom
  2 siblings, 1 reply; 11+ messages in thread
From: Russ Cox @ 2011-07-25 14:42 UTC (permalink / raw)
  To: lucio, Fans of the OS Plan 9 from Bell Labs

> However, it does not seem to be Bison that's at fault: it seems that an
> invocation of alloc() tries to set a lock and never succeeds or gives up.

It's possible that you've found a latent bug in malloc.
However, that malloc has been running along pretty
steadily for a decade at this point, so it wouldn't be
my first guess.  My first guess would be that something
in Bison or in the code you added has corrupted memory,
so that the lock has been overwritten with garbage and
therefore cannot be acquired.

The address passed to lock - 0x1f2f8 in the trace -
should be the address of the symbol sbrkmempriv.
I assume it will be, but check (if not, there's other
memory corruption).  Assuming it is, that's in the bss
so the most likely culprits for corruption are the
symbols near it: run nm | sort and look around.

Another thing to do would be to take the bison code
you are compiling to a Linux box and run it under
valgrind.

Russ


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
  2011-07-25 14:42 ` Russ Cox
@ 2011-07-25 15:17   ` Lucio De Re
  2011-07-25 16:12     ` Russ Cox
  2011-07-25 17:51     ` erik quanstrom
  0 siblings, 2 replies; 11+ messages in thread
From: Lucio De Re @ 2011-07-25 15:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Jul 25, 2011 at 10:42:11AM -0400, Russ Cox wrote:
>
> > However, it does not seem to be Bison that's at fault: it seems that an
> > invocation of alloc() tries to set a lock and never succeeds or gives up.
>
> It's possible that you've found a latent bug in malloc.
> However, that malloc has been running along pretty
> steadily for a decade at this point, so it wouldn't be
> my first guess.  My first guess would be that something
> in Bison or in the code you added has corrupted memory,
> so that the lock has been overwritten with garbage and
> therefore cannot be acquired.
>
Well, there has to be a problem, I agree that malloc() is used too
extensively in Plan 9 to only reveal a fault at this time.  The same may
be said of cpp, but it's more likely that something evil has been lurking
in there.  I really hope that it is not something I have done that causes
the problem, but I really can't see how that would be possible without
cpp's cooperation.

> The address passed to lock - 0x1f2f8 in the trace -
> should be the address of the symbol sbrkmempriv.
> I assume it will be, but check (if not, there's other
> memory corruption).  Assuming it is, that's in the bss
> so the most likely culprits for corruption are the
> symbols near it: run nm | sort and look around.
>
Following Erik's direction, it seems that the lock value is 0x0deadead,
so I will start with the premise that a problem has been detected, but
not fatally.  I'll need to dig into cpp, then.  Are there known limits
in cpp's input sizes?

> Another thing to do would be to take the bison code
> you are compiling to a Linux box and run it under
> valgrind.
>
I have heard good reports regarding valgrind, but it is totally foreign
to me, I'lll resort to that when I have no alternative left.  Thanks for
the advice, please forgive me for not following it immediately.

++L



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
  2011-07-25 15:17   ` Lucio De Re
@ 2011-07-25 16:12     ` Russ Cox
  2011-07-25 17:51     ` erik quanstrom
  1 sibling, 0 replies; 11+ messages in thread
From: Russ Cox @ 2011-07-25 16:12 UTC (permalink / raw)
  To: lucio, Fans of the OS Plan 9 from Bell Labs

>> The address passed to lock - 0x1f2f8 in the trace -
>> should be the address of the symbol sbrkmempriv.
>> I assume it will be, but check (if not, there's other
>> memory corruption).  Assuming it is, that's in the bss
>> so the most likely culprits for corruption are the
>> symbols near it: run nm | sort and look around.
>>
> Following Erik's direction, it seems that the lock value is 0x0deadead,
> so I will start with the premise that a problem has been detected, but
> not fatally.  I'll need to dig into cpp, then.  Are there known limits
> in cpp's input sizes?

The lock value being 0x0deadead was a near certainty,
since that's what lock - the function - writes when trying
to acquire it.  It probably had the wrong value to begin
with, but that value has been lost.

I missed that you were hitting a bug in cpp, not bison.
My suggestion about running nm still applies,
and I would take Erik up on his snap offer.

Russ


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
  2011-07-25 15:17   ` Lucio De Re
  2011-07-25 16:12     ` Russ Cox
@ 2011-07-25 17:51     ` erik quanstrom
  2011-07-25 19:31       ` Russ Cox
  1 sibling, 1 reply; 11+ messages in thread
From: erik quanstrom @ 2011-07-25 17:51 UTC (permalink / raw)
  To: lucio, 9fans

> >
> Following Erik's direction, it seems that the lock value is 0x0deadead,

btw, there appears to be a typo in /sys/src/libc/386/tas.s.  it should be spelled
to match other arches.  for some reason all other arches use 0xdeaddead.
unfortunately, pool also uses 0xdeaddead so this typo is fortunate.

- erik



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
  2011-07-25 17:51     ` erik quanstrom
@ 2011-07-25 19:31       ` Russ Cox
  0 siblings, 0 replies; 11+ messages in thread
From: Russ Cox @ 2011-07-25 19:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs; +Cc: lucio

>> Following Erik's direction, it seems that the lock value is 0x0deadead,
>
> btw, there appears to be a typo in /sys/src/libc/386/tas.s.  it should be spelled
> to match other arches.  for some reason all other arches use 0xdeaddead.
> unfortunately, pool also uses 0xdeaddead so this typo is fortunate.

it still didn't give you any information.
you could have deduced that value from the program counter.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
  2011-07-25 13:58 [9fans] Lock loop in malloc() Lucio De Re
  2011-07-25 14:40 ` erik quanstrom
  2011-07-25 14:42 ` Russ Cox
@ 2011-07-26  1:38 ` erik quanstrom
  2011-07-26  1:41   ` erik quanstrom
                     ` (2 more replies)
  2 siblings, 3 replies; 11+ messages in thread
From: erik quanstrom @ 2011-07-26  1:38 UTC (permalink / raw)
  To: lucio, 9fans

well, this was a fun little bug.  i downloaded bison and within a few
minutes i'd narrowed the problem down to lib/c-ctype.h.  and
it only took another minute to isolate this as the problem statement.

#if (' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \
    && ('%' == 37) && ('&' == 38) && ('\'' == 39) && ('(' == 40) \
    && (')' == 41) && ('*' == 42) && ('+' == 43) && (',' == 44) \
    && ('-' == 45) && ('.' == 46) && ('/' == 47) && ('0' == 48) \
    && ('1' == 49) && ('2' == 50) && ('3' == 51) && ('4' == 52) \
    && ('5' == 53) && ('6' == 54) && ('7' == 55) && ('8' == 56) \
    && ('9' == 57) && (':' == 58) && (';' == 59) && ('<' == 60) \
    && ('=' == 61) && ('>' == 62) && ('?' == 63) && ('A' == 65) \
    && ('B' == 66) && ('C' == 67) && ('D' == 68) && ('E' == 69) \
    && ('F' == 70) && ('G' == 71) && ('H' == 72) && ('I' == 73) \
    && ('J' == 74) && ('K' == 75) && ('L' == 76) && ('M' == 77) \
    && ('N' == 78) && ('O' == 79) && ('P' == 80) && ('Q' == 81) \
    && ('R' == 82) && ('S' == 83) && ('T' == 84) && ('U' == 85) \
    && ('V' == 86) && ('W' == 87) && ('X' == 88) && ('Y' == 89) \
    && ('Z' == 90) && ('[' == 91) && ('\\' == 92) && (']' == 93) \
    && ('^' == 94) && ('_' == 95) && ('a' == 97) && ('b' == 98) \
    && ('c' == 99) && ('d' == 100) && ('e' == 101) && ('f' == 102) \
    && ('g' == 103) && ('h' == 104) && ('i' == 105) && ('j' == 106) \
    && ('k' == 107) && ('l' == 108) && ('m' == 109) && ('n' == 110) \
    && ('o' == 111) && ('p' == 112) && ('q' == 113) && ('r' == 114) \
    && ('s' == 115) && ('t' == 116) && ('u' == 117) && ('v' == 118) \
    && ('w' == 119) && ('x' == 120) && ('y' == 121) && ('z' == 122) \
    && ('{' == 123) && ('|' == 124) && ('}' == 125) && ('~' == 126)
/* The character set is ASCII or one of its variants or extensions, not EBCDIC.
   Testing the value of '\n' and '\r' is not relevant.  */
#define C_CTYPE_ASCII 1
#endif

from there, the problem was pretty easy to spot NSTAK was too small,
and unguarded.  the funny  "+ 1" is to allow for a few operators that
can add 2 to the stack in one trip through the loop.

; diffy -c eval.c
/n/dump/2011/0725/sys/src/cmd/cpp/eval.c:2,8 - eval.c:2,8
  #include <libc.h>
  #include "cpp.h"

- #define	NSTAK	32
+ #define	NSTAK	1024
  #define	SGN	0
  #define	UNS	1
  #define	UND	2
/n/dump/2011/0725/sys/src/cmd/cpp/eval.c:92,99 - eval.c:92,99

  int	evalop(struct pri);
  struct	value tokval(Token *);
- struct value vals[NSTAK], *vp;
- enum toktype ops[NSTAK], *op;
+ struct value vals[NSTAK + 1], *vp;
+ enum toktype ops[NSTAK + 1], *op;

  /*
   * Evaluate an #if #elif #ifdef #ifndef line.  trp->tp points to the keyword.
/n/dump/2011/0725/sys/src/cmd/cpp/eval.c:122,127 - eval.c:122,129
  	op = ops;
  	*op++ = END;
  	for (rand=0, tp = trp->bp+ntok; tp < trp->lp; tp++) {
+ 		if(op >= ops + NSTAK)
+ 			sysfatal("cpp: can't evalute #if: increase NSTAK");
  		switch(tp->type) {
  		case WS:
  		case NL:

- erik



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
  2011-07-26  1:38 ` erik quanstrom
@ 2011-07-26  1:41   ` erik quanstrom
  2011-07-26  1:56   ` Russ Cox
       [not found]   ` <CADSkJJUmVYNdy_sUqqM34xdXD9CiWyUUEr89uxouAJ0ydVLpHQ@mail.gmail.c>
  2 siblings, 0 replies; 11+ messages in thread
From: erik quanstrom @ 2011-07-26  1:41 UTC (permalink / raw)
  To: 9fans

patch submitted.  /n/sources/patch/cppbigif

- erik



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
  2011-07-26  1:38 ` erik quanstrom
  2011-07-26  1:41   ` erik quanstrom
@ 2011-07-26  1:56   ` Russ Cox
       [not found]   ` <CADSkJJUmVYNdy_sUqqM34xdXD9CiWyUUEr89uxouAJ0ydVLpHQ@mail.gmail.c>
  2 siblings, 0 replies; 11+ messages in thread
From: Russ Cox @ 2011-07-26  1:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs; +Cc: lucio

no one appreciates nm.

   1b1b8 B outbuf
   1f1b8 b nbuf$1
   1f1e0 b _dtoalk
   1f1e8 B ops
   1f268 B ifsatisfied
   1f2e8 B _stdiolk
   1f2f8 B sbrkmempriv


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] Lock loop in malloc()
       [not found]   ` <CADSkJJUmVYNdy_sUqqM34xdXD9CiWyUUEr89uxouAJ0ydVLpHQ@mail.gmail.c>
@ 2011-07-26  4:01     ` erik quanstrom
  0 siblings, 0 replies; 11+ messages in thread
From: erik quanstrom @ 2011-07-26  4:01 UTC (permalink / raw)
  To: 9fans

On Mon Jul 25 21:57:56 EDT 2011, rsc@swtch.com wrote:
> no one appreciates nm.
>
>    1b1b8 B outbuf
>    1f1b8 b nbuf$1
>    1f1e0 b _dtoalk
>    1f1e8 B ops
>    1f268 B ifsatisfied
>    1f2e8 B _stdiolk
>    1f2f8 B sbrkmempriv

nice.

- erik



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-07-26  4:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-25 13:58 [9fans] Lock loop in malloc() Lucio De Re
2011-07-25 14:40 ` erik quanstrom
2011-07-25 14:42 ` Russ Cox
2011-07-25 15:17   ` Lucio De Re
2011-07-25 16:12     ` Russ Cox
2011-07-25 17:51     ` erik quanstrom
2011-07-25 19:31       ` Russ Cox
2011-07-26  1:38 ` erik quanstrom
2011-07-26  1:41   ` erik quanstrom
2011-07-26  1:56   ` Russ Cox
     [not found]   ` <CADSkJJUmVYNdy_sUqqM34xdXD9CiWyUUEr89uxouAJ0ydVLpHQ@mail.gmail.c>
2011-07-26  4:01     ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).