9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Race condition in /sys/src/9/pc/trap.c?
@ 2009-07-30  4:03 Matthew J Jones
  2009-07-30 11:27 ` erik quanstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew J Jones @ 2009-07-30  4:03 UTC (permalink / raw)
  To: 9fans

My familiarity with the kernel source code is superficial to say the
least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
contains a race condition:

  702         if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD))
  703             validaddr(sp, sizeof(Sargs)+BY2WD, 0);
  704
  705         up->s = *((Sargs*)(sp+BY2WD));

We verify that the address is correct; is there any reason another thread
in the same address space cannot start running after line 703 completes
and unmap that memory, causing us to access unmapped memory at line 705?
The system call entry is itself an interrupt gate, but line 689 is
spllo(), and we appear to hold no locks at this point.

(Please forgive me if my terminology is not in line: as I said, my
experience is very limited and I am just beginning to explore the source).

-- Elly



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30  4:03 [9fans] Race condition in /sys/src/9/pc/trap.c? Matthew J Jones
@ 2009-07-30 11:27 ` erik quanstrom
  2009-07-30 14:25   ` Elizabeth Jones
  0 siblings, 1 reply; 21+ messages in thread
From: erik quanstrom @ 2009-07-30 11:27 UTC (permalink / raw)
  To: 9fans

On Thu Jul 30 00:05:45 EDT 2009, elly1@andrew.cmu.edu wrote:
> My familiarity with the kernel source code is superficial to say the
> least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
> contains a race condition:
>
>   702         if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD))
>   703             validaddr(sp, sizeof(Sargs)+BY2WD, 0);
>   704
>   705         up->s = *((Sargs*)(sp+BY2WD));
>
> We verify that the address is correct; is there any reason another thread
> in the same address space cannot start running after line 703 completes
> and unmap that memory, causing us to access unmapped memory at line 705?
> The system call entry is itself an interrupt gate, but line 689 is
> spllo(), and we appear to hold no locks at this point.

plan 9 threads are cooperatively scheduled.  so
the correct term is proc.  but you are correct,
another proc sharing memory with this one
could be running.  however, that proc would
not have access to this proc's stack.  (rfork
doesn't allow shared stack.)  and even if it
did, plan 9 stacks don't shrink.

let's suppose that the address is invalid later.
the kernel always moves data to/from user
buffers outside of any locks because even
valid targets may be paged out.  if the address
is truely invalid, waserror() will be true and
the else branch starting at 714 will be executed

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30 11:27 ` erik quanstrom
@ 2009-07-30 14:25   ` Elizabeth Jones
  2009-07-30 14:37     ` Sape Mullender
  2009-07-30 14:39     ` erik quanstrom
  0 siblings, 2 replies; 21+ messages in thread
From: Elizabeth Jones @ 2009-07-30 14:25 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, 30 Jul 2009, erik quanstrom wrote:

> On Thu Jul 30 00:05:45 EDT 2009, elly1@andrew.cmu.edu wrote:
>> My familiarity with the kernel source code is superficial to say the
>> least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
>> contains a race condition:
>>
>>   702         if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD))
>>   703             validaddr(sp, sizeof(Sargs)+BY2WD, 0);
>>   704
>>   705         up->s = *((Sargs*)(sp+BY2WD));
>>
>> We verify that the address is correct; is there any reason another thread
>> in the same address space cannot start running after line 703 completes
>> and unmap that memory, causing us to access unmapped memory at line 705?
>> The system call entry is itself an interrupt gate, but line 689 is
>> spllo(), and we appear to hold no locks at this point.
>
> plan 9 threads are cooperatively scheduled.  so
> the correct term is proc.  but you are correct,
> another proc sharing memory with this one
> could be running.  however, that proc would
> not have access to this proc's stack.  (rfork
> doesn't allow shared stack.)  and even if it
> did, plan 9 stacks don't shrink.

What if sp points inside a segment which is not the actual stack segment?
Then could someone else come along and segdetach() it in between the two
mentioned lines?

> let's suppose that the address is invalid later.
> the kernel always moves data to/from user
> buffers outside of any locks because even
> valid targets may be paged out.  if the address
> is truely invalid, waserror() will be true and
> the else branch starting at 714 will be executed
>
> - erik

-- Elly



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30 14:25   ` Elizabeth Jones
@ 2009-07-30 14:37     ` Sape Mullender
  2009-07-30 15:01       ` cinap_lenrek
                         ` (2 more replies)
  2009-07-30 14:39     ` erik quanstrom
  1 sibling, 3 replies; 21+ messages in thread
From: Sape Mullender @ 2009-07-30 14:37 UTC (permalink / raw)
  To: 9fans; +Cc: geoff, jmk

> On Thu, 30 Jul 2009, erik quanstrom wrote:
>
>> On Thu Jul 30 00:05:45 EDT 2009, elly1@andrew.cmu.edu wrote:
>>> My familiarity with the kernel source code is superficial to say the
>>> least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
>>> contains a race condition:
>>>
>>>   702         if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD))
>>>   703             validaddr(sp, sizeof(Sargs)+BY2WD, 0);
>>>   704
>>>   705         up->s = *((Sargs*)(sp+BY2WD));
>>>
>>> We verify that the address is correct; is there any reason another thread
>>> in the same address space cannot start running after line 703 completes
>>> and unmap that memory, causing us to access unmapped memory at line 705?
>>> The system call entry is itself an interrupt gate, but line 689 is
>>> spllo(), and we appear to hold no locks at this point.
>>
>> plan 9 threads are cooperatively scheduled.  so
>> the correct term is proc.  but you are correct,
>> another proc sharing memory with this one
>> could be running.  however, that proc would
>> not have access to this proc's stack.  (rfork
>> doesn't allow shared stack.)  and even if it
>> did, plan 9 stacks don't shrink.
>
> What if sp points inside a segment which is not the actual stack segment?
> Then could someone else come along and segdetach() it in between the two
> mentioned lines?
>
>> let's suppose that the address is invalid later.
>> the kernel always moves data to/from user
>> buffers outside of any locks because even
>> valid targets may be paged out.  if the address
>> is truely invalid, waserror() will be true and
>> the else branch starting at 714 will be executed
>>
>> - erik
>
> -- Elly

I think you may be right, Elly.  Multithreaded programs indeed have their
stack running outside the stack segment, so this could happen there.
splhi won't even do on a multiprocessor.  One should probably lock down
the segment.
We've never seen this happen, of course — or rather, we haven't noticed
this as the cause of a crash.

	Sape




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30 14:25   ` Elizabeth Jones
  2009-07-30 14:37     ` Sape Mullender
@ 2009-07-30 14:39     ` erik quanstrom
  1 sibling, 0 replies; 21+ messages in thread
From: erik quanstrom @ 2009-07-30 14:39 UTC (permalink / raw)
  To: 9fans

> > plan 9 threads are cooperatively scheduled.  so
> > the correct term is proc.  but you are correct,
> > another proc sharing memory with this one
> > could be running.  however, that proc would
> > not have access to this proc's stack.  (rfork
> > doesn't allow shared stack.)  and even if it
> > did, plan 9 stacks don't shrink.
>
> What if sp points inside a segment which is not the actual stack segment?
> Then could someone else come along and segdetach() it in between the two
> mentioned lines?

see below:

> > let's suppose that the address is invalid later.
> > the kernel always moves data to/from user
> > buffers outside of any locks because even
> > valid targets may be paged out.  if the address
> > is truely invalid, waserror() will be true and
> > the else branch starting at 714 will be executed

if you think this is possible, why don't you build a
test case and prove that it can happen.  the easiest
way would be to disable the check completely.

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30 14:37     ` Sape Mullender
@ 2009-07-30 15:01       ` cinap_lenrek
  2009-07-30 15:24         ` cinap_lenrek
  2009-07-30 15:17       ` cinap_lenrek
  2009-07-30 15:20       ` erik quanstrom
  2 siblings, 1 reply; 21+ messages in thread
From: cinap_lenrek @ 2009-07-30 15:01 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 199 bytes --]

hm... what about the other stuff port/sysfile.c?

they also just checks pointers with validaddr and then call
into the device handler without locking anything / holding
anything.

--
cinap


[-- Attachment #2: Type: message/rfc822, Size: 4214 bytes --]

From: Sape Mullender <sape@plan9.bell-labs.com>
To: 9fans@9fans.net
Cc: geoff@plan9.bell-labs.com, jmk@plan9.bell-labs.com
Subject: Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
Date: Thu, 30 Jul 2009 16:37:50 +0200
Message-ID: <a77443fd0762504eb1179278b34d0351@plan9.cs.bell-labs.com>

> On Thu, 30 Jul 2009, erik quanstrom wrote:
>
>> On Thu Jul 30 00:05:45 EDT 2009, elly1@andrew.cmu.edu wrote:
>>> My familiarity with the kernel source code is superficial to say the
>>> least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
>>> contains a race condition:
>>>
>>>   702         if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD))
>>>   703             validaddr(sp, sizeof(Sargs)+BY2WD, 0);
>>>   704
>>>   705         up->s = *((Sargs*)(sp+BY2WD));
>>>
>>> We verify that the address is correct; is there any reason another thread
>>> in the same address space cannot start running after line 703 completes
>>> and unmap that memory, causing us to access unmapped memory at line 705?
>>> The system call entry is itself an interrupt gate, but line 689 is
>>> spllo(), and we appear to hold no locks at this point.
>>
>> plan 9 threads are cooperatively scheduled.  so
>> the correct term is proc.  but you are correct,
>> another proc sharing memory with this one
>> could be running.  however, that proc would
>> not have access to this proc's stack.  (rfork
>> doesn't allow shared stack.)  and even if it
>> did, plan 9 stacks don't shrink.
>
> What if sp points inside a segment which is not the actual stack segment?
> Then could someone else come along and segdetach() it in between the two
> mentioned lines?
>
>> let's suppose that the address is invalid later.
>> the kernel always moves data to/from user
>> buffers outside of any locks because even
>> valid targets may be paged out.  if the address
>> is truely invalid, waserror() will be true and
>> the else branch starting at 714 will be executed
>>
>> - erik
>
> -- Elly

I think you may be right, Elly.  Multithreaded programs indeed have their
stack running outside the stack segment, so this could happen there.
splhi won't even do on a multiprocessor.  One should probably lock down
the segment.
We've never seen this happen, of course — or rather, we haven't noticed
this as the cause of a crash.

	Sape

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30 14:37     ` Sape Mullender
  2009-07-30 15:01       ` cinap_lenrek
@ 2009-07-30 15:17       ` cinap_lenrek
  2009-07-30 15:54         ` Charles Forsyth
  2009-07-30 15:20       ` erik quanstrom
  2 siblings, 1 reply; 21+ messages in thread
From: cinap_lenrek @ 2009-07-30 15:17 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 147 bytes --]

ok, forget it... the segments are actualy reference counted.
so a process would not be able to unmap a segment in
another process.

--
cinap

[-- Attachment #2: Type: message/rfc822, Size: 4214 bytes --]

From: Sape Mullender <sape@plan9.bell-labs.com>
To: 9fans@9fans.net
Cc: geoff@plan9.bell-labs.com, jmk@plan9.bell-labs.com
Subject: Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
Date: Thu, 30 Jul 2009 16:37:50 +0200
Message-ID: <a77443fd0762504eb1179278b34d0351@plan9.cs.bell-labs.com>

> On Thu, 30 Jul 2009, erik quanstrom wrote:
>
>> On Thu Jul 30 00:05:45 EDT 2009, elly1@andrew.cmu.edu wrote:
>>> My familiarity with the kernel source code is superficial to say the
>>> least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
>>> contains a race condition:
>>>
>>>   702         if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD))
>>>   703             validaddr(sp, sizeof(Sargs)+BY2WD, 0);
>>>   704
>>>   705         up->s = *((Sargs*)(sp+BY2WD));
>>>
>>> We verify that the address is correct; is there any reason another thread
>>> in the same address space cannot start running after line 703 completes
>>> and unmap that memory, causing us to access unmapped memory at line 705?
>>> The system call entry is itself an interrupt gate, but line 689 is
>>> spllo(), and we appear to hold no locks at this point.
>>
>> plan 9 threads are cooperatively scheduled.  so
>> the correct term is proc.  but you are correct,
>> another proc sharing memory with this one
>> could be running.  however, that proc would
>> not have access to this proc's stack.  (rfork
>> doesn't allow shared stack.)  and even if it
>> did, plan 9 stacks don't shrink.
>
> What if sp points inside a segment which is not the actual stack segment?
> Then could someone else come along and segdetach() it in between the two
> mentioned lines?
>
>> let's suppose that the address is invalid later.
>> the kernel always moves data to/from user
>> buffers outside of any locks because even
>> valid targets may be paged out.  if the address
>> is truely invalid, waserror() will be true and
>> the else branch starting at 714 will be executed
>>
>> - erik
>
> -- Elly

I think you may be right, Elly.  Multithreaded programs indeed have their
stack running outside the stack segment, so this could happen there.
splhi won't even do on a multiprocessor.  One should probably lock down
the segment.
We've never seen this happen, of course — or rather, we haven't noticed
this as the cause of a crash.

	Sape

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30 14:37     ` Sape Mullender
  2009-07-30 15:01       ` cinap_lenrek
  2009-07-30 15:17       ` cinap_lenrek
@ 2009-07-30 15:20       ` erik quanstrom
  2 siblings, 0 replies; 21+ messages in thread
From: erik quanstrom @ 2009-07-30 15:20 UTC (permalink / raw)
  To: 9fans

> I think you may be right, Elly.  Multithreaded programs indeed have their
> stack running outside the stack segment, so this could happen there.
> splhi won't even do on a multiprocessor.  One should probably lock down
> the segment.
> We've never seen this happen, of course — or rather, we haven't noticed
> this as the cause of a crash.

just to beat a dead horse, i disabled the check in question
and ran the following program with an invalid address.
the program faulted and the kernel did not care.

; cat evil.c
#include <u.h>
#include <libc.h>

extern void evil(void);

void
main(void)
{
	evil();
	exits("");
}
; cat evil.s
TEXT evil(SB), $0
	PUSHL	SP
	MOVL	$0xa0000000, SP
	MOVL	$1, AX
	INT	$64
	POPL	SP
	RET
% 8.out
8.out 78: suicide: invalid address 0xa0000000/24 in sys call pc=0x1046

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30 15:01       ` cinap_lenrek
@ 2009-07-30 15:24         ` cinap_lenrek
  0 siblings, 0 replies; 21+ messages in thread
From: cinap_lenrek @ 2009-07-30 15:24 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 105 bytes --]

... but you can still shrink segments for other processes :-)

(this crashes the kernel)

--
cinap

[-- Attachment #2: Type: message/rfc822, Size: 7027 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 199 bytes --]

hm... what about the other stuff port/sysfile.c?

they also just checks pointers with validaddr and then call
into the device handler without locking anything / holding
anything.

--
cinap


[-- Attachment #2.1.2: Type: message/rfc822, Size: 4214 bytes --]

From: Sape Mullender <sape@plan9.bell-labs.com>
To: 9fans@9fans.net
Cc: geoff@plan9.bell-labs.com, jmk@plan9.bell-labs.com
Subject: Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
Date: Thu, 30 Jul 2009 16:37:50 +0200
Message-ID: <a77443fd0762504eb1179278b34d0351@plan9.cs.bell-labs.com>

> On Thu, 30 Jul 2009, erik quanstrom wrote:
>
>> On Thu Jul 30 00:05:45 EDT 2009, elly1@andrew.cmu.edu wrote:
>>> My familiarity with the kernel source code is superficial to say the
>>> least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
>>> contains a race condition:
>>>
>>>   702         if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD))
>>>   703             validaddr(sp, sizeof(Sargs)+BY2WD, 0);
>>>   704
>>>   705         up->s = *((Sargs*)(sp+BY2WD));
>>>
>>> We verify that the address is correct; is there any reason another thread
>>> in the same address space cannot start running after line 703 completes
>>> and unmap that memory, causing us to access unmapped memory at line 705?
>>> The system call entry is itself an interrupt gate, but line 689 is
>>> spllo(), and we appear to hold no locks at this point.
>>
>> plan 9 threads are cooperatively scheduled.  so
>> the correct term is proc.  but you are correct,
>> another proc sharing memory with this one
>> could be running.  however, that proc would
>> not have access to this proc's stack.  (rfork
>> doesn't allow shared stack.)  and even if it
>> did, plan 9 stacks don't shrink.
>
> What if sp points inside a segment which is not the actual stack segment?
> Then could someone else come along and segdetach() it in between the two
> mentioned lines?
>
>> let's suppose that the address is invalid later.
>> the kernel always moves data to/from user
>> buffers outside of any locks because even
>> valid targets may be paged out.  if the address
>> is truely invalid, waserror() will be true and
>> the else branch starting at 714 will be executed
>>
>> - erik
>
> -- Elly

I think you may be right, Elly.  Multithreaded programs indeed have their
stack running outside the stack segment, so this could happen there.
splhi won't even do on a multiprocessor.  One should probably lock down
the segment.
We've never seen this happen, of course — or rather, we haven't noticed
this as the cause of a crash.

	Sape

[-- Attachment #3: crash.c --]
[-- Type: text/plain, Size: 408 bytes --]

#include <u.h>
#include <libc.h>

void
main(int argc, char **argv)
{
	char *buf;
	int fd;

	if((fd = open("/dev/zero", OREAD)) < 0)
		sysfatal("open");

	buf = (char*)0x600000;
	segattach(0, "memory", buf, 2*4096);

	switch(rfork(RFPROC|RFMEM)){
	case -1:
		sysfatal("fork");
		break;
	case 0:
		for(;;){
			read(fd, buf+4096, 4096);
		}
	}
	sleep(1000);
	segbrk(buf, buf+4096);
}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30 15:17       ` cinap_lenrek
@ 2009-07-30 15:54         ` Charles Forsyth
  2009-07-31 13:07           ` Charles Forsyth
  0 siblings, 1 reply; 21+ messages in thread
From: Charles Forsyth @ 2009-07-30 15:54 UTC (permalink / raw)
  To: 9fans

just stop processes with s->ref > 1 from freeing parts of s with ibrk.
it's not as if anything ever does in practice.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-30 15:54         ` Charles Forsyth
@ 2009-07-31 13:07           ` Charles Forsyth
  2009-07-31 13:28             ` Richard Miller
  2009-07-31 17:13             ` cinap_lenrek
  0 siblings, 2 replies; 21+ messages in thread
From: Charles Forsyth @ 2009-07-31 13:07 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 2077 bytes --]

this resulted in a little side discussion.
to save someone else from having to break a strong oath about 9fans,
i'll sum it up. the existing code handles this situation.

when several processes share a segment, and any one of them
decides to shrink the segment, all the processes must see
the change before the pages are made available for re-use.
any one of them could have any of those pages currently in their mmu state.
and any  must be removed from the mmu state local to the process,
to ensure that no further access to the pages is possible before they are freed.
the critical point is that it's irrelevant whether traps or syscalls are
involved: ordinary store instructions are clearly bad enough.

thus /sys/src/9/port/segment.c:/^mfreeseg contains (with s locked)
	/* flush this seg in all other processes */
	if(s->ref > 1)
		procflushseg(s);
and procflushseg finds all processes that share s, sets them all up
to flush their mmu states, and also sets any processor running such a process
to flush its state (that's picked up by a clock interrupt).

procflushseg will not proceed until all processes and processors that
might need to flush state have done so. (s remains locked throughout.)

after the flush, no process or processor can have a reference
to any of the pages in its mmu state.  it is safe to free them,
which mfreeseg does.

now, a process might still be executing a system call or some other trap
that might refer to that segment, to an address that's now been removed
by another process. to access the memory, it must ultimately issue
a load or a store instruction (even for syscalls, such as read or write).
that instruction will trap, because as described above there is no longer
a valid mmu mapping for that address within the process. normally, the
trap will find the right page in the software mmu structures and install the map.
in this case, however, it can't find the address, so it will raise an exception
in the process (ie, send a note). (all such searches are done with the
segment properly locked.)

[-- Attachment #2: Type: message/rfc822, Size: 1746 bytes --]

From: Charles Forsyth <forsyth@terzarima.net>
To: 9fans@9fans.net
Subject: Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
Date: Thu, 30 Jul 2009 16:54:16 +0100
Message-ID: <1a242b28f6d149aec599a1175df52718@terzarima.net>

just stop processes with s->ref > 1 from freeing parts of s with ibrk.
it's not as if anything ever does in practice.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 13:07           ` Charles Forsyth
@ 2009-07-31 13:28             ` Richard Miller
  2009-07-31 17:13             ` cinap_lenrek
  1 sibling, 0 replies; 21+ messages in thread
From: Richard Miller @ 2009-07-31 13:28 UTC (permalink / raw)
  To: 9fans

> ... procflushseg finds all processes that share s, sets them all up
> to flush their mmu states, and also sets any processor running such a process
> to flush its state (that's picked up by a clock interrupt).
>
> procflushseg will not proceed until all processes and processors that
> might need to flush state have done so. (s remains locked throughout.)

Coincidentally, I spent last Sunday debugging a deadlock in precisely
this spot.  I had absentmindedly tried to use VESA vga on a multiprocessor.
The aux/vga -l apparently succeeded and the screen looked great, but
as a stealthy side-effect the CPU which had done the VESA call stopped
responding to interrupts -- including the local APIC clock interrupt
required for the mmu flush as described above.

So, some time later when another process (upas/fs as it happens) on
the other CPU wanted to adjust a segment size, procflushseg was called
and never returned.

Debugging can be challenging when cause and effect are minutes or
hours apart ...




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 13:07           ` Charles Forsyth
  2009-07-31 13:28             ` Richard Miller
@ 2009-07-31 17:13             ` cinap_lenrek
  2009-07-31 18:00               ` erik quanstrom
  2009-07-31 19:31               ` Charles Forsyth
  1 sibling, 2 replies; 21+ messages in thread
From: cinap_lenrek @ 2009-07-31 17:13 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

very interesting. thanks for the sum up.

but my testcase crashes a uniprocessor system, so here is no
waiting for mmuflushes on other processors going on.

any other process that shares the segment and was suspended
in the kernel may potentialy hold a pointer to that freed memory
area of the segment and may cause a fault in the kernel on resume.

process1:
	in kernel:
		read(buf)
			validaddr(buf)
			...
			*context switch*

process2:
	in kernel:
		...
			ibrk()
				mfreeseg()
					procflushseg()
				flushmmu()
		...
		*context switch*

process1:
	still in kernel:
		memmove(buf, ...)
		*fault*
		trap()
			fault386()
				fault() => -1
				if(!user){
					panic()
					*panic*
				}
				...
				postnote()


we cant really wait for all processes sharing the segment to be
in userspace as they may already be waiting in a long blocking
syscall.

how do we fix this?

--
cinap

[-- Attachment #2: Type: message/rfc822, Size: 6164 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 2077 bytes --]

this resulted in a little side discussion.
to save someone else from having to break a strong oath about 9fans,
i'll sum it up. the existing code handles this situation.

when several processes share a segment, and any one of them
decides to shrink the segment, all the processes must see
the change before the pages are made available for re-use.
any one of them could have any of those pages currently in their mmu state.
and any  must be removed from the mmu state local to the process,
to ensure that no further access to the pages is possible before they are freed.
the critical point is that it's irrelevant whether traps or syscalls are
involved: ordinary store instructions are clearly bad enough.

thus /sys/src/9/port/segment.c:/^mfreeseg contains (with s locked)
	/* flush this seg in all other processes */
	if(s->ref > 1)
		procflushseg(s);
and procflushseg finds all processes that share s, sets them all up
to flush their mmu states, and also sets any processor running such a process
to flush its state (that's picked up by a clock interrupt).

procflushseg will not proceed until all processes and processors that
might need to flush state have done so. (s remains locked throughout.)

after the flush, no process or processor can have a reference
to any of the pages in its mmu state.  it is safe to free them,
which mfreeseg does.

now, a process might still be executing a system call or some other trap
that might refer to that segment, to an address that's now been removed
by another process. to access the memory, it must ultimately issue
a load or a store instruction (even for syscalls, such as read or write).
that instruction will trap, because as described above there is no longer
a valid mmu mapping for that address within the process. normally, the
trap will find the right page in the software mmu structures and install the map.
in this case, however, it can't find the address, so it will raise an exception
in the process (ie, send a note). (all such searches are done with the
segment properly locked.)

[-- Attachment #2.1.2: Type: message/rfc822, Size: 1746 bytes --]

From: Charles Forsyth <forsyth@terzarima.net>
To: 9fans@9fans.net
Subject: Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
Date: Thu, 30 Jul 2009 16:54:16 +0100
Message-ID: <1a242b28f6d149aec599a1175df52718@terzarima.net>

just stop processes with s->ref > 1 from freeing parts of s with ibrk.
it's not as if anything ever does in practice.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 17:13             ` cinap_lenrek
@ 2009-07-31 18:00               ` erik quanstrom
  2009-07-31 18:08                 ` Devon H. O'Dell
  2009-07-31 21:52                 ` cinap_lenrek
  2009-07-31 19:31               ` Charles Forsyth
  1 sibling, 2 replies; 21+ messages in thread
From: erik quanstrom @ 2009-07-31 18:00 UTC (permalink / raw)
  To: 9fans

> process1:
> 	still in kernel:
> 		memmove(buf, ...)
> 		*fault*
> 		trap()
> 			fault386()
> 				fault() => -1
> 				if(!user){
> 					panic()
> 					*panic*
> 				}
> 				...
> 				postnote()

could you be more specific.  what is your test program,
where is it crashing (if you know), and what is the panic
message, if any?  i must be dense, but i'm confused by
your process diagram.

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 18:00               ` erik quanstrom
@ 2009-07-31 18:08                 ` Devon H. O'Dell
  2009-07-31 18:18                   ` erik quanstrom
  2009-07-31 21:52                 ` cinap_lenrek
  1 sibling, 1 reply; 21+ messages in thread
From: Devon H. O'Dell @ 2009-07-31 18:08 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/7/31 erik quanstrom <quanstro@coraid.com>:
>> process1:
>>       still in kernel:
>>               memmove(buf, ...)
>>               *fault*
>>               trap()
>>                       fault386()
>>                               fault() => -1
>>                               if(!user){
>>                                       panic()
>>                                       *panic*
>>                               }
>>                               ...
>>                               postnote()
>
> could you be more specific.  what is your test program,
> where is it crashing (if you know), and what is the panic
> message, if any?  i must be dense, but i'm confused by
> your process diagram.

He posted it earlier in this thread

--dho

> - erik
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 18:08                 ` Devon H. O'Dell
@ 2009-07-31 18:18                   ` erik quanstrom
  2009-07-31 18:35                     ` Devon H. O'Dell
  0 siblings, 1 reply; 21+ messages in thread
From: erik quanstrom @ 2009-07-31 18:18 UTC (permalink / raw)
  To: 9fans

> > could you be more specific.  what is your test program,
> > where is it crashing (if you know), and what is the panic
> > message, if any?  i must be dense, but i'm confused by
> > your process diagram.
>
> He posted it earlier in this thread
>
> --dho

please post a reference.  i do not see either the crash
code or the panic message on 9fans.net/archive/2009/07.
i don't recall it in the originals, either.

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 18:18                   ` erik quanstrom
@ 2009-07-31 18:35                     ` Devon H. O'Dell
  0 siblings, 0 replies; 21+ messages in thread
From: Devon H. O'Dell @ 2009-07-31 18:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/7/31 erik quanstrom <quanstro@coraid.com>:
>> > could you be more specific.  what is your test program,
>> > where is it crashing (if you know), and what is the panic
>> > message, if any?  i must be dense, but i'm confused by
>> > your process diagram.
>>
>> He posted it earlier in this thread
>>
>> --dho
>
> please post a reference.  i do not see either the crash
> code or the panic message on 9fans.net/archive/2009/07.
> i don't recall it in the originals, either.
>
> - erik

Was received as an attachment. Inline:

#include <u.h>
#include <libc.h>

void
main(int argc, char **argv)
{
	char *buf;
	int fd;

	if((fd = open("/dev/zero", OREAD)) < 0)
		sysfatal("open");

	buf = (char*)0x600000;
	segattach(0, "memory", buf, 2*4096);

	switch(rfork(RFPROC|RFMEM)){
	case -1:
		sysfatal("fork");
		break;
	case 0:
		for(;;){
			read(fd, buf+4096, 4096);
		}
	}
	sleep(1000);
	segbrk(buf, buf+4096);
}

--dho



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 17:13             ` cinap_lenrek
  2009-07-31 18:00               ` erik quanstrom
@ 2009-07-31 19:31               ` Charles Forsyth
  2009-07-31 19:40                 ` erik quanstrom
  1 sibling, 1 reply; 21+ messages in thread
From: Charles Forsyth @ 2009-07-31 19:31 UTC (permalink / raw)
  To: 9fans

>but my testcase crashes a uniprocessor system, so here is no
>waiting for mmuflushes on other processors going on.

it ensures mmuflushes in all other processes (sharing that segment) as well.
in fact, the crash you describe just emphasises that point:
the page reference no longer exists, hence the fault.

the problem (which frankly doesn't bother me) is that fault386
is being overly cautious in assuming that a page fault that occurs
in system mode but can't map a page successfully is necessarily a kernel bug:
that's not true. it could just note the process instead.
(it doesn't bother me because since unix days i've seen less than a handful
of programs that SHRINK their existing data segments, and i think that's the
only case that can cause the panic you're seeing.)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 19:31               ` Charles Forsyth
@ 2009-07-31 19:40                 ` erik quanstrom
  2009-07-31 20:41                   ` Charles Forsyth
  0 siblings, 1 reply; 21+ messages in thread
From: erik quanstrom @ 2009-07-31 19:40 UTC (permalink / raw)
  To: 9fans

> it ensures mmuflushes in all other processes (sharing that segment) as well.
> in fact, the crash you describe just emphasises that point:
> the page reference no longer exists, hence the fault.
>
> the problem (which frankly doesn't bother me) is that fault386
> is being overly cautious in assuming that a page fault that occurs
> in system mode but can't map a page successfully is necessarily a kernel bug:
> that's not true. it could just note the process instead.
> (it doesn't bother me because since unix days i've seen less than a handful
> of programs that SHRINK their existing data segments, and i think that's the
> only case that can cause the panic you're seeing.)

if this case is really not important, would it make sense
to disallow shrinking segments? it might be worth it just
to be able to define Eshrinkage.

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 19:40                 ` erik quanstrom
@ 2009-07-31 20:41                   ` Charles Forsyth
  0 siblings, 0 replies; 21+ messages in thread
From: Charles Forsyth @ 2009-07-31 20:41 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

you can get similar effects by remapping things.
i meant that it isn't likely to happen by accident, so am i bovvered?
fault386 needs to be fixed mainly by or for people running a shared cpu
server with hostile users (ie, students).
for the rest of us it might be more useful to have the panic
to prevent real kernel bugs (ie, just bad pointers in device driver implementations)
from postnoting a process instead of stopping the system.
having said that, it could be argued that even in that case
a postnote to the invoking process would allow the rest of the system
to run and `might not' mean that the broken driver has wrecked other
data structures outside it in kernel memory.

[-- Attachment #2: Type: message/rfc822, Size: 2512 bytes --]

From: erik quanstrom <quanstro@coraid.com>
To: 9fans@9fans.net
Subject: Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
Date: Fri, 31 Jul 2009 15:40:41 -0400
Message-ID: <3aa6d689be5c81fc7ac72211cfee8b13@coraid.com>

> it ensures mmuflushes in all other processes (sharing that segment) as well.
> in fact, the crash you describe just emphasises that point:
> the page reference no longer exists, hence the fault.
>
> the problem (which frankly doesn't bother me) is that fault386
> is being overly cautious in assuming that a page fault that occurs
> in system mode but can't map a page successfully is necessarily a kernel bug:
> that's not true. it could just note the process instead.
> (it doesn't bother me because since unix days i've seen less than a handful
> of programs that SHRINK their existing data segments, and i think that's the
> only case that can cause the panic you're seeing.)

if this case is really not important, would it make sense
to disallow shrinking segments? it might be worth it just
to be able to define Eshrinkage.

- erik

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
  2009-07-31 18:00               ` erik quanstrom
  2009-07-31 18:08                 ` Devon H. O'Dell
@ 2009-07-31 21:52                 ` cinap_lenrek
  1 sibling, 0 replies; 21+ messages in thread
From: cinap_lenrek @ 2009-07-31 21:52 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 63 bytes --]

i attached it in a previous mail... try again...

--
cinap

[-- Attachment #2: Type: message/rfc822, Size: 2413 bytes --]

From: erik quanstrom <quanstro@coraid.com>
To: 9fans@9fans.net
Subject: Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
Date: Fri, 31 Jul 2009 14:00:25 -0400
Message-ID: <64332bc783f3dc7dffc16c1266fefea3@coraid.com>

> process1:
> 	still in kernel:
> 		memmove(buf, ...)
> 		*fault*
> 		trap()
> 			fault386()
> 				fault() => -1
> 				if(!user){
> 					panic()
> 					*panic*
> 				}
> 				...
> 				postnote()

could you be more specific.  what is your test program,
where is it crashing (if you know), and what is the panic
message, if any?  i must be dense, but i'm confused by
your process diagram.

- erik

[-- Attachment #3: Type: text/plain, Size: 408 bytes --]

#include <u.h>
#include <libc.h>

void
main(int argc, char **argv)
{
	char *buf;
	int fd;

	if((fd = open("/dev/zero", OREAD)) < 0)
		sysfatal("open");

	buf = (char*)0x600000;
	segattach(0, "memory", buf, 2*4096);

	switch(rfork(RFPROC|RFMEM)){
	case -1:
		sysfatal("fork");
		break;
	case 0:
		for(;;){
			read(fd, buf+4096, 4096);
		}
	}
	sleep(1000);
	segbrk(buf, buf+4096);
}

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2009-07-31 21:52 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-30  4:03 [9fans] Race condition in /sys/src/9/pc/trap.c? Matthew J Jones
2009-07-30 11:27 ` erik quanstrom
2009-07-30 14:25   ` Elizabeth Jones
2009-07-30 14:37     ` Sape Mullender
2009-07-30 15:01       ` cinap_lenrek
2009-07-30 15:24         ` cinap_lenrek
2009-07-30 15:17       ` cinap_lenrek
2009-07-30 15:54         ` Charles Forsyth
2009-07-31 13:07           ` Charles Forsyth
2009-07-31 13:28             ` Richard Miller
2009-07-31 17:13             ` cinap_lenrek
2009-07-31 18:00               ` erik quanstrom
2009-07-31 18:08                 ` Devon H. O'Dell
2009-07-31 18:18                   ` erik quanstrom
2009-07-31 18:35                     ` Devon H. O'Dell
2009-07-31 21:52                 ` cinap_lenrek
2009-07-31 19:31               ` Charles Forsyth
2009-07-31 19:40                 ` erik quanstrom
2009-07-31 20:41                   ` Charles Forsyth
2009-07-30 15:20       ` erik quanstrom
2009-07-30 14:39     ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).