9front - general discussion about 9front
 help / color / mirror / Atom feed
* stats(1) suicide
@ 2014-08-07  2:51 sl
  2014-08-07 11:56 ` [9front] " cinap_lenrek
  0 siblings, 1 reply; 11+ messages in thread
From: sl @ 2014-08-07  2:51 UTC (permalink / raw)
  To: 9front

I run stats(1) on a diskless pc64 terminal with the following command:

	window -r 836 1 1339 85 stats -Ll fs cpu.stanleylieber.com ttr gl mars2 tn $sysname

Lately, stats(1) suicides every day or so. I don't know when this started, but
I've noticed it a few times in the last couple of weeks. Today, I noticed it happened
again and I managed to capture some information:

stats 593: suicide: sys: trap: fault write addr=0xffffffff8258d1b0 pc=0x204cc7

; acid 593
/proc/593/text:amd64 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/amd64
acid: lstk()
notejmp(ret=0x1,j=0x40ac90)+0x13 /sys/src/libc/amd64/notejmp.c:10
alarmed(a=0xffffffff8258d1b0,s=0x7ffffeffea58)+0x3f /sys/src/cmd/stats.c:718
notifier+0x3e /sys/src/libc/port/atnotify.c:15
acid: 

The system is connected over wifi and sometimes the network drops out long enough
that various programs (webfs, etc.) will time out. The dropouts are usually short
and most programs usually recover without needing to be restarted. Might this
stats(1) behavior be related to network problems?

sl


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-07  2:51 stats(1) suicide sl
@ 2014-08-07 11:56 ` cinap_lenrek
  2014-08-08  7:45   ` arisawa
  0 siblings, 1 reply; 11+ messages in thread
From: cinap_lenrek @ 2014-08-07 11:56 UTC (permalink / raw)
  To: 9front

thanks! i know whats wrong.

network timeout (alarm note) is the trigger. the bug was introduced in:

http://code.google.com/p/plan9front/source/detail?r=a2985da84dc3e147251c75c5839d1d074b1e7506&path=/sys/src/9/pc64/l.s

the problem is that forkret() in l.s doesnt restore BP register from
the ureg (anymore!). the first argument to a function is passed in BP
(also known as RARG). as its not loaded from the ureg, the first argument
to the note handler is garbage. which causes the crash. most note handlers
ignore the ureg argument (so it works all fine with other programs), but
not with this alarm note handler which tries todo a stack unwind with the
note jump.

i'm at work and have no access to amd64 machine right now so i cant
test anything, but you can probably fix it with a single line in
pc64/trap.c, function syscall():


        if(scallnr!=RFORK && (up->procctl || up->nnote)){
                splhi();
                notify(ureg);
+				((void**)&ureg)[-1] = (void*)noteret;	/* restores BP */
        }

--
cinap


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-07 11:56 ` [9front] " cinap_lenrek
@ 2014-08-08  7:45   ` arisawa
  2014-08-08 16:01     ` sl
  0 siblings, 1 reply; 11+ messages in thread
From: arisawa @ 2014-08-08  7:45 UTC (permalink / raw)
  To: 9front

Hello,
I have similar problem with cwfs64x on pc64.

the system is based on the latest release:
9front-3730.5d864bfef728.iso.bz2

I did
rm -rf /sys/src/*
and copied the iso to the /sys/src
and
cd /sys/src && mk install
cd /sys/src/9/pc64
mk
9fat:
cp 9pc64 /n/9fat
fshalt -r

bootargs is .....
user[glenda]: arisawa
cwfs64x 319: suicide: invalid address 0x1056efee8/16384 in syscall pc=0x22ac56
cwfs64x 319: suicide: sys: bad address in syscall pc=0x22ac56
...


9pc64 kernel below works.
--rwxrwxr-x M 20 arisawa sys 3502641 May 28 22:11 /amd64/9pc64

official cwfs source codes are not modified since that day.
note that cwfs64x for 386 works fine

your patch
> ((void**)&ureg)[-1] = (void*)noteret;	/* restores BP */
did not help me

2014/08/07 20:56、cinap_lenrek@felloff.net のメール:

> thanks! i know whats wrong.
> 
> network timeout (alarm note) is the trigger. the bug was introduced in:
> 
> http://code.google.com/p/plan9front/source/detail?r=a2985da84dc3e147251c75c5839d1d074b1e7506&path=/sys/src/9/pc64/l.s
> 
> the problem is that forkret() in l.s doesnt restore BP register from
> the ureg (anymore!). the first argument to a function is passed in BP
> (also known as RARG). as its not loaded from the ureg, the first argument
> to the note handler is garbage. which causes the crash. most note handlers
> ignore the ureg argument (so it works all fine with other programs), but
> not with this alarm note handler which tries todo a stack unwind with the
> note jump.
> 
> i'm at work and have no access to amd64 machine right now so i cant
> test anything, but you can probably fix it with a single line in
> pc64/trap.c, function syscall():
> 
> 
>        if(scallnr!=RFORK && (up->procctl || up->nnote)){
>                splhi();
>                notify(ureg);
> +				((void**)&ureg)[-1] = (void*)noteret;	/* restores BP */
>        }
> 
> --
> cinap



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-08  7:45   ` arisawa
@ 2014-08-08 16:01     ` sl
  2014-08-08 21:30       ` arisawa
  0 siblings, 1 reply; 11+ messages in thread
From: sl @ 2014-08-08 16:01 UTC (permalink / raw)
  To: 9front

> the system is based on the latest release:
> 9front-3730.5d864bfef728.iso.bz2
> 
> I did
> rm -rf /sys/src/*
> and copied the iso to the /sys/src
> and
> cd /sys/src && mk install
> cd /sys/src/9/pc64
> mk
> 9fat:
> cp 9pc64 /n/9fat
> fshalt -r

Kernel configuration and maintenance is explained here:

	http://code.google.com/p/plan9front/wiki/fqa7#7.2_-_Kernel_configuration_and_maintenance

9front does not have a 9fat: command. Did you see any errors during the process
outlined above? 

From the rest of your message I would infer that what actually happened is you are
attempting to update an older system using the sources from the latest ISO. Since
you did not succeed in copying the newly built kernel to the FAT partition, you
are now booting an old kernel (which does not have syscall 53) with the new system.

sl


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-08 16:01     ` sl
@ 2014-08-08 21:30       ` arisawa
  2014-08-08 21:44         ` sl
  0 siblings, 1 reply; 11+ messages in thread
From: arisawa @ 2014-08-08 21:30 UTC (permalink / raw)
  To: 9front

Hello,

On 2014/08/09, at 1:01, sl@9front.org wrote:

> 9front does not have a 9fat: command.
I know that. I am using old 9fat because that is familiar to me.

> Did you see any errors during the process
> outlined above? 
No.

> you are now booting an old kernel (which does not have syscall 53) with the new system.
As I said old kernel just works after updating to the new system.
(I keep kernel backup in 9fat when I attempt kernel update)
I have confirmed the copied /n/9fat/9pc64 is same as that of created by compile.
Is that syscall 53 problem?

Is the process below enough to get kew kernel?
	rm -rf /sys/src/*
	and copied the iso to the /sys/src
	and
	cd /sys/src && mk install
	cd /sys/src/9/pc64
	mk
If not, what should I add?

Kenji Arisawa



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-08 21:30       ` arisawa
@ 2014-08-08 21:44         ` sl
  2014-08-08 22:35           ` arisawa
  0 siblings, 1 reply; 11+ messages in thread
From: sl @ 2014-08-08 21:44 UTC (permalink / raw)
  To: 9front

> 	rm -rf /sys/src/*
> 	and copied the iso to the /sys/src

This part puzzled me. Why did you remove the existing files and copy all
the sources from the ISO?


> > you are now booting an old kernel (which does not have syscall 53) with the new system.
> As I said old kernel just works after updating to the new system.
> (I keep kernel backup in 9fat when I attempt kernel update)
> I have confirmed the copied /n/9fat/9pc64 is same as that of created by compile.
> Is that syscall 53 problem?

I'm sorry, my guess was wrong. I don't know what the problem is.

sl


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-08 21:44         ` sl
@ 2014-08-08 22:35           ` arisawa
  2014-08-08 22:57             ` cinap_lenrek
  0 siblings, 1 reply; 11+ messages in thread
From: arisawa @ 2014-08-08 22:35 UTC (permalink / raw)
  To: 9front

Hello,

>> 	rm -rf /sys/src/*
>> 	and copied the iso to the /sys/src
> 
> This part puzzled me. Why did you remove the existing files and copy all
> the sources from the ISO?

Usually I apply my own update strategy but I guess such a strategy will not convince you.
So, I applied this time most reliable way.
(but I found this simple strategy is not bad.)

Have you tested my report?

Kenji Arisawa



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-08 22:35           ` arisawa
@ 2014-08-08 22:57             ` cinap_lenrek
  2014-08-09  6:05               ` arisawa
  0 siblings, 1 reply; 11+ messages in thread
From: cinap_lenrek @ 2014-08-08 22:57 UTC (permalink / raw)
  To: 9front

kenji,

the note bug should not apply to release 9front-3730.5d864bfef728.iso.bz2,
as it was introduced in a later commit. also, cwfs doesnt install any note
handlers so it cant be affected.

> cwfs64x 319: suicide: invalid address 0x1056efee8/16384 in syscall pc=0x22ac56
> cwfs64x 319: suicide: sys: bad address in syscall pc=0x22ac56

i looked up the pc addresses and it tells me this is the sleep syscall. this
is very odd. sl and others run cwfs64x just fine on the latest pc64 and i'v
not seen such an issue yet.

can you verify by running:

acid /amd64/bin/cwfs64x
src(0x22ac56)
asm(sleep)

the last line will will disassemble the sleep syscall stub, and should
result in something like:

acid: asm(sleep)
sleep 0x000000000022ac53	MOVQ	BP,a0+0x0(FP)
sleep+0x5 0x000000000022ac58	MOVQ	$0x11,BP
sleep+0xc 0x000000000022ac5f	SYSCALL
sleep+0xe 0x000000000022ac61	RET

--
cinap


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-08 22:57             ` cinap_lenrek
@ 2014-08-09  6:05               ` arisawa
  2014-08-09 15:42                 ` cinap_lenrek
  0 siblings, 1 reply; 11+ messages in thread
From: arisawa @ 2014-08-09  6:05 UTC (permalink / raw)
  To: 9front

Hello cinap,

the experiment below is on my 9front terminal that is executed
from 9front boot loader
	boot from tcp
which means the kernel is not pxe booted from the file server but locally 
loaded from 9fat partition.
the kernel makes cwfs64x cuicide.
on the other hands, the commands are of file servers.

I recompiled cwfs64x
term% ls -l /amd64/bin/cwfs64x
--rwxrwxr-x M 20 arisawa sys 415466 Aug  9 13:27 /amd64/bin/cwfs64x
term% 

term% cwfs64x -f /dev/sdE0/fscache
cwfs64x 730: suicide: invalid address 0x199ab0e40/16384 in sys call pc=0x22ac64
cwfs64x 730: suicide: sys: bad address in syscall pc=0x22ac64
term% 

as you see from the acid output, pread() makes trouble
complaining the read buffer 0x199ab0e40 is invalid.
the value is as much as 6.8G. (I have 16GB ram on the terminal)

Please examine pread() syscall.

Kenji Arisawa

term% broke
echo kill>/proc/730/ctl # cwfs64x
term% acid 730
/proc/730/text:amd64 plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/amd64
acid: lstk()
pread(a0=0x4)+0xe /sys/src/libc/9syscall/pread.s:6
wrenread(b=0x0,d=0xb2b29100,c=0x199ab0e40)+0xaa /sys/src/cmd/cwfs/wren.c:95
	r=0x22261d00000000
	dr=0xb2b29170
devread(c=0x199ab0e40,b=0x0)+0x84 /sys/src/cmd/cwfs/sub.c:989
	e=0x219bbd00000000
getbuf(d=0xb2b29100,addr=0x0,flag=0x5)+0x251 /sys/src/cmd/cwfs/iobuf.c:106
	hp=0x12f4038
	p=0x364e3f0
sysinit()+0x642 /sys/src/cmd/cwfs/config.c:571
	cp=0x1b2b28e40
	d=0xb2b29100
	p=0x21eb5d
	fsp=0x300000000
	ep=0x22ad70
	fs=0x138885477e301960
	error=0x21dd510006bcad
main(argc=0x0,argv=0x7ffffeffef88)+0x247 /sys/src/cmd/cwfs/main.c:342
	nets=0x9c3f00000000
	_args=0x40df7c
	_argc=0x66
	ann=0x0
	i=0x225add00009c3f
_main+0x40 /sys/src/libc/amd64/main9.s:15
acid: 

term% dd -if /dev/sdE0/fscache -count 1 >[2]/dev/null | tr -d '\000'
service cwfs
filsys main c(/dev/sdE0/fscache)(/dev/sdE0/fsworm)
filsys other (/dev/sdE0/other)
filsys dump o
noauth
blocksize 16384
daddrbits 64
indirblks 4
dirblks 6
namelen 144
term% 


maia% 
2014/08/09 7:57、cinap_lenrek@felloff.net のメール:

> kenji,
> 
> the note bug should not apply to release 9front-3730.5d864bfef728.iso.bz2,
> as it was introduced in a later commit. also, cwfs doesnt install any note
> handlers so it cant be affected.
> 
>> cwfs64x 319: suicide: invalid address 0x1056efee8/16384 in syscall pc=0x22ac56
>> cwfs64x 319: suicide: sys: bad address in syscall pc=0x22ac56
> 
> i looked up the pc addresses and it tells me this is the sleep syscall. this
> is very odd. sl and others run cwfs64x just fine on the latest pc64 and i'v
> not seen such an issue yet.
> 
> can you verify by running:
> 
> acid /amd64/bin/cwfs64x
> src(0x22ac56)
> asm(sleep)
> 
> the last line will will disassemble the sleep syscall stub, and should
> result in something like:
> 
> acid: asm(sleep)
> sleep 0x000000000022ac53	MOVQ	BP,a0+0x0(FP)
> sleep+0x5 0x000000000022ac58	MOVQ	$0x11,BP
> sleep+0xc 0x000000000022ac5f	SYSCALL
> sleep+0xe 0x000000000022ac61	RET
> 
> --
> cinap



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-09  6:05               ` arisawa
@ 2014-08-09 15:42                 ` cinap_lenrek
  2014-08-09 22:46                   ` arisawa
  0 siblings, 1 reply; 11+ messages in thread
From: cinap_lenrek @ 2014-08-09 15:42 UTC (permalink / raw)
  To: 9front

many thanks kenji. i see whats wrong now. the bug is in iobufinit().

	xiop = ialloc(niob * RBUFSIZE, 0);

niob is uint so we get 32bit result.

iobufinit+0x114 0x000000000021ecd2	MOVL	niob(SB),BP
iobufinit+0x11b 0x000000000021ecd9	SHLL	$0xe,BP				/* multiplication by shift */
iobufinit+0x11e 0x000000000021ecdc	MOVL	BP,BP
iobufinit+0x120 0x000000000021ecde	MOVL	$0x0,0x8(SP)
iobufinit+0x128 0x000000000021ece6	CALL	ialloc(SB)

you must have raised fsmempercent so that it used more than 4g
for buffers causing the overflow.

i pushed fix now that does 64bit multiplications on amd64 so it
should work now after pull and recompiling cwfs and kernel.

--
cinap


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9front] stats(1) suicide
  2014-08-09 15:42                 ` cinap_lenrek
@ 2014-08-09 22:46                   ` arisawa
  0 siblings, 0 replies; 11+ messages in thread
From: arisawa @ 2014-08-09 22:46 UTC (permalink / raw)
  To: 9front

thanks cinap, that works.

Kenji Arisawa

On 2014/08/10, at 0:42, cinap_lenrek@felloff.net wrote:

> many thanks kenji. i see whats wrong now. the bug is in iobufinit().
> 
> 	xiop = ialloc(niob * RBUFSIZE, 0);
> 
> niob is uint so we get 32bit result.
> 
> iobufinit+0x114 0x000000000021ecd2	MOVL	niob(SB),BP
> iobufinit+0x11b 0x000000000021ecd9	SHLL	$0xe,BP				/* multiplication by shift */
> iobufinit+0x11e 0x000000000021ecdc	MOVL	BP,BP
> iobufinit+0x120 0x000000000021ecde	MOVL	$0x0,0x8(SP)
> iobufinit+0x128 0x000000000021ece6	CALL	ialloc(SB)
> 
> you must have raised fsmempercent so that it used more than 4g
> for buffers causing the overflow.
> 
> i pushed fix now that does 64bit multiplications on amd64 so it
> should work now after pull and recompiling cwfs and kernel.
> 
> --
> cinap



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-08-09 22:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-07  2:51 stats(1) suicide sl
2014-08-07 11:56 ` [9front] " cinap_lenrek
2014-08-08  7:45   ` arisawa
2014-08-08 16:01     ` sl
2014-08-08 21:30       ` arisawa
2014-08-08 21:44         ` sl
2014-08-08 22:35           ` arisawa
2014-08-08 22:57             ` cinap_lenrek
2014-08-09  6:05               ` arisawa
2014-08-09 15:42                 ` cinap_lenrek
2014-08-09 22:46                   ` arisawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).