9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] a question on APE
@ 2007-12-18  1:36 ron minnich
  2007-12-18  1:44 ` andrey mirtchovski
  2007-12-18  7:32 ` Kernel Panic
  0 siblings, 2 replies; 5+ messages in thread
From: ron minnich @ 2007-12-18  1:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

we're doing some work here with Andrey's port of ssh2. It *almost*
works. But I'm seeing a stack trace I don't understand.

I can't give you all the details -- it's ssh, therefore it is pretty
awful -- but here is the short form: There is a proc called fromnet()
which has this inner loop:
	for(;;){
		if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
			write(1, buf, n);
		else
			goto Donenet;
	}

When this proc is entered, ape has forked off two procs to handle the
fd 'c'. From the fromnet function, we see the libssh2_channel_read
does a select. here is where I get confused. The stk() for the two
procs looks like this:
pread()+0x7 /sys/src/libc/9syscall/pread.s:5
read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e /sys/src/ape/lib/bsd/send.c:30
libssh2_packet_read(session=0x1102f8)+0x176
/usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7
/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
fromnet(c=0x114460,s=0x1102f8)+0x2e
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
main(argc=0x2,argv=0xdfffef94)+0x47c
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
_main+0x31 /sys/src/libc/386/main9.s:16

The the read on fd 5. That's the socket. Here is the other proc.

_PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
_READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f /sys/src/ape/lib/ap/plan9/9read.c:10
_copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
_startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0,efds=0x0,nfds=0x6)+0xe9
/sys/src/ape/lib/ap/plan9/_buf.c:292
libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0x7b
/usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x69
/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
fromnet(c=0x114460,s=0x1102f8)+0x2e
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
main(argc=0x2,argv=0xdfffef94)+0x47c
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
_main+0x31 /sys/src/libc/386/main9.s:16

ok, I think this stack is a bit messed up, since I don't see how we
can have the coyproc in the call chain from select(), but ... is it?

I realize there is very little information here, sorry ... here's what
is bothering me. It seems we have two procs hanging on a read on fd 5.
I think the copyproc and some other proc are in conflict but ... I am
unsure. The problems we are seeing might be explained by the wrong
proc grabbing output at the wrong time -- it feels like a race
condition. And acid trips we can take to hammer this one down?

Anyone ever done a select on a socket in ape?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] a question on APE
  2007-12-18  1:36 [9fans] a question on APE ron minnich
@ 2007-12-18  1:44 ` andrey mirtchovski
  2007-12-18  7:32 ` Kernel Panic
  1 sibling, 0 replies; 5+ messages in thread
From: andrey mirtchovski @ 2007-12-18  1:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Anyone ever done a select on a socket in ape?
>

the links port does that and it works fine, at least for a while.

the code snippet you gave is suspect, although i don't know how that
relates to the stack trace. libssh2 lacks documentation, but from the
little that i read libssh2_channel_read() can return zero without
receiving EOF from the remote site. one needs to go through
libssh2_channel_eof() or something to that effect to check whether the
other side closed, and the code above doesn't do it (it's my fault, i
hadn't gotten to debugging that part).

then the code needs to do it for stderr too :)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] a question on APE
  2007-12-18  1:36 [9fans] a question on APE ron minnich
  2007-12-18  1:44 ` andrey mirtchovski
@ 2007-12-18  7:32 ` Kernel Panic
  2007-12-18  7:54   ` Kernel Panic
  1 sibling, 1 reply; 5+ messages in thread
From: Kernel Panic @ 2007-12-18  7:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

ron minnich wrote:

>we're doing some work here with Andrey's port of ssh2. It *almost*
>works. But I'm seeing a stack trace I don't understand.
>
>I can't give you all the details -- it's ssh, therefore it is pretty
>awful -- but here is the short form: There is a proc called fromnet()
>which has this inner loop:
>	for(;;){
>		if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
>			write(1, buf, n);
>		else
>			goto Donenet;
>	}
>
>When this proc is entered, ape has forked off two procs to handle the
>fd 'c'. From the fromnet function, we see the libssh2_channel_read
>does a select. here is where I get confused. The stk() for the two
>procs looks like this:
>pread()+0x7 /sys/src/libc/9syscall/pread.s:5
>read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
>recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e /sys/src/ape/lib/bsd/send.c:30
>libssh2_packet_read(session=0x1102f8)+0x176
>/usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
>libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7
>/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
>fromnet(c=0x114460,s=0x1102f8)+0x2e
>/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
>main(argc=0x2,argv=0xdfffef94)+0x47c
>/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
>_main+0x31 /sys/src/libc/386/main9.s:16
>
>The the read on fd 5. That's the socket. Here is the other proc.
>
>_PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
>_READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f /sys/src/ape/lib/ap/plan9/9read.c:10
>_copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
>_startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
>select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0,efds=0x0,nfds=0x6)+0xe9
>/sys/src/ape/lib/ap/plan9/_buf.c:292
>libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0x7b
>/usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
>libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x69
>/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
>fromnet(c=0x114460,s=0x1102f8)+0x2e
>/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
>main(argc=0x2,argv=0xdfffef94)+0x47c
>/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
>_main+0x31 /sys/src/libc/386/main9.s:16
>
>ok, I think this stack is a bit messed up, since I don't see how we
>can have the coyproc in the call chain from select(), but ... is it?
>
>  
>
Plan9 has no select functionality. Select is emulated in APE by forking 
a childproc that reads an fd and
fills a buffer (on a shared memory area). Read() should then pick up the 
data from the buffer and
wakeup the reader proc if it sleeps (because the buffer got filled up). 
Select() will startup such a
reader proc (startbuf()) if it is not already "bufferd" and then check 
if the buffer has data available,
so the stacktrace looks valid to me.

Maybe the bufferd filedescriptors doesnt work with the recv() call and 
are only implemented for read()?
I think you should find some kind of switch in read() that checks if the 
fd is bufferd and then calls
some _buf.c function that copies the data from the buffer.
Maybe this is missing for recv()?

>I realize there is very little information here, sorry ... here's what
>is bothering me. It seems we have two procs hanging on a read on fd 5.
>I think the copyproc and some other proc are in conflict but ... I am
>unsure. The problems we are seeing might be explained by the wrong
>proc grabbing output at the wrong time -- it feels like a race
>condition. And acid trips we can take to hammer this one down?
>
>Anyone ever done a select on a socket in ape?
>
>  
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] a question on APE
  2007-12-18  7:32 ` Kernel Panic
@ 2007-12-18  7:54   ` Kernel Panic
  2007-12-18 17:38     ` ron minnich
  0 siblings, 1 reply; 5+ messages in thread
From: Kernel Panic @ 2007-12-18  7:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Kernel Panic wrote:

> ron minnich wrote:
>
>> we're doing some work here with Andrey's port of ssh2. It *almost*
>> works. But I'm seeing a stack trace I don't understand.
>>
>> I can't give you all the details -- it's ssh, therefore it is pretty
>> awful -- but here is the short form: There is a proc called fromnet()
>> which has this inner loop:
>>     for(;;){
>>         if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
>>             write(1, buf, n);
>>         else
>>             goto Donenet;
>>     }
>>
>> When this proc is entered, ape has forked off two procs to handle the
>> fd 'c'. From the fromnet function, we see the libssh2_channel_read
>> does a select. here is where I get confused. The stk() for the two
>> procs looks like this:
>> pread()+0x7 /sys/src/libc/9syscall/pread.s:5
>> read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
>> recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e 
>> /sys/src/ape/lib/bsd/send.c:30
>> libssh2_packet_read(session=0x1102f8)+0x176
>> /usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
>> libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7 
>>
>> /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
>> fromnet(c=0x114460,s=0x1102f8)+0x2e
>> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
>> main(argc=0x2,argv=0xdfffef94)+0x47c
>> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
>> _main+0x31 /sys/src/libc/386/main9.s:16
>>
>> The the read on fd 5. That's the socket. Here is the other proc.
>>
>> _PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
>> _READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f 
>> /sys/src/ape/lib/ap/plan9/9read.c:10
>> _copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
>> _startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
>> select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0,efds=0x0,nfds=0x6)+0xe9 
>>
>> /sys/src/ape/lib/ap/plan9/_buf.c:292
>> libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0x7b
>> /usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
>> libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x69 
>>
>> /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
>> fromnet(c=0x114460,s=0x1102f8)+0x2e
>> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
>> main(argc=0x2,argv=0xdfffef94)+0x47c
>> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
>> _main+0x31 /sys/src/libc/386/main9.s:16
>>
>> ok, I think this stack is a bit messed up, since I don't see how we
>> can have the coyproc in the call chain from select(), but ... is it?
>

Ahh... just looked at the code...
Ok, as i expected... recv() calls a different read() from 
/sys/src/libc/9sys/read.c. It will all work if
recv() would call the thing from this one: 
/sys/src/ape/lib/ap/plan9/read.c.

I guess you could work arround it by using read() instead of recv() in 
ssh-code, but the right
thing is to fix ape and have recv() call the read() from ap/plan9/read.c.

> Plan9 has no select functionality. Select is emulated in APE by 
> forking a childproc that reads an fd and
> fills a buffer (on a shared memory area). Read() should then pick up 
> the data from the buffer and
> wakeup the reader proc if it sleeps (because the buffer got filled 
> up). Select() will startup such a
> reader proc (startbuf()) if it is not already "bufferd" and then check 
> if the buffer has data available,
> so the stacktrace looks valid to me.
>
> Maybe the bufferd filedescriptors doesnt work with the recv() call and 
> are only implemented for read()?
> I think you should find some kind of switch in read() that checks if 
> the fd is bufferd and then calls
> some _buf.c function that copies the data from the buffer.
> Maybe this is missing for recv()?
>
>> I realize there is very little information here, sorry ... here's what
>> is bothering me. It seems we have two procs hanging on a read on fd 5.
>> I think the copyproc and some other proc are in conflict but ... I am
>> unsure. The problems we are seeing might be explained by the wrong
>> proc grabbing output at the wrong time -- it feels like a race
>> condition. And acid trips we can take to hammer this one down?
>>
>> Anyone ever done a select on a socket in ape?
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] a question on APE
  2007-12-18  7:54   ` Kernel Panic
@ 2007-12-18 17:38     ` ron minnich
  0 siblings, 0 replies; 5+ messages in thread
From: ron minnich @ 2007-12-18 17:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Dec 17, 2007 11:54 PM, Kernel Panic <cinap_lenrek@gmx.de> wrote:

> Ahh... just looked at the code...
> Ok, as i expected... recv() calls a different read() from
> /sys/src/libc/9sys/read.c. It will all work if
> recv() would call the thing from this one:
> /sys/src/ape/lib/ap/plan9/read.c.

I'm not seeing that. I would be happy if you are right but I can't confirm it.

I run acid on the binary:
recv 0x000cd7b6	SUBL	$0x10,SP
recv+0x3 0x000cd7b9	MOVL	flags+0xc(FP),AX
recv+0x7 0x000cd7bd	ANDL	$0x1,AX
recv+0xa 0x000cd7c0	CMPL	AX,$0x0
recv+0xd 0x000cd7c3	JEQ	recv+0x22(SB)
recv+0xf 0x000cd7c5	MOVL	$0x29,errno(SB)
recv+0x19 0x000cd7cf	MOVL	$0xffffffff,AX
recv+0x1e 0x000cd7d4	ADDL	$0x10,SP
recv+0x21 0x000cd7d7	RET
recv+0x22 0x000cd7d8	MOVL	fd+0x0(FP),CX
recv+0x26 0x000cd7dc	MOVL	CX,0x0(SP)
recv+0x29 0x000cd7df	MOVL	a+0x4(FP),CX
recv+0x2d 0x000cd7e3	MOVL	CX,0x4(SP)
recv+0x31 0x000cd7e7	MOVL	n+0x8(FP),CX
recv+0x35 0x000cd7eb	MOVL	CX,0x8(SP)
recv+0x39 0x000cd7ef	CALL	read(SB)
recv+0x3e 0x000cd7f4	ADDL	$0x10,SP
recv+0x41 0x000cd7f7	RET

so it calls read.

Read is this:
read 0x000c3834	SUBL	$0x28,SP
read+0x3 0x000c3837	MOVL	nbytes+0x8(FP),DI
read+0x7 0x000c383b	MOVL	buf+0x4(FP),SI
read+0xb 0x000c383f	MOVL	d+0x0(FP),BX
read+0xf 0x000c3843	CMPL	BX,$0x0
read+0x12 0x000c3846	JLT	read+0x19(SB)
read+0x14 0x000c3848	CMPL	BX,$0x60
read+0x17 0x000c384b	JLT	read+0x2c(SB)
read+0x19 0x000c384d	MOVL	$0x4,errno(SB)
read+0x23 0x000c3857	MOVL	$0xffffffff,AX
read+0x28 0x000c385c	ADDL	$0x28,SP
read+0x2b 0x000c385f	RET
read+0x2c 0x000c3860	LEAL	0x0(BX)(BX*4),CX
read+0x2f 0x000c3863	SHLL	$0x2,CX
read+0x32 0x000c3866	LEAL	_fdinfo(SB)(CX*1),AX
read+0x39 0x000c386d	MOVL	0x0(AX),AX
read+0x3b 0x000c386f	ANDL	$0x2,AX
read+0x3e 0x000c3872	CMPL	AX,$0x0
read+0x41 0x000c3875	JEQ	read+0x19(SB)
read+0x43 0x000c3877	CMPL	DI,$0x0
read+0x46 0x000c387a	JHI	read+0x4e(SB)
read+0x48 0x000c387c	XORL	AX,AX
read+0x4a 0x000c387e	ADDL	$0x28,SP
read+0x4d 0x000c3881	RET
read+0x4e 0x000c3882	CMPL	SI,$0x0
read+0x51 0x000c3885	JNE	read+0x66(SB)
read+0x53 0x000c3887	MOVL	$0x9,errno(SB)
read+0x5d 0x000c3891	MOVL	$0xffffffff,AX
read+0x62 0x000c3896	ADDL	$0x28,SP
read+0x65 0x000c3899	RET

which is the ape version. There is only one read symbol in the binary,
and it's a T.

So I am not convinced the recv is calling the wrong thing. That said,
I'm still going to change it in source to see what happens :-)

ron


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-12-18 17:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-18  1:36 [9fans] a question on APE ron minnich
2007-12-18  1:44 ` andrey mirtchovski
2007-12-18  7:32 ` Kernel Panic
2007-12-18  7:54   ` Kernel Panic
2007-12-18 17:38     ` ron minnich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).