* [9fans] a question on APE
@ 2007-12-18 1:36 ron minnich
2007-12-18 1:44 ` andrey mirtchovski
2007-12-18 7:32 ` Kernel Panic
0 siblings, 2 replies; 5+ messages in thread
From: ron minnich @ 2007-12-18 1:36 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
we're doing some work here with Andrey's port of ssh2. It *almost*
works. But I'm seeing a stack trace I don't understand.
I can't give you all the details -- it's ssh, therefore it is pretty
awful -- but here is the short form: There is a proc called fromnet()
which has this inner loop:
for(;;){
if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
write(1, buf, n);
else
goto Donenet;
}
When this proc is entered, ape has forked off two procs to handle the
fd 'c'. From the fromnet function, we see the libssh2_channel_read
does a select. here is where I get confused. The stk() for the two
procs looks like this:
pread()+0x7 /sys/src/libc/9syscall/pread.s:5
read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e /sys/src/ape/lib/bsd/send.c:30
libssh2_packet_read(session=0x1102f8)+0x176
/usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7
/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
fromnet(c=0x114460,s=0x1102f8)+0x2e
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
main(argc=0x2,argv=0xdfffef94)+0x47c
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
_main+0x31 /sys/src/libc/386/main9.s:16
The the read on fd 5. That's the socket. Here is the other proc.
_PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
_READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f /sys/src/ape/lib/ap/plan9/9read.c:10
_copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
_startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0,efds=0x0,nfds=0x6)+0xe9
/sys/src/ape/lib/ap/plan9/_buf.c:292
libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0x7b
/usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x69
/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
fromnet(c=0x114460,s=0x1102f8)+0x2e
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
main(argc=0x2,argv=0xdfffef94)+0x47c
/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
_main+0x31 /sys/src/libc/386/main9.s:16
ok, I think this stack is a bit messed up, since I don't see how we
can have the coyproc in the call chain from select(), but ... is it?
I realize there is very little information here, sorry ... here's what
is bothering me. It seems we have two procs hanging on a read on fd 5.
I think the copyproc and some other proc are in conflict but ... I am
unsure. The problems we are seeing might be explained by the wrong
proc grabbing output at the wrong time -- it feels like a race
condition. And acid trips we can take to hammer this one down?
Anyone ever done a select on a socket in ape?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9fans] a question on APE
2007-12-18 1:36 [9fans] a question on APE ron minnich
@ 2007-12-18 1:44 ` andrey mirtchovski
2007-12-18 7:32 ` Kernel Panic
1 sibling, 0 replies; 5+ messages in thread
From: andrey mirtchovski @ 2007-12-18 1:44 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> Anyone ever done a select on a socket in ape?
>
the links port does that and it works fine, at least for a while.
the code snippet you gave is suspect, although i don't know how that
relates to the stack trace. libssh2 lacks documentation, but from the
little that i read libssh2_channel_read() can return zero without
receiving EOF from the remote site. one needs to go through
libssh2_channel_eof() or something to that effect to check whether the
other side closed, and the code above doesn't do it (it's my fault, i
hadn't gotten to debugging that part).
then the code needs to do it for stderr too :)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9fans] a question on APE
2007-12-18 1:36 [9fans] a question on APE ron minnich
2007-12-18 1:44 ` andrey mirtchovski
@ 2007-12-18 7:32 ` Kernel Panic
2007-12-18 7:54 ` Kernel Panic
1 sibling, 1 reply; 5+ messages in thread
From: Kernel Panic @ 2007-12-18 7:32 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
ron minnich wrote:
>we're doing some work here with Andrey's port of ssh2. It *almost*
>works. But I'm seeing a stack trace I don't understand.
>
>I can't give you all the details -- it's ssh, therefore it is pretty
>awful -- but here is the short form: There is a proc called fromnet()
>which has this inner loop:
> for(;;){
> if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
> write(1, buf, n);
> else
> goto Donenet;
> }
>
>When this proc is entered, ape has forked off two procs to handle the
>fd 'c'. From the fromnet function, we see the libssh2_channel_read
>does a select. here is where I get confused. The stk() for the two
>procs looks like this:
>pread()+0x7 /sys/src/libc/9syscall/pread.s:5
>read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
>recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e /sys/src/ape/lib/bsd/send.c:30
>libssh2_packet_read(session=0x1102f8)+0x176
>/usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
>libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7
>/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
>fromnet(c=0x114460,s=0x1102f8)+0x2e
>/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
>main(argc=0x2,argv=0xdfffef94)+0x47c
>/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
>_main+0x31 /sys/src/libc/386/main9.s:16
>
>The the read on fd 5. That's the socket. Here is the other proc.
>
>_PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
>_READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f /sys/src/ape/lib/ap/plan9/9read.c:10
>_copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
>_startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
>select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0,efds=0x0,nfds=0x6)+0xe9
>/sys/src/ape/lib/ap/plan9/_buf.c:292
>libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0x7b
>/usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
>libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x69
>/usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
>fromnet(c=0x114460,s=0x1102f8)+0x2e
>/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
>main(argc=0x2,argv=0xdfffef94)+0x47c
>/usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
>_main+0x31 /sys/src/libc/386/main9.s:16
>
>ok, I think this stack is a bit messed up, since I don't see how we
>can have the coyproc in the call chain from select(), but ... is it?
>
>
>
Plan9 has no select functionality. Select is emulated in APE by forking
a childproc that reads an fd and
fills a buffer (on a shared memory area). Read() should then pick up the
data from the buffer and
wakeup the reader proc if it sleeps (because the buffer got filled up).
Select() will startup such a
reader proc (startbuf()) if it is not already "bufferd" and then check
if the buffer has data available,
so the stacktrace looks valid to me.
Maybe the bufferd filedescriptors doesnt work with the recv() call and
are only implemented for read()?
I think you should find some kind of switch in read() that checks if the
fd is bufferd and then calls
some _buf.c function that copies the data from the buffer.
Maybe this is missing for recv()?
>I realize there is very little information here, sorry ... here's what
>is bothering me. It seems we have two procs hanging on a read on fd 5.
>I think the copyproc and some other proc are in conflict but ... I am
>unsure. The problems we are seeing might be explained by the wrong
>proc grabbing output at the wrong time -- it feels like a race
>condition. And acid trips we can take to hammer this one down?
>
>Anyone ever done a select on a socket in ape?
>
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9fans] a question on APE
2007-12-18 7:32 ` Kernel Panic
@ 2007-12-18 7:54 ` Kernel Panic
2007-12-18 17:38 ` ron minnich
0 siblings, 1 reply; 5+ messages in thread
From: Kernel Panic @ 2007-12-18 7:54 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Kernel Panic wrote:
> ron minnich wrote:
>
>> we're doing some work here with Andrey's port of ssh2. It *almost*
>> works. But I'm seeing a stack trace I don't understand.
>>
>> I can't give you all the details -- it's ssh, therefore it is pretty
>> awful -- but here is the short form: There is a proc called fromnet()
>> which has this inner loop:
>> for(;;){
>> if((n = libssh2_channel_read(c, buf, Bufsize)) > 0)
>> write(1, buf, n);
>> else
>> goto Donenet;
>> }
>>
>> When this proc is entered, ape has forked off two procs to handle the
>> fd 'c'. From the fromnet function, we see the libssh2_channel_read
>> does a select. here is where I get confused. The stk() for the two
>> procs looks like this:
>> pread()+0x7 /sys/src/libc/9syscall/pread.s:5
>> read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7
>> recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e
>> /sys/src/ape/lib/bsd/send.c:30
>> libssh2_packet_read(session=0x1102f8)+0x176
>> /usr/bootes/libssh2/libssh2-0.18/src/transport.c:326
>> libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7
>>
>> /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442
>> fromnet(c=0x114460,s=0x1102f8)+0x2e
>> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
>> main(argc=0x2,argv=0xdfffef94)+0x47c
>> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
>> _main+0x31 /sys/src/libc/386/main9.s:16
>>
>> The the read on fd 5. That's the socket. Here is the other proc.
>>
>> _PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5
>> _READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f
>> /sys/src/ape/lib/ap/plan9/9read.c:10
>> _copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166
>> _startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107
>> select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0,efds=0x0,nfds=0x6)+0xe9
>>
>> /sys/src/ape/lib/ap/plan9/_buf.c:292
>> libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0x7b
>> /usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054
>> libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x69
>>
>> /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408
>> fromnet(c=0x114460,s=0x1102f8)+0x2e
>> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75
>> main(argc=0x2,argv=0xdfffef94)+0x47c
>> /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253
>> _main+0x31 /sys/src/libc/386/main9.s:16
>>
>> ok, I think this stack is a bit messed up, since I don't see how we
>> can have the coyproc in the call chain from select(), but ... is it?
>
Ahh... just looked at the code...
Ok, as i expected... recv() calls a different read() from
/sys/src/libc/9sys/read.c. It will all work if
recv() would call the thing from this one:
/sys/src/ape/lib/ap/plan9/read.c.
I guess you could work arround it by using read() instead of recv() in
ssh-code, but the right
thing is to fix ape and have recv() call the read() from ap/plan9/read.c.
> Plan9 has no select functionality. Select is emulated in APE by
> forking a childproc that reads an fd and
> fills a buffer (on a shared memory area). Read() should then pick up
> the data from the buffer and
> wakeup the reader proc if it sleeps (because the buffer got filled
> up). Select() will startup such a
> reader proc (startbuf()) if it is not already "bufferd" and then check
> if the buffer has data available,
> so the stacktrace looks valid to me.
>
> Maybe the bufferd filedescriptors doesnt work with the recv() call and
> are only implemented for read()?
> I think you should find some kind of switch in read() that checks if
> the fd is bufferd and then calls
> some _buf.c function that copies the data from the buffer.
> Maybe this is missing for recv()?
>
>> I realize there is very little information here, sorry ... here's what
>> is bothering me. It seems we have two procs hanging on a read on fd 5.
>> I think the copyproc and some other proc are in conflict but ... I am
>> unsure. The problems we are seeing might be explained by the wrong
>> proc grabbing output at the wrong time -- it feels like a race
>> condition. And acid trips we can take to hammer this one down?
>>
>> Anyone ever done a select on a socket in ape?
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [9fans] a question on APE
2007-12-18 7:54 ` Kernel Panic
@ 2007-12-18 17:38 ` ron minnich
0 siblings, 0 replies; 5+ messages in thread
From: ron minnich @ 2007-12-18 17:38 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Dec 17, 2007 11:54 PM, Kernel Panic <cinap_lenrek@gmx.de> wrote:
> Ahh... just looked at the code...
> Ok, as i expected... recv() calls a different read() from
> /sys/src/libc/9sys/read.c. It will all work if
> recv() would call the thing from this one:
> /sys/src/ape/lib/ap/plan9/read.c.
I'm not seeing that. I would be happy if you are right but I can't confirm it.
I run acid on the binary:
recv 0x000cd7b6 SUBL $0x10,SP
recv+0x3 0x000cd7b9 MOVL flags+0xc(FP),AX
recv+0x7 0x000cd7bd ANDL $0x1,AX
recv+0xa 0x000cd7c0 CMPL AX,$0x0
recv+0xd 0x000cd7c3 JEQ recv+0x22(SB)
recv+0xf 0x000cd7c5 MOVL $0x29,errno(SB)
recv+0x19 0x000cd7cf MOVL $0xffffffff,AX
recv+0x1e 0x000cd7d4 ADDL $0x10,SP
recv+0x21 0x000cd7d7 RET
recv+0x22 0x000cd7d8 MOVL fd+0x0(FP),CX
recv+0x26 0x000cd7dc MOVL CX,0x0(SP)
recv+0x29 0x000cd7df MOVL a+0x4(FP),CX
recv+0x2d 0x000cd7e3 MOVL CX,0x4(SP)
recv+0x31 0x000cd7e7 MOVL n+0x8(FP),CX
recv+0x35 0x000cd7eb MOVL CX,0x8(SP)
recv+0x39 0x000cd7ef CALL read(SB)
recv+0x3e 0x000cd7f4 ADDL $0x10,SP
recv+0x41 0x000cd7f7 RET
so it calls read.
Read is this:
read 0x000c3834 SUBL $0x28,SP
read+0x3 0x000c3837 MOVL nbytes+0x8(FP),DI
read+0x7 0x000c383b MOVL buf+0x4(FP),SI
read+0xb 0x000c383f MOVL d+0x0(FP),BX
read+0xf 0x000c3843 CMPL BX,$0x0
read+0x12 0x000c3846 JLT read+0x19(SB)
read+0x14 0x000c3848 CMPL BX,$0x60
read+0x17 0x000c384b JLT read+0x2c(SB)
read+0x19 0x000c384d MOVL $0x4,errno(SB)
read+0x23 0x000c3857 MOVL $0xffffffff,AX
read+0x28 0x000c385c ADDL $0x28,SP
read+0x2b 0x000c385f RET
read+0x2c 0x000c3860 LEAL 0x0(BX)(BX*4),CX
read+0x2f 0x000c3863 SHLL $0x2,CX
read+0x32 0x000c3866 LEAL _fdinfo(SB)(CX*1),AX
read+0x39 0x000c386d MOVL 0x0(AX),AX
read+0x3b 0x000c386f ANDL $0x2,AX
read+0x3e 0x000c3872 CMPL AX,$0x0
read+0x41 0x000c3875 JEQ read+0x19(SB)
read+0x43 0x000c3877 CMPL DI,$0x0
read+0x46 0x000c387a JHI read+0x4e(SB)
read+0x48 0x000c387c XORL AX,AX
read+0x4a 0x000c387e ADDL $0x28,SP
read+0x4d 0x000c3881 RET
read+0x4e 0x000c3882 CMPL SI,$0x0
read+0x51 0x000c3885 JNE read+0x66(SB)
read+0x53 0x000c3887 MOVL $0x9,errno(SB)
read+0x5d 0x000c3891 MOVL $0xffffffff,AX
read+0x62 0x000c3896 ADDL $0x28,SP
read+0x65 0x000c3899 RET
which is the ape version. There is only one read symbol in the binary,
and it's a T.
So I am not convinced the recv is calling the wrong thing. That said,
I'm still going to change it in source to see what happens :-)
ron
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-12-18 17:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-18 1:36 [9fans] a question on APE ron minnich
2007-12-18 1:44 ` andrey mirtchovski
2007-12-18 7:32 ` Kernel Panic
2007-12-18 7:54 ` Kernel Panic
2007-12-18 17:38 ` ron minnich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).