From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <13426df10712171736i60f181aax4fdf2cf57594fa87@mail.gmail.com> Date: Mon, 17 Dec 2007 17:36:43 -0800 From: "ron minnich" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: [9fans] a question on APE Topicbox-Message-UUID: 1c015a2a-ead3-11e9-9d60-3106f5b1d025 we're doing some work here with Andrey's port of ssh2. It *almost* works. But I'm seeing a stack trace I don't understand. I can't give you all the details -- it's ssh, therefore it is pretty awful -- but here is the short form: There is a proc called fromnet() which has this inner loop: for(;;){ if((n = libssh2_channel_read(c, buf, Bufsize)) > 0) write(1, buf, n); else goto Donenet; } When this proc is entered, ape has forked off two procs to handle the fd 'c'. From the fromnet function, we see the libssh2_channel_read does a select. here is where I get confused. The stk() for the two procs looks like this: pread()+0x7 /sys/src/libc/9syscall/pread.s:5 read(fd=0x5,buf=0x110414,n=0x1000)+0x2f /sys/src/libc/9sys/read.c:7 recv(flags=0x0,fd=0x5,a=0x110414,n=0x1000)+0x3e /sys/src/ape/lib/bsd/send.c:30 libssh2_packet_read(session=0x1102f8)+0x176 /usr/bootes/libssh2/libssh2-0.18/src/transport.c:326 libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x2a7 /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1442 fromnet(c=0x114460,s=0x1102f8)+0x2e /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75 main(argc=0x2,argv=0xdfffef94)+0x47c /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253 _main+0x31 /sys/src/libc/386/main9.s:16 The the read on fd 5. That's the socket. Here is the other proc. _PREAD()+0x7 /sys/src/ape/lib/ap/syscall/_PREAD.s:5 _READ(fd=0x5,buf=0x600003c,n=0x2000)+0x2f /sys/src/ape/lib/ap/plan9/9read.c:10 _copyproc(b=0x6000028,fd=0x5)+0x86 /sys/src/ape/lib/ap/plan9/_buf.c:166 _startbuf(fd=0x5)+0x1dd /sys/src/ape/lib/ap/plan9/_buf.c:107 select(timeout=0xdfffde90,rfds=0xdfffde80,wfds=0x0,efds=0x0,nfds=0x6)+0xe9 /sys/src/ape/lib/ap/plan9/_buf.c:292 libssh2_waitsocket(session=0x1102f8,seconds=0x0)+0x7b /usr/bootes/libssh2/libssh2-0.18/src/packet.c:1054 libssh2_channel_read_ex(channel=0x114460,buflen=0x1000,stream_id=0x0,buf=0xdfffdee8)+0x69 /usr/bootes/libssh2/libssh2-0.18/src/channel.c:1408 fromnet(c=0x114460,s=0x1102f8)+0x2e /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:75 main(argc=0x2,argv=0xdfffef94)+0x47c /usr/bootes/libssh2/libssh2-0.18/clients/ssh2.c:253 _main+0x31 /sys/src/libc/386/main9.s:16 ok, I think this stack is a bit messed up, since I don't see how we can have the coyproc in the call chain from select(), but ... is it? I realize there is very little information here, sorry ... here's what is bothering me. It seems we have two procs hanging on a read on fd 5. I think the copyproc and some other proc are in conflict but ... I am unsure. The problems we are seeing might be explained by the wrong proc grabbing output at the wrong time -- it feels like a race condition. And acid trips we can take to hammer this one down? Anyone ever done a select on a socket in ape?