* Re: [9fans] 9vx (is this the right list)? import issue [not found] <<dd6fe68a0909222011u4243953dged01d77ecdc93e46@mail.gmail.com> @ 2009-09-23 3:17 ` erik quanstrom 2009-09-23 4:11 ` Russ Cox 0 siblings, 1 reply; 29+ messages in thread From: erik quanstrom @ 2009-09-23 3:17 UTC (permalink / raw) To: 9fans On Tue Sep 22 23:12:27 EDT 2009, rsc@swtch.com wrote: > The extra tracking that has been proposed is unnecessary, > and waiting for the Rflush doesn't make sense. The assumption > is that the Rflush isn't ever going to arrive, because the connection > is dead. what do you mean by "dead"? i/o to the same channel works fine. - erik ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 3:17 ` [9fans] 9vx (is this the right list)? import issue erik quanstrom @ 2009-09-23 4:11 ` Russ Cox 0 siblings, 0 replies; 29+ messages in thread From: Russ Cox @ 2009-09-23 4:11 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Tuesday, September 22, 2009, erik quanstrom <quanstro@quanstro.net> wrote: > On Tue Sep 22 23:12:27 EDT 2009, rsc@swtch.com wrote: >> The extra tracking that has been proposed is unnecessary, >> and waiting for the Rflush doesn't make sense. The assumption >> is that the Rflush isn't ever going to arrive, because the connection >> is dead. > > what do you mean by "dead"? i/o to the same channel works > fine. I mean that the code as written is assuming that if a read or write errors out, it can only happen for one of two reasons: 1) there was an interrupt note, in which case strcmp(error, Eintr) == 0 2) there has been an error on the 9P connection, in which case strcmp(error, Eintr) != 0 and the connection will never work again. My suggestion is to enforce #2: if a non-interrupt error happens, mark the connection so that the kernel won't even try to use it again. Separately, you might investigate what error is happening that violates the assumption above. In 9vx, it is easy: case #1 happened but the error was spelled wrong. Russ ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <<dd6fe68a0909222111y1af0f4a2qd30a3b4eded30b2b@mail.gmail.com>]
* Re: [9fans] 9vx (is this the right list)? import issue [not found] <<dd6fe68a0909222111y1af0f4a2qd30a3b4eded30b2b@mail.gmail.com> @ 2009-09-23 4:56 ` erik quanstrom 2009-09-23 18:52 ` Russ Cox 0 siblings, 1 reply; 29+ messages in thread From: erik quanstrom @ 2009-09-23 4:56 UTC (permalink / raw) To: 9fans > I mean that the code as written is assuming that if a read or write > errors out, it can only happen for one of two reasons: > 1) there was an interrupt note, in which case strcmp(error, Eintr) == 0 > 2) there has been an error on the 9P connection, in which case > strcmp(error, Eintr) != 0 and the connection will never work again. > > My suggestion is to enforce #2: if a non-interrupt error happens, > mark the connection so that the kernel won't even try to use it > again. > > Separately, you might investigate what error is happening that > violates the assumption above. In 9vx, it is easy: case #1 happened > but the error was spelled wrong. how sure are we that 1 holds? couldn't there be other, legitimate and transient errors? could a user-delivered note sneak in and confuse the issue? the problem with my solution is that it could leak tags. i don't see this as a significant problem, but i could be wrong. i think the connection would need to be pretty broken for tags to be leaked. marking connections dead also adds tracking, but in a new place. it could have trouble if ever a transient error happens when strcmp(error, Eintr) == 0, which can happen in 9vx or dt. - erik ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 4:56 ` erik quanstrom @ 2009-09-23 18:52 ` Russ Cox 2009-09-23 19:12 ` ron minnich 2009-09-23 21:25 ` erik quanstrom 0 siblings, 2 replies; 29+ messages in thread From: Russ Cox @ 2009-09-23 18:52 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > how sure are we that 1 holds? couldn't there be other, > legitimate and transient errors? could a user-delivered > note sneak in and confuse the issue? no. at least not if the kernel is working properly. that's why i said devmnt should enforce the assumption. it's at most a couple lines of extra code, whereas the diff you posted was quite a bit longer. this is a simplifying assumption in the code, so called because it simplifies the code. if you throw away the assumption, you throw away the simplicity, and not just here. rather than throw away the simplicity, work to understand why the assumption is being violated (in 9vx it is the bogus spelling of "interrupted") and fix the violation instead. russ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 18:52 ` Russ Cox @ 2009-09-23 19:12 ` ron minnich 2009-09-23 19:25 ` erik quanstrom 2009-09-23 19:26 ` Russ Cox 2009-09-23 21:25 ` erik quanstrom 1 sibling, 2 replies; 29+ messages in thread From: ron minnich @ 2009-09-23 19:12 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs OK, so what happens in 9vx. mount sources cd /n/blah/blah grep full *.c hit DEL devip.c read on Qdata fails, and we do this: if(r < 0){ oserrstr(); nexterror(); } So just need to fix oserrstr() or fix this in devip itself? I vote oserrstr, lucho votes fix this little bit of code. Anyway, there it is. We're watching this talk on nested VMs on the x86 machines. Oops. hardware botch. You have to do strange things to make it all work. I can't believe nobody read the IBM papers before they designed this stuff in. ron ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 19:12 ` ron minnich @ 2009-09-23 19:25 ` erik quanstrom 2009-09-23 19:26 ` Russ Cox 1 sibling, 0 replies; 29+ messages in thread From: erik quanstrom @ 2009-09-23 19:25 UTC (permalink / raw) To: 9fans > So just need to fix oserrstr() or fix this in devip itself? I vote > oserrstr, lucho votes fix this little bit of > code. how many other errors are lurking in osstrerror()? there are lots of assumptions about the exact errstrs. - erik ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 19:12 ` ron minnich 2009-09-23 19:25 ` erik quanstrom @ 2009-09-23 19:26 ` Russ Cox 2009-09-23 20:33 ` ron minnich 1 sibling, 1 reply; 29+ messages in thread From: Russ Cox @ 2009-09-23 19:26 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I think that oserrstr should be fixed. You want "interrupted" to be the error string for many places not just right here. Other programs look for strstr(error, "interrupt") for example Russ On Wednesday, September 23, 2009, ron minnich <rminnich@gmail.com> wrote: > OK, so what happens in 9vx. > > mount sources > cd /n/blah/blah > grep full *.c > hit DEL > devip.c read on Qdata fails, and we do this: > if(r < 0){ > oserrstr(); > nexterror(); > } > > So just need to fix oserrstr() or fix this in devip itself? I vote > oserrstr, lucho votes fix this little bit of > code. > > Anyway, there it is. > > We're watching this talk on nested VMs on the x86 machines. Oops. > hardware botch. You have to do strange things to make it all work. I > can't believe nobody read the IBM papers before they designed this > stuff in. > > ron > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 19:26 ` Russ Cox @ 2009-09-23 20:33 ` ron minnich 2009-09-23 20:35 ` ron minnich 0 siblings, 1 reply; 29+ messages in thread From: ron minnich @ 2009-09-23 20:33 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs sent in via codereview but I missed a debug print I left in. Sorry. I assume you can reject it so I'm sending the correct patch. ron ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 20:33 ` ron minnich @ 2009-09-23 20:35 ` ron minnich 2009-09-23 22:13 ` Russ Cox 0 siblings, 1 reply; 29+ messages in thread From: ron minnich @ 2009-09-23 20:35 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs not having done this before, the reference is URL: http://codereview.appspot.com/122046 ron ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 20:35 ` ron minnich @ 2009-09-23 22:13 ` Russ Cox 0 siblings, 0 replies; 29+ messages in thread From: Russ Cox @ 2009-09-23 22:13 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Wednesday, September 23, 2009, ron minnich <rminnich@gmail.com> wrote: > not having done this before, the reference is URL: > http://codereview.appspot.com/122046 Thanks, Ron. Fix applied. Russ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 18:52 ` Russ Cox 2009-09-23 19:12 ` ron minnich @ 2009-09-23 21:25 ` erik quanstrom 1 sibling, 0 replies; 29+ messages in thread From: erik quanstrom @ 2009-09-23 21:25 UTC (permalink / raw) To: 9fans On Wed Sep 23 14:54:47 EDT 2009, rsc@swtch.com wrote: > > how sure are we that 1 holds? couldn't there be other, > > legitimate and transient errors? could a user-delivered > > note sneak in and confuse the issue? > > no. at least not if the kernel is working properly. > that's why i said devmnt should enforce the assumption. > it's at most a couple lines of extra code, > whereas the diff you posted was quite a bit longer. it seems to be a big assumption that the whole ip stack and the ethernet driver know the difference between being interrupted before sending or queuing the packet or after. the comment in qbwrite seems to say that you can get interrupted after the pkt has been queued. my approach has the advantage of sidestepping this problem. i don't think 13 changed lines is unreasonable. (without verbose debugging.) - erik ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <<fadaba2046122acf656140c0618e1d1e@ladd.quanstro.net>]
* Re: [9fans] 9vx (is this the right list)? import issue [not found] <<fadaba2046122acf656140c0618e1d1e@ladd.quanstro.net> @ 2009-09-23 2:41 ` erik quanstrom 2009-09-23 3:11 ` Russ Cox 0 siblings, 1 reply; 29+ messages in thread From: erik quanstrom @ 2009-09-23 2:41 UTC (permalink / raw) To: 9fans full versions in /n/sources/contrib/quanstro/devmnt.c /n/sources/contrib/quanstro/vx32devmnt.c - erik ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-23 2:41 ` erik quanstrom @ 2009-09-23 3:11 ` Russ Cox 0 siblings, 0 replies; 29+ messages in thread From: Russ Cox @ 2009-09-23 3:11 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs The extra tracking that has been proposed is unnecessary, and waiting for the Rflush doesn't make sense. The assumption is that the Rflush isn't ever going to arrive, because the connection is dead. The problem is here: void mountio(Mnt *m, Mntrpc *r) { int n; while(waserror()) { if(m->rip == up) mntgate(m); if(strcmp(up->errstr, Eintr) != 0){ mntflushfree(m, r); <<<< nexterror(); } r = mntflushalloc(r, m->msize); } The implicit assumption is that if reading from the mounted connection gets any error other than Eintr, the connection is dead and will never receive another message. The call to mntflushfree cleanly tears down the messages this proc is waiting for by behaving as if the flush responses had come back in. In 9vx, the problem is that the errstr is "Interrupted system call" (the Unix string for errno EINTR) instead of Eintr == "interrupted". The fix is to correct whatever has translated EINTR to "Interrupted system call" to use the correct string. Drawterm probably has the same issue and the same fix. There are fewer interrupts flying around in drawterm. The kernel may even have the same issue, if a mounted connection can get an error (other than "interrupted") out of read or write but then work at the next call. A way to avoid this problem in the future is to mark the mnt (m) as dead, so that no other procs will try to read from the connection and get confused. Russ ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <<13426df10909221532t5de9f010pfeb2ca2c3b44db89@mail.gmail.com>]
* Re: [9fans] 9vx (is this the right list)? import issue [not found] <<13426df10909221532t5de9f010pfeb2ca2c3b44db89@mail.gmail.com> @ 2009-09-23 2:36 ` erik quanstrom 0 siblings, 0 replies; 29+ messages in thread From: erik quanstrom @ 2009-09-23 2:36 UTC (permalink / raw) To: 9fans ron, this works for me but my symptoms were a little different than yours. before: mnt: proc cat 290: mismatch from #D/ssl/1/data /n/coraid/lib/unicode rep 0x7fcd8c04e190 tag 4 fid 1603 T120 R117 rp 4 after: WOOT! caught stale reply 6; type 117 note: the poor organization of this patch is geared toward keeping the diff small. tagallocd and freetag should be moved above mountmux. i didn't use russ' ed scripts because i was too lazy. - erik 9vx version ; ; diff -c devmnt.c devmnt.c~ devmnt.c:945,954 - devmnt.c~:945,951 void mountmux(Mnt *m, Mntrpc *r) { - int bad; Mntrpc **l, *q; - int tagallocd(int); - void freetag(int); lock(&m->lk); l = &m->queue; devmnt.c:977,992 - devmnt.c~:974,981 } l = &q->list; } - bad = 1; - if(tagallocd(r->reply.tag)){ - freetag(r->reply.tag); - bad = 0; - } unlock(&m->lk); - if(bad) - print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type); - else - print("WOOT! caught stale reply %ud; type %d\n", r->reply.tag, r->reply.type); + print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type); } /* devmnt.c:1054,1065 - devmnt.c~:1043,1048 return NOTAG; } - int - tagallocd(int t) - { - return mntalloc.tagmask[t>>TAGSHIFT] & 1<<(t&TAGMASK); - } - void freetag(int t) { devmnt.c:1125,1136 - devmnt.c~:1108,1116 if(mntalloc.nrpcfree >= 10){ free(r->rpc); free(r); - if(r->done != 2) - freetag(r->request.tag); + freetag(r->request.tag); } else{ - if(r->done == 2) - r->request.tag = alloctag(); r->list = mntalloc.rpcfree; mntalloc.rpcfree = r; mntalloc.nrpcfree++; devmnt.c:1145,1151 - devmnt.c~:1125,1131 Mntrpc **l, *f; lock(&m->lk); - r->done = 2; + r->done = 1; l = &m->queue; for(f = *l; f; f = f->list) { plan 9 version ; diffy -c devmnt.c /n/dump/2009/0922/sys/src/9/port/devmnt.c:932,938 - devmnt.c:932,941 void mountmux(Mnt *m, Mntrpc *r) { + int bad; Mntrpc **l, *q; + int tagallocd(int); + void freetag(int); lock(m); l = &m->queue; /n/dump/2009/0922/sys/src/9/port/devmnt.c:961,968 - devmnt.c:964,977 } l = &q->list; } + bad = 1; + if(tagallocd(r->reply.tag)){ + freetag(r->reply.tag); + bad = 0; + } unlock(m); - print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type); + if(bad) + print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type); } /* /n/dump/2009/0922/sys/src/9/port/devmnt.c:1030,1035 - devmnt.c:1039,1050 return NOTAG; } + int + tagallocd(int t) + { + return mntalloc.tagmask[t>>TAGSHIFT] & 1<<(t&TAGMASK); + } + void freetag(int t) { /n/dump/2009/0922/sys/src/9/port/devmnt.c:1095,1103 - devmnt.c:1110,1121 if(mntalloc.nrpcfree >= 10){ free(r->rpc); free(r); - freetag(r->request.tag); + if(r->done != 2) + freetag(r->request.tag); } else{ + if(r->done == 2) + r->request.tag = alloctag(); r->list = mntalloc.rpcfree; mntalloc.rpcfree = r; mntalloc.nrpcfree++; /n/dump/2009/0922/sys/src/9/port/devmnt.c:1112,1118 - devmnt.c:1130,1136 Mntrpc **l, *f; lock(m); - r->done = 1; + r->done = 2; l = &m->queue; for(f = *l; f; f = f->list) { ------------------------------------------------------------------ plan 9 version ; - diffy diffy -c devmnt.c /n/dump/2009/0922/sys/src/9/port/devmnt.c:932,938 - devmnt.c:932,941 void mountmux(Mnt *m, Mntrpc *r) { + int bad; Mntrpc **l, *q; + int tagallocd(int); + void freetag(int); lock(m); l = &m->queue; /n/dump/2009/0922/sys/src/9/port/devmnt.c:961,968 - devmnt.c:964,979 } l = &q->list; } + bad = 1; + if(tagallocd(r->reply.tag)){ + freetag(r->reply.tag); + bad = 0; + } unlock(m); - print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type); + if(bad) + print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type); + else + print("WOOT! caught stale reply %ud; type %d\n", r->reply.tag, r->reply.type); } /* /n/dump/2009/0922/sys/src/9/port/devmnt.c:1030,1035 - devmnt.c:1041,1052 return NOTAG; } + int + tagallocd(int t) + { + return mntalloc.tagmask[t>>TAGSHIFT] & 1<<(t&TAGMASK); + } + void freetag(int t) { /n/dump/2009/0922/sys/src/9/port/devmnt.c:1095,1103 - devmnt.c:1112,1123 if(mntalloc.nrpcfree >= 10){ free(r->rpc); free(r); - freetag(r->request.tag); + if(r->done != 2) + freetag(r->request.tag); } else{ + if(r->done == 2) + r->request.tag = alloctag(); r->list = mntalloc.rpcfree; mntalloc.rpcfree = r; mntalloc.nrpcfree++; /n/dump/2009/0922/sys/src/9/port/devmnt.c:1112,1118 - devmnt.c:1132,1138 Mntrpc **l, *f; lock(m); - r->done = 1; + r->done = 2; l = &m->queue; for(f = *l; f; f = f->list) { ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <<13426df10909221420x1298139fhdeb4f0803924e5a3@mail.gmail.com>]
* Re: [9fans] 9vx (is this the right list)? import issue [not found] <<13426df10909221420x1298139fhdeb4f0803924e5a3@mail.gmail.com> @ 2009-09-22 21:23 ` erik quanstrom 2009-09-22 22:32 ` ron minnich 0 siblings, 1 reply; 29+ messages in thread From: erik quanstrom @ 2009-09-22 21:23 UTC (permalink / raw) To: 9fans On Tue Sep 22 17:22:08 EDT 2009, rminnich@gmail.com wrote: > here's one last "caught in the act" scenario. I have a print in > mntralloc when I reuse something. > > The fid is being read and clunked. But the Tclunk goes out before the > Rread comes in. Oops. > > > Reuse 1 > Tread tag 1 fid 454 offset 0 count 8192 > Reuse 1 > Tclunk tag 1 fid 454 > reply Rread tag 1 count 8192 '#include "u.h" > #include "../port/lib.h" > #include "mem.h" > #includ' > mnt: proc grep 78: mismatch from /net/tcp/0/data > /n/o/sys/src/9/pc/apic.c rep 0x168ed20 tag 1 fid 454 T120 R117 rp 1 you're still triggering this with a note? - erik ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 21:23 ` erik quanstrom @ 2009-09-22 22:32 ` ron minnich 0 siblings, 0 replies; 29+ messages in thread From: ron minnich @ 2009-09-22 22:32 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Tue, Sep 22, 2009 at 2:23 PM, erik quanstrom <quanstro@quanstro.net> wrote: > On Tue Sep 22 17:22:08 EDT 2009, rminnich@gmail.com wrote: >> here's one last "caught in the act" scenario. I have a print in >> mntralloc when I reuse something. >> >> The fid is being read and clunked. But the Tclunk goes out before the >> Rread comes in. Oops. >> >> >> Reuse 1 >> Tread tag 1 fid 454 offset 0 count 8192 >> Reuse 1 >> Tclunk tag 1 fid 454 >> reply Rread tag 1 count 8192 '#include "u.h" >> #include "../port/lib.h" >> #include "mem.h" >> #includ' >> mnt: proc grep 78: mismatch from /net/tcp/0/data >> /n/o/sys/src/9/pc/apic.c rep 0x168ed20 tag 1 fid 454 T120 R117 rp 1 > > you're still triggering this with a note? DEL in rio. ron ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <<df49a7370909221158u3f071cc3j125c85241c5088e6@mail.gmail.com>]
* Re: [9fans] 9vx (is this the right list)? import issue [not found] <<df49a7370909221158u3f071cc3j125c85241c5088e6@mail.gmail.com> @ 2009-09-22 19:01 ` erik quanstrom 2009-09-22 19:34 ` roger peppe 0 siblings, 1 reply; 29+ messages in thread From: erik quanstrom @ 2009-09-22 19:01 UTC (permalink / raw) To: 9fans > if it's ambiguous, then the tag should indeed be put on hold, > because there's no way to get it right. how do we prevent all tags from being on hold? there's no way to get that right, either. - erik ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 19:01 ` erik quanstrom @ 2009-09-22 19:34 ` roger peppe 2009-09-22 21:12 ` ron minnich 0 siblings, 1 reply; 29+ messages in thread From: roger peppe @ 2009-09-22 19:34 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2009/9/22 erik quanstrom <quanstro@quanstro.net>: >> if it's ambiguous, then the tag should indeed be put on hold, >> because there's no way to get it right. > > how do we prevent all tags from being on hold? > there's no way to get that right, either. well, it's legal to send several flushes for the same tag, and it's also legal to send a flush of a non-existent tag, so if there's a case of ambiguity, we could resend the flush and drop the original rpc struct when the reply to that comes back (or the original reply comes back). ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 19:34 ` roger peppe @ 2009-09-22 21:12 ` ron minnich 2009-09-22 21:20 ` ron minnich 0 siblings, 1 reply; 29+ messages in thread From: ron minnich @ 2009-09-22 21:12 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 156 bytes --] here is a 9p trace of the problem. See line 43. Topen and Tclunk go out with same tag. This is with a print in the rpc code as suggested by Russ. ron [-- Attachment #2: y --] [-- Type: application/octet-stream, Size: 3026 bytes --] Twrite tag 1 fid 365 offset 54 count 6 'term% ' reply Rwrite tag 1 count 6 Tread tag 1 fid 364 offset 68 count 512 reply Rread tag 1 count 14 'grep full *.c ' Twalk tag 1 fid 421 newfid 426 nwname 0 reply Rwalk tag 1 nwqid 0 Topen tag 1 fid 426 mode 0 reply Ropen tag 1 qid (0000000000142ef2 318 d) iounit 8192 Tstat tag 1 fid 426 reply Rstat tag 1 stat 'pc' 'geoff' 'sys' 'geoff' q (0000000000142ef2 318 d) m 01777777777760000000775 at 1253652282 mt 1207255391 l 0 t 77 d 123312 Tread tag 1 fid 426 offset 0 count 8192 reply Rread tag 1 count 8127 '49004d00 b0e10100 00010000 00f32e14 00000000 00b40100 00b1434a 4aad5c62 43dd0b00 00000000 000d0061 70626f6f 74737472 61702e73 05006765 6f666603' Tread tag 1 fid 426 offset 8127 count 8192 reply Rread tag 1 count 2754 '45004d00 b0e10100 00050000 00842f14 00000000 00b40100 00d41cb9 4a90dd8b 47609600 00000000 00090073 64696168 63692e63 05006765 6f666603 00737973' Tread tag 1 fid 426 offset 10881 count 8192 reply Rread tag 1 count 0 '' Tclunk tag 1 fid 426 reply Rclunk tag 1 Twalk tag 1 fid 421 newfid 425 nwname 1 0:grep reply Rerror tag 1 ename './sys/src/9/pc/grep' does not exist Twalk tag 1 fid 421 newfid 428 nwname 1 0:apic.c reply Rwalk tag 1 nwqid 1 0:(0000000000142ef4 2 ) Topen tag 1 fid 428 mode 0 reply Ropen tag 1 qid (0000000000142ef4 2 ) iounit 8192 Tread tag 1 fid 428 offset 0 count 8192 reply Rread tag 1 count 8192 '#include "u.h" #include "../port/lib.h" #include "mem.h" #includ' Tread tag 1 fid 428 offset 8192 count 8192 reply Rread tag 1 count 799 'apic); hi = 0; lo = ApicIMASK; for(v = 0; v <= apic->mre; v+' Tread tag 1 fid 428 offset 8991 count 8192 reply Rread tag 1 count 0 '' Tclunk tag 1 fid 428 reply Rclunk tag 1 Twalk tag 1 fid 421 newfid 428 nwname 1 0:apm.c reply Rwalk tag 1 nwqid 1 0:(0000000000142ef5 1 ) Topen tag 1 fid 428 mode 0 Tclunk tag 1 fid 428 reply Ropen tag 1 qid (0000000000142ef5 1 ) iounit 8192 mnt: proc grep 76: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc/apm.c rep 0x1734f70 tag 1 fid 428 T120 R113 rp 1 Twrite tag 1 fid 365 offset 60 count 6 'term% ' reply Rwrite tag 1 count 6 Tread tag 1 fid 364 offset 82 count 512 reply Rread tag 1 count 1 ' ' Twrite tag 1 fid 365 offset 66 count 6 'term% ' reply Rwrite tag 1 count 6 Tread tag 1 fid 364 offset 83 count 512 reply Rread tag 1 count 3 'ls ' Twalk tag 1 fid 421 newfid 428 nwname 1 0:ls reply Rclunk tag 1 mnt: proc rc 78: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc rep 0x1734f70 tag 1 fid 421 T110 R121 rp 1 Twalk tag 1 fid 421 newfid 427 nwname 0 reply Rerror tag 1 ename './sys/src/9/pc/ls' does not exist Twrite tag 1 fid 365 offset 72 count 20 'ls: .: clone failed ' reply Rwrite tag 1 count 20 Twrite tag 1 fid 365 offset 92 count 6 'term% ' reply Rwrite tag 1 count 6 Tread tag 1 fid 364 offset 86 count 512 reply Rread tag 1 count 24 'cat /dev/kmesg > /tmp/x ' Twalk tag 1 fid 421 newfid 433 nwname 1 0:cat reply Rwalk tag 1 nwqid 0 Twalk tag 1 fid 361 newfid 433 nwname 1 0:kmesg reply Rerror tag 1 ename file does not exist ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 21:12 ` ron minnich @ 2009-09-22 21:20 ` ron minnich 0 siblings, 0 replies; 29+ messages in thread From: ron minnich @ 2009-09-22 21:20 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs here's one last "caught in the act" scenario. I have a print in mntralloc when I reuse something. The fid is being read and clunked. But the Tclunk goes out before the Rread comes in. Oops. Reuse 1 Tread tag 1 fid 454 offset 0 count 8192 Reuse 1 Tclunk tag 1 fid 454 reply Rread tag 1 count 8192 '#include "u.h" #include "../port/lib.h" #include "mem.h" #includ' mnt: proc grep 78: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc/apic.c rep 0x168ed20 tag 1 fid 454 T120 R117 rp 1 ^ permalink raw reply [flat|nested] 29+ messages in thread
[parent not found: <<13426df10909221147w665e30adt93b6121281294647@mail.gmail.com>]
* Re: [9fans] 9vx (is this the right list)? import issue [not found] <<13426df10909221147w665e30adt93b6121281294647@mail.gmail.com> @ 2009-09-22 18:51 ` erik quanstrom 0 siblings, 0 replies; 29+ messages in thread From: erik quanstrom @ 2009-09-22 18:51 UTC (permalink / raw) To: 9fans > I don't know. I am not sure the code does either. Since this is only > seen so far in 9vx I am guess it is a 9vx thing. i see this now than then on regular plan 9 Tue Sep 1 18:51:15: unexpected reply tag 51; type 109 Tue Sep 1 18:51:15: unexpected reply tag 16; type 109 Tue Sep 1 18:51:15: unexpected reply tag 39; type 109 Tue Sep 1 18:51:15: unexpected reply tag 51; type 109 Tue Sep 1 18:51:16: unexpected reply tag 51; type 109 Tue Sep 1 18:51:17: unexpected reply tag 39; type 109 - erik ^ permalink raw reply [flat|nested] 29+ messages in thread
* [9fans] 9vx (is this the right list)? import issue @ 2009-09-21 18:29 ron minnich 2009-09-22 5:51 ` Russ Cox 0 siblings, 1 reply; 29+ messages in thread From: ron minnich @ 2009-09-21 18:29 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs If I am in 9vx and have imported a file system from somewhere, and do an ls, and get impatient and hit del, the import dies. term% grep full *.c \x03grep: can't open *.c: '*.c' mount rpc error term% ls ls: .: clone failed term% mnt: proc grep 91: mismatch from /net/tcp/0/data /n/o/usr/rminnich/9k/bgp rep 0x93daf0 tag 1 fid 432 T110 R117 rp 1 mnt: proc rc 93: mismatch from /net/tcp/0/data /n/o/usr/rminnich/9k/bgp rep 0x93daf0 tag 1 fid 432 T110 R121 rp 1 I'm wondering if this is just a simple matter of the note going too far. I am going to look but figure somebody might immediately say "ah ha!" and have a fix. This is a vx built from hg. ron ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-21 18:29 ron minnich @ 2009-09-22 5:51 ` Russ Cox 2009-09-22 17:27 ` ron minnich 0 siblings, 1 reply; 29+ messages in thread From: Russ Cox @ 2009-09-22 5:51 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Monday, September 21, 2009, ron minnich <rminnich@gmail.com> wrote: > If I am in 9vx and have imported a file system from somewhere, and do > an ls, and get impatient and hit del, the import dies. > > term% grep full *.c > grep: can't open *.c: '*.c' mount rpc error > term% ls > ls: .: clone failed > term% > > mnt: proc grep 91: mismatch from /net/tcp/0/data > /n/o/usr/rminnich/9k/bgp rep 0x93daf0 tag 1 fid 432 T110 R117 rp 1 > mnt: proc rc 93: mismatch from /net/tcp/0/data > /n/o/usr/rminnich/9k/bgp rep 0x93daf0 tag 1 fid 432 T110 R121 rp 1 > > I'm wondering if this is just a simple matter of the note going too > far. I am going to look but figure somebody might immediately say "ah > ha!" and have a fix. I think it is. In fact I think drawterm has the same bug except in drawterm it is for some reason harder to trigger. I have no aha! fix for you. Russ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 5:51 ` Russ Cox @ 2009-09-22 17:27 ` ron minnich 2009-09-22 18:21 ` ron minnich 0 siblings, 1 reply; 29+ messages in thread From: ron minnich @ 2009-09-22 17:27 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs OK, a little more fooling around. term% grep full *.c (wait about a second, hit DEL) grep: can't open apic.c: 'apic.c' './sys/src/9/pc/grep' does not exist grep: can't open apm.c: 'apm.c' mount rpc error grep: can't open archmp.c: 'archmp.c' mount rpc error grep: can't open bios32.c: 'bios32.c' mount rpc error grep: can't open cga.c: 'cga.c' mount rpc error etc. in fact I get an error for each file in the directory. I also get one of these for each file. mnt: proc grep 90: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc/devlml.c rep 0x16d7010 tag 3 fid 533 T112 R111 rp 3 mnt: proc grep 90: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc/devlml.c rep 0x16d7010 tag 3 fid 533 T120 R113 rp 3 mnt: proc grep 90: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc rep 0x16d7010 tag 3 fid 524 T110 R121 rp 3 mnt: proc grep 90: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc/devmoipv6.c rep 0x16d7010 tag 3 fid 534 T112 R111 rp 3 mnt: proc grep 90: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc/devmoipv6.c rep 0x16d7010 tag 3 fid 534 T120 R113 rp 3 mnt: proc grep 90: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc rep 0x16d7010 tag 3 fid 524 T110 R121 rp 3 mnt: proc grep 90: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc/devrtc.c rep 0x16d7010 tag 3 fid 535 T112 R111 rp 3 so what is interesting is that the grep did not die. It keeps trying to open files and keeps failing. And then, finally: term% ls ls: .: clone failed term% So, we see lots of guys with the same tag, with a T and R mismatch, with T and R like open and clunk, with the apparent problem that they are all using tag 3. I'm wondering if there is not a problem with the way tags are flushed when there is an interrupt. As an experiment I commented out the free in the freetag code. Better: now I just get this: mnt: proc grep 78: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc/apic.c rep 0x1491c60 tag 2 fid 454 T120 R117 rp 2 after the DEL, TCLUNK gets an RREAD (which makes a sort of sense) and the when I ls . mnt: proc rc 80: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc rep 0x1491c60 tag 2 fid 421 T110 R121 rp 2 Twalk for the ls gets the RCLUNK And from that point on it's all over. Interesting. ron ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 17:27 ` ron minnich @ 2009-09-22 18:21 ` ron minnich 2009-09-22 18:35 ` roger peppe 0 siblings, 1 reply; 29+ messages in thread From: ron minnich @ 2009-09-22 18:21 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs OK, I did this in mntralloc, in the code path in which we reuse the rpc from the mtnalloc.rpcfree: I just simply always allocated a new tag, even if we found an old rpc which was nominally free: I always allocated a new tag, not reusing the old tag. That fixed the problem. So, basically, the way I see it is, grep proc gets an interrupt, kernel will try to flush RPCs which we initiated, we drop the (we think) flushed rpc struct onto the rpcfree list, but the reply from the server is still in flight. We reuse the rpc from rpcfree list, we send out a new T, with the same tag as the previous one which we think we flushed, we get the reply from the earlier RPC, tags match, R does not match T, bad day. This is the same code as is in plan 9. There's harder and harder ways to deal with this, I have some ideas but I expect some of the folks on this list to have better ones. The simplest one that would probably work is to avoid reusing tags quite so quickly. Free tags, yes, but use a counter to indicate "next tag to use", so that it's a relatively long time before a tag for a mount point is reused again. So, e.g., we free tag 3, but the next tag we allocate is tag 4, and so on. Sooner or later we'll use tag 3 again, likely long after any messages in flight have been retired. (weirdly enough, I saw a trick like this used in hardware many years ago ...) That may not be good enough, not sure. It's definitely pretty easy to implement. What I observed is that when this is all working, tag 3 is the only tag ever used! ron ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 18:21 ` ron minnich @ 2009-09-22 18:35 ` roger peppe 2009-09-22 18:47 ` ron minnich 0 siblings, 1 reply; 29+ messages in thread From: roger peppe @ 2009-09-22 18:35 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2009/9/22 ron minnich <rminnich@gmail.com>: > So, basically, the way I see it is, grep proc gets an interrupt, > kernel will try to flush RPCs which we initiated, we drop the (we > think) flushed rpc struct onto the rpcfree list, but the reply from > the server is still in flight. We reuse the rpc from rpcfree list, we > send out a new T, with the same tag as the previous one which we think > we flushed, we get the reply from the earlier RPC, tags match, R does > not match T, bad day. surely the correct way to go about this (caveat: i haven't looked at the code) is to drop the rpc struct onto the rpcfree list only when the Rflush is received? from experience with writing heavily used 9p services, getting flush properly right is a bitch. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 18:35 ` roger peppe @ 2009-09-22 18:47 ` ron minnich 2009-09-22 18:58 ` roger peppe 2009-09-22 19:08 ` Eric Van Hensbergen 0 siblings, 2 replies; 29+ messages in thread From: ron minnich @ 2009-09-22 18:47 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Tue, Sep 22, 2009 at 11:35 AM, roger peppe <rogpeppe@gmail.com> wrote: > surely the correct way to go about this (caveat: i haven't looked at the code) > is to drop the rpc struct onto the rpcfree list only when the Rflush is > received? you just got an Eintr. Did the request get sent? I don't know. I am not sure the code does either. Since this is only seen so far in 9vx I am guess it is a 9vx thing. ron ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 18:47 ` ron minnich @ 2009-09-22 18:58 ` roger peppe 2009-09-22 19:08 ` Eric Van Hensbergen 1 sibling, 0 replies; 29+ messages in thread From: roger peppe @ 2009-09-22 18:58 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2009/9/22 ron minnich <rminnich@gmail.com>: > On Tue, Sep 22, 2009 at 11:35 AM, roger peppe <rogpeppe@gmail.com> wrote: > >> surely the correct way to go about this (caveat: i haven't looked at the code) >> is to drop the rpc struct onto the rpcfree list only when the Rflush is >> received? > > you just got an Eintr. Did the request get sent? doesn't Eintr mean that the write did not complete? if it's ambiguous, then the tag should indeed be put on hold, because there's no way to get it right. it would be useful to see a log of the actual 9p messages. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [9fans] 9vx (is this the right list)? import issue 2009-09-22 18:47 ` ron minnich 2009-09-22 18:58 ` roger peppe @ 2009-09-22 19:08 ` Eric Van Hensbergen 1 sibling, 0 replies; 29+ messages in thread From: Eric Van Hensbergen @ 2009-09-22 19:08 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sep 22, 2009, at 1:47 PM, ron minnich wrote: > On Tue, Sep 22, 2009 at 11:35 AM, roger peppe <rogpeppe@gmail.com> > wrote: > >> surely the correct way to go about this (caveat: i haven't looked >> at the code) >> is to drop the rpc struct onto the rpcfree list only when the >> Rflush is >> received? > > you just got an Eintr. Did the request get sent? > > I don't know. I am not sure the code does either. Since this is only > seen so far in 9vx I am guess it is a 9vx thing. > I believe it should - at least that's the behavior we went for in v9fs. -eric ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2009-09-23 22:13 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <<dd6fe68a0909222011u4243953dged01d77ecdc93e46@mail.gmail.com> 2009-09-23 3:17 ` [9fans] 9vx (is this the right list)? import issue erik quanstrom 2009-09-23 4:11 ` Russ Cox [not found] <<dd6fe68a0909222111y1af0f4a2qd30a3b4eded30b2b@mail.gmail.com> 2009-09-23 4:56 ` erik quanstrom 2009-09-23 18:52 ` Russ Cox 2009-09-23 19:12 ` ron minnich 2009-09-23 19:25 ` erik quanstrom 2009-09-23 19:26 ` Russ Cox 2009-09-23 20:33 ` ron minnich 2009-09-23 20:35 ` ron minnich 2009-09-23 22:13 ` Russ Cox 2009-09-23 21:25 ` erik quanstrom [not found] <<fadaba2046122acf656140c0618e1d1e@ladd.quanstro.net> 2009-09-23 2:41 ` erik quanstrom 2009-09-23 3:11 ` Russ Cox [not found] <<13426df10909221532t5de9f010pfeb2ca2c3b44db89@mail.gmail.com> 2009-09-23 2:36 ` erik quanstrom [not found] <<13426df10909221420x1298139fhdeb4f0803924e5a3@mail.gmail.com> 2009-09-22 21:23 ` erik quanstrom 2009-09-22 22:32 ` ron minnich [not found] <<df49a7370909221158u3f071cc3j125c85241c5088e6@mail.gmail.com> 2009-09-22 19:01 ` erik quanstrom 2009-09-22 19:34 ` roger peppe 2009-09-22 21:12 ` ron minnich 2009-09-22 21:20 ` ron minnich [not found] <<13426df10909221147w665e30adt93b6121281294647@mail.gmail.com> 2009-09-22 18:51 ` erik quanstrom 2009-09-21 18:29 ron minnich 2009-09-22 5:51 ` Russ Cox 2009-09-22 17:27 ` ron minnich 2009-09-22 18:21 ` ron minnich 2009-09-22 18:35 ` roger peppe 2009-09-22 18:47 ` ron minnich 2009-09-22 18:58 ` roger peppe 2009-09-22 19:08 ` Eric Van Hensbergen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).