[9fans] 9vx (is this the right list)? import issue

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] 9vx (is this the right list)? import issue
@ 2009-09-21 18:29 ron minnich
  2009-09-22  5:51 ` Russ Cox
  0 siblings, 1 reply; 29+ messages in thread
From: ron minnich @ 2009-09-21 18:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

If I am in 9vx and have imported a file system from somewhere, and do
an ls, and get impatient and hit del, the import dies.

term% grep full *.c
\x03grep: can't open *.c: '*.c' mount rpc error
term% ls
ls: .: clone failed
term%

mnt: proc grep 91: mismatch from /net/tcp/0/data
/n/o/usr/rminnich/9k/bgp rep 0x93daf0 tag 1 fid 432 T110 R117 rp 1
mnt: proc rc 93: mismatch from /net/tcp/0/data
/n/o/usr/rminnich/9k/bgp rep 0x93daf0 tag 1 fid 432 T110 R121 rp 1

I'm wondering if this is just a simple matter of the note going too
far. I am going to look but figure somebody might immediately say "ah
ha!" and have a fix.

This is a vx built from hg.

ron

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-21 18:29 [9fans] 9vx (is this the right list)? import issue ron minnich
@ 2009-09-22  5:51 ` Russ Cox
  2009-09-22 17:27   ` ron minnich
  0 siblings, 1 reply; 29+ messages in thread
From: Russ Cox @ 2009-09-22  5:51 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Monday, September 21, 2009, ron minnich <rminnich@gmail.com> wrote:
> If I am in 9vx and have imported a file system from somewhere, and do
> an ls, and get impatient and hit del, the import dies.
>
> term% grep full *.c
>  grep: can't open *.c: '*.c' mount rpc error
> term% ls
> ls: .: clone failed
> term%
>
> mnt: proc grep 91: mismatch from /net/tcp/0/data
> /n/o/usr/rminnich/9k/bgp rep 0x93daf0 tag 1 fid 432 T110 R117 rp 1
> mnt: proc rc 93: mismatch from /net/tcp/0/data
> /n/o/usr/rminnich/9k/bgp rep 0x93daf0 tag 1 fid 432 T110 R121 rp 1
>
> I'm wondering if this is just a simple matter of the note going too
> far. I am going to look but figure somebody might immediately say "ah
> ha!" and have a fix.

I think it is.  In fact I think drawterm has the same bug
except in drawterm it is for some reason harder to trigger.
I have no aha! fix for you.

Russ

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22  5:51 ` Russ Cox
@ 2009-09-22 17:27   ` ron minnich
  2009-09-22 18:21     ` ron minnich
  0 siblings, 1 reply; 29+ messages in thread
From: ron minnich @ 2009-09-22 17:27 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

OK, a little more fooling around.

term% grep full *.c

(wait about a second, hit DEL)
grep: can't open apic.c: 'apic.c' './sys/src/9/pc/grep' does not exist
grep: can't open apm.c: 'apm.c' mount rpc error
grep: can't open archmp.c: 'archmp.c' mount rpc error
grep: can't open bios32.c: 'bios32.c' mount rpc error
grep: can't open cga.c: 'cga.c' mount rpc error

etc. in fact I get an error for each file in the directory.

I also get one of these for each file.
mnt: proc grep 90: mismatch from /net/tcp/0/data
/n/o/sys/src/9/pc/devlml.c rep 0x16d7010 tag 3 fid 533 T112 R111 rp 3
mnt: proc grep 90: mismatch from /net/tcp/0/data
/n/o/sys/src/9/pc/devlml.c rep 0x16d7010 tag 3 fid 533 T120 R113 rp 3
mnt: proc grep 90: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc rep
0x16d7010 tag 3 fid 524 T110 R121 rp 3
mnt: proc grep 90: mismatch from /net/tcp/0/data
/n/o/sys/src/9/pc/devmoipv6.c rep 0x16d7010 tag 3 fid 534 T112 R111 rp
3
mnt: proc grep 90: mismatch from /net/tcp/0/data
/n/o/sys/src/9/pc/devmoipv6.c rep 0x16d7010 tag 3 fid 534 T120 R113 rp
3
mnt: proc grep 90: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc rep
0x16d7010 tag 3 fid 524 T110 R121 rp 3
mnt: proc grep 90: mismatch from /net/tcp/0/data
/n/o/sys/src/9/pc/devrtc.c rep 0x16d7010 tag 3 fid 535 T112 R111 rp 3

so what is interesting is that the grep did not die. It keeps trying
to open files and keeps failing. And then, finally:

term% ls
ls: .: clone failed
term%

So, we see lots of guys with the same tag, with a T and R mismatch,
with T and R like open and clunk, with the apparent problem that they
are all using tag 3.

I'm wondering if there is not a problem with the way tags are flushed
when there is an interrupt.

As an experiment I commented out the free in the freetag code.

Better: now I just get this:
mnt: proc grep 78: mismatch from /net/tcp/0/data
/n/o/sys/src/9/pc/apic.c rep 0x1491c60 tag 2 fid 454 T120 R117 rp 2
after the DEL, TCLUNK gets an RREAD (which makes a sort of sense)

and the when I ls .

mnt: proc rc 80: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc rep
0x1491c60 tag 2 fid 421 T110 R121 rp 2

Twalk for the ls gets the RCLUNK

And from that point on it's all over.


Interesting.

ron



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22 17:27   ` ron minnich
@ 2009-09-22 18:21     ` ron minnich
  2009-09-22 18:35       ` roger peppe
  0 siblings, 1 reply; 29+ messages in thread
From: ron minnich @ 2009-09-22 18:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

OK, I did this in mntralloc, in the code path in which we reuse the
rpc from the mtnalloc.rpcfree: I just simply always allocated a new
tag, even if we found an old rpc which was nominally free: I always
allocated a new tag, not reusing the old tag.

That fixed the problem.

So, basically, the way I see it is, grep proc gets an interrupt,
kernel will try to flush RPCs which we initiated, we drop the (we
think) flushed rpc struct onto the rpcfree list, but the reply from
the server is still in flight. We reuse the rpc from rpcfree list, we
send out a new T, with the same tag as the previous one which we think
we flushed, we get the reply from the earlier RPC, tags match, R does
not match T, bad day.

This is the same code as is in plan 9.

There's harder and harder ways to deal with this, I have some ideas
but I expect some of the folks on this list to have better ones. The
simplest one that would probably work is to avoid reusing tags quite
so quickly. Free tags, yes, but use a counter to indicate "next tag to
use", so that it's a relatively long time before a tag for a mount
point is reused again. So, e.g., we free tag 3, but the next tag we
allocate is tag 4, and so on. Sooner or later we'll use tag 3 again,
likely long after any messages in flight have been retired. (weirdly
enough, I saw a trick like this used in hardware many years ago ...)

That may not be good enough, not sure. It's definitely pretty easy to
implement.

What I observed is that when this is all working, tag 3 is the only
tag ever used!

ron

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22 18:21     ` ron minnich
@ 2009-09-22 18:35       ` roger peppe
  2009-09-22 18:47         ` ron minnich
  0 siblings, 1 reply; 29+ messages in thread
From: roger peppe @ 2009-09-22 18:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/9/22 ron minnich <rminnich@gmail.com>:
> So, basically, the way I see it is, grep proc gets an interrupt,
> kernel will try to flush RPCs which we initiated, we drop the (we
> think) flushed rpc struct onto the rpcfree list, but the reply from
> the server is still in flight. We reuse the rpc from rpcfree list, we
> send out a new T, with the same tag as the previous one which we think
> we flushed, we get the reply from the earlier RPC, tags match, R does
> not match T, bad day.

surely the correct way to go about this (caveat: i haven't looked at the code)
is to drop the rpc struct onto the rpcfree list only when the Rflush is
received?

from experience with writing heavily used 9p services, getting flush properly
right is a bitch.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22 18:35       ` roger peppe
@ 2009-09-22 18:47         ` ron minnich
  2009-09-22 18:58           ` roger peppe
  2009-09-22 19:08           ` Eric Van Hensbergen
  0 siblings, 2 replies; 29+ messages in thread
From: ron minnich @ 2009-09-22 18:47 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, Sep 22, 2009 at 11:35 AM, roger peppe <rogpeppe@gmail.com> wrote:

> surely the correct way to go about this (caveat: i haven't looked at the code)
> is to drop the rpc struct onto the rpcfree list only when the Rflush is
> received?

you just got an Eintr. Did the request get sent?

I don't know. I am not sure the code does either. Since this is only
seen so far in 9vx I am guess it is a 9vx thing.

ron



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22 18:47         ` ron minnich
@ 2009-09-22 18:58           ` roger peppe
  2009-09-22 19:08           ` Eric Van Hensbergen
  1 sibling, 0 replies; 29+ messages in thread
From: roger peppe @ 2009-09-22 18:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/9/22 ron minnich <rminnich@gmail.com>:
> On Tue, Sep 22, 2009 at 11:35 AM, roger peppe <rogpeppe@gmail.com> wrote:
>
>> surely the correct way to go about this (caveat: i haven't looked at the code)
>> is to drop the rpc struct onto the rpcfree list only when the Rflush is
>> received?
>
> you just got an Eintr. Did the request get sent?

doesn't Eintr mean that the write did not complete?

if it's ambiguous, then the tag should indeed be put on hold,
because there's no way to get it right.

it would be useful to see a log of the actual 9p messages.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22 18:47         ` ron minnich
  2009-09-22 18:58           ` roger peppe
@ 2009-09-22 19:08           ` Eric Van Hensbergen
  1 sibling, 0 replies; 29+ messages in thread
From: Eric Van Hensbergen @ 2009-09-22 19:08 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sep 22, 2009, at 1:47 PM, ron minnich wrote:

> On Tue, Sep 22, 2009 at 11:35 AM, roger peppe <rogpeppe@gmail.com>
> wrote:
>
>> surely the correct way to go about this (caveat: i haven't looked
>> at the code)
>> is to drop the rpc struct onto the rpcfree list only when the
>> Rflush is
>> received?
>
> you just got an Eintr. Did the request get sent?
>
> I don't know. I am not sure the code does either. Since this is only
> seen so far in 9vx I am guess it is a 9vx thing.
>

I believe it should - at least that's the behavior we went for in v9fs.

        -eric




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23 20:35           ` ron minnich
@ 2009-09-23 22:13             ` Russ Cox
  0 siblings, 0 replies; 29+ messages in thread
From: Russ Cox @ 2009-09-23 22:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wednesday, September 23, 2009, ron minnich <rminnich@gmail.com> wrote:
> not having done this before, the reference is URL:
> http://codereview.appspot.com/122046

Thanks, Ron.  Fix applied.

Russ


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23 18:52   ` Russ Cox
  2009-09-23 19:12     ` ron minnich
@ 2009-09-23 21:25     ` erik quanstrom
  1 sibling, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2009-09-23 21:25 UTC (permalink / raw)
  To: 9fans

On Wed Sep 23 14:54:47 EDT 2009, rsc@swtch.com wrote:
> > how sure are we that 1 holds?  couldn't there be other,
> > legitimate and transient errors?  could a user-delivered
> > note sneak in and confuse the issue?
>
> no.  at least not if the kernel is working properly.
> that's why i said devmnt should enforce the assumption.
> it's at most a couple lines of extra code,
> whereas the diff you posted was quite a bit longer.

it seems to be a big assumption that the whole ip stack
and the ethernet driver know the difference between
being interrupted before sending or queuing the packet
or after.

the comment in qbwrite seems to say that you can
get interrupted after the pkt has been queued.

my approach has the advantage of sidestepping this
problem.

i don't think 13 changed lines is unreasonable.
(without verbose debugging.)

- erik

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23 20:33         ` ron minnich
@ 2009-09-23 20:35           ` ron minnich
  2009-09-23 22:13             ` Russ Cox
  0 siblings, 1 reply; 29+ messages in thread
From: ron minnich @ 2009-09-23 20:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

not having done this before, the reference is URL:
http://codereview.appspot.com/122046

ron



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23 19:26       ` Russ Cox
@ 2009-09-23 20:33         ` ron minnich
  2009-09-23 20:35           ` ron minnich
  0 siblings, 1 reply; 29+ messages in thread
From: ron minnich @ 2009-09-23 20:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

sent in via codereview but I missed a debug print I left in. Sorry. I
assume you can reject it so I'm sending the correct patch.

ron



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23 19:12     ` ron minnich
  2009-09-23 19:25       ` erik quanstrom
@ 2009-09-23 19:26       ` Russ Cox
  2009-09-23 20:33         ` ron minnich
  1 sibling, 1 reply; 29+ messages in thread
From: Russ Cox @ 2009-09-23 19:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I think that oserrstr should be fixed.
You want "interrupted" to be the error string
for many places not just right here.
Other programs look for strstr(error, "interrupt") for example

Russ

On Wednesday, September 23, 2009, ron minnich <rminnich@gmail.com> wrote:
> OK, so what happens in 9vx.
>
> mount sources
> cd /n/blah/blah
> grep full *.c
> hit DEL
> devip.c read on Qdata fails, and we do this:
>                 if(r < 0){
>                         oserrstr();
>                         nexterror();
>                 }
>
> So just need to fix oserrstr() or fix this in devip itself? I vote
> oserrstr, lucho votes fix this little bit of
> code.
>
> Anyway, there it is.
>
> We're watching this talk on nested VMs on the x86 machines. Oops.
> hardware botch. You have to do strange things to make it all work. I
> can't believe nobody read the IBM papers before they designed this
> stuff in.
>
> ron
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23 19:12     ` ron minnich
@ 2009-09-23 19:25       ` erik quanstrom
  2009-09-23 19:26       ` Russ Cox
  1 sibling, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2009-09-23 19:25 UTC (permalink / raw)
  To: 9fans

> So just need to fix oserrstr() or fix this in devip itself? I vote
> oserrstr, lucho votes fix this little bit of
> code.

how many other errors are lurking in osstrerror()?
there are lots of assumptions about the exact
errstrs.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23 18:52   ` Russ Cox
@ 2009-09-23 19:12     ` ron minnich
  2009-09-23 19:25       ` erik quanstrom
  2009-09-23 19:26       ` Russ Cox
  2009-09-23 21:25     ` erik quanstrom
  1 sibling, 2 replies; 29+ messages in thread
From: ron minnich @ 2009-09-23 19:12 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

OK, so what happens in 9vx.

mount sources
cd /n/blah/blah
grep full *.c
hit DEL
devip.c read on Qdata fails, and we do this:
                if(r < 0){
                        oserrstr();
                        nexterror();
                }

So just need to fix oserrstr() or fix this in devip itself? I vote
oserrstr, lucho votes fix this little bit of
code.

Anyway, there it is.

We're watching this talk on nested VMs on the x86 machines. Oops.
hardware botch. You have to do strange things to make it all work. I
can't believe nobody read the IBM papers before they designed this
stuff in.

ron

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23  4:56 ` erik quanstrom
@ 2009-09-23 18:52   ` Russ Cox
  2009-09-23 19:12     ` ron minnich
  2009-09-23 21:25     ` erik quanstrom
  0 siblings, 2 replies; 29+ messages in thread
From: Russ Cox @ 2009-09-23 18:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> how sure are we that 1 holds?  couldn't there be other,
> legitimate and transient errors?  could a user-delivered
> note sneak in and confuse the issue?

no.  at least not if the kernel is working properly.
that's why i said devmnt should enforce the assumption.
it's at most a couple lines of extra code,
whereas the diff you posted was quite a bit longer.

this is a simplifying assumption in the code,
so called because it simplifies the code.  if you
throw away the assumption, you throw away the
simplicity, and not just here.  rather than throw
away the simplicity, work to understand why the
assumption is being violated (in 9vx it is the bogus
spelling of "interrupted") and fix the violation instead.

russ

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
       [not found] <<dd6fe68a0909222111y1af0f4a2qd30a3b4eded30b2b@mail.gmail.com>
@ 2009-09-23  4:56 ` erik quanstrom
  2009-09-23 18:52   ` Russ Cox
  0 siblings, 1 reply; 29+ messages in thread
From: erik quanstrom @ 2009-09-23  4:56 UTC (permalink / raw)
  To: 9fans

> I mean that the code as written is assuming that if a read or write
> errors out, it can only happen for one of two reasons:
> 1) there was an interrupt note, in which case strcmp(error, Eintr) == 0
> 2) there has been an error on the 9P connection, in which case
>    strcmp(error, Eintr) != 0 and the connection will never work again.
>
> My suggestion is to enforce #2: if a non-interrupt error happens,
> mark the connection so that the kernel won't even try to use it
> again.
>
> Separately, you might investigate what error is happening that
> violates the assumption above.  In 9vx, it is easy: case #1 happened
> but the error was spelled wrong.

how sure are we that 1 holds?  couldn't there be other,
legitimate and transient errors?  could a user-delivered
note sneak in and confuse the issue?

the problem with my solution is that it could leak tags.
i don't see this as a significant problem, but i could be
wrong.  i think the connection would need to be pretty
broken for tags to be leaked.

marking connections dead also adds tracking, but in
a new place.  it could have trouble if ever a transient error
happens when strcmp(error, Eintr) == 0, which can
happen in 9vx or dt.

- erik

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23  3:17 ` erik quanstrom
@ 2009-09-23  4:11   ` Russ Cox
  0 siblings, 0 replies; 29+ messages in thread
From: Russ Cox @ 2009-09-23  4:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tuesday, September 22, 2009, erik quanstrom <quanstro@quanstro.net> wrote:
> On Tue Sep 22 23:12:27 EDT 2009, rsc@swtch.com wrote:
>> The extra tracking that has been proposed is unnecessary,
>> and waiting for the Rflush doesn't make sense.  The assumption
>> is that the Rflush isn't ever going to arrive, because the connection
>> is dead.
>
> what do you mean by "dead"?  i/o to the same channel works
> fine.

I mean that the code as written is assuming that if a read or write
errors out, it can only happen for one of two reasons:
1) there was an interrupt note, in which case strcmp(error, Eintr) == 0
2) there has been an error on the 9P connection, in which case
   strcmp(error, Eintr) != 0 and the connection will never work again.

My suggestion is to enforce #2: if a non-interrupt error happens,
mark the connection so that the kernel won't even try to use it
again.

Separately, you might investigate what error is happening that
violates the assumption above.  In 9vx, it is easy: case #1 happened
but the error was spelled wrong.

Russ

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
       [not found] <<dd6fe68a0909222011u4243953dged01d77ecdc93e46@mail.gmail.com>
@ 2009-09-23  3:17 ` erik quanstrom
  2009-09-23  4:11   ` Russ Cox
  0 siblings, 1 reply; 29+ messages in thread
From: erik quanstrom @ 2009-09-23  3:17 UTC (permalink / raw)
  To: 9fans

On Tue Sep 22 23:12:27 EDT 2009, rsc@swtch.com wrote:
> The extra tracking that has been proposed is unnecessary,
> and waiting for the Rflush doesn't make sense.  The assumption
> is that the Rflush isn't ever going to arrive, because the connection
> is dead.

what do you mean by "dead"?  i/o to the same channel works
fine.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-23  2:41 ` erik quanstrom
@ 2009-09-23  3:11   ` Russ Cox
  0 siblings, 0 replies; 29+ messages in thread
From: Russ Cox @ 2009-09-23  3:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

The extra tracking that has been proposed is unnecessary,
and waiting for the Rflush doesn't make sense.  The assumption
is that the Rflush isn't ever going to arrive, because the connection
is dead.

The problem is here:

void
mountio(Mnt *m, Mntrpc *r)
{
	int n;

	while(waserror()) {
		if(m->rip == up)
			mntgate(m);
		if(strcmp(up->errstr, Eintr) != 0){
			mntflushfree(m, r);    <<<<
			nexterror();
		}
		r = mntflushalloc(r, m->msize);
	}

The implicit assumption is that if reading from the mounted
connection gets any error other than Eintr, the connection
is dead and will never receive another message.  The call to
mntflushfree cleanly tears down the messages this proc
is waiting for by behaving as if the flush responses had come
back in.

In 9vx, the problem is that the errstr is "Interrupted system call"
(the Unix string for errno EINTR) instead of Eintr == "interrupted".
The fix is to correct whatever has translated EINTR to
"Interrupted system call" to use the correct string.
Drawterm probably has the same issue and the same fix.
There are fewer interrupts flying around in drawterm.

The kernel may even have the same issue, if a mounted
connection can get an error (other than "interrupted")
out of read or write but then work at the next call.
A way to avoid this problem in the future is to mark the
mnt (m) as dead, so that no other procs will try to read
from the connection and get confused.

Russ

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
       [not found] <<fadaba2046122acf656140c0618e1d1e@ladd.quanstro.net>
@ 2009-09-23  2:41 ` erik quanstrom
  2009-09-23  3:11   ` Russ Cox
  0 siblings, 1 reply; 29+ messages in thread
From: erik quanstrom @ 2009-09-23  2:41 UTC (permalink / raw)
  To: 9fans

full versions in
/n/sources/contrib/quanstro/devmnt.c
/n/sources/contrib/quanstro/vx32devmnt.c

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
       [not found] <<13426df10909221532t5de9f010pfeb2ca2c3b44db89@mail.gmail.com>
@ 2009-09-23  2:36 ` erik quanstrom
  0 siblings, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2009-09-23  2:36 UTC (permalink / raw)
  To: 9fans

ron,

this works for me but my symptoms were a little different than yours.
before:
	mnt: proc cat 290: mismatch from #D/ssl/1/data /n/coraid/lib/unicode rep 0x7fcd8c04e190 tag 4 fid 1603 T120 R117 rp 4
after:
	WOOT! caught stale reply 6; type 117

note: the poor organization of this patch is geared toward keeping the
diff small. tagallocd and freetag should be moved above mountmux.

i didn't use russ' ed scripts because i was too lazy.

- erik

9vx version

; ; diff -c devmnt.c devmnt.c~
devmnt.c:945,954 - devmnt.c~:945,951
  void
  mountmux(Mnt *m, Mntrpc *r)
  {
- 	int bad;
  	Mntrpc **l, *q;
- 	int tagallocd(int);
- 	void freetag(int);

  	lock(&m->lk);
  	l = &m->queue;
devmnt.c:977,992 - devmnt.c~:974,981
  		}
  		l = &q->list;
  	}
- 	bad = 1;
- 	if(tagallocd(r->reply.tag)){
- 		freetag(r->reply.tag);
- 		bad = 0;
- 	}
  	unlock(&m->lk);
- 	if(bad)
- 		print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type);
- 	else
- 		print("WOOT! caught stale reply %ud; type %d\n", r->reply.tag, r->reply.type);
+ 	print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type);
  }

  /*
devmnt.c:1054,1065 - devmnt.c~:1043,1048
  	return NOTAG;
  }

- int
- tagallocd(int t)
- {
- 	return mntalloc.tagmask[t>>TAGSHIFT] & 1<<(t&TAGMASK);
- }
-
  void
  freetag(int t)
  {
devmnt.c:1125,1136 - devmnt.c~:1108,1116
  	if(mntalloc.nrpcfree >= 10){
  		free(r->rpc);
  		free(r);
- 		if(r->done != 2)
- 			freetag(r->request.tag);
+ 		freetag(r->request.tag);
  	}
  	else{
- 		if(r->done == 2)
- 			r->request.tag = alloctag();
  		r->list = mntalloc.rpcfree;
  		mntalloc.rpcfree = r;
  		mntalloc.nrpcfree++;
devmnt.c:1145,1151 - devmnt.c~:1125,1131
  	Mntrpc **l, *f;

  	lock(&m->lk);
- 	r->done = 2;
+ 	r->done = 1;

  	l = &m->queue;
  	for(f = *l; f; f = f->list) {

plan 9 version

; diffy -c devmnt.c
/n/dump/2009/0922/sys/src/9/port/devmnt.c:932,938 - devmnt.c:932,941
  void
  mountmux(Mnt *m, Mntrpc *r)
  {
+ 	int bad;
  	Mntrpc **l, *q;
+ 	int tagallocd(int);
+ 	void freetag(int);

  	lock(m);
  	l = &m->queue;
/n/dump/2009/0922/sys/src/9/port/devmnt.c:961,968 - devmnt.c:964,977
  		}
  		l = &q->list;
  	}
+ 	bad = 1;
+ 	if(tagallocd(r->reply.tag)){
+ 		freetag(r->reply.tag);
+ 		bad = 0;
+ 	}
  	unlock(m);
- 	print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type);
+ 	if(bad)
+ 		print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type);
  }

  /*
/n/dump/2009/0922/sys/src/9/port/devmnt.c:1030,1035 - devmnt.c:1039,1050
  	return NOTAG;
  }

+ int
+ tagallocd(int t)
+ {
+ 	return mntalloc.tagmask[t>>TAGSHIFT] & 1<<(t&TAGMASK);
+ }
+
  void
  freetag(int t)
  {
/n/dump/2009/0922/sys/src/9/port/devmnt.c:1095,1103 - devmnt.c:1110,1121
  	if(mntalloc.nrpcfree >= 10){
  		free(r->rpc);
  		free(r);
- 		freetag(r->request.tag);
+ 		if(r->done != 2)
+ 			freetag(r->request.tag);
  	}
  	else{
+ 		if(r->done == 2)
+ 			r->request.tag = alloctag();
  		r->list = mntalloc.rpcfree;
  		mntalloc.rpcfree = r;
  		mntalloc.nrpcfree++;
/n/dump/2009/0922/sys/src/9/port/devmnt.c:1112,1118 - devmnt.c:1130,1136
  	Mntrpc **l, *f;

  	lock(m);
- 	r->done = 1;
+ 	r->done = 2;

  	l = &m->queue;
  	for(f = *l; f; f = f->list) {


------------------------------------------------------------------
plan 9 version

; - diffy
diffy -c devmnt.c
/n/dump/2009/0922/sys/src/9/port/devmnt.c:932,938 - devmnt.c:932,941
  void
  mountmux(Mnt *m, Mntrpc *r)
  {
+ 	int bad;
  	Mntrpc **l, *q;
+ 	int tagallocd(int);
+ 	void freetag(int);

  	lock(m);
  	l = &m->queue;
/n/dump/2009/0922/sys/src/9/port/devmnt.c:961,968 - devmnt.c:964,979
  		}
  		l = &q->list;
  	}
+ 	bad = 1;
+ 	if(tagallocd(r->reply.tag)){
+ 		freetag(r->reply.tag);
+ 		bad = 0;
+ 	}
  	unlock(m);
- 	print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type);
+ 	if(bad)
+ 		print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type);
+ 	else
+ 		print("WOOT! caught stale reply %ud; type %d\n", r->reply.tag, r->reply.type);
  }

  /*
/n/dump/2009/0922/sys/src/9/port/devmnt.c:1030,1035 - devmnt.c:1041,1052
  	return NOTAG;
  }

+ int
+ tagallocd(int t)
+ {
+ 	return mntalloc.tagmask[t>>TAGSHIFT] & 1<<(t&TAGMASK);
+ }
+
  void
  freetag(int t)
  {
/n/dump/2009/0922/sys/src/9/port/devmnt.c:1095,1103 - devmnt.c:1112,1123
  	if(mntalloc.nrpcfree >= 10){
  		free(r->rpc);
  		free(r);
- 		freetag(r->request.tag);
+ 		if(r->done != 2)
+ 			freetag(r->request.tag);
  	}
  	else{
+ 		if(r->done == 2)
+ 			r->request.tag = alloctag();
  		r->list = mntalloc.rpcfree;
  		mntalloc.rpcfree = r;
  		mntalloc.nrpcfree++;
/n/dump/2009/0922/sys/src/9/port/devmnt.c:1112,1118 - devmnt.c:1132,1138
  	Mntrpc **l, *f;

  	lock(m);
- 	r->done = 1;
+ 	r->done = 2;

  	l = &m->queue;
  	for(f = *l; f; f = f->list) {



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22 21:23 ` erik quanstrom
@ 2009-09-22 22:32   ` ron minnich
  0 siblings, 0 replies; 29+ messages in thread
From: ron minnich @ 2009-09-22 22:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, Sep 22, 2009 at 2:23 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> On Tue Sep 22 17:22:08 EDT 2009, rminnich@gmail.com wrote:
>> here's one last "caught in the act" scenario. I have a print in
>> mntralloc when I reuse something.
>>
>> The fid is being read and clunked. But the Tclunk goes out before the
>> Rread comes in. Oops.
>>
>>
>> Reuse 1
>> Tread tag 1 fid 454 offset 0 count 8192
>> Reuse 1
>> Tclunk tag 1 fid 454
>> reply Rread tag 1 count 8192 '#include "u.h"
>> #include "../port/lib.h"
>> #include "mem.h"
>> #includ'
>> mnt: proc grep 78: mismatch from /net/tcp/0/data
>> /n/o/sys/src/9/pc/apic.c rep 0x168ed20 tag 1 fid 454 T120 R117 rp 1
>
> you're still triggering this with a note?

DEL in rio.

ron



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
       [not found] <<13426df10909221420x1298139fhdeb4f0803924e5a3@mail.gmail.com>
@ 2009-09-22 21:23 ` erik quanstrom
  2009-09-22 22:32   ` ron minnich
  0 siblings, 1 reply; 29+ messages in thread
From: erik quanstrom @ 2009-09-22 21:23 UTC (permalink / raw)
  To: 9fans

On Tue Sep 22 17:22:08 EDT 2009, rminnich@gmail.com wrote:
> here's one last "caught in the act" scenario. I have a print in
> mntralloc when I reuse something.
>
> The fid is being read and clunked. But the Tclunk goes out before the
> Rread comes in. Oops.
>
>
> Reuse 1
> Tread tag 1 fid 454 offset 0 count 8192
> Reuse 1
> Tclunk tag 1 fid 454
> reply Rread tag 1 count 8192 '#include "u.h"
> #include "../port/lib.h"
> #include "mem.h"
> #includ'
> mnt: proc grep 78: mismatch from /net/tcp/0/data
> /n/o/sys/src/9/pc/apic.c rep 0x168ed20 tag 1 fid 454 T120 R117 rp 1

you're still triggering this with a note?

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22 21:12     ` ron minnich
@ 2009-09-22 21:20       ` ron minnich
  0 siblings, 0 replies; 29+ messages in thread
From: ron minnich @ 2009-09-22 21:20 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

here's one last "caught in the act" scenario. I have a print in
mntralloc when I reuse something.

The fid is being read and clunked. But the Tclunk goes out before the
Rread comes in. Oops.


Reuse 1
Tread tag 1 fid 454 offset 0 count 8192
Reuse 1
Tclunk tag 1 fid 454
reply Rread tag 1 count 8192 '#include "u.h"
#include "../port/lib.h"
#include "mem.h"
#includ'
mnt: proc grep 78: mismatch from /net/tcp/0/data
/n/o/sys/src/9/pc/apic.c rep 0x168ed20 tag 1 fid 454 T120 R117 rp 1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22 19:34   ` roger peppe
@ 2009-09-22 21:12     ` ron minnich
  2009-09-22 21:20       ` ron minnich
  0 siblings, 1 reply; 29+ messages in thread
From: ron minnich @ 2009-09-22 21:12 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 156 bytes --]

here is a 9p trace of the problem. See line 43. Topen and Tclunk go
out with same tag. This is with a print in the rpc code as suggested
by Russ.

ron

[-- Attachment #2: y --]
[-- Type: application/octet-stream, Size: 3026 bytes --]

Twrite tag 1 fid 365 offset 54 count 6 'term% '
reply Rwrite tag 1 count 6
Tread tag 1 fid 364 offset 68 count 512
reply Rread tag 1 count 14 'grep full *.c
'
Twalk tag 1 fid 421 newfid 426 nwname 0 
reply Rwalk tag 1 nwqid 0 
Topen tag 1 fid 426 mode 0
reply Ropen tag 1 qid (0000000000142ef2 318 d) iounit 8192 
Tstat tag 1 fid 426
reply Rstat tag 1  stat 'pc' 'geoff' 'sys' 'geoff' q (0000000000142ef2 318 d) m 01777777777760000000775 at 1253652282 mt 1207255391 l 0 t 77 d 123312
Tread tag 1 fid 426 offset 0 count 8192
reply Rread tag 1 count 8127 '49004d00 b0e10100 00010000 00f32e14 00000000 00b40100 00b1434a 4aad5c62 43dd0b00 00000000 000d0061 70626f6f 74737472 61702e73 05006765 6f666603'
Tread tag 1 fid 426 offset 8127 count 8192
reply Rread tag 1 count 2754 '45004d00 b0e10100 00050000 00842f14 00000000 00b40100 00d41cb9 4a90dd8b 47609600 00000000 00090073 64696168 63692e63 05006765 6f666603 00737973'
Tread tag 1 fid 426 offset 10881 count 8192
reply Rread tag 1 count 0 ''
Tclunk tag 1 fid 426
reply Rclunk tag 1
Twalk tag 1 fid 421 newfid 425 nwname 1 0:grep 
reply Rerror tag 1 ename './sys/src/9/pc/grep' does not exist
Twalk tag 1 fid 421 newfid 428 nwname 1 0:apic.c 
reply Rwalk tag 1 nwqid 1 0:(0000000000142ef4 2 ) 
Topen tag 1 fid 428 mode 0
reply Ropen tag 1 qid (0000000000142ef4 2 ) iounit 8192 
Tread tag 1 fid 428 offset 0 count 8192
reply Rread tag 1 count 8192 '#include "u.h"
#include "../port/lib.h"
#include "mem.h"
#includ'
Tread tag 1 fid 428 offset 8192 count 8192
reply Rread tag 1 count 799 'apic);

	hi = 0;
	lo = ApicIMASK;
	for(v = 0; v <= apic->mre; v+'
Tread tag 1 fid 428 offset 8991 count 8192
reply Rread tag 1 count 0 ''
Tclunk tag 1 fid 428
reply Rclunk tag 1
Twalk tag 1 fid 421 newfid 428 nwname 1 0:apm.c 
reply Rwalk tag 1 nwqid 1 0:(0000000000142ef5 1 ) 
Topen tag 1 fid 428 mode 0
Tclunk tag 1 fid 428
reply Ropen tag 1 qid (0000000000142ef5 1 ) iounit 8192 
mnt: proc grep 76: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc/apm.c rep 0x1734f70 tag 1 fid 428 T120 R113 rp 1
Twrite tag 1 fid 365 offset 60 count 6 'term% '
reply Rwrite tag 1 count 6
Tread tag 1 fid 364 offset 82 count 512
reply Rread tag 1 count 1 '
'
Twrite tag 1 fid 365 offset 66 count 6 'term% '
reply Rwrite tag 1 count 6
Tread tag 1 fid 364 offset 83 count 512
reply Rread tag 1 count 3 'ls
'
Twalk tag 1 fid 421 newfid 428 nwname 1 0:ls 
reply Rclunk tag 1
mnt: proc rc 78: mismatch from /net/tcp/0/data /n/o/sys/src/9/pc rep 0x1734f70 tag 1 fid 421 T110 R121 rp 1
Twalk tag 1 fid 421 newfid 427 nwname 0 
reply Rerror tag 1 ename './sys/src/9/pc/ls' does not exist
Twrite tag 1 fid 365 offset 72 count 20 'ls: .: clone failed
'
reply Rwrite tag 1 count 20
Twrite tag 1 fid 365 offset 92 count 6 'term% '
reply Rwrite tag 1 count 6
Tread tag 1 fid 364 offset 86 count 512
reply Rread tag 1 count 24 'cat /dev/kmesg > /tmp/x
'
Twalk tag 1 fid 421 newfid 433 nwname 1 0:cat 
reply Rwalk tag 1 nwqid 0 
Twalk tag 1 fid 361 newfid 433 nwname 1 0:kmesg 
reply Rerror tag 1 ename file does not exist

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
  2009-09-22 19:01 ` erik quanstrom
@ 2009-09-22 19:34   ` roger peppe
  2009-09-22 21:12     ` ron minnich
  0 siblings, 1 reply; 29+ messages in thread
From: roger peppe @ 2009-09-22 19:34 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/9/22 erik quanstrom <quanstro@quanstro.net>:
>> if it's ambiguous, then the tag should indeed be put on hold,
>> because there's no way to get it right.
>
> how do we prevent all tags from being on hold?
> there's no way to get that right, either.

well, it's legal to send several flushes for the same tag,
and it's also legal to send a flush of a non-existent tag,
so if there's a case of ambiguity, we could resend the flush
and drop the original rpc struct when the reply to that comes
back (or the original reply comes back).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
       [not found] <<df49a7370909221158u3f071cc3j125c85241c5088e6@mail.gmail.com>
@ 2009-09-22 19:01 ` erik quanstrom
  2009-09-22 19:34   ` roger peppe
  0 siblings, 1 reply; 29+ messages in thread
From: erik quanstrom @ 2009-09-22 19:01 UTC (permalink / raw)
  To: 9fans

> if it's ambiguous, then the tag should indeed be put on hold,
> because there's no way to get it right.

how do we prevent all tags from being on hold?
there's no way to get that right, either.

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [9fans] 9vx (is this the right list)? import issue
       [not found] <<13426df10909221147w665e30adt93b6121281294647@mail.gmail.com>
@ 2009-09-22 18:51 ` erik quanstrom
  0 siblings, 0 replies; 29+ messages in thread
From: erik quanstrom @ 2009-09-22 18:51 UTC (permalink / raw)
  To: 9fans

> I don't know. I am not sure the code does either. Since this is only
> seen so far in 9vx I am guess it is a 9vx thing.

i see this now than then on regular plan 9

Tue Sep  1 18:51:15: unexpected reply tag 51; type 109
Tue Sep  1 18:51:15: unexpected reply tag 16; type 109
Tue Sep  1 18:51:15: unexpected reply tag 39; type 109
Tue Sep  1 18:51:15: unexpected reply tag 51; type 109
Tue Sep  1 18:51:16: unexpected reply tag 51; type 109
Tue Sep  1 18:51:17: unexpected reply tag 39; type 109

- erik



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-09-23 22:13 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-21 18:29 [9fans] 9vx (is this the right list)? import issue ron minnich
2009-09-22  5:51 ` Russ Cox
2009-09-22 17:27   ` ron minnich
2009-09-22 18:21     ` ron minnich
2009-09-22 18:35       ` roger peppe
2009-09-22 18:47         ` ron minnich
2009-09-22 18:58           ` roger peppe
2009-09-22 19:08           ` Eric Van Hensbergen
     [not found] <<13426df10909221147w665e30adt93b6121281294647@mail.gmail.com>
2009-09-22 18:51 ` erik quanstrom
     [not found] <<df49a7370909221158u3f071cc3j125c85241c5088e6@mail.gmail.com>
2009-09-22 19:01 ` erik quanstrom
2009-09-22 19:34   ` roger peppe
2009-09-22 21:12     ` ron minnich
2009-09-22 21:20       ` ron minnich
     [not found] <<13426df10909221420x1298139fhdeb4f0803924e5a3@mail.gmail.com>
2009-09-22 21:23 ` erik quanstrom
2009-09-22 22:32   ` ron minnich
     [not found] <<13426df10909221532t5de9f010pfeb2ca2c3b44db89@mail.gmail.com>
2009-09-23  2:36 ` erik quanstrom
     [not found] <<fadaba2046122acf656140c0618e1d1e@ladd.quanstro.net>
2009-09-23  2:41 ` erik quanstrom
2009-09-23  3:11   ` Russ Cox
     [not found] <<dd6fe68a0909222011u4243953dged01d77ecdc93e46@mail.gmail.com>
2009-09-23  3:17 ` erik quanstrom
2009-09-23  4:11   ` Russ Cox
     [not found] <<dd6fe68a0909222111y1af0f4a2qd30a3b4eded30b2b@mail.gmail.com>
2009-09-23  4:56 ` erik quanstrom
2009-09-23 18:52   ` Russ Cox
2009-09-23 19:12     ` ron minnich
2009-09-23 19:25       ` erik quanstrom
2009-09-23 19:26       ` Russ Cox
2009-09-23 20:33         ` ron minnich
2009-09-23 20:35           ` ron minnich
2009-09-23 22:13             ` Russ Cox
2009-09-23 21:25     ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).