From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Tue, 31 Dec 2013 09:38:52 -0500
To: 9fans@9fans.net
Message-ID: <5f239460c7fcd4457d2d5e35bb17d266@brasstown.quanstro.net>
In-Reply-To: <e0f12013b2f130aa426c990d23a17aeb@felloff.net>
References: <e0f12013b2f130aa426c990d23a17aeb@felloff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] 9front sleep interrupted in kproc?
Topicbox-Message-UUID: aa14f542-ead8-11e9-9d60-3106f5b1d025

On Tue Dec 31 08:19:04 EST 2013, cinap_lenrek@felloff.net wrote:
> its just a case of defensive programming paranoia. i did the change
> after sl reported wifi kproc exiting (went into Broken) state. there
> seemed no other explaination other than a note interrupting it
> so i made all the kprocs safe to be interrupted by notes. i found this
> pattern in many places. most of the kprocs are just loops waiting for
> something and in case of error, just restart the loop. even if it is
> obvious that nobody is calling error() in there. (see devfloppy
> kproc for example). you just setup the error label at before your loop
> and it make it automatically restart your loop on errors. this doesnt
> cost you any cycles in the loop. when i see this code, i can stop worrying
> and assume *less* about of the surroundings.
>
> so simply put, yes, you are right. these kprocs shouldnt get interrupted.
> but i made sure to catch the case anyway.
>
> i wouldnt rule out notes out of the blue. the use of postnote() is racy in
> many places. a Proc* pointer is not a good identifier for a process
> *unless* you can guaratee that the proc will not exit while you call
> postnote(). alarms have this race. pexit() just does up->alarm = 0;
> to clear alarms. if the alarm kproc was already commited
> posting you the note at this point (and before it got hold onto
> the p->debug qlock), the new process reusing your Proc*
> structure can get the alarm out of the blue.

thanks for the discussion.

in my view, there is a point where defensive programming against
can't-happen events becomes counter productive.  first, the code can't
be tested, since it can't happen.  (can the kproc be restarted.  quite a
few can't.)  second, it leads the poor reader (and there are always more
readers than writers) to believe the event could happen, when it can't.
third, were it to happen by some newly introduced bug, the obvious
manifestation might be hidden by error recovery.

in short, defending against things that can't happen is imho an anti-pattern.
i don't mean things like users or other programs providing bad input,
or things that are not proveably correct, but the baseline invarients.
i think the ip code gets it right.  it panics if the ip version is ever
outside {V4, V6}.

out-of-the-blue notes can't happen without a serious bug.  were that
to happen (and we have no evidence that it has), shouldn't that be
fixed instead?

did you find the wifi issue?

- erik