From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 14 Jun 2006 15:09:20 -0700 From: Roman Shaposhnick To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] quantity vs. quality Message-ID: <20060614220919.GG7331@submarine> References: <8ccc8ba40606121415i63c1064fi5ae59cf04fca7aa2@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i Topicbox-Message-UUID: 6ac576a2-ead1-11e9-9d60-3106f5b1d025 On Tue, Jun 13, 2006 at 01:08:22PM +0100, rog@vitanuova.com wrote: > i think this has been mentioned on the list before (otherwise i wouldn't > have known to look for it) but when considering error recovery tactics, it's > worth looking at http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf > ("Making reliable software systems in the presence of errors") Thanks for the pointer. I find some of the ideas mentioned in this paper to be quite interesting. Especially the ones on how true error recovery is supposed to be structured based on the hierarchy of supervisors and WBFs. Their job, however, seems to be made easier by the sort of language they use. For the cave-man like me, who still thinks that C is all I need, here's a mechanism I would very much like to use in order to make my code more fault tolerant, but easier to maintain: the "reverse" propagation of corrections. Here's what I mean by it. Suppose we're deep down in a call stack which looks somewhat like this: main ... foo ... bar() now, there's a fixable exception that occurs in bar(), lets say a call to malloc that return NULL. Also suppose that I do have a strategy for dealing with OOM conditions, but I don't want it to clutter my bar() code. In fact, I don't even want it to be local to the process but rather implemented as a policy on a standalone server. All of that means that I can't just simply write: try { malloc(); } catch(...) { } but I have to transfer the control to the higher authority. I expect the codition which lead to OOM be fixed at that level, and all I want to have at the level of bar() is to see my malloc() call be restared. Automatically. Alternatively the authority could decide that malloc() has to be terminated at which point my control flow will resume past the 'malloc();'. Now, we have a mechanism for the exception to be propagated upwards (I can even do it in C with things like waserror()), but there's no mechanism for the "fix" to be "propagated" downwards and have my call to malloc be automatically restarted. On one hand it shouldn't be too hard to make such a thing part of the language, but I haven't seen anything like it yet. So are there any better solutions to the problem I've just described or am I talking nonsense here ? ;-) Thanks, Roman.