From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Wed, 14 Jun 2006 15:09:20 -0700
From: Roman Shaposhnick <rvs@sun.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu>
Subject: Re: [9fans] quantity vs. quality
Message-ID: <20060614220919.GG7331@submarine>
References: <8ccc8ba40606121415i63c1064fi5ae59cf04fca7aa2@mail.gmail.com>
	<a67ad68188590b288261dc131e7e978c@vitanuova.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
In-Reply-To: <a67ad68188590b288261dc131e7e978c@vitanuova.com>
User-Agent: Mutt/1.4.2.1i
Topicbox-Message-UUID: 6ac576a2-ead1-11e9-9d60-3106f5b1d025

On Tue, Jun 13, 2006 at 01:08:22PM +0100, rog@vitanuova.com wrote:
> i think this has been mentioned on the list before (otherwise i wouldn't
> have known to look for it) but when considering error recovery tactics, it's
> worth looking at http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf
> ("Making reliable software systems in the presence of errors")

  Thanks for the pointer. I find some of the ideas mentioned in this paper 
  to be quite interesting. Especially the ones on how true error recovery is
  supposed to be structured based on the hierarchy of supervisors and WBFs.
  
  Their job, however, seems to be made easier by the sort of language they
  use. For the cave-man like me, who still thinks that C is all I need,
  here's a mechanism I would very much like to use in order to make my
  code more fault tolerant, but easier to maintain: the "reverse" propagation
  of corrections. Here's what I mean by it.

  Suppose we're deep down in a call stack which looks somewhat like this:
     
     main 
       ...
         foo
	   ...
	      bar()

  now, there's a fixable exception that occurs in bar(), lets say a call
  to malloc that return NULL. Also suppose that I do have a strategy
  for dealing with OOM conditions, but I don't want it to clutter my
  bar() code. In fact, I don't even want it to be local to the process
  but rather implemented as a policy on a standalone server. All of that
  means that I can't just simply write:
      
         try {
	    malloc();
	 } catch(...) {
	    <fix it>
	 }
   
  but I have to transfer the control to the higher authority. I expect
  the codition which lead to OOM be fixed at that level, and all I want
  to have at the level of bar() is to see my malloc() call be restared.
  Automatically. Alternatively the authority could decide that malloc()
  has to be terminated at which point my control flow will resume past
  the 'malloc();'. 

  Now, we have a mechanism for the exception to be propagated upwards
  (I can even do it in C with things like waserror()), but there's no
  mechanism for the "fix" to be "propagated" downwards and have
  my call to malloc be automatically restarted. 

  On one hand it shouldn't be too hard to make such a thing part of the
  language, but I haven't seen anything like it yet. So are there any
  better solutions to the problem I've just described or am I talking
  nonsense here ? ;-)
  	     
Thanks,
Roman.