RE: [9fans] quantity vs. quality

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* RE: [9fans] quantity vs. quality
@ 2006-06-09  6:01 cej
  0 siblings, 0 replies; 78+ messages in thread
From: cej @ 2006-06-09  6:01 UTC (permalink / raw)
  To: 9fans

I usually don't care of gui apps, so there is (not yet working) linuxemu by ruuss (I think)... Would be great to have this-- it's on the wiki's TODO

++pac

-----Original Message-----
From: 9fans-bounces+cej=gli.cas.cz@cse.psu.edu [mailto:9fans-bounces+cej=gli.cas.cz@cse.psu.edu] On Behalf Of Lluís Batlle
Sent: 08 June 2006 11:32
To: Fans of the OS Plan 9 from Bell Labs
Subject: Re: [9fans] quantity vs. quality

Maybe something capable of running an isolated linux box in plan9's environment would do the trick, as some people do right now in Linux in order to run plan9. Let's say... a 'xen-like-thing' running on plan9, for running other kernels over the same hardware.

Of course, I don't plan coding that.

2006/6/8, cej@gli.cas.cz <cej@gli.cas.cz>:
> > No, we need fresh ideas.  An infinite number of monkeys turning Plan 
> > 9 into Linux is not progress.
>
> I agree 100%. Although I would LOVE to have some loonix prgs w/o 
> rebooting to L. Or c++, java, perl, etc... Attract more people to the 
> clean design (and they will, hopefully, rewrite everything that 
> is(isn't;-) worth it... IMHO.
>
> ++pac.
>
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-14 22:09                                                 ` Roman Shaposhnick
@ 2006-06-15 15:46                                                   ` Victor Nazarov
  0 siblings, 0 replies; 78+ messages in thread
From: Victor Nazarov @ 2006-06-15 15:46 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Roman Shaposhnick wrote:

>  Suppose we're deep down in a call stack which looks somewhat like this:
>     
>     main 
>       ...
>         foo
>	   ...
>	      bar()
>
>  now, there's a fixable exception that occurs in bar(), lets say a call
>  to malloc that return NULL. Also suppose that I do have a strategy
>  for dealing with OOM conditions, but I don't want it to clutter my
>  bar() code. In fact, I don't even want it to be local to the process
>  but rather implemented as a policy on a standalone server. All of that
>  means that I can't just simply write:
>      
>         try {
>	    malloc();
>	 } catch(...) {
>	    <fix it>
>	 }
>   
>  but I have to transfer the control to the higher authority. I expect
>  the codition which lead to OOM be fixed at that level, and all I want
>  to have at the level of bar() is to see my malloc() call be restared.
>  Automatically. Alternatively the authority could decide that malloc()
>  has to be terminated at which point my control flow will resume past
>  the 'malloc();'. 
>
>  Now, we have a mechanism for the exception to be propagated upwards
>  (I can even do it in C with things like waserror()), but there's no
>  mechanism for the "fix" to be "propagated" downwards and have
>  my call to malloc be automatically restarted. 
>
>  On one hand it shouldn't be too hard to make such a thing part of the
>  language, but I haven't seen anything like it yet. So are there any
>  better solutions to the problem I've just described or am I talking
>  nonsense here ? ;)
>  
>
What about creating a Plan9-thread for the error recovery mecanism. This 
thread
would wait for the message on the channel. When we encounter malloc error we
send message on the error chanel. Error recovery process error and send 
message
back to us. Getting message we restart malloc...
All this require carefull coding, so I didn't try to write an example by 
myself, but
I think it's possible.
--
Victor



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-13 12:08                                               ` rog
  2006-06-13 16:34                                                 ` Skip Tavakkolian
@ 2006-06-14 22:09                                                 ` Roman Shaposhnick
  2006-06-15 15:46                                                   ` Victor Nazarov
  1 sibling, 1 reply; 78+ messages in thread
From: Roman Shaposhnick @ 2006-06-14 22:09 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Tue, Jun 13, 2006 at 01:08:22PM +0100, rog@vitanuova.com wrote:
> i think this has been mentioned on the list before (otherwise i wouldn't
> have known to look for it) but when considering error recovery tactics, it's
> worth looking at http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf
> ("Making reliable software systems in the presence of errors")

  Thanks for the pointer. I find some of the ideas mentioned in this paper 
  to be quite interesting. Especially the ones on how true error recovery is
  supposed to be structured based on the hierarchy of supervisors and WBFs.

  Their job, however, seems to be made easier by the sort of language they
  use. For the cave-man like me, who still thinks that C is all I need,
  here's a mechanism I would very much like to use in order to make my
  code more fault tolerant, but easier to maintain: the "reverse" propagation
  of corrections. Here's what I mean by it.

  Suppose we're deep down in a call stack which looks somewhat like this:

     main 
       ...
         foo
	   ...
	      bar()

  now, there's a fixable exception that occurs in bar(), lets say a call
  to malloc that return NULL. Also suppose that I do have a strategy
  for dealing with OOM conditions, but I don't want it to clutter my
  bar() code. In fact, I don't even want it to be local to the process
  but rather implemented as a policy on a standalone server. All of that
  means that I can't just simply write:

         try {
	    malloc();
	 } catch(...) {
	    <fix it>
	 }

  but I have to transfer the control to the higher authority. I expect
  the codition which lead to OOM be fixed at that level, and all I want
  to have at the level of bar() is to see my malloc() call be restared.
  Automatically. Alternatively the authority could decide that malloc()
  has to be terminated at which point my control flow will resume past
  the 'malloc();'. 

  Now, we have a mechanism for the exception to be propagated upwards
  (I can even do it in C with things like waserror()), but there's no
  mechanism for the "fix" to be "propagated" downwards and have
  my call to malloc be automatically restarted. 

  On one hand it shouldn't be too hard to make such a thing part of the
  language, but I haven't seen anything like it yet. So are there any
  better solutions to the problem I've just described or am I talking
  nonsense here ? ;-)

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-13 16:34                                                 ` Skip Tavakkolian
@ 2006-06-13 21:35                                                   ` "Nils O. Selåsdal"
  0 siblings, 0 replies; 78+ messages in thread
From: "Nils O. Selåsdal" @ 2006-06-13 21:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Skip Tavakkolian wrote:
> excellent points; i believe this.  there's no sense in masking errors
> with pseudo recovery.  good test coverage should expose programmer
> misunderstanding.
> 
> if the system can't afford memory allocation errors, then
> preallocating (static or dynamic) and capping a maximum that the
> system should ever need will help simulate exhaustion in testing and
> make the memory usage and response times bounded.  watchdog processes
> and memory checksums are possible additional measures.

Memory shortage can often be temporary. Sleeping malloc has saved me
a few times.



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-13 12:08                                               ` rog
@ 2006-06-13 16:34                                                 ` Skip Tavakkolian
  2006-06-13 21:35                                                   ` "Nils O. Selåsdal"
  2006-06-14 22:09                                                 ` Roman Shaposhnick
  1 sibling, 1 reply; 78+ messages in thread
From: Skip Tavakkolian @ 2006-06-13 16:34 UTC (permalink / raw)
  To: 9fans

excellent points; i believe this.  there's no sense in masking errors
with pseudo recovery.  good test coverage should expose programmer
misunderstanding.

if the system can't afford memory allocation errors, then
preallocating (static or dynamic) and capping a maximum that the
system should ever need will help simulate exhaustion in testing and
make the memory usage and response times bounded.  watchdog processes
and memory checksums are possible additional measures.

> i think this has been mentioned on the list before (otherwise i wouldn't
> have known to look for it) but when considering error recovery tactics, it's
> worth looking at http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf
> ("Making reliable software systems in the presence of errors")
> 
> he summarises their approach to error recovery as follows:
> 
> - if you can't do what you want to do, die.
> - let it crash.
> - do not program defensively.
> 
> they built a telecoms switching system with a reported measured reliability of 99.9999999%
> following this philosophy.
> 
> see section 4.3 (page 101) for details.
> 
> the key is that another process gets notified of the error.
> 
> he makes this useful distinction between "error" and "exception":
> 
> - exceptions occur when the run-time system does not know what to do.
> - errors occur when the programmer does not know what to do.
> 
> i would suggest that most out-of-memory conditions are best
> classed as errors, not exceptions.



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-12 21:15                                             ` Francisco J Ballesteros
@ 2006-06-13 12:08                                               ` rog
  2006-06-13 16:34                                                 ` Skip Tavakkolian
  2006-06-14 22:09                                                 ` Roman Shaposhnick
  0 siblings, 2 replies; 78+ messages in thread
From: rog @ 2006-06-13 12:08 UTC (permalink / raw)
  To: 9fans

i think this has been mentioned on the list before (otherwise i wouldn't
have known to look for it) but when considering error recovery tactics, it's
worth looking at http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf
("Making reliable software systems in the presence of errors")

he summarises their approach to error recovery as follows:

- if you can't do what you want to do, die.
- let it crash.
- do not program defensively.

they built a telecoms switching system with a reported measured reliability of 99.9999999%
following this philosophy.

see section 4.3 (page 101) for details.

the key is that another process gets notified of the error.

he makes this useful distinction between "error" and "exception":

- exceptions occur when the run-time system does not know what to do.
- errors occur when the programmer does not know what to do.

i would suggest that most out-of-memory conditions are best
classed as errors, not exceptions.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-12 20:56                                               ` Ronald G Minnich
  2006-06-12 21:09                                                 ` Victor Nazarov
@ 2006-06-13  0:05                                                 ` Roman Shaposhnik
  1 sibling, 0 replies; 78+ messages in thread
From: Roman Shaposhnik @ 2006-06-13  0:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, 2006-06-12 at 14:56 -0600, Ronald G Minnich wrote:
> Roman Shaposhnick wrote:
> > On Mon, Jun 12, 2006 at 02:16:10PM -0600, Ronald G Minnich wrote:
> > 
> >>actually, to repeat my serious question: waserror() and friends in user 
> >>mode. Crazy, doable, or just plain bad idea?
> > 
> > 
> >   Doable. But that'll give you a tool quite similar to the C++ exceptions
> >   (although without the bonus of auto-destruction). What's your plan
> >   for using this tool in libraries ?
> 
> I have no plans :-)
> 
> Just wondering.
> 
> auto-destruction: does it involve plastique or just gunpowder?
  
   I'm talking language of mass destruction here ;-)

Thanks,
Roman.



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-12 20:16                                           ` Ronald G Minnich
  2006-06-12 20:23                                             ` Roman Shaposhnick
@ 2006-06-12 21:15                                             ` Francisco J Ballesteros
  2006-06-13 12:08                                               ` rog
  1 sibling, 1 reply; 78+ messages in thread
From: Francisco J Ballesteros @ 2006-06-12 21:15 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

in /n/sources/nemo/sys/src/liberror

there is such beast, we made it for Plan B.
I found it useful when error handling required a lot
of nesting and checking, because simply calling error sufficed.

However, at the end, I still had the problem of what to do with the error.
In may times, btw, it was just clean up to avoid messing things up, and then
sysfatal().


On 6/12/06, Ronald G Minnich <rminnich@lanl.gov> wrote:
> actually, to repeat my serious question: waserror() and friends in user
> mode. Crazy, doable, or just plain bad idea?
>
> thanks
>
> ron
>
>


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-12 20:56                                               ` Ronald G Minnich
@ 2006-06-12 21:09                                                 ` Victor Nazarov
  2006-06-13  0:05                                                 ` Roman Shaposhnik
  1 sibling, 0 replies; 78+ messages in thread
From: Victor Nazarov @ 2006-06-12 21:09 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Ronald G Minnich wrote:

> Roman Shaposhnick wrote:
>
>> On Mon, Jun 12, 2006 at 02:16:10PM -0600, Ronald G Minnich wrote:
>>
>>> actually, to repeat my serious question: waserror() and friends in 
>>> user mode. Crazy, doable, or just plain bad idea?
>>
>>
>>
>>   Doable. But that'll give you a tool quite similar to the C++ 
>> exceptions
>>   (although without the bonus of auto-destruction). What's your plan
>>   for using this tool in libraries ?
>
>
> I have no plans :-)
>
> Just wondering.
>
> auto-destruction: does it involve plastique or just gunpowder?
>
> ron
>
auto-destruction seems to be deprecated everywhere. Garbage Collection 
needs to be
implemented in libraries. It's seems to be possible without KenC 
language modification, but
I think it's too expensive and needs massive rewrites and refactoring.
--
Victor



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-12 20:23                                             ` Roman Shaposhnick
@ 2006-06-12 20:56                                               ` Ronald G Minnich
  2006-06-12 21:09                                                 ` Victor Nazarov
  2006-06-13  0:05                                                 ` Roman Shaposhnik
  0 siblings, 2 replies; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-12 20:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Roman Shaposhnick wrote:
> On Mon, Jun 12, 2006 at 02:16:10PM -0600, Ronald G Minnich wrote:
> 
>>actually, to repeat my serious question: waserror() and friends in user 
>>mode. Crazy, doable, or just plain bad idea?
> 
> 
>   Doable. But that'll give you a tool quite similar to the C++ exceptions
>   (although without the bonus of auto-destruction). What's your plan
>   for using this tool in libraries ?

I have no plans :-)

Just wondering.

auto-destruction: does it involve plastique or just gunpowder?

ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-12 20:16                                           ` Ronald G Minnich
@ 2006-06-12 20:23                                             ` Roman Shaposhnick
  2006-06-12 20:56                                               ` Ronald G Minnich
  2006-06-12 21:15                                             ` Francisco J Ballesteros
  1 sibling, 1 reply; 78+ messages in thread
From: Roman Shaposhnick @ 2006-06-12 20:23 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Jun 12, 2006 at 02:16:10PM -0600, Ronald G Minnich wrote:
> actually, to repeat my serious question: waserror() and friends in user 
> mode. Crazy, doable, or just plain bad idea?

  Doable. But that'll give you a tool quite similar to the C++ exceptions
  (although without the bonus of auto-destruction). What's your plan
  for using this tool in libraries ?

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-12  3:45                                         ` Paul Lalonde
@ 2006-06-12 20:16                                           ` Ronald G Minnich
  2006-06-12 20:23                                             ` Roman Shaposhnick
  2006-06-12 21:15                                             ` Francisco J Ballesteros
  0 siblings, 2 replies; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-12 20:16 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

actually, to repeat my serious question: waserror() and friends in user 
mode. Crazy, doable, or just plain bad idea?

thanks

ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-11 23:26                                       ` geoff
@ 2006-06-12  3:45                                         ` Paul Lalonde
  2006-06-12 20:16                                           ` Ronald G Minnich
  0 siblings, 1 reply; 78+ messages in thread
From: Paul Lalonde @ 2006-06-12  3:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-Jun-06, at 4:26 PM, geoff@collyer.net wrote:
>
> On the other hand, resource exhaustion nowadays can generally be
> prevented at little cost: add a few gigabytes of RAM, add a few 500GB
> disks for swap or file storage.

Generally, but not in many cases.  I do entertainment software.   
Sony, Microsoft, or Nintendo set the machine spec, and that's what  
you have to work with.  When working on the new breed of highly  
dynamic environments it's becoming difficult to fall back to the  
traditional solution of a fixed-size asset base.  Memory shortages in  
particular are acute and need to be dealt with on an excessively  
regular basis, and usually in ways that affect the fundamental  
architecture of the product.  Sadly, methodologies for exception  
condition handling are just poor and obscenely difficult to test.

Paul

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEjONfpJeHo/Fbu1wRAnq2AKDNodI//nqxZ9xfCorLk2HAc/OnhgCgmKHP
D6XjGQnPzrjdbph+ki1LXu0=
=Xi4D
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10 23:13                             ` Ronald G Minnich
  2006-06-11  0:44                               ` quanstro
  2006-06-11  5:42                               ` Russ Cox
@ 2006-06-12  1:03                               ` Roman Shaposhnik
  2 siblings, 0 replies; 78+ messages in thread
From: Roman Shaposhnik @ 2006-06-12  1:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 2006-06-10 at 17:13 -0600, Ronald G Minnich wrote:
> A fork fails.
> 
> Which would you rather have the kernel do: panic? Lock up (as in the old 
> days)? Or handle it gracefully.

I think that the original point (which was to demonstrate that
error handling in libraries is not just a gross oversight that
obviously conflicts with a claim of Plan9 code being clean and
otherwise superior to the average source code out there) has
been clearly demonstrated by the sheer length of the discussion
that followed. That was my goal in this discussion.

Now, I think that personally I somewhat overstated my case. Clearly,
the real question is not to chose between the death and recovery,
but rather how to develop a comprehensive strategy based on these
two. 

Being a compiler guy the most common case of error handling I see
is, of course, an invalid input given to the translator. Something
that clearly requires a recovery strategy. Yet, I'm always frustrated
at an endless stream of induced error messages when I see one. Something
that makes the code for dealing with correct recovery in C++ grow longer
every day. Longer and untested :-(

Exceptions/return values/waserror() these are all just mechanisms for
controlling the flow of your application -- they can get you to a
particular place in your code, but deciding what to do once you're
there is when it gets complicated.  

Thanks,
Roman.

P.S. It is also interesting to note that a chapter on error and
resource handling in "Practice of Programming" is the least prescriptive
of them all, basically stating that authors don't have a clear
answer.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-11 12:00                                     ` lucio
  2006-06-11 22:59                                       ` quanstro
@ 2006-06-11 23:26                                       ` geoff
  2006-06-12  3:45                                         ` Paul Lalonde
  1 sibling, 1 reply; 78+ messages in thread
From: geoff @ 2006-06-11 23:26 UTC (permalink / raw)
  To: 9fans

I think the argument behind erik's argument is that some error
scenarios are difficult to simulate, and fairly improbable.  Even
given conscientious programmers who code carefully and test reasonably
thoroughly, will testing simulate a failure of every possible attempt
to allocate memory to see what happens?  How about simulating I/O
errors on every possible read or write?  More generally, will it
simulate an error return from every possible library and system call?
all possible error returns from every possible call?

To pick a real example of recovery, in C News, we made sure that
incoming messages would not be dropped even if a filesystem filled at
an awkward time.  The code might exit prematurely (since there's no
point in banging one's [disk] head against a full filesystem usually),
but that would just result in the batch in-process being preserved and
input processing being stalled until the file system got some free
space.  Eventually we started checking free space before processing a
batch too.

On the other hand, resource exhaustion nowadays can generally be
prevented at little cost: add a few gigabytes of RAM, add a few 500GB
disks for swap or file storage.  woot.com was selling 250GB disks
(admittedly from Western Digital, which I'm not a fan of) for $50 each
the other night.  malloc failure was a genuine possibility on the
PDP-11, with a 16-bit address space, and full file systems happened,
when 300MB disks and drives cost thousands of dollars (I no longer
remember prices; they were costly enough that the 11/70 I ran had two
300MB disks and that seemed like an enormous amount of space [it still
seems like a lot to me]).  These days I would imagine that primarily
embedded systems (including games) and supercomputer-like applications
have problems in practice with resource exhaustion.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-11 12:00                                     ` lucio
@ 2006-06-11 22:59                                       ` quanstro
  2006-06-11 23:26                                       ` geoff
  1 sibling, 0 replies; 78+ messages in thread
From: quanstro @ 2006-06-11 22:59 UTC (permalink / raw)
  To: 9fans

On Sun Jun 11 07:24:25 CDT 2006, lucio@proxima.alt.za wrote:
> > never?  what if malloc's datastructures are corrupt?
> 
> As long as the stack isn't corrupt, it _can_ still return to the
> caller.  The argument is really whether the caller can be trusted to
> take the correct (non)recovery action.  

i don't think this is about "trusting" somebody do do the right thing
for recovery.

if malloc's datastructures are corrupt, then you can assume that memory
is corrupt.  somebody's fandangoed on core.  since you don't have any
valid data, what can you accomplish except call sysfatal. (which might not
work.)

the most incidious bit about trying to recover when you're really and 
truly hosed is that you just make debugging harder.

btw.  glibc will abort if you corrupt the heap or double-free.

> But you can't take away
> Lucho's options because another 99 callers are too lazy.  Your view,
> if I read you correctly, is that Lucho also can't be trusted, because
> he won't test his recovery code, but that is not an acceptable
> assumption.

i think you're reading me wrong.  it's not about trust.  it's about how
software really gets written.  i'm as guilty as the next guy in writing
fancy recovery code that i never try out.

i've been bitten in production at least twice by botched recovery.

> 
> Yes, we do need a middle ground and redefining _sysfatal() is one
> option, but encouraging good programming practice, by example as well
> as by instruction, would be preferable to unpredictable behaviour
> under error conditions.

yes.

> To me, the greatest loss in this age of complexity, is the determinism
> of early day computing.  Anything that increases determinism at the
> application level is to be encouraged, not discouraged.

this is exactly why i think that sysfatal can be good if you really can't continue
or continuing is very likely to mask an error.

if you fail to get 20 bytes from malloc, for instance, it's likely you have an
huge leak in your program that needs to be fixed.

- erik

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10 23:02                 ` Ronald G Minnich
  2006-06-11  0:12                   ` quanstro
@ 2006-06-11 22:31                   ` David Leimbach
  1 sibling, 0 replies; 78+ messages in thread
From: David Leimbach @ 2006-06-11 22:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 6/10/06, Ronald G Minnich <rminnich@lanl.gov> wrote:
> David Leimbach wrote:
> > On 6/9/06, Ronald G Minnich <rminnich@lanl.gov> wrote:
> >
> >> Latchesar Ionkov wrote:
> >> > Another example is using emalloc in libraries. I agree that it is  much
> >> > simpler to just give up when there is not enough memory (which  is also
> >> > not very likely case), but is that how the code is supposed  to be
> >> > written if you are not doing research?
> >>
> >> yes, that is a problem with a lot of code. "Just bail on first error" --
> >> we've had to stop using emalloc here because that is very unrealistic
> >> for production support.
> >>
> >> ron
> >>
> >
> > Well I wonder what people typically do when they can't malloc anymore
> > memory but need more... A reasonable thing to do is to die I'd think.
>
> example.
>
> xcpu server is running a couple hundred processes for testing. It is
> asked to do one more. It can't allocate something.
>
> Just dying at that point is really a bad idea, and we did find that some
> of the libraries we were using would in fact do that, without coming
> back to xcpu server main code with an error. That's not good behaviour
> for xcpu server. It should gracefully return 'no more room' and keep
> managing things; in some few cases, the library did not give us that
> option.

I agree with you wholeheartedly.  However there are times when you
*need* to malloc more memory but can't, based on how the program in
question is supposed to operate.

Having implemented parts of MPI's Dynamic Process Management
specification before, I totally understand where you're going.

I do find that, however, when I need to malloc more memory, not
getting what I ask for from malloc is not an acceptable run of the
program.  This happens to me more often than the recoverable cases in
fact, so I often call "abort" when malloc comes back NULL (in testing
anyway... core dumps are handy).

>
> There are lots of cases like this, many of them in the kernel even;
> would we want the kernel to just toss chunks at these times?

Well in the kernel I agree it's a totally different ball game.  There
are blocking and nonblocking memory allocators (in linux anyway).


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-11 10:09                                   ` quanstro
@ 2006-06-11 12:00                                     ` lucio
  2006-06-11 22:59                                       ` quanstro
  2006-06-11 23:26                                       ` geoff
  0 siblings, 2 replies; 78+ messages in thread
From: lucio @ 2006-06-11 12:00 UTC (permalink / raw)
  To: 9fans

> never?  what if malloc's datastructures are corrupt?

As long as the stack isn't corrupt, it _can_ still return to the
caller.  The argument is really whether the caller can be trusted to
take the correct (non)recovery action.  But you can't take away
Lucho's options because another 99 callers are too lazy.  Your view,
if I read you correctly, is that Lucho also can't be trusted, because
he won't test his recovery code, but that is not an acceptable
assumption.

Yes, we do need a middle ground and redefining _sysfatal() is one
option, but encouraging good programming practice, by example as well
as by instruction, would be preferable to unpredictable behaviour
under error conditions.

To me, the greatest loss in this age of complexity, is the determinism
of early day computing.  Anything that increases determinism at the
application level is to be encouraged, not discouraged.

++L

PS: What is true is that the Plan 9 developers did not have the
resources to do everything perfectly and picked sensible places where
this shortcoming would affect the result in the least destructive
fashion.  What is also true is that as a community we ought to be able
to improve on this, rather than concentrate on wild goose chases.
Which brings us back, full circle, to the desirability of GCC/G++ for
Plan 9.  Thanks, Latchesar :-)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-11  5:08                                 ` lucio
@ 2006-06-11 10:09                                   ` quanstro
  2006-06-11 12:00                                     ` lucio
  0 siblings, 1 reply; 78+ messages in thread
From: quanstro @ 2006-06-11 10:09 UTC (permalink / raw)
  To: 9fans

never?  what if malloc's datastructures are corrupt?

- erik

On Sun Jun 11 00:24:44 CDT 2006, lucio@proxima.alt.za wrote:
> Lucho's right in that the library should _never_ terminate on error,
> he is wrong in that this can only apply in an ideal world where all
> returns from library functions are checked.  But we may be compounding
> his error by not auditing /sys/src and adding error checking where it
> ought to be present and removing Plan 9's contribution to this state
> of affairs.  Then Plan 9 gets to hold a bigger chunk of the high moral
> ground and also stops encouraging this approach.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-11  5:42                               ` Russ Cox
@ 2006-06-11 10:08                                 ` quanstro
  0 siblings, 0 replies; 78+ messages in thread
From: quanstro @ 2006-06-11 10:08 UTC (permalink / raw)
  To: 9fans

On Sat Jun 10 23:41:04 CDT 2006, rsc@swtch.com wrote:
> When I ported dns I was very impressed that
> it correctly handled rfork failing (because p9p
> doesn't do shared-memory rfork).  

you've got to admit this was a fortuitous accident.  and
dns doesn't need rfork do do one request -- it needs rfork
to handle a second request concurrently.  dns didn't really
work that well with the broken rfork.

> You can argue
> about cutting corners all you want, but it's a 
> slippery slope.

i don't think it's about cutting corners. 

what do you do when
- a library can't allocate 10 bytes? or 20?
- malloc detects a double-free or somebody has stepped
on it's datastructures.

isn't this much different than
- a library can allocate 64k or 1m

trying to recover from some errors is worse than dying.
dying gives you the oppertunity to fix the problem.
not dying can cover real bugs.

> 
> You end up with more robust software when you
> think about what to do in the error cases instead
> of just assuming they won't happen.

if you're going to put that code in, it needs to be tested.
in commercial software, my worst experiences have been
with recovery code that didn't work.

- erik

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10 23:13                             ` Ronald G Minnich
  2006-06-11  0:44                               ` quanstro
@ 2006-06-11  5:42                               ` Russ Cox
  2006-06-11 10:08                                 ` quanstro
  2006-06-12  1:03                               ` Roman Shaposhnik
  2 siblings, 1 reply; 78+ messages in thread
From: Russ Cox @ 2006-06-11  5:42 UTC (permalink / raw)
  To: 9fans

When I ported dns I was very impressed that
it correctly handled rfork failing (because p9p
doesn't do shared-memory rfork).  You can argue
about cutting corners all you want, but it's a 
slippery slope.  

You end up with more robust software when you
think about what to do in the error cases instead
of just assuming they won't happen.

Russ

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-11  0:44                               ` quanstro
@ 2006-06-11  5:08                                 ` lucio
  2006-06-11 10:09                                   ` quanstro
  0 siblings, 1 reply; 78+ messages in thread
From: lucio @ 2006-06-11  5:08 UTC (permalink / raw)
  To: 9fans

> mostly agreed.  i'd go for it's /often/ better to quit.  and the downside
> of the library not quitting is now you get to add a bunch of error
> code to /sys/src/cmd.  

Well, you _could_ modify /sys/src/cmd so that it checks all library
return codes and _sysfatal()s as and when required, as a preliminary.
It would be a small (relatively speaking) alteration that can be
improved later.

Lucho's right in that the library should _never_ terminate on error,
he is wrong in that this can only apply in an ideal world where all
returns from library functions are checked.  But we may be compounding
his error by not auditing /sys/src and adding error checking where it
ought to be present and removing Plan 9's contribution to this state
of affairs.  Then Plan 9 gets to hold a bigger chunk of the high moral
ground and also stops encouraging this approach.

Is it even conceivable (haven't had my coffee yet) that we could have
a library that reports all instances where user code _does_ not check
for error returns?

++L

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10 23:06                           ` Ronald G Minnich
  2006-06-10 23:15                             ` geoff
@ 2006-06-11  2:58                             ` jmk
  1 sibling, 0 replies; 78+ messages in thread
From: jmk @ 2006-06-11  2:58 UTC (permalink / raw)
  To: 9fans

there's nothing inviolate about fixing the library
to do what you want. for example, in this case, the default
sysfatal behaviour could stay the same, but it would
return the value returned by _sysfatal, say a string.
then we look at all the places it's used and see if
it needs any tweaking.

or something like that. if you can agree on some better
implementation that satisfies everyone then that's what
we do. routines like sysfatal end up in the library as
someone thought they were common idioms in a bunch of
existing programmes. sometimes the idiom needs refinement
after it's been seen in place.

--jim

On Sat Jun 10 19:07:50 EDT 2006, rminnich@lanl.gov wrote:
> quanstro@quanstro.net wrote:
> 
> > sure you can.  sysfatal calls _sysfatal to do the deed.  redefine that to call your
> > fancy cleanup routine and you're golden.
> 
> wrong approach. _sysfatal has no idea what's going on. The code that 
> called the library call has every idea what's going on.
> 
> 
> 
> ron

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-11  0:12                   ` quanstro
@ 2006-06-11  2:20                     ` Ronald G Minnich
  0 siblings, 0 replies; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-11  2:20 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

quanstro@quanstro.net wrote:
> i think we mostly agree.  i'm a little sick of trying to read code with
> gobs of recovery goo that may or may not ever get executed.
> who knows?
> 


Maybe we just need waserror() in user mode.

Then, you want to recover, you at least get the chance.

Or is waserror() there in the library and I never noticed?

ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10 23:13                             ` Ronald G Minnich
@ 2006-06-11  0:44                               ` quanstro
  2006-06-11  5:08                                 ` lucio
  2006-06-11  5:42                               ` Russ Cox
  2006-06-12  1:03                               ` Roman Shaposhnik
  2 siblings, 1 reply; 78+ messages in thread
From: quanstro @ 2006-06-11  0:44 UTC (permalink / raw)
  To: 9fans

On Sat Jun 10 18:16:52 CDT 2006, rminnich@lanl.gov wrote:
> A fork fails.
> 
> Which would you rather have the kernel do: panic? Lock up (as in the old 
> days)? Or handle it gracefully.

i was talking about applications.  and usually applications can deal with fork
failure.  maybe counterproductively.

> Your fossil fills up completely. Which would you rather have it do: blow 
> up on boot, so you can no longer boot your system (current behavior); or 
> manage disk space so that, even in the worst of all cases, you can still 
> get booted enough to try to clean up (most unix file systems since 1980 
> or so).

i don't know.  it's hard to say without looking at it.  it's hard for me to just say
"ya, it should do that" because linux is an example of how that can go.  
you should be able to boot from cd, though?  

would this work:  lets say i have everything i need to boot on a fossilfs main
and stuff i can boot without (and could much more easily fill) on fossilfs 
otherstuff.  will fossil continue with the main fs and ignore the full one?

> 
> Nobody's arguing for "best enemy of good". All we're trying to say is, 
> that there are times a library should not make the decision to 
> sysfatal() on you.  And, there are many real world examples of resource 
> exhaustion where continuing to run is better than dying. 

agreed.

> It's not always 
> better to run, and it may not be better to run in most cases, but 
> sometimes it is really better not to have the library pre-emptively 
> decide to exit; in fact, you want a reasonable return value.

mostly agreed.  i'd go for it's /often/ better to quit.  and the downside
of the library not quitting is now you get to add a bunch of error
code to /sys/src/cmd.  

> 
> I do believe that the shell can handle the case of some types of 
> resource exhaustion:
> 
> 	switch(forkid = fork()){
> 	case -1:
> 		Xerror("try again");
> 		break;
> 
> 
> would you want rc to exit in this case?

would it really matter?  if everybody tried their hardest to continue
in the face of errors, you'd never be able to fork a process and fix anything.

damned if you do, damned if you don't.

- erik

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10 23:02                 ` Ronald G Minnich
@ 2006-06-11  0:12                   ` quanstro
  2006-06-11  2:20                     ` Ronald G Minnich
  2006-06-11 22:31                   ` David Leimbach
  1 sibling, 1 reply; 78+ messages in thread
From: quanstro @ 2006-06-11  0:12 UTC (permalink / raw)
  To: 9fans

i think we mostly agree.  i'm a little sick of trying to read code with
gobs of recovery goo that may or may not ever get executed.
who knows?

on the other hand, you're right.  if there's a way to kill the kernel by 
running it out of memory, then that should be fixed, if possible.
no questions. 

it would be better for xcpu to start lining up processes for the 
firing squad until sufficient memory is available.

- erik

On Sat Jun 10 18:04:07 CDT 2006, rminnich@lanl.gov wrote:
> 
> xcpu server is running a couple hundred processes for testing. It is 
> asked to do one more. It can't allocate something.
> 
> Just dying at that point is really a bad idea, and we did find that some 
> of the libraries we were using would in fact do that, without coming 
> back to xcpu server main code with an error. That's not good behaviour 
> for xcpu server. It should gracefully return 'no more room' and keep 
> managing things; in some few cases, the library did not give us that 
> option.
> 
> There are lots of cases like this, many of them in the kernel even; 
> would we want the kernel to just toss chunks at these times?
> 
> I agree -- sometimes, death is the only option. But not always.
> 
> ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10 23:04                       ` Ronald G Minnich
@ 2006-06-11  0:05                         ` quanstro
  0 siblings, 0 replies; 78+ messages in thread
From: quanstro @ 2006-06-11  0:05 UTC (permalink / raw)
  To: 9fans

you must have been running better software than i!  usually something
gets corrupted when a resource runs out.  and often, when you come
close to running out of memory on unix, performance sucks so bad you
wish the machine would just fall over.

- erik

On Sat Jun 10 18:06:01 CDT 2006, rminnich@lanl.gov wrote:
> quanstro@quanstro.net wrote:
> 
> > i'm skeptical that this is a real-world problem.  i've not run out of memory
> > without hosing the system to the point where it needed to be rebooted.
> 
> we've definitely had different experiences :-)
> 
> ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10 23:05                           ` Ronald G Minnich
@ 2006-06-11  0:00                             ` quanstro
  0 siblings, 0 replies; 78+ messages in thread
From: quanstro @ 2006-06-11  0:00 UTC (permalink / raw)
  To: 9fans

the converse is that every dumb little app now has to handle
a lot of error conditions.  this is also not robust.

- erik

On Sat Jun 10 18:06:50 CDT 2006, rminnich@lanl.gov wrote:
> Latchesar Ionkov wrote:
> > You are concentrating too much on the particular example I gave (emalloc)
> > and not on the issue of exiting from a library. I don't think it is a
> > library job to decide whether the application should die or not.
> 
> that is a very key point.
> 
> ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10 23:06                           ` Ronald G Minnich
@ 2006-06-10 23:15                             ` geoff
  2006-06-11  2:58                             ` jmk
  1 sibling, 0 replies; 78+ messages in thread
From: geoff @ 2006-06-10 23:15 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 156 bytes --]

I think the suggestion is to have the caller assign the address of an
alternate function to _sysfatal, and that function can know what the
caller knows.

[-- Attachment #2: Type: message/rfc822, Size: 3135 bytes --]

From: Ronald G Minnich <rminnich@lanl.gov>
To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu>
Subject: Re: [9fans] quantity vs. quality
Date: Sat, 10 Jun 2006 17:06:52 -0600
Message-ID: <448B508C.1@lanl.gov>

quanstro@quanstro.net wrote:

> sure you can.  sysfatal calls _sysfatal to do the deed.  redefine that to call your
> fancy cleanup routine and you're golden.

wrong approach. _sysfatal has no idea what's going on. The code that 
called the library call has every idea what's going on.

ron

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:45                           ` Roman Shaposhnick
  2006-06-10  3:01                             ` Latchesar Ionkov
@ 2006-06-10 23:13                             ` Ronald G Minnich
  2006-06-11  0:44                               ` quanstro
                                                 ` (2 more replies)
  1 sibling, 3 replies; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-10 23:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

A fork fails.

Which would you rather have the kernel do: panic? Lock up (as in the old 
days)? Or handle it gracefully.

Your fossil fills up completely. Which would you rather have it do: blow 
up on boot, so you can no longer boot your system (current behavior); or 
manage disk space so that, even in the worst of all cases, you can still 
get booted enough to try to clean up (most unix file systems since 1980 
or so).

Nobody's arguing for "best enemy of good". All we're trying to say is, 
that there are times a library should not make the decision to 
sysfatal() on you. And, there are many real world examples of resource 
exhaustion where continuing to run is better than dying. It's not always 
better to run, and it may not be better to run in most cases, but 
sometimes it is really better not to have the library pre-emptively 
decide to exit; in fact, you want a reasonable return value.

I do believe that the shell can handle the case of some types of 
resource exhaustion:

	switch(forkid = fork()){
	case -1:
		Xerror("try again");
		break;

would you want rc to exit in this case?

ron

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:23                         ` quanstro
  2006-06-10  0:41                           ` Paul Lalonde
  2006-06-10  2:51                           ` Latchesar Ionkov
@ 2006-06-10 23:06                           ` Ronald G Minnich
  2006-06-10 23:15                             ` geoff
  2006-06-11  2:58                             ` jmk
  2 siblings, 2 replies; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-10 23:06 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

quanstro@quanstro.net wrote:

> sure you can.  sysfatal calls _sysfatal to do the deed.  redefine that to call your
> fancy cleanup routine and you're golden.

wrong approach. _sysfatal has no idea what's going on. The code that 
called the library call has every idea what's going on.



ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  2:31                         ` Latchesar Ionkov
  2006-06-10  0:45                           ` Roman Shaposhnick
@ 2006-06-10 23:05                           ` Ronald G Minnich
  2006-06-11  0:00                             ` quanstro
  1 sibling, 1 reply; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-10 23:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Latchesar Ionkov wrote:
> You are concentrating too much on the particular example I gave (emalloc)
> and not on the issue of exiting from a library. I don't think it is a
> library job to decide whether the application should die or not.

that is a very key point.

ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:51                     ` quanstro
                                         ` (2 preceding siblings ...)
  2006-06-10  2:27                       ` Latchesar Ionkov
@ 2006-06-10 23:04                       ` Ronald G Minnich
  2006-06-11  0:05                         ` quanstro
  3 siblings, 1 reply; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-10 23:04 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

quanstro@quanstro.net wrote:

> i'm skeptical that this is a real-world problem.  i've not run out of memory
> without hosing the system to the point where it needed to be rebooted.

we've definitely had different experiences :-)

ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:46                 ` Paul Lalonde
@ 2006-06-10 23:03                   ` Ronald G Minnich
  0 siblings, 0 replies; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-10 23:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Paul Lalonde wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Isn't the traditional solution to pre-allocate an emergency pad, and  
> when malloc fails the emergency "I need more memory" handler gets to  
> use that pad, and then propagates the condition upwards until some  
> caller can handle the case?

it's very common, and you can probably patent it anyway.

ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:29                 ` quanstro
  2006-06-10  1:57                   ` Latchesar Ionkov
@ 2006-06-10 23:03                   ` Ronald G Minnich
  1 sibling, 0 replies; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-10 23:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

quanstro@quanstro.net wrote:

> i have never seen malloc failure on a production system where
> recovery was possible; i have seen some memory corruption that
> led to malloc failure, but there's no recovering from that.

we've seen different things, then.

ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:38               ` David Leimbach
  2006-06-09 23:45                 ` andrey mirtchovski
  2006-06-09 23:46                 ` Paul Lalonde
@ 2006-06-10 23:02                 ` Ronald G Minnich
  2006-06-11  0:12                   ` quanstro
  2006-06-11 22:31                   ` David Leimbach
  2 siblings, 2 replies; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-10 23:02 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

David Leimbach wrote:
> On 6/9/06, Ronald G Minnich <rminnich@lanl.gov> wrote:
> 
>> Latchesar Ionkov wrote:
>> > Another example is using emalloc in libraries. I agree that it is  much
>> > simpler to just give up when there is not enough memory (which  is also
>> > not very likely case), but is that how the code is supposed  to be
>> > written if you are not doing research?
>>
>> yes, that is a problem with a lot of code. "Just bail on first error" --
>> we've had to stop using emalloc here because that is very unrealistic
>> for production support.
>>
>> ron
>>
> 
> Well I wonder what people typically do when they can't malloc anymore
> memory but need more... A reasonable thing to do is to die I'd think.

example.

xcpu server is running a couple hundred processes for testing. It is 
asked to do one more. It can't allocate something.

Just dying at that point is really a bad idea, and we did find that some 
of the libraries we were using would in fact do that, without coming 
back to xcpu server main code with an error. That's not good behaviour 
for xcpu server. It should gracefully return 'no more room' and keep 
managing things; in some few cases, the library did not give us that 
option.

There are lots of cases like this, many of them in the kernel even; 
would we want the kernel to just toss chunks at these times?

I agree -- sometimes, death is the only option. But not always.

ron

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  1:15                               ` Paul Lalonde
@ 2006-06-10  5:19                                 ` Bruce Ellis
  0 siblings, 0 replies; 78+ messages in thread
From: Bruce Ellis @ 2006-06-10  5:19 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

i think there is at least 20 years of lit on "how to cope
with memory exhaustion".  in limbo almost any chunk
of code can exhaust the heap, throws an exception and
the errant dudes get cleaned up.  i've only seen it in
torture tests but no crashes ... just culprits.

brucee

On 6/10/06, Paul Lalonde <plalonde@telus.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> On 9-Jun-06, at 5:59 PM, quanstro@quanstro.net wrote:
> >
> > however, i have yet to see a small allocation fail without the
> > system being pretty broken.  and my conclusion is that preemtive
> > strikes against failures that should not happen on a sane system
> > may cause more harm than good.
>
> I have seen allocations that are *supposed* to be small fail - some
> of the most precious results are in the middle of runs of code that's
> far from what you or I would call production-ready.  There's plenty
> of cases of relatively fragile software doing smart things with
> partial results when a programmer error occurs; when partial runs
> have value you save the result...
>
> Paul
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (Darwin)
>
> iD8DBQFEih08pJeHo/Fbu1wRAkBsAJ9Mj7i0aJnrnMDhJWI/mtII0ScPRgCgoDM+
> mLfgd72HPUb5NbMb6LH59m8=
> =6RkZ
> -----END PGP SIGNATURE-----
>


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:45                             ` quanstro
@ 2006-06-10  3:10                               ` Latchesar Ionkov
  2006-06-10  0:53                                 ` quanstro
  0 siblings, 1 reply; 78+ messages in thread
From: Latchesar Ionkov @ 2006-06-10  3:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 09, 2006 at 07:45:40PM -0500, quanstro@quanstro.net said:
> i don't know if it's good code or bad code in general. i was pointing out that it is possible to
> catch a call to sysfatal and do something else, since you seemed to indicate that was
> not possible.

How can you find from _sysfatal what exactly caused the sysfatal and if you
in recoverable state or not?

	Lucho


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:45                           ` Roman Shaposhnick
@ 2006-06-10  3:01                             ` Latchesar Ionkov
  2006-06-10  0:52                               ` quanstro
  2006-06-10  1:04                               ` Roman Shaposhnick
  2006-06-10 23:13                             ` Ronald G Minnich
  1 sibling, 2 replies; 78+ messages in thread
From: Latchesar Ionkov @ 2006-06-10  3:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 09, 2006 at 05:45:29PM -0700, Roman Shaposhnick said:
>   Now, don't get me wrong -- sometimes you have to make an extra effort
>   to at least pretend that it is solvable. Especially when you are in a
>   business of building commercial software. I can appreciate it. But lets
>   move our discussion to a practical level -- could you explain what 
>   sort of "alternative" control flow you're after when something bad
>   happens inside the library. What kind of an ideal world solution
>   would you like to see as an application developer ?

Umm, like returning an error? :)

Thanks,
	Lucho


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:23                         ` quanstro
  2006-06-10  0:41                           ` Paul Lalonde
@ 2006-06-10  2:51                           ` Latchesar Ionkov
  2006-06-10  0:45                             ` quanstro
  2006-06-10 23:06                           ` Ronald G Minnich
  2 siblings, 1 reply; 78+ messages in thread
From: Latchesar Ionkov @ 2006-06-10  2:51 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 09, 2006 at 07:23:11PM -0500, quanstro@quanstro.net said:
> On Fri Jun  9 19:18:09 CDT 2006, lucho@gmx.net wrote:
> > On Fri, Jun 09, 2006 at 06:51:00PM -0500, quanstro@quanstro.net said:
> > > On Fri Jun  9 18:48:44 CDT 2006, lucho@gmx.net wrote:
> > > > > 
> > > > > what is the senerio you're thinking of where malloc could fail
> > > > > and you can recover?
> > > > 
> > > > 
> > > > Let's say you have a fossil like file server and you cannot malloc memory to
> > > > process new requests. Do you want to flush the data buffers back to the disk
> > > > before you die, you you want to die in some library without any flushing.
> > > 
> > > have you had a problem with fossil failing in this way?
> > 
> > IIRC fossil is not using libraries that call sysfatal. I said fossil like
> > file server, not fossil. Let's say that I want to write a fossil-like file
> > server. I cannot use lib9p, because it will call sysfatal and not give me
> > chance to flush the buffers to the disk.
> > 
> 
> sure you can.  sysfatal calls _sysfatal to do the deed.  redefine that to call your
> fancy cleanup routine and you're golden.

And you think that's an example of writing good code? I thought we are
talking about a system with clean design that you don't need to use kludges in :)

	Lucho



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:10                       ` Roman Shaposhnick
@ 2006-06-10  2:31                         ` Latchesar Ionkov
  2006-06-10  0:45                           ` Roman Shaposhnick
  2006-06-10 23:05                           ` Ronald G Minnich
  0 siblings, 2 replies; 78+ messages in thread
From: Latchesar Ionkov @ 2006-06-10  2:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

You are concentrating too much on the particular example I gave (emalloc)
and not on the issue of exiting from a library. I don't think it is a
library job to decide whether the application should die or not.

On Fri, Jun 09, 2006 at 05:10:24PM -0700, Roman Shaposhnick said:
> On Fri, Jun 09, 2006 at 06:51:00PM -0500, quanstro@quanstro.net wrote:
> > > Or you have a file server that keeps some non-file server related state in
> > > memory. The unability to serve any more requests is fine as long as it can
> > > start serving them at some point later when there is more memory. The dying
> > > is not acceptible because the data kept in the memory is important.
> >  
> > i'm skeptical that this is a real-world problem.  i've not run out of memory
> > without hosing the system to the point where it needed to be rebooted.
> > 
> > worse, all these failure modes need to be tested if this is production code.
> 
>   I believe it is to be a crucial issue here. True, what Latchesar is after
>   is a fine goal, its just that I'm yet to see a production system where
>   it can save you from a bigger trouble. May be I've been exceptionally
>   unlucky, but that's a reality -- if you truly run out of memory you're
>   screwed. Your escape strategies don't work, worse yet they are *very*
>   likely to fail in the manner that you don't except in the layer that
>   you have no knowledge about. 
> 
>   Consider this -- what's worse: a fossil server that died and lost some
>   of the requests or a server that tried to recover and committed random
>   junk ?

How is definitely losing data better than the possibility of writing random
junk?

Thanks,
	Lucho


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:51                     ` quanstro
  2006-06-10  0:10                       ` Roman Shaposhnick
  2006-06-10  0:24                       ` andrey mirtchovski
@ 2006-06-10  2:27                       ` Latchesar Ionkov
  2006-06-10  0:23                         ` quanstro
  2006-06-10 23:04                       ` Ronald G Minnich
  3 siblings, 1 reply; 78+ messages in thread
From: Latchesar Ionkov @ 2006-06-10  2:27 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 09, 2006 at 06:51:00PM -0500, quanstro@quanstro.net said:
> On Fri Jun  9 18:48:44 CDT 2006, lucho@gmx.net wrote:
> > > 
> > > what is the senerio you're thinking of where malloc could fail
> > > and you can recover?
> > 
> > 
> > Let's say you have a fossil like file server and you cannot malloc memory to
> > process new requests. Do you want to flush the data buffers back to the disk
> > before you die, you you want to die in some library without any flushing.
> 
> have you had a problem with fossil failing in this way?

IIRC fossil is not using libraries that call sysfatal. I said fossil like
file server, not fossil. Let's say that I want to write a fossil-like file
server. I cannot use lib9p, because it will call sysfatal and not give me
chance to flush the buffers to the disk.

> i'm not saying you allocate memory like crazy and don't worry about running
> out of memory.  i'm sure that fossil is very careful about allocating memory
> for requests.
> > 
> > Or you have a file server that keeps some non-file server related state in
> > memory. The unability to serve any more requests is fine as long as it can
> > start serving them at some point later when there is more memory. The dying
> > is not acceptible because the data kept in the memory is important.
>  
> i'm skeptical that this is a real-world problem.  i've not run out of memory
> without hosing the system to the point where it needed to be rebooted.

This is a real-world problem that we are currently having. 

Thanks,
	Lucho


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:29                 ` quanstro
@ 2006-06-10  1:57                   ` Latchesar Ionkov
  2006-06-09 23:51                     ` quanstro
  2006-06-10 23:03                   ` Ronald G Minnich
  1 sibling, 1 reply; 78+ messages in thread
From: Latchesar Ionkov @ 2006-06-10  1:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 09, 2006 at 06:29:37PM -0500, quanstro@quanstro.net said:
> if you take that view, then you can't use paging because the
> kernel might kill your process off on overcommit.

Or you should disable the overcommit (if possible).

> i'm not sure i understand how malloc failure can be a common
> enough event that it needs to be handled.

It is not only malloc failure, grep the plan9 libraries for sysfatal.

> i have never seen malloc failure on a production system where
> recovery was possible; i have seen some memory corruption that
> led to malloc failure, but there's no recovering from that.
> 
> what is the senerio you're thinking of where malloc could fail
> and you can recover?


Let's say you have a fossil like file server and you cannot malloc memory to
process new requests. Do you want to flush the data buffers back to the disk
before you die, you you want to die in some library without any flushing.

Or you have a file server that keeps some non-file server related state in
memory. The unability to serve any more requests is fine as long as it can
start serving them at some point later when there is more memory. The dying
is not acceptible because the data kept in the memory is important.

Thanks,
	Lucho

> 
> - erik
> 
> On Fri Jun  9 18:23:09 CDT 2006, lionkov@lanl.gov wrote:
> > There are cases when you want to leave the output of the program in  
> > consistent state before you die (and you don't need extra memory to  
> > achieve that consistency). Or even if the program cannot continue its  
> > work, it would rather lurk around and wait for somebody to rescue it  
> > instead of just dying. And in cases like that you just cannot use  
> > libraries that call sysfatal.
> > 
> > Thanks,
> > 	Lucho


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:59                             ` quanstro
@ 2006-06-10  1:15                               ` Paul Lalonde
  2006-06-10  5:19                                 ` Bruce Ellis
  0 siblings, 1 reply; 78+ messages in thread
From: Paul Lalonde @ 2006-06-10  1:15 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 9-Jun-06, at 5:59 PM, quanstro@quanstro.net wrote:
>
> however, i have yet to see a small allocation fail without the
> system being pretty broken.  and my conclusion is that preemtive
> strikes against failures that should not happen on a sane system
> may cause more harm than good.

I have seen allocations that are *supposed* to be small fail - some  
of the most precious results are in the middle of runs of code that's  
far from what you or I would call production-ready.  There's plenty  
of cases of relatively fragile software doing smart things with  
partial results when a programmer error occurs; when partial runs  
have value you save the result...

Paul

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEih08pJeHo/Fbu1wRAkBsAJ9Mj7i0aJnrnMDhJWI/mtII0ScPRgCgoDM+
mLfgd72HPUb5NbMb6LH59m8=
=6RkZ
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  3:01                             ` Latchesar Ionkov
  2006-06-10  0:52                               ` quanstro
@ 2006-06-10  1:04                               ` Roman Shaposhnick
  1 sibling, 0 replies; 78+ messages in thread
From: Roman Shaposhnick @ 2006-06-10  1:04 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 09, 2006 at 09:01:23PM -0600, Latchesar Ionkov wrote:
> On Fri, Jun 09, 2006 at 05:45:29PM -0700, Roman Shaposhnick said:
> >   Now, don't get me wrong -- sometimes you have to make an extra effort
> >   to at least pretend that it is solvable. Especially when you are in a
> >   business of building commercial software. I can appreciate it. But lets
> >   move our discussion to a practical level -- could you explain what 
> >   sort of "alternative" control flow you're after when something bad
> >   happens inside the library. What kind of an ideal world solution
> >   would you like to see as an application developer ?
> 
> Umm, like returning an error? :)

  If its about 
    $ echo "I can't remember the syntax" > /dev/complicated/device

  sure thing -- I'm with you and I'd be the first one to argue for 
  fixing places where it reboots your system instead.

  What I'm after, however, is a: we are deep in a library, and 
  a really bad thing(tm) happens, how do we recover if we can't guarantee
  that we would be able to even get to the application layer from
  where we are. I'm seriously curious -- do you know of a good strategy ?

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:41                           ` Paul Lalonde
@ 2006-06-10  0:59                             ` quanstro
  2006-06-10  1:15                               ` Paul Lalonde
  0 siblings, 1 reply; 78+ messages in thread
From: quanstro @ 2006-06-10  0:59 UTC (permalink / raw)
  To: 9fans

sure.  if you have a known place where a known sane input will
cause allocation failure (like an image that's too big), then by all
means, handle that case.  everything you say is valid.

however, i have yet to see a small allocation fail without the
system being pretty broken.  and my conclusion is that preemtive
strikes against failures that should not happen on a sane system
may cause more harm than good.

your bit about exceptions is is right on.  sometimes just droping
core makes problem resolution much easier.

- erik

On Fri Jun  9 19:44:19 CDT 2006, plalonde@telus.net wrote:
> On 9-Jun-06, at 5:23 PM, quanstro@quanstro.net wrote:
> >
> > sure you can.  sysfatal calls _sysfatal to do the deed.  redefine  
> > that to call your
> > fancy cleanup routine and you're golden.
> 
> But it's one step worse than this.  Sometimes your fancy cleanup  
> routine can't dig itself out of your current callstack; it's better  
> to find a way to "succeed" and handle the failure higher up, thus  
> maintaining integrity.  When I have critical (well, as critical as it  
> gets when doing entertainment software) resources whose allocation  
> failure will cause grief, I try to pre-allocate before doing  
> something irreversible.  The the rest of the work is working out what  
> you're going to use to propagate that exception condition up the  
> stack, at the same time as your routine "succeeds".
>   A longump or function call doesn't let you clean up/repair your  
> state well enough precisely because calling it threw away an  
> important part of your state.  This is what all those people on about  
> C++ exceptions are mumbling about, although their implementation  
> means catching every such case in what seems like every codepath -  
> ugly fast.
> 
> Paul


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  3:10                               ` Latchesar Ionkov
@ 2006-06-10  0:53                                 ` quanstro
  0 siblings, 0 replies; 78+ messages in thread
From: quanstro @ 2006-06-10  0:53 UTC (permalink / raw)
  To: 9fans

by the error string.

On Fri Jun  9 20:03:37 CDT 2006, lucho@gmx.net wrote:
> On Fri, Jun 09, 2006 at 07:45:40PM -0500, quanstro@quanstro.net said:
> > i don't know if it's good code or bad code in general. i was pointing out that it is possible to
> > catch a call to sysfatal and do something else, since you seemed to indicate that was
> > not possible.
> 
> How can you find from _sysfatal what exactly caused the sysfatal and if you
> in recoverable state or not?
> 
> 	Lucho
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  3:01                             ` Latchesar Ionkov
@ 2006-06-10  0:52                               ` quanstro
  2006-06-10  1:04                               ` Roman Shaposhnick
  1 sibling, 0 replies; 78+ messages in thread
From: quanstro @ 2006-06-10  0:52 UTC (permalink / raw)
  To: 9fans

i think you're optimising for the corner case.  do you expect every program
in /sys/src/cmd to want or need to recover from a malloc error or a sysfatal?

btw.  why would you return an error for these cases:

lib9p/srv.c:		sysfatal("no walk function, no file trees");
ibString/s_grow.c:		sysfatal("s_grow of constant string");
libString/s_putc.c:		sysfatal("can't s_putc a shared string");
libString/s_read.c:		sysfatal("can't s_read a shared string");
libString/s_read_line.c:		sysfatal("can't s_read_line a shared string");
libString/s_terminate.c:		sysfatal("can't s_terminate a shared string");
[etc.]

what could you do about this one:
libthread/note.c:		sysfatal("libthread: too many delayed notes");

- erik

On Fri Jun  9 19:51:55 CDT 2006, lucho@gmx.net wrote:
> On Fri, Jun 09, 2006 at 05:45:29PM -0700, Roman Shaposhnick said:
> >   Now, don't get me wrong -- sometimes you have to make an extra effort
> >   to at least pretend that it is solvable. Especially when you are in a
> >   business of building commercial software. I can appreciate it. But lets
> >   move our discussion to a practical level -- could you explain what 
> >   sort of "alternative" control flow you're after when something bad
> >   happens inside the library. What kind of an ideal world solution
> >   would you like to see as an application developer ?
> 
> Umm, like returning an error? :)


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  2:51                           ` Latchesar Ionkov
@ 2006-06-10  0:45                             ` quanstro
  2006-06-10  3:10                               ` Latchesar Ionkov
  0 siblings, 1 reply; 78+ messages in thread
From: quanstro @ 2006-06-10  0:45 UTC (permalink / raw)
  To: 9fans

i don't know if it's good code or bad code in general. i was pointing out that it is possible to
catch a call to sysfatal and do something else, since you seemed to indicate that was
not possible.

it may be a good strategy if you really feel that sysfatal needs to be caught for your 
particular application.  the advantage is that other, simplier applications need not 
be complicated by handling things like malloc failures.

- erik

On Fri Jun  9 19:46:28 CDT 2006, lucho@gmx.net wrote:
> > sure you can.  sysfatal calls _sysfatal to do the deed.  redefine that to call your
> > fancy cleanup routine and you're golden.
> 
> And you think that's an example of writing good code? I thought we are
> talking about a system with clean design that you don't need to use kludges in :)
> 
> 	Lucho

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  2:31                         ` Latchesar Ionkov
@ 2006-06-10  0:45                           ` Roman Shaposhnick
  2006-06-10  3:01                             ` Latchesar Ionkov
  2006-06-10 23:13                             ` Ronald G Minnich
  2006-06-10 23:05                           ` Ronald G Minnich
  1 sibling, 2 replies; 78+ messages in thread
From: Roman Shaposhnick @ 2006-06-10  0:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 09, 2006 at 08:31:16PM -0600, Latchesar Ionkov wrote:
> You are concentrating too much on the particular example I gave (emalloc)
> and not on the issue of exiting from a library. I don't think it is a
> library job to decide whether the application should die or not.

  You're right. But this particular example illustrates an important 
  point which, I believe, is crucial to the spirit of Plan9. I clumsily
  formulate it as: its important to know when to stop solving the
  general problem. I believe that's the principle that helps Plan9
  kernel stay away from becoming a true micro-kernel (Mach OS) or 
  drive the abstraction ad absurdum (Spring OS) etc.

  It is my true belief that sometimes its better to admit that we don't
  know how to solve the general problem yet, than to pretend that we do.
  C++ and Java were supposed to relieve me from ever worrying about error
  handling. But did they really ? Somehow I don't see a huge stream
  of exceptions I get from my JVM quite regularly as a superior strategy.
  We don't know how to solve the general problem.

  Now, don't get me wrong -- sometimes you have to make an extra effort
  to at least pretend that it is solvable. Especially when you are in a
  business of building commercial software. I can appreciate it. But lets
  move our discussion to a practical level -- could you explain what 
  sort of "alternative" control flow you're after when something bad
  happens inside the library. What kind of an ideal world solution
  would you like to see as an application developer ?

> >   Consider this -- what's worse: a fossil server that died and lost some
> >   of the requests or a server that tried to recover and committed random
> >   junk ?
> 
> How is definitely losing data better than the possibility of writing random
> junk?

  Because it is very easy to tell when you lost something on which you can
  build a comprehensive recovery strategy. Case in point: I remember a couple
  (5-10, actually) years ago there was an MS DOS virus detected at Stanford
  which, to my recollection, was the one and only one they truly feared. 
  What the virus did was it scanned your HDD for TeX documents and carefully
  permuted certain characters as to keep the document a valid TeX document
  but alter its meaning slightly. 

  The vote on this one was unanimous: "we'd be all better off if it just
  deleted them".

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:23                         ` quanstro
@ 2006-06-10  0:41                           ` Paul Lalonde
  2006-06-10  0:59                             ` quanstro
  2006-06-10  2:51                           ` Latchesar Ionkov
  2006-06-10 23:06                           ` Ronald G Minnich
  2 siblings, 1 reply; 78+ messages in thread
From: Paul Lalonde @ 2006-06-10  0:41 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 9-Jun-06, at 5:23 PM, quanstro@quanstro.net wrote:
>
> sure you can.  sysfatal calls _sysfatal to do the deed.  redefine  
> that to call your
> fancy cleanup routine and you're golden.

But it's one step worse than this.  Sometimes your fancy cleanup  
routine can't dig itself out of your current callstack; it's better  
to find a way to "succeed" and handle the failure higher up, thus  
maintaining integrity.  When I have critical (well, as critical as it  
gets when doing entertainment software) resources whose allocation  
failure will cause grief, I try to pre-allocate before doing  
something irreversible.  The the rest of the work is working out what  
you're going to use to propagate that exception condition up the  
stack, at the same time as your routine "succeeds".
  A longump or function call doesn't let you clean up/repair your  
state well enough precisely because calling it threw away an  
important part of your state.  This is what all those people on about  
C++ exceptions are mumbling about, although their implementation  
means catching every such case in what seems like every codepath -  
ugly fast.

Paul

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEihUqpJeHo/Fbu1wRAh0KAJwPCzxloDdoK8R61nHASUBvYahb/wCgtetN
9AaBLhds/5qTrx4hAdoicc4=
=BXDh
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  0:24                       ` andrey mirtchovski
@ 2006-06-10  0:36                         ` quanstro
  0 siblings, 0 replies; 78+ messages in thread
From: quanstro @ 2006-06-10  0:36 UTC (permalink / raw)
  To: 9fans

no.  you explain the situtation well.

i realize that this is exactly the situation.  and the problem is that once you
run out of memory, all kinds of stuff will start failing.  fork will fail.  unless
there is a single identifiably offender that the kernel kills and isn't automatically
restarted, your chances of fixing the system are small.

running a production system on non-dedicated h/w is a bug in itself.  you
need to have some idea of what's going to be running on your box.

i've run into all those cases.  (except for legitamately running out of memory
with a completely sane system.)  

running out of fds on unix is a bug.  you can query the system for the number
of fds you are allowed to use, and you need to respect this number.

running out of processes could be lots of things.  i always boiled it down to
either a configuration error or a lack of resource control. e.g. something like

	if(nrq > cfg.rqmax){
		werrstr("too many requests");
		return -1;
	}
	nrq++;

look, i like to start out with each "difficult" error (like running out of memory)
and handling the real cases that kill my program.  generally this is a small
fraction of the total number of ways the application could fail.

naturally this is more work, but if you do it you have confidence that the situation
you specifically saw and handled will be handled correctly.

this is a bit of religion.  hopefully i'm not too dogmatic. ;-)

- erik

On Fri Jun  9 19:25:35 CDT 2006, mirtchovski@gmail.com wrote:
> > i'm skeptical that this is a real-world problem.  i've not run out of memory
> > without hosing the system to the point where it needed to be rebooted.
> 
> the problem we face is that we can't isolate our programs on dedicated
> hardware the way you isolate venti for example. if you ran a
> standalone venti server and ran out of memory you could argue that the
> crap has hit the fan irrevocably.
> 
> some of our code looks a lot like a meta-kernel: we provide the
> capabilities for running other programs on many machines concurrently.
> in more cases than anyone will admit, those programs misbehave badly
> but we can't afford to throw the towel every time.
> 
> to illustrate from experience, just in the space of one month this
> year, we ran out of memory, out of processes to run, out of time and
> out of file descriptors in trivial cases. we simply must keep going or
> at least sit quietly and wait for the storm to pass...
> 
> i'm sorry if i'm not explaining the situation too well.
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 22:44         ` David Leimbach
  2006-06-09 22:46           ` quanstro
  2006-06-09 22:51           ` Latchesar Ionkov
@ 2006-06-10  0:28           ` Roman Shaposhnick
  2 siblings, 0 replies; 78+ messages in thread
From: Roman Shaposhnick @ 2006-06-10  0:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 09, 2006 at 03:44:22PM -0700, David Leimbach wrote:
> On 6/9/06, Roman Shaposhnick <rvs@sun.com> wrote:
> >On Wed, Jun 07, 2006 at 11:05:12PM -0400, Dan Cross wrote:
> >> Too bad the example a beginning programmer
> >> sees now is the cess pool of open source cruft instead of well-written
> >> code.
> >
> >  And that would be the second most useful thing about Plan 9 -- its
> >  source code as a literature for educating oneself how the code is
> >  supposed to be written.
> >
> >Thanks,
> >Roman.
> 
> Except /sys/src/9/pc/pci.c that says it badly needs to be rewritten.
> Maybe a slightly less Kool-Aid drinking way to approach this would be
> to say "code that needs help is better marked, and there's less of
> that?"

  May be. I guess I feel passionate about it because Plan9 is the only
  source code that I can read and understand what's going on almost
  always without using a debugger. Maybe it is a cognitive limitation 
  on my part, and may be you guys are lucky enough to have more developed 
  perceptual capabilities but something like this: 
     http://svn.mplayerhq.hu/mplayer/trunk/mplayer.c?view=markup&rev=18407
  or this:
     http://cvs.openssl.org/dir?d=openssl/crypto

  leaves me no chance of *learning* from it. Its all write-only code.

> There's a lot of "belief" here that I think is "fundamentally"
> dangerous... as with anything.

  Its not so much a belief but rather my personal experience. I do lots 
  of self-educating these days by trying to understand the limitations
  of a particular chunk of technology. Just the other day I was
  exploring the "graphics" (i.e. /dev/draw) approach to building a 
  desktop OS (expect more questions on this one from me a bit later ;-))
  And of course, the natural places to start were: libdraw, NeWS,
  Java2D and a bit of Quartz. libdraw wasn't the ideal one. True. But it
  was the only one where I can more or less understand what's going on
  by just reading the source. This principle holds true 99% of the time
  when I compare anything with Plan9. Does it say something about the
  quality of the code ? I don't know. About me ? Not sure. But that's the
  way it feels to me -- as subjective as it may be...

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:51                     ` quanstro
  2006-06-10  0:10                       ` Roman Shaposhnick
@ 2006-06-10  0:24                       ` andrey mirtchovski
  2006-06-10  0:36                         ` quanstro
  2006-06-10  2:27                       ` Latchesar Ionkov
  2006-06-10 23:04                       ` Ronald G Minnich
  3 siblings, 1 reply; 78+ messages in thread
From: andrey mirtchovski @ 2006-06-10  0:24 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> i'm skeptical that this is a real-world problem.  i've not run out of memory
> without hosing the system to the point where it needed to be rebooted.

the problem we face is that we can't isolate our programs on dedicated
hardware the way you isolate venti for example. if you ran a
standalone venti server and ran out of memory you could argue that the
crap has hit the fan irrevocably.

some of our code looks a lot like a meta-kernel: we provide the
capabilities for running other programs on many machines concurrently.
in more cases than anyone will admit, those programs misbehave badly
but we can't afford to throw the towel every time.

to illustrate from experience, just in the space of one month this
year, we ran out of memory, out of processes to run, out of time and
out of file descriptors in trivial cases. we simply must keep going or
at least sit quietly and wait for the storm to pass...

i'm sorry if i'm not explaining the situation too well.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  2:27                       ` Latchesar Ionkov
@ 2006-06-10  0:23                         ` quanstro
  2006-06-10  0:41                           ` Paul Lalonde
                                             ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: quanstro @ 2006-06-10  0:23 UTC (permalink / raw)
  To: 9fans

On Fri Jun  9 19:18:09 CDT 2006, lucho@gmx.net wrote:
> On Fri, Jun 09, 2006 at 06:51:00PM -0500, quanstro@quanstro.net said:
> > On Fri Jun  9 18:48:44 CDT 2006, lucho@gmx.net wrote:
> > > > 
> > > > what is the senerio you're thinking of where malloc could fail
> > > > and you can recover?
> > > 
> > > 
> > > Let's say you have a fossil like file server and you cannot malloc memory to
> > > process new requests. Do you want to flush the data buffers back to the disk
> > > before you die, you you want to die in some library without any flushing.
> > 
> > have you had a problem with fossil failing in this way?
> 
> IIRC fossil is not using libraries that call sysfatal. I said fossil like
> file server, not fossil. Let's say that I want to write a fossil-like file
> server. I cannot use lib9p, because it will call sysfatal and not give me
> chance to flush the buffers to the disk.
> 

sure you can.  sysfatal calls _sysfatal to do the deed.  redefine that to call your
fancy cleanup routine and you're golden.

but i think you're letting better be the enemy of good.  start off just letting
sysfatal run its course.  if you find real examples of recoverable failure, then
handle and test each one.

when i've taken this approach, i have never needed to recover from the 
error, but i have found some corefiles with interesting requests.  like malloc(-1).
that is another overlooked benefit of not recovering from every error at
the outset.

- erik


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:51                     ` quanstro
@ 2006-06-10  0:10                       ` Roman Shaposhnick
  2006-06-10  2:31                         ` Latchesar Ionkov
  2006-06-10  0:24                       ` andrey mirtchovski
                                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 78+ messages in thread
From: Roman Shaposhnick @ 2006-06-10  0:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 09, 2006 at 06:51:00PM -0500, quanstro@quanstro.net wrote:
> > Or you have a file server that keeps some non-file server related state in
> > memory. The unability to serve any more requests is fine as long as it can
> > start serving them at some point later when there is more memory. The dying
> > is not acceptible because the data kept in the memory is important.
>  
> i'm skeptical that this is a real-world problem.  i've not run out of memory
> without hosing the system to the point where it needed to be rebooted.
> 
> worse, all these failure modes need to be tested if this is production code.

  I believe it is to be a crucial issue here. True, what Latchesar is after
  is a fine goal, its just that I'm yet to see a production system where
  it can save you from a bigger trouble. May be I've been exceptionally
  unlucky, but that's a reality -- if you truly run out of memory you're
  screwed. Your escape strategies don't work, worse yet they are *very*
  likely to fail in the manner that you don't except in the layer that
  you have no knowledge about. 

  Consider this -- what's worse: a fossil server that died and lost some
  of the requests or a server that tried to recover and committed random
  junk ?

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-10  1:57                   ` Latchesar Ionkov
@ 2006-06-09 23:51                     ` quanstro
  2006-06-10  0:10                       ` Roman Shaposhnick
                                         ` (3 more replies)
  0 siblings, 4 replies; 78+ messages in thread
From: quanstro @ 2006-06-09 23:51 UTC (permalink / raw)
  To: 9fans

On Fri Jun  9 18:48:44 CDT 2006, lucho@gmx.net wrote:
> > 
> > what is the senerio you're thinking of where malloc could fail
> > and you can recover?
> 
> 
> Let's say you have a fossil like file server and you cannot malloc memory to
> process new requests. Do you want to flush the data buffers back to the disk
> before you die, you you want to die in some library without any flushing.

have you had a problem with fossil failing in this way?

i'm not saying you allocate memory like crazy and don't worry about running
out of memory.  i'm sure that fossil is very careful about allocating memory
for requests.
> 
> Or you have a file server that keeps some non-file server related state in
> memory. The unability to serve any more requests is fine as long as it can
> start serving them at some point later when there is more memory. The dying
> is not acceptible because the data kept in the memory is important.
 
i'm skeptical that this is a real-world problem.  i've not run out of memory
without hosing the system to the point where it needed to be rebooted.

worse, all these failure modes need to be tested if this is production code.

how many times have you seen the comment
	// client isn't going to check the return code, anyway
	sysfatal("out of memory");
or similar?  i'm pretty sure these comments are written be folks who've been
bitten by untested or inconsistant error recovery code.

- erik


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:38               ` David Leimbach
  2006-06-09 23:45                 ` andrey mirtchovski
@ 2006-06-09 23:46                 ` Paul Lalonde
  2006-06-10 23:03                   ` Ronald G Minnich
  2006-06-10 23:02                 ` Ronald G Minnich
  2 siblings, 1 reply; 78+ messages in thread
From: Paul Lalonde @ 2006-06-09 23:46 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Isn't the traditional solution to pre-allocate an emergency pad, and  
when malloc fails the emergency "I need more memory" handler gets to  
use that pad, and then propagates the condition upwards until some  
caller can handle the case?

Or should I run out and file a patent?

Paul

On 9-Jun-06, at 4:38 PM, David Leimbach wrote:

> On 6/9/06, Ronald G Minnich <rminnich@lanl.gov> wrote:
>> Latchesar Ionkov wrote:
>> > Another example is using emalloc in libraries. I agree that it  
>> is  much
>> > simpler to just give up when there is not enough memory (which   
>> is also
>> > not very likely case), but is that how the code is supposed  to be
>> > written if you are not doing research?
>>
>> yes, that is a problem with a lot of code. "Just bail on first  
>> error" --
>> we've had to stop using emalloc here because that is very unrealistic
>> for production support.
>>
>> ron
>>
>
> Well I wonder what people typically do when they can't malloc anymore
> memory but need more... A reasonable thing to do is to die I'd think.
>
> In fact if you use Perl "die" is even a part of the lang.
>
> my $a = 5 or die;
> print "hello, world\n" or die;
>
> ^^ Valid, "draconian" perl?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFEighzpJeHo/Fbu1wRAtb8AJ45qe4fxgEIiZrpugkheL+V14GHDgCfRd7b
GkIT818+hgPefaxcI25OvaA=
=KZnc
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:38               ` David Leimbach
@ 2006-06-09 23:45                 ` andrey mirtchovski
  2006-06-09 23:46                 ` Paul Lalonde
  2006-06-10 23:02                 ` Ronald G Minnich
  2 siblings, 0 replies; 78+ messages in thread
From: andrey mirtchovski @ 2006-06-09 23:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Well I wonder what people typically do when they can't malloc anymore
> memory but need more... A reasonable thing to do is to die I'd think.

a reasonable thing to do is send Rerror and let the client sort it out
(usually that means retry at some other time). it has happened here
when we run linux systems out of the number of allowed processes :)


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:25             ` Ronald G Minnich
@ 2006-06-09 23:38               ` David Leimbach
  2006-06-09 23:45                 ` andrey mirtchovski
                                   ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: David Leimbach @ 2006-06-09 23:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 6/9/06, Ronald G Minnich <rminnich@lanl.gov> wrote:
> Latchesar Ionkov wrote:
> > Another example is using emalloc in libraries. I agree that it is  much
> > simpler to just give up when there is not enough memory (which  is also
> > not very likely case), but is that how the code is supposed  to be
> > written if you are not doing research?
>
> yes, that is a problem with a lot of code. "Just bail on first error" --
> we've had to stop using emalloc here because that is very unrealistic
> for production support.
>
> ron
>

Well I wonder what people typically do when they can't malloc anymore
memory but need more... A reasonable thing to do is to die I'd think.

In fact if you use Perl "die" is even a part of the lang.

my $a = 5 or die;
print "hello, world\n" or die;

^^ Valid, "draconian" perl?


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 23:19               ` Latchesar Ionkov
@ 2006-06-09 23:29                 ` quanstro
  2006-06-10  1:57                   ` Latchesar Ionkov
  2006-06-10 23:03                   ` Ronald G Minnich
  0 siblings, 2 replies; 78+ messages in thread
From: quanstro @ 2006-06-09 23:29 UTC (permalink / raw)
  To: 9fans

if you take that view, then you can't use paging because the
kernel might kill your process off on overcommit.

i'm not sure i understand how malloc failure can be a common
enough event that it needs to be handled.

i have never seen malloc failure on a production system where
recovery was possible; i have seen some memory corruption that
led to malloc failure, but there's no recovering from that.

what is the senerio you're thinking of where malloc could fail
and you can recover?

- erik

On Fri Jun  9 18:23:09 CDT 2006, lionkov@lanl.gov wrote:
> There are cases when you want to leave the output of the program in  
> consistent state before you die (and you don't need extra memory to  
> achieve that consistency). Or even if the program cannot continue its  
> work, it would rather lurk around and wait for somebody to rescue it  
> instead of just dying. And in cases like that you just cannot use  
> libraries that call sysfatal.
> 
> Thanks,
> 	Lucho

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 22:51           ` Latchesar Ionkov
  2006-06-09 22:55             ` quanstro
@ 2006-06-09 23:25             ` Ronald G Minnich
  2006-06-09 23:38               ` David Leimbach
  1 sibling, 1 reply; 78+ messages in thread
From: Ronald G Minnich @ 2006-06-09 23:25 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Latchesar Ionkov wrote:
> Another example is using emalloc in libraries. I agree that it is  much 
> simpler to just give up when there is not enough memory (which  is also 
> not very likely case), but is that how the code is supposed  to be 
> written if you are not doing research?

yes, that is a problem with a lot of code. "Just bail on first error" -- 
we've had to stop using emalloc here because that is very unrealistic 
for production support.

ron


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 22:55             ` quanstro
@ 2006-06-09 23:19               ` Latchesar Ionkov
  2006-06-09 23:29                 ` quanstro
  0 siblings, 1 reply; 78+ messages in thread
From: Latchesar Ionkov @ 2006-06-09 23:19 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

There are cases when you want to leave the output of the program in  
consistent state before you die (and you don't need extra memory to  
achieve that consistency). Or even if the program cannot continue its  
work, it would rather lurk around and wait for somebody to rescue it  
instead of just dying. And in cases like that you just cannot use  
libraries that call sysfatal.

Thanks,
	Lucho

On Jun 9, 2006, at 4:55 PM, quanstro@quanstro.net wrote:

> 've written many production systems this way.  i never once had a  
> malloc
> failure that was not a result of a catastrophic bug or h/w failure.
>
> except in the case where you know you might be requesting an  
> unreasonable
> amount of memory, i have always thought that it makes more sense to  
> limit
> the number of failure states by just quitting when malloc fails.  i  
> want my
> programs to be as deterministic as possible. i would much rather  
> have them
> die than get into some untested failure state. (does any project  
> that tries
> to recover from malloc failure actually test failure at each  
> recovery point?)
>
> the current fad is c++ exceptions.  the idea seems okay, but the  
> programs
> i know that make use of c++ this way (mozilla comes to mind) do not
> seem to be very stable or reliable.
>
> - erik
>
> On Fri Jun  9 17:53:03 CDT 2006, lionkov@lanl.gov wrote:
>> Another example is using emalloc in libraries. I agree that it is
>> much simpler to just give up when there is not enough memory (which
>> is also not very likely case), but is that how the code is supposed
>> to be written if you are not doing research?
>>
>> Thanks,
>> 	Lucho



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 22:51           ` Latchesar Ionkov
@ 2006-06-09 22:55             ` quanstro
  2006-06-09 23:19               ` Latchesar Ionkov
  2006-06-09 23:25             ` Ronald G Minnich
  1 sibling, 1 reply; 78+ messages in thread
From: quanstro @ 2006-06-09 22:55 UTC (permalink / raw)
  To: 9fans

've written many production systems this way.  i never once had a malloc
failure that was not a result of a catastrophic bug or h/w failure.

except in the case where you know you might be requesting an unreasonable
amount of memory, i have always thought that it makes more sense to limit
the number of failure states by just quitting when malloc fails.  i want my
programs to be as deterministic as possible. i would much rather have them
die than get into some untested failure state. (does any project that tries
to recover from malloc failure actually test failure at each recovery point?)

the current fad is c++ exceptions.  the idea seems okay, but the programs
i know that make use of c++ this way (mozilla comes to mind) do not
seem to be very stable or reliable.

- erik

On Fri Jun  9 17:53:03 CDT 2006, lionkov@lanl.gov wrote:
> Another example is using emalloc in libraries. I agree that it is  
> much simpler to just give up when there is not enough memory (which  
> is also not very likely case), but is that how the code is supposed  
> to be written if you are not doing research?
> 
> Thanks,
> 	Lucho

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 22:44         ` David Leimbach
  2006-06-09 22:46           ` quanstro
@ 2006-06-09 22:51           ` Latchesar Ionkov
  2006-06-09 22:55             ` quanstro
  2006-06-09 23:25             ` Ronald G Minnich
  2006-06-10  0:28           ` Roman Shaposhnick
  2 siblings, 2 replies; 78+ messages in thread
From: Latchesar Ionkov @ 2006-06-09 22:51 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Another example is using emalloc in libraries. I agree that it is  
much simpler to just give up when there is not enough memory (which  
is also not very likely case), but is that how the code is supposed  
to be written if you are not doing research?

Thanks,
	Lucho

On Jun 9, 2006, at 4:44 PM, David Leimbach wrote:

> On 6/9/06, Roman Shaposhnick <rvs@sun.com> wrote:
>> On Wed, Jun 07, 2006 at 11:05:12PM -0400, Dan Cross wrote:
>> > Too bad the example a beginning programmer
>> > sees now is the cess pool of open source cruft instead of well- 
>> written
>> > code.
>>
>>   And that would be the second most useful thing about Plan 9 -- its
>>   source code as a literature for educating oneself how the code is
>>   supposed to be written.
>>
>> Thanks,
>> Roman.
>
> Except /sys/src/9/pc/pci.c that says it badly needs to be rewritten.
> Maybe a slightly less Kool-Aid drinking way to approach this would be
> to say "code that needs help is better marked, and there's less of
> that?"
>
> Then again.  I've not personally audited the whole system, and it's
> not clear that I have the qualifications to say that Plan 9's source
> is better than other systems.
>
> There's a lot of "belief" here that I think is "fundamentally"
> dangerous... as with anything.
>
> I say this partially tongue-in-cheek.  I think sometimes people don't
> question a thing because they don't want to seem unpopular to the
> group they're speaking to :-).  I think that's wrong.
>
> Then again maybe I'm just paranoid.



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 22:44         ` David Leimbach
@ 2006-06-09 22:46           ` quanstro
  2006-06-09 22:51           ` Latchesar Ionkov
  2006-06-10  0:28           ` Roman Shaposhnick
  2 siblings, 0 replies; 78+ messages in thread
From: quanstro @ 2006-06-09 22:46 UTC (permalink / raw)
  To: 9fans

On Fri Jun  9 17:45:21 CDT 2006, leimy2k@gmail.com wrote:

> Except /sys/src/9/pc/pci.c that says it badly needs to be rewritten.
> Maybe a slightly less Kool-Aid drinking way to approach this would be
> to say "code that needs help is better marked, and there's less of
> that?"

i've never worked on or looked at the source code of any significant
program or kernel that didn't have some functions that either claim 
to need or actually need "massive rewrites".  

> Then again.  I've not personally audited the whole system, and it's
> not clear that I have the qualifications to say that Plan 9's source
> is better than other systems.

going only on the theory that deleted code is debugged code, plan 9
is likely has fewer problems and unintended consequences than unix
clones.

a fundamental mistake of linux-think is that anytime a function is
known to have limitations, fixing those limitations is a priori considered
better than not.  many times, i think this boils down to local optimizations
that result in global pessimization.

> 
> There's a lot of "belief" here that I think is "fundamentally"
> dangerous... as with anything.

belief is important.  where would the catholic church be without it? ☺
if we do not believe that plan 9 is good, we'll all use something else.

- erik

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 22:03       ` Roman Shaposhnick
@ 2006-06-09 22:44         ` David Leimbach
  2006-06-09 22:46           ` quanstro
                             ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: David Leimbach @ 2006-06-09 22:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 6/9/06, Roman Shaposhnick <rvs@sun.com> wrote:
> On Wed, Jun 07, 2006 at 11:05:12PM -0400, Dan Cross wrote:
> > Too bad the example a beginning programmer
> > sees now is the cess pool of open source cruft instead of well-written
> > code.
>
>   And that would be the second most useful thing about Plan 9 -- its
>   source code as a literature for educating oneself how the code is
>   supposed to be written.
>
> Thanks,
> Roman.

Except /sys/src/9/pc/pci.c that says it badly needs to be rewritten.
Maybe a slightly less Kool-Aid drinking way to approach this would be
to say "code that needs help is better marked, and there's less of
that?"

Then again.  I've not personally audited the whole system, and it's
not clear that I have the qualifications to say that Plan 9's source
is better than other systems.

There's a lot of "belief" here that I think is "fundamentally"
dangerous... as with anything.

I say this partially tongue-in-cheek.  I think sometimes people don't
question a thing because they don't want to seem unpopular to the
group they're speaking to :-).  I think that's wrong.

Then again maybe I'm just paranoid.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-08  3:05     ` Dan Cross
  2006-06-08  3:44       ` Joel Salomon
@ 2006-06-09 22:03       ` Roman Shaposhnick
  2006-06-09 22:44         ` David Leimbach
  1 sibling, 1 reply; 78+ messages in thread
From: Roman Shaposhnick @ 2006-06-09 22:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Jun 07, 2006 at 11:05:12PM -0400, Dan Cross wrote:
> Too bad the example a beginning programmer
> sees now is the cess pool of open source cruft instead of well-written
> code.

  And that would be the second most useful thing about Plan 9 -- its 
  source code as a literature for educating oneself how the code is 
  supposed to be written.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-09 21:29     ` Roman Shaposhnick
@ 2006-06-09 21:34       ` andrey mirtchovski
  0 siblings, 0 replies; 78+ messages in thread
From: andrey mirtchovski @ 2006-06-09 21:34 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>   Personally it is so much easier for me to try out new
>   ideas on Plan9 than it is on Linux or anywhere else.

same here, with the small addition that p9p relieves a lot of the
prototyping burdain on Linux too. we got a small system written in p9p
(so it runs on Plan 9 too), got it to be reasonably stable, evaluated
(even by end users) it and now it's off to be rewritten in Linux'
native tongue.

way less painful than it could otherwise have been.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-08  1:39   ` [9fans] quantity vs. quality Lyndon Nerenberg
  2006-06-08  3:05     ` Dan Cross
@ 2006-06-09 21:29     ` Roman Shaposhnick
  2006-06-09 21:34       ` andrey mirtchovski
  1 sibling, 1 reply; 78+ messages in thread
From: Roman Shaposhnick @ 2006-06-09 21:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Jun 07, 2006 at 06:39:42PM -0700, Lyndon Nerenberg wrote:
> Plan 9's "future" is in guiding the UNIX community forward, not in  
> regressing back to what spawned it in the first place.  And I  
> sincerely hope that doesn't happen by having Plan 9 be adopted  
> wholesale by the masses, for that would (sooner or later) see the end  
> of research and innovation for the sake of not breaking all the  
> currently running apps, which is what caused UNIX to start growing  
> mold.

  Bingo! Personally it is so much easier for me to try out new
  ideas on Plan9 than it is on Linux or anywhere else. The size
  of the system and the fact that I can reasonably comprehend
  its bits and pieces without getting mad makes it an ideal
  platform for all sorts of innovation. I hope it will remain
  that way.

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-08 15:29   ` David Leimbach
@ 2006-06-08 15:43     ` jmk
  0 siblings, 0 replies; 78+ messages in thread
From: jmk @ 2006-06-08 15:43 UTC (permalink / raw)
  To: 9fans

On Thu Jun  8 11:30:43 EDT 2006, leimy2k@gmail.com wrote:
> ...
> Why not just run Plan 9 as the guest for that matter.  I do that with
> parallels.  It's no worse than using VNC to alleviate the pain of not
> having a web browser is it?  Except I don't need VNC to do the work.
> 
> I think I'd be happy with this.  I don't see any compelling reasons
> these days to load Plan 9 anywhere but in a virtualized environment
> really (for my use).
> 
> Ron has a very specific need and I wish him luck.
> ...

"There is no problem in computer science that cannot be solved by
another level of indirection." - except performance. Last week I
listened to many talks which wanted to fix the problem of porting
code to supercomputers by adding virtualisation because, well, Ken,
device drivers are hard so let's just go to the mall.

Ron's needs are what are not so indirectly paying for keeping
plan9.bell-labs.com alive.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-08  9:32 ` Lluís Batlle
@ 2006-06-08 15:29   ` David Leimbach
  2006-06-08 15:43     ` jmk
  0 siblings, 1 reply; 78+ messages in thread
From: David Leimbach @ 2006-06-08 15:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 6/8/06, Lluís Batlle <viriketo@gmail.com> wrote:
> Maybe something capable of running an isolated linux box in plan9's
> environment would do the trick, as some people do right now in Linux
> in order to run plan9. Let's say... a 'xen-like-thing' running on
> plan9, for running other kernels over the same hardware.
>
> Of course, I don't plan coding that.

Hmmm, port Plan 9 to L4, run side by side with L4 Linux?

Why not just run Plan 9 as the guest for that matter.  I do that with
parallels.  It's no worse than using VNC to alleviate the pain of not
having a web browser is it?  Except I don't need VNC to do the work.

I think I'd be happy with this.  I don't see any compelling reasons
these days to load Plan 9 anywhere but in a virtualized environment
really (for my use).

Ron has a very specific need and I wish him luck.

>
> 2006/6/8, cej@gli.cas.cz <cej@gli.cas.cz>:
> > > No, we need fresh ideas.  An infinite number of monkeys turning Plan
> > > 9 into Linux is not progress.
> >
> > I agree 100%. Although I would LOVE to have some loonix prgs w/o
> > rebooting to L. Or c++, java, perl, etc...
> > Attract more people to the clean design (and they will, hopefully,
> > rewrite everything that is(isn't;-) worth it...
> > IMHO.
> >
> > ++pac.
> >
> >
>


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-08  7:30 cej
@ 2006-06-08  9:32 ` Lluís Batlle
  2006-06-08 15:29   ` David Leimbach
  0 siblings, 1 reply; 78+ messages in thread
From: Lluís Batlle @ 2006-06-08  9:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Maybe something capable of running an isolated linux box in plan9's
environment would do the trick, as some people do right now in Linux
in order to run plan9. Let's say... a 'xen-like-thing' running on
plan9, for running other kernels over the same hardware.

Of course, I don't plan coding that.

2006/6/8, cej@gli.cas.cz <cej@gli.cas.cz>:
> > No, we need fresh ideas.  An infinite number of monkeys turning Plan
> > 9 into Linux is not progress.
>
> I agree 100%. Although I would LOVE to have some loonix prgs w/o
> rebooting to L. Or c++, java, perl, etc...
> Attract more people to the clean design (and they will, hopefully,
> rewrite everything that is(isn't;-) worth it...
> IMHO.
>
> ++pac.
>
>


^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [9fans] quantity vs. quality
@ 2006-06-08  7:30 cej
  2006-06-08  9:32 ` Lluís Batlle
  0 siblings, 1 reply; 78+ messages in thread
From: cej @ 2006-06-08  7:30 UTC (permalink / raw)
  To: 9fans

> No, we need fresh ideas.  An infinite number of monkeys turning Plan  
> 9 into Linux is not progress.

I agree 100%. Although I would LOVE to have some loonix prgs w/o
rebooting to L. Or c++, java, perl, etc...
Attract more people to the clean design (and they will, hopefully,
rewrite everything that is(isn't;-) worth it...
IMHO.

++pac.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-08  3:44       ` Joel Salomon
@ 2006-06-08  7:03         ` Roman Shaposhnik
  0 siblings, 0 replies; 78+ messages in thread
From: Roman Shaposhnik @ 2006-06-08  7:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Joel Salomon wrote:
> On 6/7/06, Dan Cross <cross@math.psu.edu> wrote:
>> if I want C++, Java, C#, or Ruby, I know where to get them.
>
> You know where to find a standards-compliant C++ compiler?  ☺
Which standard ? After all, 2008(9) one is downright scary :-(

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-08  3:05     ` Dan Cross
@ 2006-06-08  3:44       ` Joel Salomon
  2006-06-08  7:03         ` Roman Shaposhnik
  2006-06-09 22:03       ` Roman Shaposhnick
  1 sibling, 1 reply; 78+ messages in thread
From: Joel Salomon @ 2006-06-08  3:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 6/7/06, Dan Cross <cross@math.psu.edu> wrote:
> if I want C++, Java, C#, or Ruby, I know where to get them.

You know where to find a standards-compliant C++ compiler?  ☺

--Joel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [9fans] quantity vs. quality
  2006-06-08  1:39   ` [9fans] quantity vs. quality Lyndon Nerenberg
@ 2006-06-08  3:05     ` Dan Cross
  2006-06-08  3:44       ` Joel Salomon
  2006-06-09 22:03       ` Roman Shaposhnick
  2006-06-09 21:29     ` Roman Shaposhnick
  1 sibling, 2 replies; 78+ messages in thread
From: Dan Cross @ 2006-06-08  3:05 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Jun 07, 2006 at 06:39:42PM -0700, Lyndon Nerenberg wrote:
> Plan 9's "future" is in guiding the UNIX community forward, not in  
> regressing back to what spawned it in the first place.  And I  
> sincerely hope that doesn't happen by having Plan 9 be adopted  
> wholesale by the masses, for that would (sooner or later) see the end  
> of research and innovation for the sake of not breaking all the  
> currently running apps, which is what caused UNIX to start growing  
> mold.  I would much rather see Plan 9 stay small and mostly ignored,  
> since that's how it will remain agile and pliable.  It's the *ideas*  
> from Plan 9 (e.g. the servers, namespaces) that will help the masses  
> morph their current environment into something suitable for the 21st  
> century.

I concur.  One has to ask the question, *why* does one want to attract
new users to Plan 9?  It would take many man-centuries of effort to get
an environment as rich (for the end-user) as that provided by the
mainstream Unices right now.  And what would be the point?  As many
have noted, it would lead to increased complexity, bloat, and decreased
quality on what is otherwise a pretty clean and high quality system.
Plan 9 would, at that point, cease to be Plan 9 and turn into something
else.  I don't particularly want that, just as I don't really want to
add a lot of new ``features'' to C: if I want C++, Java, C#, or Ruby, I
know where to get them.  Similarly, if I want Unix, I know where to get
it.

Instead, I'd like to go back to basics and use Plan 9 as a clean,
conceptually pure prototype for a new Unix-like system.  Let's take the
good ideas from Plan 9, and a current BSD kernel (probably the FreeBSD
one), a current Unix user-land, an axe, and go to town like Charles did
with SunOS 4 back in the day.  For that matter, throw in some of the
good ideas from VSTa (now FMI/OS?  They seem to be regressing: from
their Wiki's page on `Current Work': ``working on getting switching to
posix style error numbers instead of strings. done''  Great...), EMAS,
TOPS-20, VMS, Multics, QNX, etc.  (Yes! VMS had some good ideas!  Like
a standardized calling convention that made it possible to call modules
from any number of languages!  20 years ago!  Stick THAT in your ELF
and smoke it!)  But in general, you know, look back at history and add
in some of those things that were good ideas and got lost in the sands
of time but which will get reinvented two years from now, badly,
because no one thinks to do any research before starting on their work
anymore.  At least, not in computer science....

Of course, that will never happen, and Geoff is right: it's mostly an
issue with education and defeating incorrect or outdated perceptions (I
get physically ill when I hear about ``efficiency'' these days.  Yeah,
like your GCC extension to pack struct's is *so* much more efficient on
a 3 GHz machine than code to pack it or unpack it from a bytestream.
You spend more time porting it to a new platform than you ever did
writing and debugging once, and running an arbitrary number of times,
the byte packing code).  Too bad the example a beginning programmer
sees now is the cess pool of open source cruft instead of well-written
code.

	- Dan C.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [9fans] quantity vs. quality
  2006-06-08  1:07 ` Latchesar Ionkov
@ 2006-06-08  1:39   ` Lyndon Nerenberg
  2006-06-08  3:05     ` Dan Cross
  2006-06-09 21:29     ` Roman Shaposhnick
  0 siblings, 2 replies; 78+ messages in thread
From: Lyndon Nerenberg @ 2006-06-08  1:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Jun 7, 2006, at 6:07 PM, Latchesar Ionkov wrote:

> I don't think the Plan9 community has the resources (both in  
> numbers and quality) to continue the development. We need fresh blood.

No, we need fresh ideas.  An infinite number of monkeys turning Plan  
9 into Linux is not progress.

The latest copy of login (the Usenix newsletter) has an editorial  
lamenting how UNIX just doesn't fit the current one-user-one-machine  
paradigm we see today, and how we need to re-think how things are  
done in this regard.  Yet Plan 9 has already been doing this for over  
a decade.

Plan 9's "future" is in guiding the UNIX community forward, not in  
regressing back to what spawned it in the first place.  And I  
sincerely hope that doesn't happen by having Plan 9 be adopted  
wholesale by the masses, for that would (sooner or later) see the end  
of research and innovation for the sake of not breaking all the  
currently running apps, which is what caused UNIX to start growing  
mold.  I would much rather see Plan 9 stay small and mostly ignored,  
since that's how it will remain agile and pliable.  It's the *ideas*  
from Plan 9 (e.g. the servers, namespaces) that will help the masses  
morph their current environment into something suitable for the 21st  
century.

--lyndon

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2006-06-15 15:46 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-09  6:01 [9fans] quantity vs. quality cej
  -- strict thread matches above, loose matches on Subject: below --
2006-06-08  7:30 cej
2006-06-08  9:32 ` Lluís Batlle
2006-06-08 15:29   ` David Leimbach
2006-06-08 15:43     ` jmk
2006-06-08  0:53 [9fans] gcc on plan9 geoff
2006-06-08  1:07 ` Latchesar Ionkov
2006-06-08  1:39   ` [9fans] quantity vs. quality Lyndon Nerenberg
2006-06-08  3:05     ` Dan Cross
2006-06-08  3:44       ` Joel Salomon
2006-06-08  7:03         ` Roman Shaposhnik
2006-06-09 22:03       ` Roman Shaposhnick
2006-06-09 22:44         ` David Leimbach
2006-06-09 22:46           ` quanstro
2006-06-09 22:51           ` Latchesar Ionkov
2006-06-09 22:55             ` quanstro
2006-06-09 23:19               ` Latchesar Ionkov
2006-06-09 23:29                 ` quanstro
2006-06-10  1:57                   ` Latchesar Ionkov
2006-06-09 23:51                     ` quanstro
2006-06-10  0:10                       ` Roman Shaposhnick
2006-06-10  2:31                         ` Latchesar Ionkov
2006-06-10  0:45                           ` Roman Shaposhnick
2006-06-10  3:01                             ` Latchesar Ionkov
2006-06-10  0:52                               ` quanstro
2006-06-10  1:04                               ` Roman Shaposhnick
2006-06-10 23:13                             ` Ronald G Minnich
2006-06-11  0:44                               ` quanstro
2006-06-11  5:08                                 ` lucio
2006-06-11 10:09                                   ` quanstro
2006-06-11 12:00                                     ` lucio
2006-06-11 22:59                                       ` quanstro
2006-06-11 23:26                                       ` geoff
2006-06-12  3:45                                         ` Paul Lalonde
2006-06-12 20:16                                           ` Ronald G Minnich
2006-06-12 20:23                                             ` Roman Shaposhnick
2006-06-12 20:56                                               ` Ronald G Minnich
2006-06-12 21:09                                                 ` Victor Nazarov
2006-06-13  0:05                                                 ` Roman Shaposhnik
2006-06-12 21:15                                             ` Francisco J Ballesteros
2006-06-13 12:08                                               ` rog
2006-06-13 16:34                                                 ` Skip Tavakkolian
2006-06-13 21:35                                                   ` "Nils O. Selåsdal"
2006-06-14 22:09                                                 ` Roman Shaposhnick
2006-06-15 15:46                                                   ` Victor Nazarov
2006-06-11  5:42                               ` Russ Cox
2006-06-11 10:08                                 ` quanstro
2006-06-12  1:03                               ` Roman Shaposhnik
2006-06-10 23:05                           ` Ronald G Minnich
2006-06-11  0:00                             ` quanstro
2006-06-10  0:24                       ` andrey mirtchovski
2006-06-10  0:36                         ` quanstro
2006-06-10  2:27                       ` Latchesar Ionkov
2006-06-10  0:23                         ` quanstro
2006-06-10  0:41                           ` Paul Lalonde
2006-06-10  0:59                             ` quanstro
2006-06-10  1:15                               ` Paul Lalonde
2006-06-10  5:19                                 ` Bruce Ellis
2006-06-10  2:51                           ` Latchesar Ionkov
2006-06-10  0:45                             ` quanstro
2006-06-10  3:10                               ` Latchesar Ionkov
2006-06-10  0:53                                 ` quanstro
2006-06-10 23:06                           ` Ronald G Minnich
2006-06-10 23:15                             ` geoff
2006-06-11  2:58                             ` jmk
2006-06-10 23:04                       ` Ronald G Minnich
2006-06-11  0:05                         ` quanstro
2006-06-10 23:03                   ` Ronald G Minnich
2006-06-09 23:25             ` Ronald G Minnich
2006-06-09 23:38               ` David Leimbach
2006-06-09 23:45                 ` andrey mirtchovski
2006-06-09 23:46                 ` Paul Lalonde
2006-06-10 23:03                   ` Ronald G Minnich
2006-06-10 23:02                 ` Ronald G Minnich
2006-06-11  0:12                   ` quanstro
2006-06-11  2:20                     ` Ronald G Minnich
2006-06-11 22:31                   ` David Leimbach
2006-06-10  0:28           ` Roman Shaposhnick
2006-06-09 21:29     ` Roman Shaposhnick
2006-06-09 21:34       ` andrey mirtchovski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).