From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 19 Aug 2004 21:30:49 -0600 From: ron minnich To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] datakit In-Reply-To: <20040820115259.12e1c14c@garlic.apnic.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: geoff@collyer.net Topicbox-Message-UUID: d74d3052-eacd-11e9-9e20-41e7f4b1d025 I don't know, Geoff, having seen all the failed attempts at putting 'reliable transport' into the network itself (including ATM, HIPPI, HIPPI-800, GSN, Quadrics, Myrinet, SCI, Infiniband, ...) I've become a big fan of dumb networks like Ethernet. All that fancy stuff works great in the small, fails in the large, and boy oh boy ... do you really want someone to come to you 3 months from now and say "what's this huge block of zeros in my data file?". I don't. We had a network here (HIPPI-800) that was super-reliable ... on 2 machines. With X thousand interfaces all going at once, you got a bad packet once every 15 mins. Oops. Took three months to find out that was happening. Software now covers for that problem. Every new network does this: - we're reliable! count on it! Just push the bits and we'll take care of it! - what errors? We're not seeing them (oh, wait, we're not LOOKING for them, oops -- yes, this really happens!) - well, ok, you're using the network wrong - well, ok, it has bugs, but you're not seeing them -- it's your application - oops, you're seeing bugs? we never simulated this scenario. Gosh, maybe there is a problem. - there's a problem. Fixed in next hardware release - there's a problem in the new hardware release - (final phase) Our latest code release detects and corrects any errors in the network! See: NFS, from '86 to '91 (everyone remember patching SunOS kernels to turn on udpcksum?) See: ATM, any time If I assume the network is not 100% reliable, I will write software that thinks that way, and I won't get bitten when my "reliable" network with a 1e-14 BER wrecks some data. The number you need? The sandia guys like their ASCI Red network with its 1e-21 BER. What did datakit do? I know nothing I've ever used can do that 1e-21. The Red Storm network might, however. ron p.s. the Quadrics and Infiniband guys, who are all Very Smart People, will beg to disagree with me about listing their networks above, but I will in turn continue to disagree with them. But maybe the Infiniband guys are right -- I'll believe it when I see it. So it goes. The Myrinet and HIPPI-800 and SCI and ATM (and, actually, Ethernet) guys used to believe they could solve all the problems in the network, but last time I looked they don't believe that any more. Software continues to guarantee hardware reliability. TCP r00lz.