From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: tuhs-bounces@minnie.tuhs.org X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,LOTS_OF_MONEY,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=ham autolearn_force=no version=3.4.1 Received: from minnie.tuhs.org (minnie.tuhs.org [45.79.103.53]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id 101b9204 for ; Tue, 4 Sep 2018 11:48:06 +0000 (UTC) Received: by minnie.tuhs.org (Postfix, from userid 112) id CC937A1AA5; Tue, 4 Sep 2018 21:48:05 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id A2CA1A1A29; Tue, 4 Sep 2018 21:47:44 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=fail reason="key not found in DNS" (0-bit key; unprotected) header.d=kev009.com header.i=@kev009.com header.b=sDw5kg2M; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id 13539A1A29; Tue, 4 Sep 2018 21:47:42 +1000 (AEST) Received: from mail-it0-f65.google.com (mail-it0-f65.google.com [209.85.214.65]) by minnie.tuhs.org (Postfix) with ESMTPS id 80A9AA1A26 for ; Tue, 4 Sep 2018 21:47:41 +1000 (AEST) Received: by mail-it0-f65.google.com with SMTP id d10-v6so4491725itj.5 for ; Tue, 04 Sep 2018 04:47:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kev009.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=syr4zKno/CrjzU9rumo6oF2KDFD3llDNlukrlNoo1KY=; b=sDw5kg2MHdTpR+GH2b1tV9DCrDuuvFvc72MX01GnUzZ+Q9AaNW4eEXBLtjn/5IiZ05 6on7agVwrDUXXZvx/hKC8ht8wzv77PimNIa/Runs0Eo3p3+TH+uG5Dfy6JFTugp91V48 BJJUwdLcxOVMb+R3AFfANpxosXpH3GO6gIi04= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=syr4zKno/CrjzU9rumo6oF2KDFD3llDNlukrlNoo1KY=; b=NCzjRW+XoUcclMk0cE/EH8rhuv6KmRPzc01Xk/npklucw3J8wq1n7Vdioae/xBNPlP pWaNrBq5/j83qsKWJymbReysNJe1muNCMDfTcAzp7J7aweonEznDTLbPfuU6OIDLBF9o 1J8iOAoWXR7klNCjw38zD0fOVfdfJwDmIxHNHBZiQeRZmyvZAArsCuow5sEmYUCiDtcb 3nhFkxG/a7YG5jt/ePRL9QbwDsV9NTpzG3pUgDjNtB1yi9aWmuALMPTElvJRTof5M1Fn xUzStHuKQZJR1X7CqAVZ5wkvK65KxgVDVbHm13gABnLJI/vrlMjzU4s4fvtCs7MUTbqP rUSQ== X-Gm-Message-State: APzg51B+0sYC6GYsCZ4fpkB3mxjtovWc/oD2j/Jrln/L9nGZWX1F1LXu lbfCCVU/9NuIkCd/o+KXjc0t6AfWfP0P9IVEPh65MaEGbD1nSA== X-Google-Smtp-Source: ANB0Vdb9pcmp0kAKpLOyZMIFEdMN0J9/Wmqa3ywsDZ88k40hnKJ5vEVIjv5mEoqh+tv03HGozEshXPwtZ4ZEVWPrpdQ= X-Received: by 2002:a24:45a4:: with SMTP id c36-v6mr2757524itd.145.1536061660799; Tue, 04 Sep 2018 04:47:40 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:1cc3:0:0:0:0:0 with HTTP; Tue, 4 Sep 2018 04:47:39 -0700 (PDT) In-Reply-To: <20180902194301.GA22518@thunk.org> References: <20180830213407.6DC4718C0A6@mercury.lcs.mit.edu> <20180831213451.r7LAj%ca6c@bitmessage.ch> <20180831215854.GB28971@mcvoy.com> <7ed51612-82d7-90ca-ceaf-37b0c869ff93@kilonet.net> <20180901221933.GA2214@thunk.org> <20180902194301.GA22518@thunk.org> From: Kevin Bowling Date: Tue, 4 Sep 2018 04:47:39 -0700 Message-ID: To: "Theodore Y. Ts'o" Content-Type: text/plain; charset="UTF-8" Subject: Re: [TUHS] SunOS code? X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: TUHS main list Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" On Sun, Sep 2, 2018 at 12:43 PM, Theodore Y. Ts'o wrote: > On Sat, Sep 01, 2018 at 10:05:06PM -0700, Kevin Bowling wrote: >> >> Sorry this is just bogus about being weak compared to Solaris. Are >> you looking back with rosy glasses or have you scanned the code in the >> past couple years? I have and there is nothing particularly special >> about Solaris internals here or elsewhere. > > I haven't looked at Solaris code; I had just *assumed* that if they > were selling million dollar E10k's, they would have had NUMA support > at *least* as good as SGI's Irix. And it would have been an excuse > for their pathetic performance on UP and 2-4 SMP systems. One would hope so, but that was the strategy that got them eaten by a grue. Another funny anecdote about this aloofness.. Linux on sparc64 uses the Relaxed Memory Order mode that the hardware offers . Solaris.. Total Store Order. There are tons of things like this in the code that blow my mind. I would have been pissed if I were on the hardware side of SPARC. > >> Keep in mind IBM wants to sell RockHoppers and E980s (4 drawers, 16 >> sockets, 768 threads) for dedicated Linux use which have similar >> north/south and east/west off chip networks. They have a lot of very >> talented people on the firmware, kernel, compilers to make these >> things work fast, including Paul. >> ... >> Where you start going beyond Linux-like NUMA IMO is when you get >> Irix-like features of page copying, migration, and multiple advanced >> placement policies. > > One thing to consider is that IBM really only cared about optimizing > hardware for DB2, Oracle, and Webshpere. That's one of the reason why > you didn't see much in the way of innovative file system work, ala > ZFS. There was no business justification for pouring 100+ engineer > years to develop a next-generation file systesm --- and they had > already done that once already for GPFS, a cluster file system. As > far as local disk file system was concerned, the only real business > value it had was to serve as a program loader for DB2 and Websphere. :-) > > (I'm exagerating a little for effect, but *only* a little.) Hmm, I think they've been pretty earnest at wanting to be 2+ years ahead of the general market with POWER for as long as I can see, lots of HPC money has been subsidizing that. Depends on the workload but bus and memory bandwidth right now with PCIe Gen4 and NvLink can really cut down on server sprawl. I've met with the GM/chief architect and they see OpenPOWER positioned as a full frontal competitor to Intel Xeon. I'm fairly disappointed in my contemporaries for not recognizing the value of a completely open source firmware and on chip controller stack; especially after the recent snafu where Intel changed the microcode license to disallow benchmarks and claimed it was an accident. Your statements make sense to me with respect to AIX, as Linux has been the main effort since the 2000s. GPFS looks neat, I wish it were open or at least internals documented well enough to study the implementation academically. > > So as far as NUMA was concerned, there was almost certainly not have > been much perceived business value in having sophisticated > auto-migration for arbitrary workloads in the kernel. Something basic > which was good enough for Oracle, DB2, etc., was all that would be > needed. (And if you needed to hire consultants from IBM Global > Services to mind-meld with the configuration documentation in order to > get the best out of your Rockhopper.... well, shucks, darn. :-) That's probably the dirty little secret. It's long been profitable to carefully plan software interrupt handlers, user threads, and memory allocation even on pedestrian servers if they are running a fixed function. I guess Google's Borg and the new workalikes could do semi-automagic things with cgroups these days. There is evidence of people getting pretty crazy with it when we see things like Intel cache allocation features. > At IBM the business people really did make the funding decisions of > what to work on. ZFS could have never happened at IBM because no one > would have thought that a even a tiny number of IBM's current or > potential customer base would abandon AIX or Linux and switch to > Solaris, or buy Sun hardware instead of IBM hardware --- just for the > sake of ZFS. And that's how decision-makers at IBM really thought. > (And to be fair to those decision-makers, IBM is still in business as > a free-standing business --- and Sun is not.) Agreed, one of these companies is doing pretty well with a fat dividend yield, that other has basically been dismantled for all but a couple remaining desirable platform control points like Java and MySQL. Many things in tech are happy accidents and a small number of motivated people at the right place and time. A Sun engineer admitted on some video I've seen that the green light was really given for ZFS because they got stumped by some UFS bugs.. once enough of ZFS was written to test the end to end checksumming features they found out some of these heisenbugs were LSI HBA and disk firmware issues :o) Surveying some of these filesystems.. JFS2 is a decent, nowhere near the capabilities of ZFS but even today it's not in dire need of replacement.. I suspect another issue complementary to your point is the standalone storage business is many $B of revenue. ESS/DS8000 and the like are preferred revenue. IBM and HP were more in the SAN game than Sun and SGI who let the customers configure systems themselves be used as storage (Sun was using VxFS for a long time, SGI had some CXFS things IIRC). Tru64 had a pretty interesting filesystem on paper, curious if you ever looked at its design since they open sourced it. Regards, Kevin