From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE, MAILING_LIST_MULTI,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [50.116.15.146]) by inbox.vuxu.org (Postfix) with ESMTP id 0D4A323617 for ; Mon, 17 Jun 2024 17:57:47 +0200 (CEST) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 51AFB4281C; Tue, 18 Jun 2024 01:57:40 +1000 (AEST) Received: from mail-ua1-x930.google.com (mail-ua1-x930.google.com [IPv6:2607:f8b0:4864:20::930]) by minnie.tuhs.org (Postfix) with ESMTPS id 060EF427DE for ; Tue, 18 Jun 2024 01:57:33 +1000 (AEST) Received: by mail-ua1-x930.google.com with SMTP id a1e0cc1a2514c-80b8689775fso1517920241.1 for ; Mon, 17 Jun 2024 08:57:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ccc.com; s=google; t=1718639852; x=1719244652; darn=tuhs.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sMzPKQjkXBFTZw8LaPssrFdXdjPzJ2kg2OdLBeI57Uw=; b=Ji+mxE/ZUPvZh5LztMj1wY6y3pODx2cVBZ1tsYOF1YOi9nV/NWh9G2aJTXixwsAP6A 9LuQ9F9FwHJezXVxmM3Rumr0HXnei6iV9wE4e7E2A5kWUqPV9tPgAHMPZBD31PjVYKKB cQEvZ5agyI0kf6+Fy3921cXQBiLvZUUeo2vZw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718639852; x=1719244652; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sMzPKQjkXBFTZw8LaPssrFdXdjPzJ2kg2OdLBeI57Uw=; b=gwTSGXzo74oy3P60SzKoJXtkV9Jb6BMxp5vxE9kDENp36qU8rJ/SANCX929KgbJLjb 4jwu+nBWs7/GhCVM3AnuUXArGU/+fFunb9onuuQkT5Hi12F7HeyAzVx09/TkdRZVXIXn mqi5r880di9+Vi3gKVpx4BcE8ABQNNllDSAcOCuTvVaRkeNNQquO8d48b+xuuPn+QJiJ NZ6fStq6Xvhu9M910qOlkw1Tv9aAbHwIiLjcCgrTCArnxPfn+FRHvyzKxv4pwficYLFU LxhTBpAAfgiMQ5juckwwoq7U3wpxkKxWB06rg9D8/iyXJUzKxCPCXT3lmHEWGHjNc4pV H3jg== X-Forwarded-Encrypted: i=1; AJvYcCU8x4GAc3zYgUsZT9kjSiHZgqzj7qNO55+I3xOibt3NXSayR2g7+8aPUOXn0w1wm1CBTmLn7D5FdHuPQVPq X-Gm-Message-State: AOJu0Yy+HxGqdc7aP77M0Ar8utlK9zPMXUKklBC2Iyl4NxgM0+Nb8yaE lG+GCMPbrAqwEwhrjQ01uGaTxa40vV/Yv5BGBGssFE5XL88Bw9kGd+mbMH6uIsN6VRgR2/oHXR6 uda2JmkjlkpFsK6yofggFK8vbQi8F+HNcLEnFR3N65KvhAXk8SA== X-Google-Smtp-Source: AGHT+IFjzsc1NfrdHu/AJFDkwi3dSaNk4PXzQbcyx8PYayEuWobw4yeKaoEJ96C133CzOdeoziDRF8nDMHxqquiIdUU= X-Received: by 2002:a05:6122:4590:b0:4ec:fc9b:a0bc with SMTP id 71dfb90a1353d-4ee3f06944amr9352577e0c.4.1718639851518; Mon, 17 Jun 2024 08:57:31 -0700 (PDT) MIME-Version: 1.0 References: <20240617004816.C28BC18C098@mercury.lcs.mit.edu> <20240617010532.GC12821@mcvoy.com> <653E15D7-DD66-414C-94F3-A74B4EE3DD10@iitbombay.org> <85C11B5C-7AE0-40F6-A348-1771AB9F8B09@iitbombay.org> In-Reply-To: <85C11B5C-7AE0-40F6-A348-1771AB9F8B09@iitbombay.org> From: Clem Cole Date: Mon, 17 Jun 2024 11:56:55 -0400 Message-ID: To: Bakul Shah Content-Type: multipart/alternative; boundary="000000000000d8d67e061b180569" Message-ID-Hash: G42TJGH5SXW2ICMZVT2GFOSYWNBDAIOW X-Message-ID-Hash: G42TJGH5SXW2ICMZVT2GFOSYWNBDAIOW X-MailFrom: clemc@ccc.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Noel Chiappa , The Unix Heritage Society mailing list X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: Version 256 of systemd boasts '42% less Unix philosophy' The Register List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --000000000000d8d67e061b180569 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jun 17, 2024 at 1:51=E2=80=AFAM Bakul Shah via TUHS = wrote: > Forgot to mention LOCUS, which was the only distributed Unix compatible O= S > I am aware of. To anyone who has user/implementer experience, I would lov= e > to hear what worked well, what didn't, what was easy to implement, what w= as > very hard and what you wished was added to it. > Jerry and Bruce's book is the complete reference: https://www.amazon.com/Distributed-System-Architecture-Computer-Systems/dp/= 0262161028 There were basically 3/4 versions... the original version of the PDP 11 which is the SOSP paper, which morphed to include a VAX at UCLA; IBM's AIX/370 and AIX/PS2 which included TCF (Transparent Computing Facility), and LCC's TNC Transparent Networking Computing "product" which were the 14 core technologies used to built it. Part of them landed in other systems from Tru64, HPUX, the Paragon and even a later a Linux implementation (which sadly was done on the V2 kernel so was lost when Linus did not understand it). What worked well was different flavors of the DFS and the later core idea of the VPROCS layer which I sorely miss, which allowed process migration - which w worked well and boy did I miss later in my career. Admin of a Locus based system was a dream because it was just one system for up to 4096 nodes in a Paragon. It also means you could migrate processes off a node, take the node down, reboot/change and bring it back. Very cool. After the first system was installed, adding a node was trivial, by the way. You booted the node, "joined" the cluster, and were up. AIX used file replication to then build the local disks as needed. BTW: "checkpointing" was a freebie -- you just migrated the file to a disk. Mixing ISA like the 370 and PS/2 was a mixed bag -- I'll let Charlie comment. With TNC we redid that model a bit, I'm not sure we ever got it 100% right. The HP-UX version was probably the best. The biggest implementation issue is that UNIX has too many different namespaces with all sorts of rules that are particular to each. For all of the concept of "everything is a file," - when you start to try to bring it together, you discover new and werid^H^H^H^H^Hintersting name spaces from System V IPC to signals to FIFOs and Name Pipes (similar but different). It seemed like everything we looked, we would find another NS we needed to handle, and when we started to try to look at non-UNIX process layers, it got even stranger. The original UNIX protection model is a tad weak, but most people had started to add ACLs, and POSIX was in the throughs of standardizing them -- so we based it on an early POSIX proposal (mostly based on HP-UX since they had them before the others did). To be more specific, the virtual process layer (VPROC) attempted to do what VFS had done for the FS layer to the core kernel. If you look at both the original 2 Locus schemes, process control was ad hoc and thus very messy. LCC realized if we were going to succeed, we needed to make that cleaner. But that still took major surgery - although, like the CFS layer, things were a lot clearer once done. Bruce, Roman, and I came up with VPROCs. BTW: one of the cool parts of VPROC is like VFS. It conceptually made it possible to have other process models. We did a prototype for OS/2 running inside of the OSF uK and were trying to get a contract from DEC to do it to Tru64 and adding VMS before we got sold (we had already developed CFS for DEC as part of Tru64 - which TNC's Cluster File System). Truth is, cheap VMs killed the need for this idea, but it worked fairly well. After the core VPROCs layer, the hardest thing was distributed shared memory (DSM) and the distributed lock manager (DLM). DSM was an example that offered pure transparency in operation, *i.e.,* test and set worked (operationally) correctly across the DSM, but it was not "speed transparent." But if you rewrote to use DLM, then you could get full transparency and speed. The DLM is one of the TNC technology which lives on today. It ended up in a number of systems - Oracle wrote their own based on the specs for the DEC DLM we built for the CFS for Tru64 (which is from TNC). I believe a few other folks used it. It was in OSF's DCE, and ISTR Microsoft picked it up. So a good question is if TNC was so cool, why did Beowulf (a real hack in comparison) stick around and TNC die? Well, a few things. LCC/HP did not open-source the code until it was too late. So Beowulf, which was around, was what folks (like me) used to build big scientific clusters. And while Popek was "right," -- it takes something like Locus/TNC to make a cluster fully transparent. Beowulf ignored the seams and i the end, that was "good enough." But it makes setup and admin a PITA, and the program needs to be careful -- the dragons are all over the place. So, when I went to Intel, I was the Architect of Cluster Ready, which defined away many of those seams and then provided tools to test for them and help you admin. Tools like the Cluster Checker and the whole ClusterReady program would not be needed if TNC had "stuck," and I think clusters, in general, a cluster of small computers on a LAN, not just clusters on a high-speed/special interconnect like a supercomputer, would be more available today. Clem =E1=90=A7 --000000000000d8d67e061b180569 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Mon, Jun 17, 2024 at 1:51= =E2=80=AFAM Bakul Shah via TUHS <tuhs@t= uhs.org> wrote:
Forgot to mention LOCUS, which was the only distributed Unix co= mpatible OS I am aware of. To anyone who has user/implementer experience, I= would love to hear what worked well, what didn't, what was easy to imp= lement, what was very hard and what you wished was added to it.
Jerry and Bruce's book is the complete reference:=C2=A0= =C2=A0https://www.amazon.com/Distributed-Syst= em-Architecture-Computer-Systems/dp/0262161028

There were basically 3/4 versions...=C2=A0 the original version of the = PDP 11 which is the SOSP paper, which morphed to include a VAX at UCLA; IBM= 's AIX/370 and AIX/PS2 which included TCF (Transparent Computing Facili= ty), and LCC's TNC Transparent Networking Computing "product"= which were the=C2=A014 core technologies used to built it.=C2=A0 Part of t= hem landed in other systems from Tru64, HPUX, the Paragon and even a later = a Linux implementation (which sadly was done on the V2=C2=A0 kernel so was = lost when Linus did not understand=C2=A0it).

What wo= rked well was different flavors of the DFS and the later core idea of the V= PROCS layer which I sorely miss,=C2=A0which allowed process migration - whi= ch w worked well and boy did I miss later in my career.=C2=A0 Admin of a Lo= cus based system was a dream because it was just one system for up to 4096 = nodes in a Paragon.=C2=A0 =C2=A0It also means=C2=A0you could migrate proces= ses off a node, take the node down, reboot/change and bring it back. Very c= ool.=C2=A0 After the first system was installed, adding a node was trivial,= by the way.=C2=A0 You booted the node, "joined" the cluster, and= were up. AIX used file replication to then build the local disks as needed= .=C2=A0 =C2=A0 BTW: "checkpointing" was a freebie=C2=A0-- you jus= t migrated the file to a disk.

Mixing ISA like the 370= and PS/2=C2=A0 was a mixed bag -- I'll let Charlie comment.=C2=A0 =C2= =A0With TNC we redid that model a bit, I'm not sure we ever got it 100%= right.=C2=A0 The HP-UX version was probably the best.

The biggest implementation issue is that UNIX has too many different names= paces with all sorts of rules that are particular to each.=C2=A0 For all of= the concept of "everything is a file," - when you start to try t= o bring it together, you discover new and werid^H^H^H^H^Hintersting name sp= aces from System V IPC to signals to FIFOs and Name Pipes (similar but diff= erent).=C2=A0 It seemed like everything we looked, we would find another NS= we needed to handle, and when we started to try to look at non-UNIX proces= s layers, it got even stranger.=C2=A0 The original UNIX protection model is= a tad weak, but most people had started to add ACLs, and POSIX was in the = throughs of standardizing them -- so we based it on an early POSIX proposal= (mostly based on HP-UX since they had them before the others did).

To be more specific, the virtual process layer (VPROC) attemp= ted to do what VFS had done for the FS layer to the core kernel.=C2=A0 =C2= =A0If you look at both the original 2 Locus schemes, process control was ad= hoc and thus very messy.=C2=A0 =C2=A0LCC realized if we were going to succ= eed, we needed to make that cleaner.=C2=A0 But that still=C2=A0took major s= urgery - although, like the CFS layer, things were a lot clearer once done.= =C2=A0 =C2=A0Bruce, Roman, and I came up with VPROCs.=C2=A0 BTW: one of the= cool parts of VPROC is like VFS. It conceptually made it possible to have = other process models. We did a prototype for OS/2 running inside of the OSF= uK and were trying to get a contract from DEC to do it to Tru64=C2=A0and a= dding VMS before we got sold (we had already developed CFS for DEC as part = of Tru64 - which TNC's Cluster File System). Truth is, cheap VMs killed= the need for this idea, but it worked fairly well.=C2=A0 =C2=A0
After the core VPROCs layer, the hardest thing was distributed s= hared=C2=A0memory (DSM) and the distributed lock manager=C2=A0(DLM).=C2=A0 = =C2=A0DSM was an example that offered pure transparency in operation, i.= e., test and set worked (operationally) correctly across the DSM, but i= t was not "speed transparent."=C2=A0 But if you rewrote to use DL= M, then you could get full transparency and speed.=C2=A0 The DLM is one of = the TNC technology which lives on today.=C2=A0 It ended up in a number=C2= =A0of systems - Oracle wrote their own based on the specs for the DEC DLM w= e built for the CFS for Tru64 (which is from TNC). I believe a few other fo= lks used it.=C2=A0 It was in OSF's DCE, and ISTR Microsoft picked it up= .

So a good question is if TNC was so cool, why did Be= owulf (a real hack in comparison) stick around and TNC die?=C2=A0 =C2=A0Wel= l, a few things.=C2=A0 LCC/HP did not open-source the code until it was too= late.=C2=A0 So Beowulf, which was around, was what folks (like me) used to= build big scientific clusters. And while Popek was "right," -- i= t takes something like Locus/TNC to make a cluster fully transparent.=C2=A0= Beowulf ignored the seams and i the end, that was "good enough."= =C2=A0 =C2=A0But it makes setup and admin a PITA, and the program needs to = be careful -- the dragons are all over the place.=C2=A0So, when I went to I= ntel, I was the Architect of Cluster Ready, which defined away many of thos= e seams and then provided tools to test for them and help you admin.
<= div class=3D"gmail_default" style=3D"font-family:arial,helvetica,sans-serif= ">
Tools like the Cluster Checker and the whole ClusterReady pr= ogram would not be needed if TNC had "stuck," and I think cluster= s, in general, a cluster of small computers on a LAN, not just clusters on = a high-speed/special interconnect like a supercomputer, would be more avail= able today.


Clem

3D"==E1=90=A7
--000000000000d8d67e061b180569--