From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE, MAILING_LIST_MULTI,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [IPv6:2600:3c01:e000:146::1]) by inbox.vuxu.org (Postfix) with ESMTP id 1CFED23863 for ; Mon, 17 Jun 2024 18:00:58 +0200 (CEST) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id E1A644324F; Tue, 18 Jun 2024 02:00:53 +1000 (AEST) Received: from mail-ua1-x934.google.com (mail-ua1-x934.google.com [IPv6:2607:f8b0:4864:20::934]) by minnie.tuhs.org (Postfix) with ESMTPS id 394134324E for ; Tue, 18 Jun 2024 02:00:50 +1000 (AEST) Received: by mail-ua1-x934.google.com with SMTP id a1e0cc1a2514c-80b7699abcaso1242640241.3 for ; Mon, 17 Jun 2024 09:00:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ccc.com; s=google; t=1718640049; x=1719244849; darn=tuhs.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=rZ1RuW4AYUi08sNYJqD6av7R5RfGeakNH3Gt6zANE1o=; b=IcLl0YNJwipJbr3zE7S+sLAcBbwVWgh6/SlwDi0bQ9zD4P24Sj2GdxUHOgDRBL5Pdf NVHzv80MNNzf+K5xEgjwOursY/ypf4GwSI0l8F/SJ6IfWUNyl7ySqyiEIfyZzG009uPG UUHyWIqlCAJxHjT//T/UAseVBrGExXgLATunA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718640049; x=1719244849; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rZ1RuW4AYUi08sNYJqD6av7R5RfGeakNH3Gt6zANE1o=; b=nTzeI44NJ3CKmLkInRtDAUE3gpU+ArP101fz51rIPnAXTn8qayfc0ZqKkC+sqKbo1d 9qP9yqci/brDWlsiec8YxM6ND5vvRY07hkC711OSxoqh94TXe6BXmWe06S9Nm3O0GEE2 VZPuSWODOqVfzqJvKmZ70ExgFbBLeihnPXPTvFU8MViK2oNl8O17wygwH/NMD0p3KJK0 8ojhCDyKYpqNIQBxQ0tzI6SiJSxTq59H4EQ+1AQsxu5JnawLiZyHaxbPX40XTzPxV75m heRnX0P885ifdlKMyp4x6jNq/bGSg3jBX3p6SX4e2r9GQ2+s81Ena2clrb+dkV3YD8kJ cJdA== X-Forwarded-Encrypted: i=1; AJvYcCW2SOrLDp8k/dj1pYOzJ+FQvMXVHiTmr6PwoCKp3eLsv5l+K/X6mr5Tp8x4O/232Axtz/KhvuvX3BN/lbkU X-Gm-Message-State: AOJu0Yxx2jDkvyHXAM+PZ8e8LLJ8L95FpdR7QDxBlYZBds5ThE6MHANI pabk9wUPve0llSXoVJ2PlEdYxLQSWIrJuSl+xcXZYGIYJcqvVsB5BgObrJV1tiVGEm+u2xCLmRd sYBbk5tcOlIcdc67JOVX7+ScJqBrWP+yVeab5 X-Google-Smtp-Source: AGHT+IEcMjT68qxjh3yJMA5LHkNraGQ9wKVEy5EdZ8okHpQUq5r85eE+V5rKoUczp7+PcO5JGHd3wvAixM/qthg0KzE= X-Received: by 2002:a05:6102:a0b:b0:48d:9260:30aa with SMTP id ada2fe7eead31-48dae3e4832mr8206001137.34.1718640048952; Mon, 17 Jun 2024 09:00:48 -0700 (PDT) MIME-Version: 1.0 References: <20240617004816.C28BC18C098@mercury.lcs.mit.edu> <20240617010532.GC12821@mcvoy.com> <653E15D7-DD66-414C-94F3-A74B4EE3DD10@iitbombay.org> <85C11B5C-7AE0-40F6-A348-1771AB9F8B09@iitbombay.org> In-Reply-To: From: Clem Cole Date: Mon, 17 Jun 2024 12:00:13 -0400 Message-ID: To: Bakul Shah Content-Type: multipart/alternative; boundary="0000000000009d71c3061b1811e7" Message-ID-Hash: AREPUSMW2FLP6GC3VGG7KNEO4UVQSB6E X-Message-ID-Hash: AREPUSMW2FLP6GC3VGG7KNEO4UVQSB6E X-MailFrom: clemc@ccc.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Noel Chiappa , The Unix Heritage Society mailing list X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: Version 256 of systemd boasts '42% less Unix philosophy' The Register List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --0000000000009d71c3061b1811e7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable typo... like the VFS layer (not CFS layer) =E1=90=A7 On Mon, Jun 17, 2024 at 11:56=E2=80=AFAM Clem Cole wrote: > > > On Mon, Jun 17, 2024 at 1:51=E2=80=AFAM Bakul Shah via TUHS wrote: > >> Forgot to mention LOCUS, which was the only distributed Unix compatible >> OS I am aware of. To anyone who has user/implementer experience, I would >> love to hear what worked well, what didn't, what was easy to implement, >> what was very hard and what you wished was added to it. >> > Jerry and Bruce's book is the complete reference: > https://www.amazon.com/Distributed-System-Architecture-Computer-Systems/d= p/0262161028 > > There were basically 3/4 versions... the original version of the PDP 11 > which is the SOSP paper, which morphed to include a VAX at UCLA; IBM's > AIX/370 and AIX/PS2 which included TCF (Transparent Computing Facility), > and LCC's TNC Transparent Networking Computing "product" which were the 1= 4 > core technologies used to built it. Part of them landed in other systems > from Tru64, HPUX, the Paragon and even a later a Linux implementation > (which sadly was done on the V2 kernel so was lost when Linus did not > understand it). > > What worked well was different flavors of the DFS and the later core idea > of the VPROCS layer which I sorely miss, which allowed process migration = - > which w worked well and boy did I miss later in my career. Admin of a > Locus based system was a dream because it was just one system for up to > 4096 nodes in a Paragon. It also means you could migrate processes off = a > node, take the node down, reboot/change and bring it back. Very cool. > After the first system was installed, adding a node was trivial, by the > way. You booted the node, "joined" the cluster, and were up. AIX used fi= le > replication to then build the local disks as needed. BTW: > "checkpointing" was a freebie -- you just migrated the file to a disk. > > Mixing ISA like the 370 and PS/2 was a mixed bag -- I'll let Charlie > comment. With TNC we redid that model a bit, I'm not sure we ever got i= t > 100% right. The HP-UX version was probably the best. > > The biggest implementation issue is that UNIX has too many different > namespaces with all sorts of rules that are particular to each. For all = of > the concept of "everything is a file," - when you start to try to bring i= t > together, you discover new and werid^H^H^H^H^Hintersting name spaces from > System V IPC to signals to FIFOs and Name Pipes (similar but different). > It seemed like everything we looked, we would find another NS we needed t= o > handle, and when we started to try to look at non-UNIX process layers, it > got even stranger. The original UNIX protection model is a tad weak, but > most people had started to add ACLs, and POSIX was in the throughs of > standardizing them -- so we based it on an early POSIX proposal (mostly > based on HP-UX since they had them before the others did). > > To be more specific, the virtual process layer (VPROC) attempted to do > what VFS had done for the FS layer to the core kernel. If you look at > both the original 2 Locus schemes, process control was ad hoc and thus ve= ry > messy. LCC realized if we were going to succeed, we needed to make that > cleaner. But that still took major surgery - although, like the CFS laye= r, > things were a lot clearer once done. Bruce, Roman, and I came up with > VPROCs. BTW: one of the cool parts of VPROC is like VFS. It conceptually > made it possible to have other process models. We did a prototype for OS/= 2 > running inside of the OSF uK and were trying to get a contract from DEC t= o > do it to Tru64 and adding VMS before we got sold (we had already develope= d > CFS for DEC as part of Tru64 - which TNC's Cluster File System). Truth is= , > cheap VMs killed the need for this idea, but it worked fairly well. > > After the core VPROCs layer, the hardest thing was distributed > shared memory (DSM) and the distributed lock manager (DLM). DSM was an > example that offered pure transparency in operation, *i.e.,* test and set > worked (operationally) correctly across the DSM, but it was not "speed > transparent." But if you rewrote to use DLM, then you could get full > transparency and speed. The DLM is one of the TNC technology which lives > on today. It ended up in a number of systems - Oracle wrote their own > based on the specs for the DEC DLM we built for the CFS for Tru64 (which = is > from TNC). I believe a few other folks used it. It was in OSF's DCE, and > ISTR Microsoft picked it up. > > So a good question is if TNC was so cool, why did Beowulf (a real hack in > comparison) stick around and TNC die? Well, a few things. LCC/HP did n= ot > open-source the code until it was too late. So Beowulf, which was around= , > was what folks (like me) used to build big scientific clusters. And while > Popek was "right," -- it takes something like Locus/TNC to make a cluster > fully transparent. Beowulf ignored the seams and i the end, that was "go= od > enough." But it makes setup and admin a PITA, and the program needs to = be > careful -- the dragons are all over the place. So, when I went to Intel, = I > was the Architect of Cluster Ready, which defined away many of those seam= s > and then provided tools to test for them and help you admin. > > Tools like the Cluster Checker and the whole ClusterReady program would > not be needed if TNC had "stuck," and I think clusters, in general, a > cluster of small computers on a LAN, not just clusters on a > high-speed/special interconnect like a supercomputer, would be more > available today. > > > Clem > > =E1=90=A7 > --0000000000009d71c3061b1811e7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
typo...=C2=A0 like the VFS layer (not CFS layer)
3D""=E1=90=A7

On Mon, Jun 17, 2024= at 11:56=E2=80=AFAM Clem Cole <clemc@c= cc.com> wrote:


On Mon, Jun 17, 2024= at 1:51=E2=80=AFAM Bakul Shah via TUHS <tuhs@tuhs.org> wrote:
Forgot to mention LOCUS, which was th= e only distributed Unix compatible OS I am aware of. To anyone who has user= /implementer experience, I would love to hear what worked well, what didn&#= 39;t, what was easy to implement, what was very hard and what you wished wa= s added to it.
Jerry and Bruce's book is t= he complete reference:=C2=A0=C2=A0https://www.amazon.com/Distributed-System-Architecture-Computer-Syst= ems/dp/0262161028

There were basically 3/4 = versions...=C2=A0 the original version of the PDP 11 which is the SOSP pape= r, which morphed to include a VAX at UCLA; IBM's AIX/370 and AIX/PS2 wh= ich included TCF (Transparent Computing Facility), and LCC's TNC Transp= arent Networking Computing "product" which were the=C2=A014 core = technologies used to built it.=C2=A0 Part of them landed in other systems f= rom Tru64, HPUX, the Paragon and even a later a Linux implementation (which= sadly was done on the V2=C2=A0 kernel so was lost when Linus did not under= stand=C2=A0it).

What worked well was different flavors= of the DFS and the later core idea of the VPROCS layer which I sorely miss= ,=C2=A0which allowed process migration - which w worked well and boy did I = miss later in my career.=C2=A0 Admin of a Locus based system was a dream be= cause it was just one system for up to 4096 nodes in a Paragon.=C2=A0 =C2= =A0It also means=C2=A0you could migrate processes off a node, take the node= down, reboot/change and bring it back. Very cool.=C2=A0 After the first sy= stem was installed, adding a node was trivial, by the way.=C2=A0 You booted= the node, "joined" the cluster, and were up. AIX used file repli= cation to then build the local disks as needed.=C2=A0 =C2=A0 BTW: "che= ckpointing" was a freebie=C2=A0-- you just migrated the file to a disk= .

Mixing ISA like the 370 and PS/2=C2=A0 was a mixed b= ag -- I'll let Charlie comment.=C2=A0 =C2=A0With TNC we redid that mode= l a bit, I'm not sure we ever got it 100% right.=C2=A0 The HP-UX versio= n was probably the best.

The biggest implementation is= sue is that UNIX has too many different namespaces with all sorts of rules = that are particular to each.=C2=A0 For all of the concept of "everythi= ng is a file," - when you start to try to bring it together, you disco= ver new and werid^H^H^H^H^Hintersting name spaces from System V IPC to sign= als to FIFOs and Name Pipes (similar but different).=C2=A0 It seemed like e= verything we looked, we would find another NS we needed to handle, and when= we started to try to look at non-UNIX process layers, it got even stranger= .=C2=A0 The original UNIX protection model is a tad weak, but most people h= ad started to add ACLs, and POSIX was in the throughs of standardizing them= -- so we based it on an early POSIX proposal (mostly based on HP-UX since = they had them before the others did).

To be more speci= fic, the virtual process layer (VPROC) attempted to do what VFS had done fo= r the FS layer to the core kernel.=C2=A0 =C2=A0If you look at both the orig= inal 2 Locus schemes, process control was ad hoc and thus very messy.=C2=A0= =C2=A0LCC realized if we were going to succeed, we needed to make that cle= aner.=C2=A0 But that still=C2=A0took major surgery - although, like the CFS= layer, things were a lot clearer once done.=C2=A0 =C2=A0Bruce, Roman, and = I came up with VPROCs.=C2=A0 BTW: one of the cool parts of VPROC is like VF= S. It conceptually made it possible to have other process models. We did a = prototype for OS/2 running inside of the OSF uK and were trying to get a co= ntract from DEC to do it to Tru64=C2=A0and adding VMS before we got sold (w= e had already developed CFS for DEC as part of Tru64 - which TNC's Clus= ter File System). Truth is, cheap VMs killed the need for this idea, but it= worked fairly well.=C2=A0 =C2=A0

After the core VPR= OCs layer, the hardest thing was distributed shared=C2=A0memory (DSM) and t= he distributed lock manager=C2=A0(DLM).=C2=A0 =C2=A0DSM was an example that= offered pure transparency in operation, i.e., test and set worked (= operationally) correctly across the DSM, but it was not "speed transpa= rent."=C2=A0 But if you rewrote to use DLM, then you could get full tr= ansparency and speed.=C2=A0 The DLM is one of the TNC technology which live= s on today.=C2=A0 It ended up in a number=C2=A0of systems - Oracle wrote th= eir own based on the specs for the DEC DLM we built for the CFS for Tru64 (= which is from TNC). I believe a few other folks used it.=C2=A0 It was in OS= F's DCE, and ISTR Microsoft picked it up.

So a go= od question is if TNC was so cool, why did Beowulf (a real hack in comparis= on) stick around and TNC die?=C2=A0 =C2=A0Well, a few things.=C2=A0 LCC/HP = did not open-source the code until it was too late.=C2=A0 So Beowulf, which= was around, was what folks (like me) used to build big scientific clusters= . And while Popek was "right," -- it takes something like Locus/T= NC to make a cluster fully transparent.=C2=A0 Beowulf ignored the seams and= i the end, that was "good enough."=C2=A0 =C2=A0But it makes setu= p and admin a PITA, and the program needs to be careful -- the dragons are = all over the place.=C2=A0So, when I went to Intel, I was the Architect of C= luster Ready, which defined away many of those seams and then provided tool= s to test for them and help you admin.

Tools like the = Cluster Checker and the whole ClusterReady program would not be needed if T= NC had "stuck," and I think clusters, in general, a cluster of sm= all computers on a LAN, not just clusters on a high-speed/special interconn= ect like a supercomputer, would be more available today.


Clem

3D""=E1=90=A7
--0000000000009d71c3061b1811e7--