From: Marc Rochkind <mrochkind@gmail.com>
Cc: The UNIX Historical Society <tuhs@tuhs.org>
Subject: [TUHS] Re: SCCS roach motel
Date: Fri, 13 Dec 2024 11:39:03 -0700 [thread overview]
Message-ID: <CAOkr1zVx03_1cqz0oTn46=LYswpFzZW_evoMRxevNqZ7UiabRg@mail.gmail.com> (raw)
In-Reply-To: <CAOkr1zUJ=W5TsfW9Yoykh_M_9gpHi7hUe+xJpqb79U4JZcfuzg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 10947 bytes --]
Larry, I found this:
https://www.bitkeeper.org/src-notes/SCCSWEAVE.html
A good reference?
Marc
On Fri, Dec 13, 2024 at 11:32 AM Marc Rochkind <mrochkind@gmail.com> wrote:
> Larry, thanks for this. I had read some things you've written about the
> weave before, but not with this level of detail. Sounds weird, but I didn't
> really appreciate the implications of the weave even though I'm the guy who
> thought it up. I did understand the importance of not copying data if you
> can reference it, which is a principle of database design (normal forms,
> etc).
>
> In my paper, I can add a little more about the weave and its advantages.
> Aside from this TUHS post, is there something I can put in the References
> that people can find?
>
> Question: Is this right, that TeamWare was literally layered on top of
> AT&T SCCS, but BitKeeper was layered on your implementation of SCCS? Or,
> was it more complicated than that?
>
> Was your implementation of SCCS ever released by itself?
>
> Marc
>
> On Fri, Dec 13, 2024 at 11:06 AM Larry McVoy <lm@mcvoy.com> wrote:
>
>> On Fri, Dec 13, 2024 at 09:52:28AM -0700, Marc Rochkind wrote:
>> > IEEE Transactions on Software Engineering has asked me to write a
>> > retrospective on the influence of SCCS over the last 50 years, as my
>> SCCS
>> > paper was published in 1975. They consider it one of the most
>> influential
>> > papers from TSE's first decade.
>> >
>> > There's a funny quote from Ken Thompson that circulates from
>> time-to-time:
>> >
>> > "SCCS, the source motel! Programs check in and never check out!"
>> >
>> > But nobody seems to know what it means exactly. As part of my research,
>> I
>> > asked Ken what the quote meant, sunce I wanted to include it. He
>> explained
>> > that it refers to SCCS storing binary data in its repository file,
>> > preventing UNIX text tools from operating on the file.
>> >
>> > Of course, this is only one of SCCS's many weaknesses. If you have
>> anything
>> > funny about any of the others, post it here. I already have all the
>> boring
>> > usual stuff (e.g., long-term locks, file-oriented, no merging).
>>
>> Warning, I know more about SCCS than the average person, I've
>> reimplemented it from scratch and then built BitKeeper on top of an
>> extended SCCS file format. So lots of info coming. Rick Smith and
>> Wayne Scott know as much as I do, Rick knows more, he joined me and
>> promptly started fixing my SCCS implementation. So far as I know,
>> there is not a more knowledgable person that Rick when it comes to
>> weave file formats.
>>
>> SCCS's strength is the weave format. It's largely not understood, even
>> by other people working in source management. Here's the benefit of
>> that weave (if people use it, which, other than BitKeeper, they don't,
>> looking at you, Clearcase, you had a weave and didn't use it): SCCS can
>> pass merge data by reference, everyone else copies the data.
>>
>> SCCS is a set based system. Each node has a revision number, like 1.5,
>> but because SCCS, unlike RCS, limited the revisions to at most 4 fields,
>> like 1.5.1.1, it is _impossible_ to build the history graph from the
>> revisions, you can in simple graphs but as soon as you branch from a
>> branch, all bets are off.
>>
>> The graph is built from what BitKeeper called serial numbers. Each node
>> in the graph has at least 2 serials, one that names that node in the
>> graph, and one that is the parent.
>>
>> So if I have a file with 5 revisions in straight line history, the
>> internal stuff will look something like
>>
>> rev me parent
>> 1.5 5 4
>> 1.4 4 3
>> 1.3 3 2
>> 1.2 2 1
>> 1.1 1 0
>>
>> So what's the set? Pretty simple for straight line history, you walk
>> the history from the rev that you want, adding the "me" serial and
>> going to the parent, repeat until the parent is 0.
>>
>> Suppose you branch from rev 1.3.
>>
>> rev me parent
>> 1.3.1.1 6 3
>> 1.5 5 4
>> 1.4 4 3
>> 1.3 3 2
>> ...
>>
>> See that 1.3.1.1 is me: 6 and parent: 3. So if I were building the set
>> for 1.3.1.1, it becomes 6, then go to parent 3, 2, 1, skipping over 5
>> and 4. If you understand that, you are starting to understand the set
>> and how it is constructed.
>>
>> Did you know you can have an argument in the revision history without
>> adding anything to the data part? SCCS has the ability to include
>> and/or exclude serials as part of a delta. Lets say Marc looked at
>> my 1.5 and thought it was garbage. He can exclude it from the
>> set like so:
>>
>> rev me parent include exclude
>> 1.6 7 5 0 5
>> 1.3.1.1 6 3
>> 1.5 5 4
>> 1.4 4 3
>> 1.3 3 2
>> ...
>>
>> That doesn't change the data part of the file AT ALL, it's just saying
>> Marc doesn't want anyone to see the 1.5 changes.
>>
>> To understand that, you need to know how SCCS checks out a file. And
>> you need to know how the data is stored. Which is in a weave. RCS,
>> and pretty much everything that followed it, doesn't use a weave at
>> all. RCS stores the most recent version of the file as a complete
>> copy of the checked out file. Then each delta working backwards up
>> the trunk is a patch, what diff produces.
>>
>> Think about what that means for working on a branch. You have to start
>> with the most recent version of the file, apply backward patches to go
>> to earlier versions all the way back to the branch point, then apply
>> forward patches to work your way to tip of the branch. Ask Dave Miller
>> how pleasant it is to work on gcc on a branch. It's crazy slow and
>> painful.
>>
>> So how does SCCS do it? Lets say the first version of a file is
>>
>> 1
>> 2
>> 3
>> 4
>> 5
>>
>> The data portion of the history file will look like:
>>
>> ^AI 1
>> 1
>> 2
>> 3
>> 4
>> 5
>> ^AE 1
>>
>> SCCS used ^A at the beginning of a line to mean "this is metadata for
>> SCCS". ^AI is an insert, ^AD is a delete, and insert/delete are paired
>> with a ^AE which means end. The number after is the serial. So that
>> weave says "If serial 1 is in your set, everything after ^AI 1 is part
>> of that set until you hit the matching ^AE 1.
>>
>> Lets say the 2nd version is
>>
>> 1
>> 2
>> serial 2 added this
>> 3
>> 4
>>
>> Notice that serial 2 deleted what was line 5.
>>
>> ^AI 1
>> 1
>> 2
>> ^AI 2
>> serial 2 added this
>> ^AE 2
>> 3
>> 4
>> ^AD 2
>> 5
>> ^AE 2
>> ^AE 1
>>
>> So now we can start to see how you walk the weave. If I'm trying to
>> check out 1.1 aka serial 1, I build a set that has only '1' in the set.
>> I hit the ^AI 1 see that I have 1 in my set, so I'm now in print mode,
>> which means print each data line. I hit ^AI 2, that's not in my set,
>> so I'm now in skip mode. And I skip the stuff inserted by serial 2.
>> I see the ^AE 2 and I revert back to print mode. I get to ^AD 2,
>> 2 is NOT in my set, so I stay in print mode. Etc.
>>
>> Let's make a branch, 1.1.1.1, with lots of data.
>>
>> 1
>> 2
>> 3
>> branch line 1
>> branch line 2
>> ...
>> branch line 10000
>> 4
>> 5
>>
>> ^AI 1
>> 1
>> 2
>> ^AI 2
>> serial 2 added this
>> ^AE 2
>> 3
>> ^AI 3
>> branch line 1
>> branch line 2
>> ...
>> branch line 10000
>> ^AE 3
>> 4
>> ^AD 2
>> 5
>> ^AE 2
>> ^AE 1
>>
>> So if I checked out 1.1.1.1, the set is 1, 3, I walk the weave and I'll
>> print anything inserted by either of those, delete anything deleted
>> by those, skip anything inserted by anything not in the set, skip any
>> deletes by anything not in the set.
>>
>> The delta table looks like this, notice I've added an author column:
>>
>> rev me parent include exclude author
>> 1.1.1.1 3 1 lm
>> 1.2 2 1 lm
>> 1.1 1 0 lm
>>
>> If you followed all that, you can see how SCCS can merge by reference.
>> Lets say Clem decides to merge my branch onto the trunk. The delta table
>> will get a new entry:
>>
>> rev me parent include exclude author
>> 1.3 4 2 3 clem
>> 1.1.1.1 3 1 lm
>> 1.2 2 1 lm
>> 1.1 1 0 lm
>>
>> The weave DOES NOT CHANGE. That's the pass by reference. You do the 3
>> way
>> merge, it will find the lines "3" and "5" as anchor points in both
>> versions,
>> so it is a simple insert with no new data added to the weave.
>>
>> Here's some magic that *everyone* else gets wrong when they pass by value:
>> In a system that passes by value (copies) the data, the merge done by clem
>> would have an annotated listing like so:
>>
>> lm 1
>> lm 2
>> lm 3
>> clem branch line 1
>> clem branch line 2
>> clem ...
>> clem branch line 10000
>> lm 4
>> lm 5
>>
>> Since it copied the data, it looks like Clem wrote it but he didn't, he
>> just automerged it. In SCCS/BitKeeper it would look like:
>>
>> lm 1
>> lm 2
>> lm 3
>> lm branch line 1
>> lm branch line 2
>> lm ...
>> lm branch line 10000
>> lm 4
>> lm 5
>>
>> which is correct, all of those lines were authored by one person. The
>> only
>> time the merger should show up as an author is if there was a conflict,
>> however the merger resolved that conflict is new work and should be
>> authored by the merger.
>>
>> What BitKeeper did, that was a profound step forward, was make the idea
>> of a repository a formal thing and introduced the concept of changesets
>> that keeps track of all this stuff at the repository level. So it does
>> all this stuff at the file level but you don't have to do that low level
>> work. You could think of SCCS as assembly and BitKeeper as more like C,
>> it upleveled things to the point that humans can manage the repository
>> easily.
>>
>> Whew. That's a butt load of info. Perhaps better for COFF? Any
>> questions? It should be obvious that I *love* SCCS, it's a dramatically
>> better file format than a patch based one, you can get *any* version of
>> the file in constant time, authorship can be preserved across versions,
>> it's pretty brilliant and I consider myself blessed to be posting this
>> in response to SCCS's creator. Hats off to Marc. And big boo, hiss,
>> to the RCS guy, who got a PhD for RCS (give me a break) and did the
>> world a huge disservice by bad mouthing SCCS so he could promote RCS.
>>
>> --lm
>>
>
>
> --
> *My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*
>
--
*My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*
[-- Attachment #2: Type: text/html, Size: 12979 bytes --]
next prev parent reply other threads:[~2024-12-13 18:39 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-13 16:52 [TUHS] " Marc Rochkind
[not found] ` <A6DE3D0A-8ED7-4E82-87CF-F2BC7AE11761@seiden.com>
2024-12-13 17:58 ` [TUHS] " Marc Rochkind
2024-12-13 21:09 ` Dan Cross
2024-12-14 1:11 ` Marc Rochkind
2024-12-14 1:27 ` Dan Cross
2024-12-14 1:39 ` Larry McVoy
2024-12-14 6:20 ` Marc Rochkind
2024-12-14 1:38 ` Larry McVoy
2024-12-13 18:06 ` Larry McVoy
2024-12-13 18:32 ` Marc Rochkind
2024-12-13 18:39 ` Marc Rochkind [this message]
2024-12-13 18:49 ` Larry McVoy
2024-12-13 18:55 ` Larry McVoy
2024-12-13 19:55 ` Henry Bent
2024-12-14 18:29 ` arnold
2024-12-14 18:59 ` Larry McVoy
2024-12-13 21:46 ` Clem Cole
2024-12-13 21:22 ` Rob Pike
2024-12-13 21:27 ` Rob Pike
2024-12-13 21:37 ` Aron Insinga
2024-12-13 21:40 ` Aron Insinga
2024-12-14 0:37 ` Luther Johnson
2024-12-13 22:33 Norman Wilson
2024-12-17 0:21 ` andrew
2024-12-13 22:57 Norman Wilson
2024-12-13 23:19 ` Larry McVoy
2024-12-13 23:38 ` Warner Losh
2024-12-14 0:53 ` Larry McVoy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOkr1zVx03_1cqz0oTn46=LYswpFzZW_evoMRxevNqZ7UiabRg@mail.gmail.com' \
--to=mrochkind@gmail.com \
--cc=tuhs@tuhs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).