The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Marc Rochkind <mrochkind@gmail.com>
Cc: The UNIX Historical Society <tuhs@tuhs.org>
Subject: [TUHS] Re: SCCS roach motel
Date: Fri, 13 Dec 2024 11:39:03 -0700	[thread overview]
Message-ID: <CAOkr1zVx03_1cqz0oTn46=LYswpFzZW_evoMRxevNqZ7UiabRg@mail.gmail.com> (raw)
In-Reply-To: <CAOkr1zUJ=W5TsfW9Yoykh_M_9gpHi7hUe+xJpqb79U4JZcfuzg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 10947 bytes --]

Larry, I found this:

https://www.bitkeeper.org/src-notes/SCCSWEAVE.html

A good reference?

Marc

On Fri, Dec 13, 2024 at 11:32 AM Marc Rochkind <mrochkind@gmail.com> wrote:

> Larry, thanks for this. I had read some things you've written about the
> weave before, but not with this level of detail. Sounds weird, but I didn't
> really appreciate the implications of the weave even though I'm the guy who
> thought it up. I did understand the importance of not copying data if you
> can reference it, which is a principle of database design (normal forms,
> etc).
>
> In my paper, I can add a little more about the weave and its advantages.
> Aside from this TUHS post, is there something I can put in the References
> that people can find?
>
> Question: Is this right, that TeamWare was literally layered on top of
> AT&T SCCS, but BitKeeper was layered on your implementation of SCCS? Or,
> was it more complicated than that?
>
> Was your implementation of SCCS ever released by itself?
>
> Marc
>
> On Fri, Dec 13, 2024 at 11:06 AM Larry McVoy <lm@mcvoy.com> wrote:
>
>> On Fri, Dec 13, 2024 at 09:52:28AM -0700, Marc Rochkind wrote:
>> > IEEE Transactions on Software Engineering has asked me to write a
>> > retrospective on the influence of SCCS over the last 50 years, as my
>> SCCS
>> > paper was published in 1975. They consider it one of the most
>> influential
>> > papers from TSE's first decade.
>> >
>> > There's a funny quote from Ken Thompson that circulates from
>> time-to-time:
>> >
>> > "SCCS, the source motel! Programs check in and never check out!"
>> >
>> > But nobody seems to know what it means exactly. As part of my research,
>> I
>> > asked Ken what the quote meant, sunce I wanted to include it. He
>> explained
>> > that it refers to SCCS storing binary data in its repository file,
>> > preventing UNIX text tools from operating on the file.
>> >
>> > Of course, this is only one of SCCS's many weaknesses. If you have
>> anything
>> > funny about any of the others, post it here. I already have all the
>> boring
>> > usual stuff (e.g., long-term locks, file-oriented, no merging).
>>
>> Warning, I know more about SCCS than the average person, I've
>> reimplemented it from scratch and then built BitKeeper on top of an
>> extended SCCS file format.  So lots of info coming.  Rick Smith and
>> Wayne Scott know as much as I do, Rick knows more, he joined me and
>> promptly started fixing my SCCS implementation.  So far as I know,
>> there is not a more knowledgable person that Rick when it comes to
>> weave file formats.
>>
>> SCCS's strength is the weave format.  It's largely not understood, even
>> by other people working in source management.  Here's the benefit of
>> that weave (if people use it, which, other than BitKeeper, they don't,
>> looking at you, Clearcase, you had a weave and didn't use it): SCCS can
>> pass merge data by reference, everyone else copies the data.
>>
>> SCCS is a set based system.   Each node has a revision number, like 1.5,
>> but because SCCS, unlike RCS, limited the revisions to at most 4 fields,
>> like 1.5.1.1, it is _impossible_ to build the history graph from the
>> revisions, you can in simple graphs but as soon as you branch from a
>> branch, all bets are off.
>>
>> The graph is built from what BitKeeper called serial numbers.  Each node
>> in the graph has at least 2 serials, one that names that node in the
>> graph, and one that is the parent.
>>
>> So if I have a file with 5 revisions in straight line history, the
>> internal stuff will look something like
>>
>> rev     me      parent
>> 1.5     5       4
>> 1.4     4       3
>> 1.3     3       2
>> 1.2     2       1
>> 1.1     1       0
>>
>> So what's the set?  Pretty simple for straight line history, you walk
>> the history from the rev that you want, adding the "me" serial and
>> going to the parent, repeat until the parent is 0.
>>
>> Suppose you branch from rev 1.3.
>>
>> rev     me      parent
>> 1.3.1.1 6       3
>> 1.5     5       4
>> 1.4     4       3
>> 1.3     3       2
>> ...
>>
>> See that 1.3.1.1 is me: 6 and parent: 3.  So if I were building the set
>> for 1.3.1.1, it becomes 6, then go to parent 3, 2, 1, skipping over 5
>> and 4.  If you understand that, you are starting to understand the set
>> and how it is constructed.
>>
>> Did you know you can have an argument in the revision history without
>> adding anything to the data part?  SCCS has the ability to include
>> and/or exclude serials as part of a delta.  Lets say Marc looked at
>> my 1.5 and thought it was garbage.  He can exclude it from the
>> set like so:
>>
>> rev     me      parent  include exclude
>> 1.6     7       5       0       5
>> 1.3.1.1 6       3
>> 1.5     5       4
>> 1.4     4       3
>> 1.3     3       2
>> ...
>>
>> That doesn't change the data part of the file AT ALL, it's just saying
>> Marc doesn't want anyone to see the 1.5 changes.
>>
>> To understand that, you need to know how SCCS checks out a file.  And
>> you need to know how the data is stored.  Which is in a weave.  RCS,
>> and pretty much everything that followed it, doesn't use a weave at
>> all.  RCS stores the most recent version of the file as a complete
>> copy of the checked out file.  Then each delta working backwards up
>> the trunk is a patch, what diff produces.
>>
>> Think about what that means for working on a branch.  You have to start
>> with the most recent version of the file, apply backward patches to go
>> to earlier versions all the way back to the branch point, then apply
>> forward patches to work your way to tip of the branch.  Ask Dave Miller
>> how pleasant it is to work on gcc on a branch.  It's crazy slow and
>> painful.
>>
>> So how does SCCS do it?  Lets say the first version of a file is
>>
>> 1
>> 2
>> 3
>> 4
>> 5
>>
>> The data portion of the history file will look like:
>>
>> ^AI 1
>> 1
>> 2
>> 3
>> 4
>> 5
>> ^AE 1
>>
>> SCCS used ^A at the beginning of a line to mean "this is metadata for
>> SCCS".  ^AI is an insert, ^AD is a delete, and insert/delete are paired
>> with a ^AE which means end.  The number after is the serial.  So that
>> weave says "If serial 1 is in your set, everything after ^AI 1 is part
>> of that set until you hit the matching ^AE 1.
>>
>> Lets say the 2nd version is
>>
>> 1
>> 2
>> serial 2 added this
>> 3
>> 4
>>
>> Notice that serial 2 deleted what was line 5.
>>
>> ^AI 1
>> 1
>> 2
>> ^AI 2
>> serial 2 added this
>> ^AE 2
>> 3
>> 4
>> ^AD 2
>> 5
>> ^AE 2
>> ^AE 1
>>
>> So now we can start to see how you walk the weave.  If I'm trying to
>> check out 1.1 aka serial 1, I build a set that has only '1' in the set.
>> I hit the ^AI 1 see that I have 1 in my set, so I'm now in print mode,
>> which means print each data line.  I hit ^AI 2, that's not in my set,
>> so I'm now in skip mode.  And I skip the stuff inserted by serial 2.
>> I see the ^AE 2 and I revert back to print mode.  I get to ^AD 2,
>> 2 is NOT in my set, so I stay in print mode.  Etc.
>>
>> Let's make a branch, 1.1.1.1, with lots of data.
>>
>> 1
>> 2
>> 3
>> branch line 1
>> branch line 2
>> ...
>> branch line 10000
>> 4
>> 5
>>
>> ^AI 1
>> 1
>> 2
>> ^AI 2
>> serial 2 added this
>> ^AE 2
>> 3
>> ^AI 3
>> branch line 1
>> branch line 2
>> ...
>> branch line 10000
>> ^AE 3
>> 4
>> ^AD 2
>> 5
>> ^AE 2
>> ^AE 1
>>
>> So if I checked out 1.1.1.1, the set is 1, 3, I walk the weave and I'll
>> print anything inserted by either of those, delete anything deleted
>> by those, skip anything inserted by anything not in the set, skip any
>> deletes by anything not in the set.
>>
>> The delta table looks like this, notice I've added an author column:
>>
>> rev     me      parent  include exclude author
>> 1.1.1.1 3       1                       lm
>> 1.2     2       1                       lm
>> 1.1     1       0                       lm
>>
>> If you followed all that, you can see how SCCS can merge by reference.
>> Lets say Clem decides to merge my branch onto the trunk. The delta table
>> will get a new entry:
>>
>> rev     me      parent  include exclude author
>> 1.3     4       2       3               clem
>> 1.1.1.1 3       1                       lm
>> 1.2     2       1                       lm
>> 1.1     1       0                       lm
>>
>> The weave DOES NOT CHANGE.  That's the pass by reference.  You do the 3
>> way
>> merge, it will find the lines "3" and "5" as anchor points in both
>> versions,
>> so it is a simple insert with no new data added to the weave.
>>
>> Here's some magic that *everyone* else gets wrong when they pass by value:
>> In a system that passes by value (copies) the data, the merge done by clem
>> would have an annotated listing like so:
>>
>> lm      1
>> lm      2
>> lm      3
>> clem    branch line 1
>> clem    branch line 2
>> clem    ...
>> clem    branch line 10000
>> lm      4
>> lm      5
>>
>> Since it copied the data, it looks like Clem wrote it but he didn't, he
>> just automerged it.  In SCCS/BitKeeper it would look like:
>>
>> lm      1
>> lm      2
>> lm      3
>> lm      branch line 1
>> lm      branch line 2
>> lm      ...
>> lm      branch line 10000
>> lm      4
>> lm      5
>>
>> which is correct, all of those lines were authored by one person.  The
>> only
>> time the merger should show up as an author is if there was a conflict,
>> however the merger resolved that conflict is new work and should be
>> authored by the merger.
>>
>> What BitKeeper did, that was a profound step forward, was make the idea
>> of a repository a formal thing and introduced the concept of changesets
>> that keeps track of all this stuff at the repository level.  So it does
>> all this stuff at the file level but you don't have to do that low level
>> work.  You could think of SCCS as assembly and BitKeeper as more like C,
>> it upleveled things to the point that humans can manage the repository
>> easily.
>>
>> Whew.  That's a butt load of info.  Perhaps better for COFF?  Any
>> questions?  It should be obvious that I *love* SCCS, it's a dramatically
>> better file format than a patch based one, you can get *any* version of
>> the file in constant time, authorship can be preserved across versions,
>> it's pretty brilliant and I consider myself blessed to be posting this
>> in response to SCCS's creator.  Hats off to Marc.  And big boo, hiss,
>> to the RCS guy, who got a PhD for RCS (give me a break) and did the
>> world a huge disservice by bad mouthing SCCS so he could promote RCS.
>>
>> --lm
>>
>
>
> --
> *My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*
>


-- 
*My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*

[-- Attachment #2: Type: text/html, Size: 12979 bytes --]

  reply	other threads:[~2024-12-13 18:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13 16:52 [TUHS] " Marc Rochkind
     [not found] ` <A6DE3D0A-8ED7-4E82-87CF-F2BC7AE11761@seiden.com>
2024-12-13 17:58   ` [TUHS] " Marc Rochkind
2024-12-13 21:09     ` Dan Cross
2024-12-14  1:11       ` Marc Rochkind
2024-12-14  1:27         ` Dan Cross
2024-12-14  1:39           ` Larry McVoy
2024-12-14  6:20           ` Marc Rochkind
2024-12-14  1:38         ` Larry McVoy
2024-12-13 18:06 ` Larry McVoy
2024-12-13 18:32   ` Marc Rochkind
2024-12-13 18:39     ` Marc Rochkind [this message]
2024-12-13 18:49       ` Larry McVoy
2024-12-13 18:55     ` Larry McVoy
2024-12-13 19:55       ` Henry Bent
2024-12-14 18:29         ` arnold
2024-12-14 18:59           ` Larry McVoy
2024-12-13 21:46     ` Clem Cole
2024-12-13 21:22 ` Rob Pike
2024-12-13 21:27   ` Rob Pike
2024-12-13 21:37     ` Aron Insinga
2024-12-13 21:40       ` Aron Insinga
2024-12-14  0:37 ` Luther Johnson
2024-12-13 22:33 Norman Wilson
2024-12-17  0:21 ` andrew
2024-12-13 22:57 Norman Wilson
2024-12-13 23:19 ` Larry McVoy
2024-12-13 23:38   ` Warner Losh
2024-12-14  0:53     ` Larry McVoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOkr1zVx03_1cqz0oTn46=LYswpFzZW_evoMRxevNqZ7UiabRg@mail.gmail.com' \
    --to=mrochkind@gmail.com \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).