Larry, I found this: https://www.bitkeeper.org/src-notes/SCCSWEAVE.html A good reference? Marc On Fri, Dec 13, 2024 at 11:32 AM Marc Rochkind wrote: > Larry, thanks for this. I had read some things you've written about the > weave before, but not with this level of detail. Sounds weird, but I didn't > really appreciate the implications of the weave even though I'm the guy who > thought it up. I did understand the importance of not copying data if you > can reference it, which is a principle of database design (normal forms, > etc). > > In my paper, I can add a little more about the weave and its advantages. > Aside from this TUHS post, is there something I can put in the References > that people can find? > > Question: Is this right, that TeamWare was literally layered on top of > AT&T SCCS, but BitKeeper was layered on your implementation of SCCS? Or, > was it more complicated than that? > > Was your implementation of SCCS ever released by itself? > > Marc > > On Fri, Dec 13, 2024 at 11:06 AM Larry McVoy wrote: > >> On Fri, Dec 13, 2024 at 09:52:28AM -0700, Marc Rochkind wrote: >> > IEEE Transactions on Software Engineering has asked me to write a >> > retrospective on the influence of SCCS over the last 50 years, as my >> SCCS >> > paper was published in 1975. They consider it one of the most >> influential >> > papers from TSE's first decade. >> > >> > There's a funny quote from Ken Thompson that circulates from >> time-to-time: >> > >> > "SCCS, the source motel! Programs check in and never check out!" >> > >> > But nobody seems to know what it means exactly. As part of my research, >> I >> > asked Ken what the quote meant, sunce I wanted to include it. He >> explained >> > that it refers to SCCS storing binary data in its repository file, >> > preventing UNIX text tools from operating on the file. >> > >> > Of course, this is only one of SCCS's many weaknesses. If you have >> anything >> > funny about any of the others, post it here. I already have all the >> boring >> > usual stuff (e.g., long-term locks, file-oriented, no merging). >> >> Warning, I know more about SCCS than the average person, I've >> reimplemented it from scratch and then built BitKeeper on top of an >> extended SCCS file format. So lots of info coming. Rick Smith and >> Wayne Scott know as much as I do, Rick knows more, he joined me and >> promptly started fixing my SCCS implementation. So far as I know, >> there is not a more knowledgable person that Rick when it comes to >> weave file formats. >> >> SCCS's strength is the weave format. It's largely not understood, even >> by other people working in source management. Here's the benefit of >> that weave (if people use it, which, other than BitKeeper, they don't, >> looking at you, Clearcase, you had a weave and didn't use it): SCCS can >> pass merge data by reference, everyone else copies the data. >> >> SCCS is a set based system. Each node has a revision number, like 1.5, >> but because SCCS, unlike RCS, limited the revisions to at most 4 fields, >> like 1.5.1.1, it is _impossible_ to build the history graph from the >> revisions, you can in simple graphs but as soon as you branch from a >> branch, all bets are off. >> >> The graph is built from what BitKeeper called serial numbers. Each node >> in the graph has at least 2 serials, one that names that node in the >> graph, and one that is the parent. >> >> So if I have a file with 5 revisions in straight line history, the >> internal stuff will look something like >> >> rev me parent >> 1.5 5 4 >> 1.4 4 3 >> 1.3 3 2 >> 1.2 2 1 >> 1.1 1 0 >> >> So what's the set? Pretty simple for straight line history, you walk >> the history from the rev that you want, adding the "me" serial and >> going to the parent, repeat until the parent is 0. >> >> Suppose you branch from rev 1.3. >> >> rev me parent >> 1.3.1.1 6 3 >> 1.5 5 4 >> 1.4 4 3 >> 1.3 3 2 >> ... >> >> See that 1.3.1.1 is me: 6 and parent: 3. So if I were building the set >> for 1.3.1.1, it becomes 6, then go to parent 3, 2, 1, skipping over 5 >> and 4. If you understand that, you are starting to understand the set >> and how it is constructed. >> >> Did you know you can have an argument in the revision history without >> adding anything to the data part? SCCS has the ability to include >> and/or exclude serials as part of a delta. Lets say Marc looked at >> my 1.5 and thought it was garbage. He can exclude it from the >> set like so: >> >> rev me parent include exclude >> 1.6 7 5 0 5 >> 1.3.1.1 6 3 >> 1.5 5 4 >> 1.4 4 3 >> 1.3 3 2 >> ... >> >> That doesn't change the data part of the file AT ALL, it's just saying >> Marc doesn't want anyone to see the 1.5 changes. >> >> To understand that, you need to know how SCCS checks out a file. And >> you need to know how the data is stored. Which is in a weave. RCS, >> and pretty much everything that followed it, doesn't use a weave at >> all. RCS stores the most recent version of the file as a complete >> copy of the checked out file. Then each delta working backwards up >> the trunk is a patch, what diff produces. >> >> Think about what that means for working on a branch. You have to start >> with the most recent version of the file, apply backward patches to go >> to earlier versions all the way back to the branch point, then apply >> forward patches to work your way to tip of the branch. Ask Dave Miller >> how pleasant it is to work on gcc on a branch. It's crazy slow and >> painful. >> >> So how does SCCS do it? Lets say the first version of a file is >> >> 1 >> 2 >> 3 >> 4 >> 5 >> >> The data portion of the history file will look like: >> >> ^AI 1 >> 1 >> 2 >> 3 >> 4 >> 5 >> ^AE 1 >> >> SCCS used ^A at the beginning of a line to mean "this is metadata for >> SCCS". ^AI is an insert, ^AD is a delete, and insert/delete are paired >> with a ^AE which means end. The number after is the serial. So that >> weave says "If serial 1 is in your set, everything after ^AI 1 is part >> of that set until you hit the matching ^AE 1. >> >> Lets say the 2nd version is >> >> 1 >> 2 >> serial 2 added this >> 3 >> 4 >> >> Notice that serial 2 deleted what was line 5. >> >> ^AI 1 >> 1 >> 2 >> ^AI 2 >> serial 2 added this >> ^AE 2 >> 3 >> 4 >> ^AD 2 >> 5 >> ^AE 2 >> ^AE 1 >> >> So now we can start to see how you walk the weave. If I'm trying to >> check out 1.1 aka serial 1, I build a set that has only '1' in the set. >> I hit the ^AI 1 see that I have 1 in my set, so I'm now in print mode, >> which means print each data line. I hit ^AI 2, that's not in my set, >> so I'm now in skip mode. And I skip the stuff inserted by serial 2. >> I see the ^AE 2 and I revert back to print mode. I get to ^AD 2, >> 2 is NOT in my set, so I stay in print mode. Etc. >> >> Let's make a branch, 1.1.1.1, with lots of data. >> >> 1 >> 2 >> 3 >> branch line 1 >> branch line 2 >> ... >> branch line 10000 >> 4 >> 5 >> >> ^AI 1 >> 1 >> 2 >> ^AI 2 >> serial 2 added this >> ^AE 2 >> 3 >> ^AI 3 >> branch line 1 >> branch line 2 >> ... >> branch line 10000 >> ^AE 3 >> 4 >> ^AD 2 >> 5 >> ^AE 2 >> ^AE 1 >> >> So if I checked out 1.1.1.1, the set is 1, 3, I walk the weave and I'll >> print anything inserted by either of those, delete anything deleted >> by those, skip anything inserted by anything not in the set, skip any >> deletes by anything not in the set. >> >> The delta table looks like this, notice I've added an author column: >> >> rev me parent include exclude author >> 1.1.1.1 3 1 lm >> 1.2 2 1 lm >> 1.1 1 0 lm >> >> If you followed all that, you can see how SCCS can merge by reference. >> Lets say Clem decides to merge my branch onto the trunk. The delta table >> will get a new entry: >> >> rev me parent include exclude author >> 1.3 4 2 3 clem >> 1.1.1.1 3 1 lm >> 1.2 2 1 lm >> 1.1 1 0 lm >> >> The weave DOES NOT CHANGE. That's the pass by reference. You do the 3 >> way >> merge, it will find the lines "3" and "5" as anchor points in both >> versions, >> so it is a simple insert with no new data added to the weave. >> >> Here's some magic that *everyone* else gets wrong when they pass by value: >> In a system that passes by value (copies) the data, the merge done by clem >> would have an annotated listing like so: >> >> lm 1 >> lm 2 >> lm 3 >> clem branch line 1 >> clem branch line 2 >> clem ... >> clem branch line 10000 >> lm 4 >> lm 5 >> >> Since it copied the data, it looks like Clem wrote it but he didn't, he >> just automerged it. In SCCS/BitKeeper it would look like: >> >> lm 1 >> lm 2 >> lm 3 >> lm branch line 1 >> lm branch line 2 >> lm ... >> lm branch line 10000 >> lm 4 >> lm 5 >> >> which is correct, all of those lines were authored by one person. The >> only >> time the merger should show up as an author is if there was a conflict, >> however the merger resolved that conflict is new work and should be >> authored by the merger. >> >> What BitKeeper did, that was a profound step forward, was make the idea >> of a repository a formal thing and introduced the concept of changesets >> that keeps track of all this stuff at the repository level. So it does >> all this stuff at the file level but you don't have to do that low level >> work. You could think of SCCS as assembly and BitKeeper as more like C, >> it upleveled things to the point that humans can manage the repository >> easily. >> >> Whew. That's a butt load of info. Perhaps better for COFF? Any >> questions? It should be obvious that I *love* SCCS, it's a dramatically >> better file format than a patch based one, you can get *any* version of >> the file in constant time, authorship can be preserved across versions, >> it's pretty brilliant and I consider myself blessed to be posting this >> in response to SCCS's creator. Hats off to Marc. And big boo, hiss, >> to the RCS guy, who got a PhD for RCS (give me a break) and did the >> world a huge disservice by bad mouthing SCCS so he could promote RCS. >> >> --lm >> > > > -- > *My new email address is mrochkind@gmail.com * > -- *My new email address is mrochkind@gmail.com *