The Unix Heritage Society mailing list
 help / color / Atom feed
From: Larry McVoy <lm@mcvoy.com>
To: Dave Horsfall <dave@horsfall.org>
Cc: The Eunuchs Hysterical Society <tuhs@tuhs.org>, rick@rbsmith.com
Subject: [TUHS] SCCS
Date: Wed, 11 Sep 2019 20:43:46 -0700
Message-ID: <20190912034346.GJ2046@mcvoy.com> (raw)
In-Reply-To: <alpine.BSF.2.21.9999.1909120846160.18105@aneurin.horsfall.org>

On Thu, Sep 12, 2019 at 08:49:35AM +1000, Dave Horsfall wrote:
> On Wed, 11 Sep 2019, Larry McVoy wrote:
> 
> >SCCS is awesome, I'll post why in a different thread.  It is light years
> >better than RCS, Tichy lied.
> 
> Agreed; I used it for years, then was forced to use RCS in my next job :-(

Marc Rochkind is on the list, I believe I invited him, but he can spell
check what I'm about to say.

SCCS is awesome.  People have no idea how good it is as a version control
system because Walter Tichy got his PhD for writing RCS and claiming it 
was better (don't get me started on how wrong that was).

Let me start with how RCS works.  You all know about diff and patch,
someone does a diff, produces a patch, and Larry Wall wrote patch(1)
that takes the patch and file and applies it.  In a straight line 
history here is how RCS works.  The most recent version of the file
is stored in whole, the delta behind that is a reverse patch to get 
to that, same for the next delta behind that and so on.

Branches in RCS are forward patches.  Sort of.  You start with the
whole file that is the most recent version, reverse patch your way
back to the branch point, and then forward patch your way down to 
the most recent version on the branch.  

Yeah, branches in RCS suck, very expensive.  

So why is SCCS awesome?  It is an entirely different approach.
SCCS is a set based system, for any version, there is a set
of deltas that are in that version and you apply them to the 
file part of the data.  

Huh?  What does that mean?  OK, you've all seen SCCS revisions, 1.1,
1.2, 1.3, 1.3.1.1, etc.  Yeah, that's for humans (and truth be told those
revs are not that great).  For SCCS internally there are serial numbers.
All those are are a monotonically increasing number, doesn't matter
if you are on the trunk or on a branch, each time you add a delta the
internal number, or serial, is the last serial plus 1.

When you go to check out a version of the file, that's the set.
It's the set of serials that make up that file.  If you wanted
1.3 and there are no branches, the set is the serial of 1.3 (3)
and the parent of 1.3 which is 1.2 (2) and 1.1 (1).  So your set
is 1,2,3.

Here is the awesome part.  The way the data is stored in SCCS
is nothing like patches, it's what we call a weave.  All versions
of the file are woven together in the following way.  There are
3 operators:

insert: ^AI <serial>
delete: ^AD <serial>
end: ^AE <serial>

So if you checked in a file that looked like

I
love
TUHS

The weave would be

^AI 1
I
love
TUHS
^AE 1

Lets say that Clem changed that to

I
really
love
TUHS

The new weave would look like:

^AI 1
I
^AI 2
really
^AE 2
love
TUHS
^AE 1

and if I changed it to

I
*really*
love
TUHS

the weave looks like

^AI 1
I
^AD 3
^AI 2
really
^AE 2
^AE 3
^AI 3
*really*
^AE 3
love
TUHS
^AE 1

So a checkout is 2 things, build up the set of serials that need to be
active for this checkout, and walk the weave.  For each serial you see
you are either in this set and I need to do what it says, or this is
not in my set and I need to do the opposite of what it says.

So that is really really fast compared to RCS.  RCS reads the whole
file and has to do compute, SCCS reads the whole file and does a
very tiny fraction of that compute, so tiny that you can't measure
it compared to reading the file.  

But wait, it gets better, much better.  Lets talk about branches
and merges.  RCS is copy by value across a merge, SCCS is copy by
reference.  Marc thought about the set stuff enough to realize
wouldn't be cool if you could manipulate the set.  He added include
and exclude operators.

Imagine if you and I were having an argument about some video being
checked in.  You checked it in, I deleted it, you checked it in, I deleted
it.  Suppose that was a 1GB video.  In RCS, each time we argued that is
another GB, we did that 4 times, there 4GB of diffs in the history.

In SCCS, you could do that with includes and excludes, those 4 times
would be about 30 bytes because all they are doing is saying "In the
set", "Not in the set".

Cool I guess but what is the real life win?  Merges.  In a weave based
system like SCCS you can add 1GB on a branch and I can merge that onto
the trunk and all I did was say "include your serials".  I didn't copy
your work, I referenced it.  Tiny meta data to do so.

That has more implications than you might think.  Think annotations.
git blame, know that?  It shows who did what?  Yeah, well git is 
copy by value just like RCS.  It's not just about the space used,
it is also about who did what.  If bwk did one thing and dmr did 
another thing and little old lm merged dmr's stuff into the bwk 
trunk, in a copy by value system, all of dmr's work will look like
I wrote it in bwk's trunk.

SCCS avoids that.  If I merged dmr's work into bwk's tree, if it 
all automerged, annotations would show it all as dmr's work, yeah,
I did the merge but I didn't do anything so I don't show up in
annotations.  If there was a conflict and I had to resolve that
conflict, then that, and that alone, would show up as my work.

For Marc, I worked with Rick Smith, he found me after I had done a
reimplentation of SCCS.  He has forgotten more about weaves than I will
ever know.  But he was really impressed with my code (which you can
go see at bitkeeper.org, and it is not my code, it is my teams code,
Rick bugfixed all my mistakes) because I embraced the set thing and the
way I wrote the code you could have N of my data structures and pulled
out N versions of the file in one pass.  He had not seen that before,
to me it just seemed the most natural way to do it.

SCCS is awesome, Marc should be held up as a hero for that.  Most people
have no idea how much better it is as a format, to this day people still
do it wrong.  The hg people at facebook sort of got it, they did an
import of SCCS (it was BitKeeper which is SCCS on super steriods).
But it is rare that someone gets it.  I wanted Marc to know we got it.

--lm

  reply index

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  6:25 [TUHS] PWB vs Unix/TS Warner Losh
2019-09-09  6:36 ` arnold
2019-09-10 15:16 ` Clem Cole
2019-09-11  0:28   ` Steve Johnson
2019-09-11  3:53   ` Warner Losh
2019-09-11 15:36     ` Clem Cole
2019-09-11 16:55       ` [TUHS] IBM Unix source licenses [was " Charles H Sauer
2019-09-12 19:31         ` Kevin Bowling
2019-09-12 20:59           ` Clem Cole
2019-09-12 21:09             ` [TUHS] IBM Unix source licenses - Series/1 NUXI Ronald Natalie
2019-09-12 21:31             ` [TUHS] IBM Unix source licenses [was Re: PWB vs Unix/TS Warner Losh
2019-09-12 22:30             ` jcs
2019-09-12 23:12               ` reed
2019-09-12 23:22                 ` jcs
2019-09-12 23:29               ` [TUHS] IBM Unix source licenses Warren Toomey
2019-09-13  7:06                 ` arnold
2019-09-13  8:30                 ` SPC
2019-09-14 18:29                   ` Warner Losh
2019-09-12 21:29           ` [TUHS] IBM Unix source licenses [was Re: PWB vs Unix/TS Charles H Sauer
2019-09-11 17:49       ` [TUHS] " Richard Salz
2019-09-11 17:52         ` ron
2019-09-11 21:44           ` Clem Cole
2019-09-11 18:11       ` Larry McVoy
2019-09-11 18:18         ` Richard Salz
2019-09-11 18:54           ` Larry McVoy
2019-09-11 21:05             ` Steve Johnson
2019-09-11 21:34             ` Steve Johnson
2019-09-11 21:57             ` Clem Cole
2019-09-11 22:50               ` Arthur Krewat
2019-09-11 21:59           ` Clem Cole
2019-09-11 21:50         ` Clem Cole
2019-09-11 22:49         ` Dave Horsfall
2019-09-12  3:43           ` Larry McVoy [this message]
2019-09-12  4:20             ` [TUHS] SCCS George Michaelson
2019-09-12  4:31               ` [TUHS] [SPAM] SCCS Larry McVoy
2019-09-12 13:44                 ` Tony Finch
2019-09-13  4:11                   ` Larry McVoy
2019-09-13  5:54                     ` Dave Horsfall
2019-09-13  8:00                       ` Peter Jeremy
2019-09-13 15:23                         ` Larry McVoy
2019-09-13 21:36                         ` Dave Horsfall
2019-09-12  4:28             ` [TUHS] SCCS Jon Forrest
2019-09-12  4:33               ` Larry McVoy
2019-09-12  6:12                 ` William Corcoran
2019-09-12 14:35                   ` Clem Cole
2019-09-13  5:22                 ` Dave Horsfall
2019-09-13  5:50                   ` Bakul Shah
2019-09-12 16:45               ` Eric Allman
2019-09-12 17:29                 ` Clem Cole
2019-09-12 17:47                   ` Warner Losh
2019-09-13  8:12                   ` emanuel stiebler
2019-09-13 21:11                     ` Steffen Nurpmeso
2019-09-13 21:17                       ` Larry McVoy
2019-09-13 21:48                         ` Bakul Shah
2019-09-13 23:12                           ` Steffen Nurpmeso
2019-09-13 23:03                         ` Steffen Nurpmeso
2019-09-14  1:55                           ` [TUHS] [SPAM] SCCS Larry McVoy
2019-09-16 17:23                             ` [TUHS] SCCS Steffen Nurpmeso
2019-09-16 20:31                               ` Larry McVoy
2019-09-17 17:57                                 ` Steffen Nurpmeso
2019-09-18  8:48                               ` Eric Allman
2019-09-18 17:33                                 ` Steffen Nurpmeso
2019-09-12 20:07             ` Nemo
2019-09-11 16:05   ` [TUHS] PWB vs Unix/TS Paul Winalski
2019-09-11 17:14     ` ron
2019-09-14  0:44   ` [TUHS] a book (was Re: PWB vs Unix/TS) reed
2019-09-14  2:53     ` Warner Losh
2019-09-15  2:18       ` Jon Steinhart
2019-09-15  2:39         ` Clem Cole
2019-09-15  3:24         ` Adam Thornton
2019-09-14 22:46     ` Clem cole
2019-09-15  0:58       ` Adam Thornton
2019-09-15  3:30         ` Eric Allman
2019-09-15  4:21           ` Larry McVoy
2019-09-15  5:17             ` Jon Steinhart
2019-09-15 20:14               ` Clem Cole
2019-09-15 20:21                 ` Jon Steinhart
2019-09-15 20:12           ` Clem Cole
2019-09-15 21:28             ` Dave Horsfall
2019-09-15 23:27               ` Clem cole
2019-09-15 23:45                 ` Richard Salz
2019-09-15  7:43     ` Andy Kosela
2019-09-12  4:25 [TUHS] SCCS Jon Steinhart
2019-09-13 21:37 Norman Wilson
2019-09-13 21:51 ` Larry McVoy

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912034346.GJ2046@mcvoy.com \
    --to=lm@mcvoy.com \
    --cc=dave@horsfall.org \
    --cc=rick@rbsmith.com \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

The Unix Heritage Society mailing list

Archives are clonable: git clone --mirror http://inbox.vuxu.org/tuhs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.tuhs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git