The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-13  0:44 Noel Chiappa
  2017-05-13  0:51 ` Random832
  0 siblings, 1 reply; 77+ messages in thread
From: Noel Chiappa @ 2017-05-13  0:44 UTC (permalink / raw)


    > From: Dave Horsfall

    > Err, isn't that the sticky bit, not the setuid bit?

Oh, right you are. I just looked in the code for ptrace(), and assumed that
was it.

The fix is _actually_ in sys1$exec() (in V6) and sys1$getxfile() (in PWB1 and
the MIT system:

	/*
	 * set SUID/SGID protections, if no tracing
	 */

	if ((u.u_procp->p_flag&STRC)==0) {
                if(ip->i_mode&ISUID)
			if(u.u_uid != 0) {
				u.u_uid = ip->i_uid;
				u.u_procp->p_uid = ip->i_uid;
				}

The thing is, this code is identical in V6, PWB1, and MIT system!?

So now I'm wondering - was this really the bug? Or was there some
bug in ptrace I don't see, which was the actual bug that's being
discussed here.

Because is sure looks like this would prevent the exploitation that I
described (start an SUID program under the debugger, then patch the code).

Or perhaps somehow this fix was broken by some other feature,, and that
introduced the exploit?

	  Noel


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13  0:44 [TUHS] The evolution of Unix facilities and architecture Noel Chiappa
@ 2017-05-13  0:51 ` Random832
  2017-05-13  0:55   ` Dave Horsfall
                     ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Random832 @ 2017-05-13  0:51 UTC (permalink / raw)


On Fri, May 12, 2017, at 20:44, Noel Chiappa wrote:
> So now I'm wondering - was this really the bug? Or was there some
> bug in ptrace I don't see, which was the actual bug that's being
> discussed here.

Ah. There's the other piece. You start the SUID program under the
debugger, and rather than kicking off the debugger, it simply starts it
non-suid. *However*, in the presence of shared text (either of the two
cases being checked for in the other place), you can make changes to the
text image (e.g. put whatever code you want at the entry point), which
will be reused the *next* time it is started *without* the debugger.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13  0:51 ` Random832
@ 2017-05-13  0:55   ` Dave Horsfall
  2017-05-13  1:17   ` Chris Torek
  2017-05-13 15:25   ` Steve Simon
  2 siblings, 0 replies; 77+ messages in thread
From: Dave Horsfall @ 2017-05-13  0:55 UTC (permalink / raw)


On Fri, 12 May 2017, Random832 wrote:

> Ah. There's the other piece. You start the SUID program under the 
> debugger, and rather than kicking off the debugger, it simply starts it 
> non-suid. *However*, in the presence of shared text (either of the two 
> cases being checked for in the other place), you can make changes to the 
> text image (e.g. put whatever code you want at the entry point), which 
> will be reused the *next* time it is started *without* the debugger.

Cripes!  I think you're right...  If so, well done!

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13  0:51 ` Random832
  2017-05-13  0:55   ` Dave Horsfall
@ 2017-05-13  1:17   ` Chris Torek
  2017-05-13 15:25   ` Steve Simon
  2 siblings, 0 replies; 77+ messages in thread
From: Chris Torek @ 2017-05-13  1:17 UTC (permalink / raw)


>Ah.  There's the other piece.  You start the SUID program under
>the debugger, and rather than kicking off the debugger, it simply
>starts it non-suid.  *However*, in the presence of shared text
>(either of the two cases being checked for in the other place),
>you can make changes to the text image (e.g.  put whatever code
>you want at the entry point), which will be reused the *next*
>time it is started *without* the debugger.

Right.  Some of this was not a problem with demand-paged files
(4.xBSD) since you would not share swap images.  But there's more,
or rather, *was* more, once people added PT_ATTACH and new system
calls or behavior ... specifically, the setreuid / setregid calls
from 4.2BSD or the saved setuid behavior in System V.  This is one
I personally touched.

Suppose a process starts out setuid or setgid.  This means it has
alternative privileges (maybe super-user, maybe not).  With these
it can do things like open some files or transit some directories.
Afterward, using setreuid() and setregid() in 4.2BSD, the process
can swap its real and effective IDs, or give up its effective UID
or GID entirely, to give up its privileges.  So some program
could, for instance, chdir past a "lock" directory -- this was the
MDQS trick -- and now exist in a tree that it had no access to, or
open a file with secrets, that it no longer has permission to
open.

Once a process had given up any special privileges, though, it
could be ptraced again (via PT_ATTACH).  Now you can swap file
descriptor variables around in it, or otherwise make it do bad
things with the privileges it gained while it was setuid.  On
SysV, with "saved setuid", it is even worse: you can make the
process *regain* privileges and do whatever you want.

The fix we used was an extra bit in the process flags: "process
has had or used privileges". Once set (cleared only on exec()),
the process could no longer be ptraced except by root.  Processes
did -- and still do -- have to be careful about files they may
leave open, or other entities that survive exec().  (One should
also consider mmap() and shared regions.)

Chris


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13  0:51 ` Random832
  2017-05-13  0:55   ` Dave Horsfall
  2017-05-13  1:17   ` Chris Torek
@ 2017-05-13 15:25   ` Steve Simon
  2017-05-13 16:55     ` Clem Cole
  2017-05-13 23:01     ` Dave Horsfall
  2 siblings, 2 replies; 77+ messages in thread
From: Steve Simon @ 2017-05-13 15:25 UTC (permalink / raw)



hi,

this is (IMHO) a rather subtle bug,
the ones i remember where rather simpler. is it ok to discuss ancient security holes or is that still bad manners?

-Steve
 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13 15:25   ` Steve Simon
@ 2017-05-13 16:55     ` Clem Cole
  2017-05-13 17:19       ` William Pechter
  2017-05-13 23:01     ` Dave Horsfall
  1 sibling, 1 reply; 77+ messages in thread
From: Clem Cole @ 2017-05-13 16:55 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]

On Sat, May 13, 2017 at 11:25 AM, Steve Simon <steve at quintile.net> wrote:

> hi,
>
> this is (IMHO) a rather subtle bug,
> the ones i remember where rather simpler. is it ok to discuss ancient
> security holes or is that still bad manners?
>
​Speaking for myself.....   I clearly don't think it is bad manners​ as
this stage - I brought it up!E
It was a different time when that occurred.  Today, I think *the general
security community*** pretty lives by the rules of if you find something,
notify the folks that fix it as quickly as possible and try to get a patch
out and figure out how to get that patch out.   Then make damned sure the
whole is well documented and published so: a) do we can test for it in the
wild, b) make sure it does not happen again.

It actually has always impressed me at how good UNIX was (is) when you
really get down to it.  IMHO, was less the 'thousand eyeballs'' and more
the 'eye balls that all of cared, could do something about it and most
importantly actually understood' the 'calculus' of the different problems
were want made UNIX secure and as good if not better than many 'commercial'
systems than its contemporaries.  *i.e.* the UNIX schemes used sensible
 human based security processes/mechanisms combined with basic math &
physics ( technology if you will) - as the higher order bits, not being
secret or obscure to protect.

Were there mistakes, yup.   But frankly, VMS had as many if not more and
some of them were far, far worse.   IBM's OS were considered good, but
their were documented exploits in the news there too.

Clem


** I note 'security community' because not all firm buy into this behavior.
  I speak for myself.   In the last few weeks my own employer (Intel)
recent has been mixed up in a bit over played issue with server chips sets,
AMT and Winders [its not my area/group etc but as I under the issue, the
bug does not seem to effect UNIX flavors nor systems that do not use AMT -
which is a server thingy].   Some outside of Intel people are have
complained that folks that own the bug @ my employer has been less that
forth coming.   I'll not defend nor comment because it's not mine to
comment on, other than to state I personally take an attitude of trying to
say a much as I can and when I am in a position for my job I will and do.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170513/f75501ad/attachment-0001.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13 16:55     ` Clem Cole
@ 2017-05-13 17:19       ` William Pechter
  2017-05-14 12:55         ` Derek Fawcus
  0 siblings, 1 reply; 77+ messages in thread
From: William Pechter @ 2017-05-13 17:19 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4358 bytes --]

Clem Cole wrote:
>
> On Sat, May 13, 2017 at 11:25 AM, Steve Simon <steve at quintile.net 
> <mailto:steve at quintile.net>> wrote:
>
>     hi,
>
>     this is (IMHO) a rather subtle bug,
>     the ones i remember where rather simpler. is it ok to discuss
>     ancient security holes or is that still bad manners?
>
> ​Speaking for myself.....   I clearly don't think it is bad manners​ 
> as this stage - I brought it up!E
> It was a different time when that occurred.  Today, I think /the 
> general security community**/ pretty lives by the rules of if you find 
> something, notify the folks that fix it as quickly as possible and try 
> to get a patch out and figure out how to get that patch out.   Then 
> make damned sure the whole is well documented and published so: a) do 
> we can test for it in the wild, b) make sure it does not happen again.
>
> It actually has always impressed me at how good UNIX was (is) when you 
> really get down to it.  IMHO, was less the 'thousand eyeballs'' and 
> more the 'eye balls that all of cared, could do something about it and 
> most importantly actually understood' the 'calculus' of the different 
> problems were want made UNIX secure and as good if not better than 
> many 'commercial' systems than its contemporaries. /i.e./ the UNIX 
> schemes used sensible  human based security 
> processes/mechanisms combined with basic math & physics ( technology 
> if you will) - as the higher order bits, not being secret or obscure 
> to protect.
>
The problem is once you got past "One true Unix" you were left hoping 
the vendor fixed their bugs.
I saw somethings on Solaris and HPUX which were pretty much as bad as 
Windows.

The thing about Research Unix was that the underlying security structure 
was well designed.
VMS wasn't too bad either.  The problem was the stuff layered on it.

When VMS went to 3.6 or so a friend of mine was almost fired by DEC for 
randomly testing boxes looking
to see DEC's internal boxes weren't running System/Manager, 
Field/Service and UETP/UETP User/password
combinations.  DEC had just implemented new security features and alerts 
and Mitnick had just recently
penetrated them (IIRC).  Next thing you know corporate security was all 
over my buddy who was just killing
time on night shift  temporarily covering someone's vacation time.

It was interesting to see the SysV security enhanced Unix from AT&T at 
Pyramid -- who was migrating to
SVR4 from their BSD/SysV hybrid.  ACL's, split root/system and security 
mgr stuff which had been added
to get VMS to C and B2...  Some of these things had me wondering if any 
commercial sites would implement
two sign-ins to authorize special root-type actions on an os.

| Were there mistakes, yup.   But frankly, VMS had as many if not more 
and some of them were far, far worse.   IBM's OS were considered good, 
but their were documented exploits in the news there too.

The loginout.exe one was bad.  Were there any structural ones past v3.6?
>
> Clem
>
>
> ** I note 'security community' because not all firm buy into this 
> behavior.   I speak for myself.   In the last few weeks my own 
> employer (Intel) recent has been mixed up in a bit over played issue 
> with server chips sets, AMT and Winders [its not my area/group etc but 
> as I under the issue, the bug does not seem to effect UNIX flavors nor 
> systems that do not use AMT - which is a server thingy].   Some 
> outside of Intel people are have complained that folks that own the 
> bug @ my employer has been less that forth coming.   I'll not defend 
> nor comment because it's not mine to comment on, other than to state I 
> personally take an attitude of trying to say a much as I can and when 
> I am in a position for my job I will and do.
>
Actually... I'd think AMT is an automated remote IT  Management thing 
rather than a server thing,
since it exists on all the business Thinkpads from my T61's Core 2's up 
to the T420 i5.  They couldn't
be considered servers except they do support Samba and NFS and ssh. They 
also dual boot which
is a major part of the risk.

Sorry for the pedantic add... but I just remediated my 5 laptops for 
crap that should've been fixed with
new vendor software -- but they can't be bothered.




-- 
Digital had it then.  Don't you wish you could buy it now!
pechter-at-gmail.com  http://xkcd.com/705/



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13 15:25   ` Steve Simon
  2017-05-13 16:55     ` Clem Cole
@ 2017-05-13 23:01     ` Dave Horsfall
  1 sibling, 0 replies; 77+ messages in thread
From: Dave Horsfall @ 2017-05-13 23:01 UTC (permalink / raw)


On Sat, 13 May 2017, Steve Simon wrote:

> this is (IMHO) a rather subtle bug, the ones i remember where rather 
> simpler. is it ok to discuss ancient security holes or is that still bad 
> manners?

Bring it on :-)  Any systems still vulnerable (and exposed to the 
Internet) deserves all that they get.

For example, I think I was the first one to mention the SPL bug here (note 
that I didn't find it); I'm still trying to find that little program which 
I think was published in a Usenix newsletter and exploited by, err, yours 
truly...

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13 17:19       ` William Pechter
@ 2017-05-14 12:55         ` Derek Fawcus
  2017-05-14 22:12           ` Dave Horsfall
  0 siblings, 1 reply; 77+ messages in thread
From: Derek Fawcus @ 2017-05-14 12:55 UTC (permalink / raw)


On Sat, May 13, 2017 at 01:19:42PM -0400, William Pechter wrote:
> When VMS went to 3.6 or so a friend of mine was almost fired by DEC for 
> randomly testing boxes looking
> to see DEC's internal boxes weren't running System/Manager, 
> Field/Service and UETP/UETP User/password
> combinations.

Those default account combinations were still being used to gain
access to VMS systems in the '87-'89 time frame;  although
user/password was less interesting by itself,  being an unpriviledged
account.

DF


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-14 12:55         ` Derek Fawcus
@ 2017-05-14 22:12           ` Dave Horsfall
  2017-05-15  1:24             ` Nemo
  0 siblings, 1 reply; 77+ messages in thread
From: Dave Horsfall @ 2017-05-14 22:12 UTC (permalink / raw)


On Sun, 14 May 2017, Derek Fawcus wrote:

> > to see DEC's internal boxes weren't running System/Manager, 
> > Field/Service and UETP/UETP User/password combinations.
> 
> Those default account combinations were still being used to gain access 
> to VMS systems in the '87-'89 time frame;  although user/password was 
> less interesting by itself, being an unpriviledged account.

Wasn't there also Guest/Guest as well?  Admittedly it would also be pretty 
boring, but nonetheless still a toe-hold.

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-14 22:12           ` Dave Horsfall
@ 2017-05-15  1:24             ` Nemo
  2017-05-15 18:00               ` Steve Johnson
  0 siblings, 1 reply; 77+ messages in thread
From: Nemo @ 2017-05-15  1:24 UTC (permalink / raw)


On 14 May 2017 at 18:12, Dave Horsfall <dave at horsfall.org> wrote:
> On Sun, 14 May 2017, Derek Fawcus wrote:
>
>> > to see DEC's internal boxes weren't running System/Manager,
>> > Field/Service and UETP/UETP User/password combinations.
>>
>> Those default account combinations were still being used to gain access
>> to VMS systems in the '87-'89 time frame;  although user/password was
>> less interesting by itself, being an unpriviledged account.
>
> Wasn't there also Guest/Guest as well?  Admittedly it would also be pretty
> boring, but nonetheless still a toe-hold.

I worked in a VAX shop once where a DEC FSE came by (on the wrong day
with the sysadmin out) and was rather upset that the default account
passwords had been changed.

N.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-15  1:24             ` Nemo
@ 2017-05-15 18:00               ` Steve Johnson
  2017-05-16 22:33                 ` Ron Natalie
  0 siblings, 1 reply; 77+ messages in thread
From: Steve Johnson @ 2017-05-15 18:00 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1374 bytes --]

Early in the Unix days, a DEC repairperson showed up to do "preventive
maintenance" and managed to clobber the nascent file system.   Turns
out DEC didn't have any permanent file systems on machines that
small...

----- Original Message -----
From: "Nemo" <cym224@gmail.com>
To:"The Eunuchs Hysterical Society" <tuhs at tuhs.org>
Cc:
Sent:Sun, 14 May 2017 21:24:25 -0400
Subject:Re: [TUHS] The evolution of Unix facilities and architecture

 On 14 May 2017 at 18:12, Dave Horsfall <dave at horsfall.org> wrote:
 > On Sun, 14 May 2017, Derek Fawcus wrote:
 >
 >> > to see DEC's internal boxes weren't running System/Manager,
 >> > Field/Service and UETP/UETP User/password combinations.
 >>
 >> Those default account combinations were still being used to gain
access
 >> to VMS systems in the '87-'89 time frame; although user/password
was
 >> less interesting by itself, being an unpriviledged account.
 >
 > Wasn't there also Guest/Guest as well? Admittedly it would also be
pretty
 > boring, but nonetheless still a toe-hold.

 I worked in a VAX shop once where a DEC FSE came by (on the wrong day
 with the sysadmin out) and was rather upset that the default account
 passwords had been changed.

 N.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170515/008815c8/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-15 18:00               ` Steve Johnson
@ 2017-05-16 22:33                 ` Ron Natalie
  2017-05-16 23:13                   ` Arthur Krewat
  0 siblings, 1 reply; 77+ messages in thread
From: Ron Natalie @ 2017-05-16 22:33 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3730 bytes --]

Our biggest UNIX vs. DEC OS problem was that UNIX set the system clock in GMT whereas the DEC OSes of the day used local time.

We got used to the time being 4/5 hours off after a DEC CE was there.

 

We actually contracted our maintenance out for the entire company.    We ended up going with GE.   We couldn’t convince the guys from DEC who bid that when we said there were certain critical components of the site that needed 24-hour response, that this meant, 24 hours in a row, not three successive 8 hour days.

 

We often were stuck using a DEC CE for initial system setup or warranty work.    We had a particularly bad one who would blow power supplies, screw up running systems, etc.    She was just simply incompetent.   The culmination of this was when she managed somehow to put herself across the AC line of a VAX over in one of our external buildings and ended up being taken a way in an ambulance.    Working off hours, we’d often set up the new machines and run diagnostic checks on them ahead of the CE showing up.    I got a testy CE show up and tell me that he “didn’t need me checking up on his work.”  (Hell, was the COTR and customer).   I told him I only had one word to say about that <insert the incompetent CE’s name>.   He beat a quick retreat saying that Nancy was a different story.

 

I was driving to work one morning at Christmas time, and one of our local radio stations was soliciting people to send in their sob stories about how bad a year they had and they would be given a special gift.   One story went on for a few minutes, and I hadn’t caught on until it got to the electrical shock at work part and I knew it was Nancy’s story.

 

Our standard joke was that the way you could tell a DEC CE with a flat tire was that he had to change all four before he found the problem.

 

Amusingly, working for the feds had some other interesting fiascos.    I got an amusing message from the security and facilities people one day.    I had to tell our CE.

 

ME:   Bill, I can’t let you in the machine room anymore.

BILL:   Why not 

ME:   You’re a fire hazard.

BILL:   How so?

ME:  You have soldering irons.

 

Of course, I was able to prevail on them that we’d keep an eye on the CE and stand by with fire suppression if we let him do his job.    The machine room in my building had no automatic halon system which was popular in those days.    What we had was a lot of large halon hand extinguishers.    The post fire department came out and set pan fires behind our building and let us practice putting them out with the halon.     I can’t imagine what the costs to the federal budget and the ozone layer were on that little activity.    Of course, I brought my turnout gear as I was a firefighter and paramedic at the time.    This led to another interesting call from the front office.

 

SEC:    You need to attend a CPR class.

ME:    I’ve already had one this year.   I’m a state certified paramedic.   I go through recurrent training every month.

SEC:   Well it is a requirement that you have a CPR card.

ME:   Why am I the only one in my office that this is a requirement for?

SEC:    It’s your job classification.

ME:   Because I’m an electrical engineer and the other guys are computer scientists?

SEC:   Yes.

ME:   Why?

SEC:  Because you work with electricity.

ME:   I work with digital logic.   Five volts.   Further, even if I was going to shock myself into cardiac arrest, I can’t do CPR on  myself.   You should make everybody else take CPR.

 

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170516/0b1f4efd/attachment-0001.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-16 22:33                 ` Ron Natalie
@ 2017-05-16 23:13                   ` Arthur Krewat
  2017-05-16 23:18                     ` Ron Natalie
  0 siblings, 1 reply; 77+ messages in thread
From: Arthur Krewat @ 2017-05-16 23:13 UTC (permalink / raw)


On 5/16/2017 6:33 PM, Ron Natalie wrote:
> Our standard joke was that the way you could tell a DEC CE with a flat 
> tire was that he had to change all four before he found the problem.

Conversely, at BOCES LIRICS in Dix Hills, NY, we had a field-service 
tech who seemed to know every single wire wrap in the PDP-10's. First 
the KA10 they had before I got there (but worked on in high school), and 
then the KS10's.

Maybe the VAX/PDP-11 crew were different :)

On the other hand, he left in his desk a complete list of KLINIK line 
phone numbers (modems connected in parallel with the CTY), and 
PASSWORDS, to just about every PDP-10 in the tri-state area (NY/NJ/CT).




^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-16 23:13                   ` Arthur Krewat
@ 2017-05-16 23:18                     ` Ron Natalie
  0 siblings, 0 replies; 77+ messages in thread
From: Ron Natalie @ 2017-05-16 23:18 UTC (permalink / raw)


The wirewrap comment reminds me of another catastrophe.   BRL had one of the few operational Denelcor HEP supercomputers.    Amusingly the thing was front ended with an 11/34 and had an io processor with 32 Unibuses connected to it.    While we were still shaking the thing out at the factory (Mike Muuss shot his mouth off that we could get UNIX to run on the thing, and it turned out nobody had a better idea).    We'd come in at night (while the Denelcor employees had the machine in the day).    I'd regularly read the log about what happened that day.    Well, one day it read that someone had neatened up one of the system backplanes by taking the wires that were overly long and shortening them.    

Now this thing was all built out of 10800 ECL and those wires were "overly long" and stuffed into the backplane because the length was needed to control the signal propagation times.    I told Mike I had a bad feeling about this and the machine might not be usable.




^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
       [not found] <mailman.1.1494986402.2329.tuhs@minnie.tuhs.org>
@ 2017-05-19 14:31 ` David
  0 siblings, 0 replies; 77+ messages in thread
From: David @ 2017-05-19 14:31 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]


At Celerity we were porting Unix to a new NCR chipset for our washing machine sized Workstation.
We had a VAX 750 as the development box and we cross compiled to the NCR box. We contracted
out the 750 maintenance to a 3rd party and had no problems for a couple of years. Then one day I
came in to work to find the VAX happy consuming power and doing nothing. Unix wasn’t running and
nothing I could do would bring it back. After about 2 hours I got my boss and we contacted the maintenance
company. They guy they sent did much what I’d done and then went around the back. He pushed on the
backplane of the machine and Lo, it started working. He then removed the pressure and it failed quite
immediately. Turns out the backplane had a broken trace in it. We had done no board swaps in many
months and the room had had no A/C faults of any kind.

The company got a new backplane and had it installed in 2 days. Being 3rd party we couldn’t get it
replaced any quicker. After that it worked like a champ.

Celerity eventually became part of Sun as Sun Supercomputer.

	David



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-16 13:20 Noel Chiappa
@ 2017-05-16 13:46 ` Clem Cole
  0 siblings, 0 replies; 77+ messages in thread
From: Clem Cole @ 2017-05-16 13:46 UTC (permalink / raw)


I believe it is a true story...   The issue was that any early version of
the FE tools was set to use the RS04 as a temporary disk when running the
system exercisers if it found one.   The author of that error is a friend
of mine (and would later become a UNIX guy at Masscomp).  He said to me
about it once. he had checked with the DEC OS teams and thought it was an
'ok' because when they did it, none of the DEC OS used the RS04 for
permanent storage - the device had been designed to be a swapping device.

From what I understand, the issue was actually short lived, but widely
known in the UNIX community.   He told me it the accident only occurred at
one site that he knew about (AT&T) and they made a quick change to have it
ask before it wiped it out and that verion of the tools was release to the
field quickly.  But he was personally wise to UNIX from them on (and it
later years would come to love become a UNIX a user although I don't think
he ever gave up the EDT macros we wrote for EMACS for him and the other
ex-VMS folks).

Clem

On Tue, May 16, 2017 at 9:20 AM, Noel Chiappa <jnc at mercury.lcs.mit.edu>
wrote:

>     > From: "Steve Johnson"
>
>     > a DEC repairperson showed up to do "preventive maintenance" and
> managed
>     > to clobber the nascent file system.
>     > Turns out DEC didn't have any permanent file systems on machines that
>     > small...
>
> A related story (possibly a different version of this one) which I read
> (can't
> remember where, now) was that he trashed the contents of the RS04
> fixed-head
> hard disk, because on DEC OS's, those were only used for swapping.
>
>         Noel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170516/8ce42c3e/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-16 13:20 Noel Chiappa
  2017-05-16 13:46 ` Clem Cole
  0 siblings, 1 reply; 77+ messages in thread
From: Noel Chiappa @ 2017-05-16 13:20 UTC (permalink / raw)


    > From: "Steve Johnson"

    > a DEC repairperson showed up to do "preventive maintenance" and managed
    > to clobber the nascent file system.
    > Turns out DEC didn't have any permanent file systems on machines that
    > small...

A related story (possibly a different version of this one) which I read (can't
remember where, now) was that he trashed the contents of the RS04 fixed-head
hard disk, because on DEC OS's, those were only used for swapping.

	Noel


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-14 21:44 Noel Chiappa
  0 siblings, 0 replies; 77+ messages in thread
From: Noel Chiappa @ 2017-05-14 21:44 UTC (permalink / raw)


    > From: Random832

    > Ah. There's the other piece. You start the SUID program under the
    > debugger, and ... it simply starts it non-suid. *However*, in the
    > presence of shared text ...  you can make changes to the text image
    > ... which will be reused the *next* time it is started *without* the
    > debugger.

So I actually tried to do this (on a V6 system running on an emulator), after
whipping up a tiny test program (which prints "1", and the real and current
UIDs): the plan was to patch it to print a different number.

However, after a variety of stubbed toes and hiccups (gory details below, if
anyone cares), including a semi-interesting issue with the debugger and pure
texts), I'm punting: when trying to set a breakpoint in a pure text, I get the
error message "Can't set breakpoint", which sort of correlates with the
comment in the V6 sig$ptrace(): "write user I (for now, always an error)".

So it's not at all clear that the technique we thought would work would, in
fact, work - unless people weren't using a stock V6 system, but rather one
that had been tweaked to e.g. allow use of debuggers on pure-text programs
(including split I+D).

It's interesting to speculate on what the 'right' fix would be, if somehow the
techique above did work. The 'simple' fix, on systems with a PWB1-line XWRIT
flag, would be to ignore SETUID bits when doing an exec() of a pure text that
had been modified. But probably 'the' right fix would be to give someone
debugging a pure-text program their own private copy of the text. (This would
also prevent people who try to run the program from hitting breakpoints while
it's being debugged. :-)


But anyway, it's clear that back when, when I thought I'd found the bug, I
clearly hadn't - which is why when I looked into the source, it looked like it
had been 'already' been fixed. (And why Jim G hemmed and hawed...)

But I'm kind of curious about that mod in PWB1 that writes a modified pure
text back to the swap area when the last process using it exits. What was the
thinking behind that? What's the value to allowing someone to patch the
in-core pure text, and then save those patches? And there's also the 'other
people who try and run a program beind debugged are going to hit breakpoints'
issue, if you do allow writing into pure texts...


	Noel


--------


For the gory details: to start with, attempting to run a pure-text program
(whether SUID or not) under the debugger produced a "Can't execute
{program-name} Process terminated."  error message.

'cdb' is printing this error message just after the call to exec() (if that
fails, and returns). I modified it to print the error number when that
happens, and it's ETXTBSY. I had a quick look at the V6 source, to see if I
could see what the problem is, and it seems to be be (in sys1$exec()):

    if(u.u_arg[1]!=0 && (ip->i_flag&ITEXT)==0 && ip->i_count!=1) {
		     u.u_error = ETXTBSY;
			       goto bad;
			       }

What that code does is a little obscure; I'm not sure I understand it. The
first term checks to see if the size of the text segment is non-zero (which it
is not, in both 0407 and 0410 files). The second is, I think, looking to see
if the inode is marked as being in use for a pure text (which it isn't, until
later in exec()). The third checks to make sure nobody else is using the file.

So I guess this prevents exec() of a file which is already open, and not for a
pure text. (Why this is the Right Thing is not instantly clear to me...)

Anyway, the reason this fails under 'cdb' is that the debugger already has it
open (to be able to read the code). So I munged the debugger to close it
before doing the exec(), and then the error went away.

Then I ran into a long series of issues, the details of which are not at all
interesting, connected with the fact that the version of 'cdb' I was using
(one I got off a Tim Shoppa modified V6 disk) doesn't correspond to either of
the sources I have for 'cdb'.

When I switched to the latest source (so I could fix the issue above), it had
some bug where it wouldn't work unless there was a 'core' file. But eventually
I kludged it enough to get the 'can't set breakpoints' message, at which point
I threw in the towel.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-14  4:30           ` Theodore Ts'o
@ 2017-05-14 17:40             ` Clem Cole
  0 siblings, 0 replies; 77+ messages in thread
From: Clem Cole @ 2017-05-14 17:40 UTC (permalink / raw)


Ted -- thank you -- excellent write up.  Love it and I could not agree
more!!  Your 'worse is better' is the same idea as what is 'good enough,'
an argument I used to have at DEC, where being 'prefect' cost years and in
the end - and we lost because of it.

FWIW:  I put it slightly differently, make sure you pick a couple of things
that matter, and nail them... be the best on those >>few<< items but then
the rest needs to be 'good enough' and in time you can make those parts
better.   But if you wait for all parts to be great or worse 'perfect' - it
doesn't matter -- this says an ex-Alpha guy now working in INTEL*64 --
sigh...

BTW: Not to quibble, but you might also remember, traditional UNIX took the
same path as you did.   Ken's v6/v7 FS were not the great about write
ordering either.   Remember FFS is replacing Ken's work 15 years down the
road.   Kirk did not do all the careful ordered write stuff until after
George Gobble taught us all how to, which was a few years later.   When
Kirk implemented FFS, it was after the Purdue patches had been released to
make Unix's original FS more 'reliable' - and yes - Ted (Kowalski) and my
version of the original fsck was not nearly as careful as you were year
later.   But again, we were a huge step forward from what had been at the
time.  That said -- your stuff, as Larry has pointed, was rock solid and in
practice 'just worked.'  Certainly post ext3, I do not have memory of
losing any real user data on any of Linux boxes, and an error I made has
definitely be the cause of them to crashing over the years ;-)

Also, WRT to making the HW properly, and at lot of PC HW being trash -- yup
- which comes back to the what is good enough issue.   DEC did the same
thing and for long time in as SGI in making rock solid HW.  Heck DEC,
somewhat cornered the SCSI disk business for the mid-range and upper end
rack world in the 90s'.   Like Sgi's and Sun's OSses of the day, Tru64 has
very good DMA controllers under the covers that were hardened and lab
tested for corner cases.   In fact, it is one of the reasons why while
Tru64 could detect an Adaptec controller @ boot up and actually use it (I
had on my workstation), it was not officially in the 'SPD' as supported
device because the HW failed as you described and TruCluster's in
particular could not make a proper DLM do the failure modes of the Apaptec
(now as I used to point out to the marketing and HW weenies, none sane
person was going to put a $150 SCSI controller in their $.5M TruCluster
system - so we could have allowed it, just not allowed it some configs -
made it clear in the SPD -- Adaptec on in configs XXX).

As a result, the issue places out this way... an then DEC's VPs used to say
you could not make & sell an Alpha for under $5K (one person in particular
who I will leave nameless - those who were there - all know who I refer).
My last act before I left DEC/Compaq for Paceline, was to make the $1K
alpha using an $799 (end user) Compaq system with the K7 on the motherboard
swapped with an EV6 and some mechanical shims,  Adaptec SCSI BT (I still
have the motherboard @ home, and the EV6 in on my desk at Intel).  It was
built using PC parts - case, power supply et al.   The key was the Alpha
@5K was a better physically built system than the $799 based PC -- but who
cared ...   your closing para WRT to WiFi and PCMCIA summed up the issue
pretty well.

Clem

On Sun, May 14, 2017 at 12:30 AM, Theodore Ts'o <tytso at mit.edu> wrote:

> On Thu, May 11, 2017 at 03:25:47PM -0700, Larry McVoy wrote:
> > This is one place where I think Linux kicked Unix's ass.  And I am not
> > really sure how they did it, I have an idea but am not positive.  Unix
> > file systems up through UFS as shipped by Sun, were all vulnerable to
> > what I call the power out test.  Untar some big tarball and power off
> > the machine in the middle of it.  Reboot.  Hilarity ensues (not).
> >
> > You were dropped into some stand alone shell after fsck threw up its
> > hands and it was up to you to fix it.  Dozens and dozens of errors.
> > It was almost always faster to go to backups because figuring that
> > stuff out, file by file (which I have done more than once), gets you
> > to the point that your run "fsck -y" and go poke at lost+found when
> > fsck is done, realize that there is no hope, and reach for backups.
> >
> > Try the same thing with Linux.  The file system will come back, starting
> > with, I believe, ext2.
> >
> > My belief is that Linux orders writes such that while you may lose data
> > (as in, a process created a file, the OS said it was OK, but that file
> > will not be in the file system after a crash), but the rest of the file
> > system will be consistent.  I think it's as if you powered off the
> > machine a few seconds earlier than you actually did, some stuff is in
> > flight and until they can write stuff out in the proper order you may
> > lose data on a hard reset.
>
> So the story is a bit complicated here, and may be an example of
> "worse is better" --- which is ironically one of those things which is
> used as an explanation for why BSD/Unix won ever though the Lisp was
> technically superior[1] --- but in this case, it's Linux that did
> something "dirty", and BSD that did something that was supposed to be
> the "better" solution.
>
> [1] https://www.jwz.org/doc/worse-is-better.html
>
> So first let's talk about ext2 (which indeed, does not have file
> system journalling; that came in ext3).  The BSD Fast File System goes
> to a huge amount of effort to make sure that writes are sent to the
> disk in exactly the right order so that fsck can actually fix things.
> This requires that the disk not reorder writes (e.g., write caching is
> disabled or in write-through mode).  Linux, in ext2, didn't bother
> with trying to get the write order correct at all.  None.  Nada.  Zip.
> Writes would go out in whatever order dictated by the elevator
> scheduler, and so on a power failure or a kernel crash, the order in
> which metadata writes would be sent to the disk was completely
> unconstrained.
>
> Sounds horrible, right?  In many ways, it was.  And I lost count of
> how often NetBSD and FreeBSD users would talk about how primitive and
> horrible ext2 was in comparison to FFS, which had all of this
> excellent engineering work to make sure writes happened in the correct
> order such that fsck was guaranteed to always be able to fix things.
>
> So why did Linux get away with it?  When I wrote the fsck for ext2, I
> knew that anything can and would happen, so it was implemented so that
> it was extremely paranoid about not ever losing any data.  And if
> there was a chance that an expert could recover the data, e2fsck would
> stop and ask the system administrator to take a look.  In the case
> that the user ran with fsck -y, the default was drop files into
> lost+found, where as in some cases with the FFS fsck, it "knew" that
> in a particular case, the order in which writes were staged out the
> right thing to do was to let the unlink complete, so it would let the
> refcount go to zero, or stay at zero.
>
> The other thing that we did in Linux is that I made sure we had a
> highly functional "debugfs" tool.  This tool served two purposes.  The
> first was it made it very easy for me to creat a regression test suite
> for fsck.  As far as I know, none of the other major file systems at
> the time had an fsck with a regression test suite --- and I was
> religious about adding tests as I added functionality, and as I fixed
> bugs.  The debugfs tool made it easy for me to create test case file
> systems that was corrupted in various interesting ways.  The other use
> of debugfs was that it made it easy for experts to do file system
> recovery after a crash, if there was some really precious file that
> they needed to try to recover.
>
> So this is why this is a great example of "worse is better".  In Linux,
> ext2 was ***incredibly*** sloppy about how it handled write ordering
> --- it didn't do anything at all.  But as a consequence we developed
> tools that were extremely good to compensate, and in practice, it was
> extremely rare (although it did happen on occasion) that files would
> get lost or the file systme could end up in a state where fsck would
> not be able to recover without manual intervention by a system
> administrator using debugfs.
>
> But the other thing to note here is that in the PC era, most disk
> drives ran with write caching enabled, with writeback caching so that
> the hard drive could do its own elevator shceduling.  So having a file
> system that very carefully scheduled writes to make sure they happened
> in the write order didn't help you a *bit* unless you configured your
> hard drive to disable writeback caching --- at which point you would
> take a massive speed hit.
>
> This is ultimately also one weaknesses of Soft Updates --- it requires
> that you disable writeback caching, since it works by letting the OS
> control the order in which writes hit stable storage.  With
> journalling you don't have to do that; but the tradeoff is that when
> you do a journal commit, you need typically two cache flush
> operations.  (Or a cache flush followed by a FUA write of the commit
> block, if the disk supports FUA.)
>
>
>
> There is another example of how Linux embraced the "worse is better"
> philosophy in ext3, and that has to do with how we do journalling.
> The sophisticated way to do journalling is to do logical journalling.
> This is where what you write in the in journal is "set bit XXX in the
> allocation bitmap", or "update the mtime to YYYY".  And in this way,
> you can batch multiple file system operations into a single block
> written to the journal.  Solaris/UFS and Irix uses this much more
> sophisticated form of journalling.  (Actually, older versions of
> Solaris did use volume-level journalling, which is basically what
> ext3/ext4 uses, but they upgraded to the much more "right", more
> advanced thing, which is logical journalling.)
>
> Ext3 uses phyiscal, or volume-level journalling.  This journalling
> works on the block level --- so if we flip a bit in an allocation
> bitmap, we log the entire 4k block to the journal.  By default, we
> only do a journal commit every five seconds (unless an fsync happens
> first), so there could be multiple changes to a single inode table
> blocks that can be batched together, but it's still true that for a
> given metadata-heavy workload, a file system which uses logical
> journalling will tend require many fewer blocks written to the journal
> than a file system such as ext3/ext4 which uses physical block
> journalling.
>
> Why did Linux get away with it?  Number one, most workloads aren't
> really modify metadata all _that_ intensively, and 12k of sequential
> writes versus 32k of sequential writes doesn't actually take that much
> more time.  Secondly, Ted's law of PC-class hardware ("most PC-class
> hardware is crap") comes into play, and turns physical journalling
> into an advantage.  PC class hardware tends not to have power fail
> interrupts, and when power drops, and the voltage levels on the power
> rails start drooping, DRAM tends to go insane and starts returning
> garbage long before the DMA engine and the hard drive stops
> functioning.
>
> So if your system is doing logical journalling, after the file system
> commits a transaction, it will start writing the inode table block to
> the permanent location on disk.  If at that point you get a power
> drop, garbage can get written to the inode table block, and if the
> file system is using logical journalling, on reboot the mtime field
> can get updated from the logical journal --- but the rest of the inode
> table block is still garbage.
>
> In contrast, since ext3 was using physical block journalling, even if
> various metadata blocks get corrupted due to writes from failing DRAM
> during a power drop, when we replay the journal, this will restore the
> entire metadata block, and Things Just Worse.
>
> I have talked to an XFS engineer from SGI, and this was definitely a
> thing which SGI discovered the hard way.  After they discovered this
> problem, they added extra capacitors to the power supply, added a
> power fail interrupt, and taught Irix so that when the power fail
> interrupt was triggered, it would frantically cancel DMA transfers in
> order to avoid this problem.  I do not know how many of the other
> Legacy Unix systems figured out this failure mode --- and I can't
> claim that we were brilliant enough to design a system to avoid this
> problem.  It just so happened that the brute-force design that we
> chose was very well suited for crappy (but way cheaper than a Sun Fire
> E10k :-) PC-class hardware.
>
> > I copied Ted, who had his fingers deep in that code, maybe he can correct
> > me where I got it wrong.  Details aside, I think this is a place where
> > Linux moved the state of the art significantly forward.  There are other
> > places but this one is a big deal IMHO, maybe the biggest deal.
>
> So I'm not really sure we can claim to have "moved the state of the
> art".  There certainly wasn't any brilliant computer science
> innovations here.  That sort of thing is more like Soft Updates, of
> which Valerie Aurora (formerly Henson) once wrote,
>
>    "I've read this paper at least 15 times, and each time I when get
>    to page 7, I'm feeling pretty good and thinking, "Yeah, okay, I
>    must be smarter now than the last time I read this because I'm
>    getting it this time," - and then I turn to page 8 and my head
>    explodes." --- https://lwn.net/Articles/339337/
>
> I will be the first to admit that with ext2/ext3/ext4, especially in
> the early days, it was much more about brute force engineering, and
> regression testing, and much less about "moving the state of the art".
> Certainly those of us who were working on Linux weren't trying to get
> papers published in peer reviewed journals or conferences!  (And I've
> always thought that Greg Ganger was _way_ smarter than I.  :-)
>
> And if the Lisp Machine hackers looked down on BSD, and complained
> that BSD adopted the "Worse is Better" philosophy, while Lisp strived
> for the true, elegant, Correct technical solution, it's perhaps
> especially interesting to consider that if anything, Linux was an even
> more radical example of the "Worse is Better" philosophy.
>
> Cheers,
>
>                                         - Ted
>
> P.S.  There is yet another example of "Worse is Better" in how Linux
> had PCMCIA support several years before FreeBSD/NetBSD.  However, if
> you ejected a PCMCiA card in a Linux system, there was a chance (in
> practice it worked out to be about in 1 in 5 times for a WiFI card, in
> my experience) that the system would crash.  The *BSD's took a good
> 2-3 years longer to get PCMCIA support, but when they did, it was rock
> solid.  Of course, if you are a laptop user, and are happy to keep
> your 802.11 PCMCIA card permanently installed, guess which OS you were
> likely to prefer --- "sloppy but works, mostly", or "it'll get there
> eventually, and will be rock solid when it does, but zip, nada, right now"?
>
>
> >
> > --lm
> >
> > On Thu, May 11, 2017 at 04:37:29PM -0400, Ron Natalie wrote:
> > > I remember the pre-fsck days.   It was part of my test to become an
> operator at the UNIX site at JHU that I could run the various manual checks.
> > >
> > > The V6 file system wasn???t exactly stable during crashes (lousy
> database behavior), so there was almost certainly something to clean up.
> > >
> > >
> > >
> > > The first thing we???d run was icheck.   This runs down the superblock
> freelist and all the allocated blocks in the inodes.     If there were
> missing blocks (not in a file or the free list), you could use icheck ???s
> > >
> > > to rebuild it.    Similarly, if you had duplicated allocations in the
> freelist or between the freelist and a single file.   Anything more
> complicated required some clever patching (typically, we???d just mount
> readonly, copy the files, and then blow them away with clri).
> > >
> > >
> > >
> > > Then you???d run dcheck.   As mentioned dcheck walks the directory
> path from the top of the disk counting inode references that it reconciles
> with the link count in the inode.   Occasionally we???d end up with a 0-0
> inode (no directory entires, but allocated???typically this is caused by
> people removing a file while it is still open, a regular practice of some
> programs for their /tmp files.).    clri again blew these away.
> > >
> > >
> > >
> > > Clri wrote zeros all over the inode.   This had the effect of wiping
> out the file, but it was dangerous if you got the i-number wrong.    We
> replaced it with ???clrm??? which just cleared the allocated bit, a lot
> easy to reverse.
> > >
> > >
> > >
> > > If you really had a mess of a file system, you might get a piece of
> the directory tree broken off from a path to the root.   Or you???d have an
> inode that icheck reported dups.   ncheck would try to reconcile an inumber
> into an absolute path.
> > >
> > >
> > >
> > > After a while a program called fsdb came around that allowed you to
> poke at the various file system structures.    We didn???t use it much
> because by the time we had it, fsck was fast on its heals.
> > >
> >
> > --
> > ---
> > Larry McVoy                        lm at mcvoy.com
> http://www.mcvoy.com/lm
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170514/644a2259/attachment-0001.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 22:25         ` Larry McVoy
  2017-05-11 22:30           ` Ron Natalie
  2017-05-11 23:47           ` Dave Horsfall
@ 2017-05-14  4:30           ` Theodore Ts'o
  2017-05-14 17:40             ` Clem Cole
  2 siblings, 1 reply; 77+ messages in thread
From: Theodore Ts'o @ 2017-05-14  4:30 UTC (permalink / raw)


On Thu, May 11, 2017 at 03:25:47PM -0700, Larry McVoy wrote:
> This is one place where I think Linux kicked Unix's ass.  And I am not
> really sure how they did it, I have an idea but am not positive.  Unix
> file systems up through UFS as shipped by Sun, were all vulnerable to
> what I call the power out test.  Untar some big tarball and power off
> the machine in the middle of it.  Reboot.  Hilarity ensues (not).
> 
> You were dropped into some stand alone shell after fsck threw up its
> hands and it was up to you to fix it.  Dozens and dozens of errors.
> It was almost always faster to go to backups because figuring that 
> stuff out, file by file (which I have done more than once), gets you
> to the point that your run "fsck -y" and go poke at lost+found when
> fsck is done, realize that there is no hope, and reach for backups.
> 
> Try the same thing with Linux.  The file system will come back, starting
> with, I believe, ext2.
> 
> My belief is that Linux orders writes such that while you may lose data
> (as in, a process created a file, the OS said it was OK, but that file 
> will not be in the file system after a crash), but the rest of the file 
> system will be consistent.  I think it's as if you powered off the
> machine a few seconds earlier than you actually did, some stuff is in
> flight and until they can write stuff out in the proper order you may
> lose data on a hard reset.

So the story is a bit complicated here, and may be an example of
"worse is better" --- which is ironically one of those things which is
used as an explanation for why BSD/Unix won ever though the Lisp was
technically superior[1] --- but in this case, it's Linux that did
something "dirty", and BSD that did something that was supposed to be
the "better" solution.

[1] https://www.jwz.org/doc/worse-is-better.html

So first let's talk about ext2 (which indeed, does not have file
system journalling; that came in ext3).  The BSD Fast File System goes
to a huge amount of effort to make sure that writes are sent to the
disk in exactly the right order so that fsck can actually fix things.
This requires that the disk not reorder writes (e.g., write caching is
disabled or in write-through mode).  Linux, in ext2, didn't bother
with trying to get the write order correct at all.  None.  Nada.  Zip.
Writes would go out in whatever order dictated by the elevator
scheduler, and so on a power failure or a kernel crash, the order in
which metadata writes would be sent to the disk was completely
unconstrained.

Sounds horrible, right?  In many ways, it was.  And I lost count of
how often NetBSD and FreeBSD users would talk about how primitive and
horrible ext2 was in comparison to FFS, which had all of this
excellent engineering work to make sure writes happened in the correct
order such that fsck was guaranteed to always be able to fix things.

So why did Linux get away with it?  When I wrote the fsck for ext2, I
knew that anything can and would happen, so it was implemented so that
it was extremely paranoid about not ever losing any data.  And if
there was a chance that an expert could recover the data, e2fsck would
stop and ask the system administrator to take a look.  In the case
that the user ran with fsck -y, the default was drop files into
lost+found, where as in some cases with the FFS fsck, it "knew" that
in a particular case, the order in which writes were staged out the
right thing to do was to let the unlink complete, so it would let the
refcount go to zero, or stay at zero.

The other thing that we did in Linux is that I made sure we had a
highly functional "debugfs" tool.  This tool served two purposes.  The
first was it made it very easy for me to creat a regression test suite
for fsck.  As far as I know, none of the other major file systems at
the time had an fsck with a regression test suite --- and I was
religious about adding tests as I added functionality, and as I fixed
bugs.  The debugfs tool made it easy for me to create test case file
systems that was corrupted in various interesting ways.  The other use
of debugfs was that it made it easy for experts to do file system
recovery after a crash, if there was some really precious file that
they needed to try to recover.

So this is why this is a great example of "worse is better".  In Linux,
ext2 was ***incredibly*** sloppy about how it handled write ordering
--- it didn't do anything at all.  But as a consequence we developed
tools that were extremely good to compensate, and in practice, it was
extremely rare (although it did happen on occasion) that files would
get lost or the file systme could end up in a state where fsck would
not be able to recover without manual intervention by a system
administrator using debugfs.

But the other thing to note here is that in the PC era, most disk
drives ran with write caching enabled, with writeback caching so that
the hard drive could do its own elevator shceduling.  So having a file
system that very carefully scheduled writes to make sure they happened
in the write order didn't help you a *bit* unless you configured your
hard drive to disable writeback caching --- at which point you would
take a massive speed hit.

This is ultimately also one weaknesses of Soft Updates --- it requires
that you disable writeback caching, since it works by letting the OS
control the order in which writes hit stable storage.  With
journalling you don't have to do that; but the tradeoff is that when
you do a journal commit, you need typically two cache flush
operations.  (Or a cache flush followed by a FUA write of the commit
block, if the disk supports FUA.)



There is another example of how Linux embraced the "worse is better"
philosophy in ext3, and that has to do with how we do journalling.
The sophisticated way to do journalling is to do logical journalling.
This is where what you write in the in journal is "set bit XXX in the
allocation bitmap", or "update the mtime to YYYY".  And in this way,
you can batch multiple file system operations into a single block
written to the journal.  Solaris/UFS and Irix uses this much more
sophisticated form of journalling.  (Actually, older versions of
Solaris did use volume-level journalling, which is basically what
ext3/ext4 uses, but they upgraded to the much more "right", more
advanced thing, which is logical journalling.)

Ext3 uses phyiscal, or volume-level journalling.  This journalling
works on the block level --- so if we flip a bit in an allocation
bitmap, we log the entire 4k block to the journal.  By default, we
only do a journal commit every five seconds (unless an fsync happens
first), so there could be multiple changes to a single inode table
blocks that can be batched together, but it's still true that for a
given metadata-heavy workload, a file system which uses logical
journalling will tend require many fewer blocks written to the journal
than a file system such as ext3/ext4 which uses physical block
journalling.

Why did Linux get away with it?  Number one, most workloads aren't
really modify metadata all _that_ intensively, and 12k of sequential
writes versus 32k of sequential writes doesn't actually take that much
more time.  Secondly, Ted's law of PC-class hardware ("most PC-class
hardware is crap") comes into play, and turns physical journalling
into an advantage.  PC class hardware tends not to have power fail
interrupts, and when power drops, and the voltage levels on the power
rails start drooping, DRAM tends to go insane and starts returning
garbage long before the DMA engine and the hard drive stops
functioning.

So if your system is doing logical journalling, after the file system
commits a transaction, it will start writing the inode table block to
the permanent location on disk.  If at that point you get a power
drop, garbage can get written to the inode table block, and if the
file system is using logical journalling, on reboot the mtime field
can get updated from the logical journal --- but the rest of the inode
table block is still garbage.

In contrast, since ext3 was using physical block journalling, even if
various metadata blocks get corrupted due to writes from failing DRAM
during a power drop, when we replay the journal, this will restore the
entire metadata block, and Things Just Worse.

I have talked to an XFS engineer from SGI, and this was definitely a
thing which SGI discovered the hard way.  After they discovered this
problem, they added extra capacitors to the power supply, added a
power fail interrupt, and taught Irix so that when the power fail
interrupt was triggered, it would frantically cancel DMA transfers in
order to avoid this problem.  I do not know how many of the other
Legacy Unix systems figured out this failure mode --- and I can't
claim that we were brilliant enough to design a system to avoid this
problem.  It just so happened that the brute-force design that we
chose was very well suited for crappy (but way cheaper than a Sun Fire
E10k :-) PC-class hardware.

> I copied Ted, who had his fingers deep in that code, maybe he can correct
> me where I got it wrong.  Details aside, I think this is a place where
> Linux moved the state of the art significantly forward.  There are other
> places but this one is a big deal IMHO, maybe the biggest deal.

So I'm not really sure we can claim to have "moved the state of the
art".  There certainly wasn't any brilliant computer science
innovations here.  That sort of thing is more like Soft Updates, of
which Valerie Aurora (formerly Henson) once wrote,

   "I've read this paper at least 15 times, and each time I when get
   to page 7, I'm feeling pretty good and thinking, "Yeah, okay, I
   must be smarter now than the last time I read this because I'm
   getting it this time," - and then I turn to page 8 and my head
   explodes." --- https://lwn.net/Articles/339337/

I will be the first to admit that with ext2/ext3/ext4, especially in
the early days, it was much more about brute force engineering, and
regression testing, and much less about "moving the state of the art".
Certainly those of us who were working on Linux weren't trying to get
papers published in peer reviewed journals or conferences!  (And I've
always thought that Greg Ganger was _way_ smarter than I.  :-)

And if the Lisp Machine hackers looked down on BSD, and complained
that BSD adopted the "Worse is Better" philosophy, while Lisp strived
for the true, elegant, Correct technical solution, it's perhaps
especially interesting to consider that if anything, Linux was an even
more radical example of the "Worse is Better" philosophy.

Cheers,

					- Ted

P.S.  There is yet another example of "Worse is Better" in how Linux
had PCMCIA support several years before FreeBSD/NetBSD.  However, if
you ejected a PCMCiA card in a Linux system, there was a chance (in
practice it worked out to be about in 1 in 5 times for a WiFI card, in
my experience) that the system would crash.  The *BSD's took a good
2-3 years longer to get PCMCIA support, but when they did, it was rock
solid.  Of course, if you are a laptop user, and are happy to keep
your 802.11 PCMCIA card permanently installed, guess which OS you were
likely to prefer --- "sloppy but works, mostly", or "it'll get there
eventually, and will be rock solid when it does, but zip, nada, right now"?


> 
> --lm
> 
> On Thu, May 11, 2017 at 04:37:29PM -0400, Ron Natalie wrote:
> > I remember the pre-fsck days.   It was part of my test to become an operator at the UNIX site at JHU that I could run the various manual checks.
> > 
> > The V6 file system wasn???t exactly stable during crashes (lousy database behavior), so there was almost certainly something to clean up.
> > 
> >  
> > 
> > The first thing we???d run was icheck.   This runs down the superblock freelist and all the allocated blocks in the inodes.     If there were missing blocks (not in a file or the free list), you could use icheck ???s
> > 
> > to rebuild it.    Similarly, if you had duplicated allocations in the freelist or between the freelist and a single file.   Anything more complicated required some clever patching (typically, we???d just mount readonly, copy the files, and then blow them away with clri).
> > 
> >  
> > 
> > Then you???d run dcheck.   As mentioned dcheck walks the directory path from the top of the disk counting inode references that it reconciles with the link count in the inode.   Occasionally we???d end up with a 0-0 inode (no directory entires, but allocated???typically this is caused by people removing a file while it is still open, a regular practice of some programs for their /tmp files.).    clri again blew these away.
> > 
> >  
> > 
> > Clri wrote zeros all over the inode.   This had the effect of wiping out the file, but it was dangerous if you got the i-number wrong.    We replaced it with ???clrm??? which just cleared the allocated bit, a lot easy to reverse.
> > 
> >  
> > 
> > If you really had a mess of a file system, you might get a piece of the directory tree broken off from a path to the root.   Or you???d have an inode that icheck reported dups.   ncheck would try to reconcile an inumber into an absolute path.
> > 
> >  
> > 
> > After a while a program called fsdb came around that allowed you to poke at the various file system structures.    We didn???t use it much because by the time we had it, fsck was fast on its heals.
> > 
> 
> -- 
> ---
> Larry McVoy            	     lm at mcvoy.com             http://www.mcvoy.com/lm 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-13  1:25 Noel Chiappa
  0 siblings, 0 replies; 77+ messages in thread
From: Noel Chiappa @ 2017-05-13  1:25 UTC (permalink / raw)


    > From: Random832

    > It seems to me that this check is central to being able to (or not)
    > modify the in-core image of any process at all other than the one being
    > traced (say, by attaching to a SUID program that has already dropped
    > privileges, and making changes that will affect the next time it is
    > run).

Right, good catch: if you have a program that was _both_ sticky and SUID, when
the system is idle (so the text copy in the swap area won't get recycled),
call up a copy under the debugger, patch it, exit (leaving the patched copy),
and then re-run it without the debugger.

I'd have to check the handling of patched sticky pure texts - to see if they
are retained or not.

{Checks code.}

Well, the code to do with pure texts is _very_ different between V6 and
PWB1.

The exact approach above might not work in V6, because the modified (in-core)
copy of pure texts are simply deleted when the last user exits them. But it
might be possible for a slight variant to work; leave the copy under the
debugger (which will prevent the in-core copy from being discarded), and then
run it again without the debugger. That might do it.

Under PWB1, I'm not sure if any variant would work (very complicated, and I'm
fading). There's an extra flag bit, XWRIT, which is set when a pure text is
written into; when the last user stops using the in-code pure text, the
modified text is written to swap.  (It lools like the in-core copy is always
discarded when the last user stops using it.) But the check for sticky would
probably stop a sticky pure-text being modified? But maybe the approach that
seems like it would work under V6 (leave the patched, debugger copy running,
and start a new instance) looks like it should work here too.

So maybe the sticky thing is irrelevant? On both V6 and PWB1, it just needs a
pure text which is SETUID: start under the debugger, patch, leave running, and
start a _new_ copy, which will run the patched version as the SUID user.

      Noel



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13  0:26     ` Dave Horsfall
@ 2017-05-13  0:48       ` Random832
  0 siblings, 0 replies; 77+ messages in thread
From: Random832 @ 2017-05-13  0:48 UTC (permalink / raw)




On Fri, May 12, 2017, at 20:26, Dave Horsfall wrote:
> On Fri, 12 May 2017, Random832 wrote:
> 
> > > > 	if (xp->x_count!=1 || xp->x_iptr->i_mode&ISVTX)
> > > > 		goto error;
> > > 
> > > Err, isn't that the sticky bit, not the setuid bit?
> > 
> > The sticky bit makes it keep the image in memory when there are no 
> > processes using it. I assume x_count is determining whether there are 
> > processes using it. So, taken together, these checks are "is there or 
> > might there be in the future a process, other than the one being 
> > debugged, using this exact copy of the image rather than loading it from 
> > the disk".
> 
> I know that, but the discussion was about the SUID bit, and the ability
> to 
> modify the in-core image of a set-uid program being run...

It seems to me that this check is central to being able to (or not)
modify the in-core image of any process at all other than the one being
traced (say, by attaching to a SUID program that has already dropped
privileges, and making changes that will affect the next time it is
run).


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 23:52   ` Random832
@ 2017-05-13  0:26     ` Dave Horsfall
  2017-05-13  0:48       ` Random832
  0 siblings, 1 reply; 77+ messages in thread
From: Dave Horsfall @ 2017-05-13  0:26 UTC (permalink / raw)


On Fri, 12 May 2017, Random832 wrote:

> > > 	if (xp->x_count!=1 || xp->x_iptr->i_mode&ISVTX)
> > > 		goto error;
> > 
> > Err, isn't that the sticky bit, not the setuid bit?
> 
> The sticky bit makes it keep the image in memory when there are no 
> processes using it. I assume x_count is determining whether there are 
> processes using it. So, taken together, these checks are "is there or 
> might there be in the future a process, other than the one being 
> debugged, using this exact copy of the image rather than loading it from 
> the disk".

I know that, but the discussion was about the SUID bit, and the ability to 
modify the in-core image of a set-uid program being run...

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-13  0:22 ` Clem Cole
@ 2017-05-13  0:23   ` Clem Cole
  0 siblings, 0 replies; 77+ messages in thread
From: Clem Cole @ 2017-05-13  0:23 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 373 bytes --]

On Fri, May 12, 2017 at 8:22 PM, Clem Cole <clemc at ccc.com> wrote:

> We should try to look in the PWB 1.0 kernel.
>

​As you said, you found it in the PWB1.0 sources... which is really
interesting.​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/8ddf0c55/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 23:30 Noel Chiappa
  2017-05-12 23:38 ` Dave Horsfall
@ 2017-05-13  0:22 ` Clem Cole
  2017-05-13  0:23   ` Clem Cole
  1 sibling, 1 reply; 77+ messages in thread
From: Clem Cole @ 2017-05-13  0:22 UTC (permalink / raw)


Interesting...   I don't remember Gettys being at the meeting (I would get
to know Jim a few years later when he was at Princeton before he came back
to MIT to work on X) and he's a been a friend of mine for a number of years
(actually lives in the next town over).

I do not remember all the details of the bug at this point, to many beers
ago; but yes the jist of the issue was being able to write to user memory
with ptraced process with SUID being involved.

The only thing that worries me about your response is I thought remembered
that MMU was somehow involved.   Just turning off SUID was not the only
part of the solution.

I do remember that the bug was in the Research kernel at the time and
Dennis had not known about it until that meeting so if PWB had it fixed,
that's an example of something that did not go back, which I would find
surprising.

I suspect MIT found and fixed it independently, but it never got passed it
back for whatever reason.

We should try to look in the PWB 1.0 kernel.

Clem

On Fri, May 12, 2017 at 7:30 PM, Noel Chiappa <jnc at mercury.lcs.mit.edu>
wrote:

>     > From: Clem Cole
>
>     > I said -- profil - I intended to say  ptrace(2)
>
> Is that the one where running an SUID program under the debugger allowed
> one
> to patch the in-core image of said program?
>
> If so, I have a story, and a puzzle, about that.
>
>
> A couple of us, including Jim Gettys (later of X-windows fame) were on out
> way
> out to dinner one evening (I don't recall when, alas, but I didn't meet him
> until '80 or so), and he mentioned this horrible Unix security bug that had
> just been found. All he would tell me about it (IIRC) was that it involved
> ptrace.
>
> So, over dinner (without the source) I figured out what it had to be:
> patching SUID programs. So I asked him if that was what it was, and I don't
> recall his exact answer, but I vaguely recall he hemmed and hawed in a way
> that let me know I'd worked it out.
>
> So when we got back from dinner, I looked at the source to our system to
> see
> if I was right, and.... it had already been fixed! Here's the code:
>
>         if (xp->x_count!=1 || xp->x_iptr->i_mode&ISVTX)
>                 goto error;
>
> Now, we'd been running that system since '77 (when I joined CSR), without
> any
> changes to that part of the OS, so I'm pretty sure this fix pre-dates your
> story?
>
> So when I saw your email about this, I wondered 'did that bug get fixed at
> MIT when some undergrad used it to break in' (I _think_ ca. '77 is when
> they
> switched from an OS called Delphi on the -11/45 used for the undergrad CS
> programming course - I _think_ they switched that machine from Delphi to
> Unix), or did it come with PWB1? (Like I said, that system was mostly
> PWB1.)
>
> So I just looked in the PWB1 sources, and... there it is, the _exact_ same
> fix. So we must have got it from PWB1.
>
> So now the question is: did the PWB guys find and fix this, and forget to
> tell the research guys? Or did they tell them, and the research guys blew
> them off? Or what?
>
>         Noel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/fc322093/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 23:38 ` Dave Horsfall
@ 2017-05-12 23:52   ` Random832
  2017-05-13  0:26     ` Dave Horsfall
  0 siblings, 1 reply; 77+ messages in thread
From: Random832 @ 2017-05-12 23:52 UTC (permalink / raw)


On Fri, May 12, 2017, at 19:38, Dave Horsfall wrote:
> On Fri, 12 May 2017, Noel Chiappa wrote:
> 
> > So when we got back from dinner, I looked at the source to our system to see
> > if I was right, and.... it had already been fixed! Here's the code:
> > 
> > 	if (xp->x_count!=1 || xp->x_iptr->i_mode&ISVTX)
> > 		goto error;
> 
> Err, isn't that the sticky bit, not the setuid bit?

The sticky bit makes it keep the image in memory when there are no
processes using it. I assume x_count is determining whether there are
processes using it. So, taken together, these checks are "is there or
might there be in the future a process, other than the one being
debugged, using this exact copy of the image rather than loading it from
the disk".

The next line is "xp->x_iptr->i_flag &= ~ITEXT", which I assume prevents
the image from being reused for other processes started while this one
is running.

I am looking at 7th edition in the UnixTree site, the whole fix is:

		/*
		 * If text, must assure exclusive use
		 */
		if (xp = u.u_procp->p_textp) {
			if (xp->x_count!=1 || xp->x_iptr->i_mode&ISVTX)
				goto error;
			xp->x_iptr->i_flag &= ~ITEXT;
		}

The equivalent section to the one this appears in in 6th edition doesn't
have the fix, and the comment claims, doesn't work at all:

	/* write user I (for now, always an error) */
	case 4:
		if (suiword(ipc.ip_addr, 0) < 0)
			goto error;
		suiword(ipc.ip_addr, ipc.ip_data);
		break;

This is clearly PDP-11 specific, maybe a similar bug reappeared with
demand-paged virtual memory. 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 23:30 Noel Chiappa
@ 2017-05-12 23:38 ` Dave Horsfall
  2017-05-12 23:52   ` Random832
  2017-05-13  0:22 ` Clem Cole
  1 sibling, 1 reply; 77+ messages in thread
From: Dave Horsfall @ 2017-05-12 23:38 UTC (permalink / raw)


On Fri, 12 May 2017, Noel Chiappa wrote:

> So when we got back from dinner, I looked at the source to our system to see
> if I was right, and.... it had already been fixed! Here's the code:
> 
> 	if (xp->x_count!=1 || xp->x_iptr->i_mode&ISVTX)
> 		goto error;

Err, isn't that the sticky bit, not the setuid bit?

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-12 23:30 Noel Chiappa
  2017-05-12 23:38 ` Dave Horsfall
  2017-05-13  0:22 ` Clem Cole
  0 siblings, 2 replies; 77+ messages in thread
From: Noel Chiappa @ 2017-05-12 23:30 UTC (permalink / raw)


    > From: Clem Cole

    > I said -- profil - I intended to say  ptrace(2)

Is that the one where running an SUID program under the debugger allowed one
to patch the in-core image of said program?

If so, I have a story, and a puzzle, about that.


A couple of us, including Jim Gettys (later of X-windows fame) were on out way
out to dinner one evening (I don't recall when, alas, but I didn't meet him
until '80 or so), and he mentioned this horrible Unix security bug that had
just been found. All he would tell me about it (IIRC) was that it involved
ptrace.

So, over dinner (without the source) I figured out what it had to be:
patching SUID programs. So I asked him if that was what it was, and I don't
recall his exact answer, but I vaguely recall he hemmed and hawed in a way
that let me know I'd worked it out.

So when we got back from dinner, I looked at the source to our system to see
if I was right, and.... it had already been fixed! Here's the code:

	if (xp->x_count!=1 || xp->x_iptr->i_mode&ISVTX)
		goto error;

Now, we'd been running that system since '77 (when I joined CSR), without any
changes to that part of the OS, so I'm pretty sure this fix pre-dates your
story?

So when I saw your email about this, I wondered 'did that bug get fixed at
MIT when some undergrad used it to break in' (I _think_ ca. '77 is when they
switched from an OS called Delphi on the -11/45 used for the undergrad CS
programming course - I _think_ they switched that machine from Delphi to
Unix), or did it come with PWB1? (Like I said, that system was mostly PWB1.)

So I just looked in the PWB1 sources, and... there it is, the _exact_ same
fix. So we must have got it from PWB1.

So now the question is: did the PWB guys find and fix this, and forget to
tell the research guys? Or did they tell them, and the research guys blew
them off? Or what?

	Noel


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 21:12           ` Dave Horsfall
@ 2017-05-12 23:25             ` Hellwig Geisse
  0 siblings, 0 replies; 77+ messages in thread
From: Hellwig Geisse @ 2017-05-12 23:25 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1022 bytes --]

On Sa, 2017-05-13 at 07:12 +1000, Dave Horsfall wrote:
> 
> Let's see:
> 
>     aneurin% cdecl
>     Type `help' or `?' for help
>     explain void (*p)(int)
>     declare p as pointer to function (int) returning void
> 
> So the "fundamental" type (if there was such a thing) would be a
> pointer to a function, I guess i.e. don't treat it as anything else.
> 

Yes, of course. What I was aiming at: If you try
to declare two of these variables, neither
"void (*p,q)(int)" nor "void (*(p,q))(int)"
is allowed, so you cannot use the "fundamental
type" to declare more than one variable of this
type in a single declaration list (as you had
suggested with "char* cp1, cp2").

"void (*p)(int), (*q)(int)" in contrast is legal,
but I wouldn't call "void" the fundamental type
in these declarations. Thus my statement "list
construction (in declarations) and C declarations
don't mix well" - IMO one of the difficulties in
reading/writing C declarations, and the starting
point of this discussion.

Hellwig


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 20:40       ` Jeremy C. Reed
@ 2017-05-12 21:29         ` Clem Cole
  0 siblings, 0 replies; 77+ messages in thread
From: Clem Cole @ 2017-05-12 21:29 UTC (permalink / raw)


Excuse me - I said -- profil - I intended to say  ptrace(2)....   long day
-- aging brain bits.

http://man.cat-v.org/unix_7th/2/ptrace

The debugger system call that allows you to a processed memory.

On Fri, May 12, 2017 at 4:40 PM, Jeremy C. Reed <reed at reedmedia.net> wrote:

> profil code somewhere in here?
> https://github.com/weiss/original-bsd/commits/master/
> sys/kern/kern_clock.c?after=b44636d7febc9dcf553118bd320571864188351d+104
>
> that has the sccs history back to April 1980 for src/sys/sys/clock.c
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/4680e758/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 18:56 ` Dan Cross
  2017-05-12 19:43   ` Clem Cole
@ 2017-05-12 21:29   ` Ron Natalie
  1 sibling, 0 replies; 77+ messages in thread
From: Ron Natalie @ 2017-05-12 21:29 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 891 bytes --]

Allegedly at one point Dennis Mumaugh over at the NSA had a list of UNIX security bugs that became classified.

Mike Muuss (and the rest of us BRLers)  decided not to request a copy because we figured we’d be better off not being constrained by knowing some bug we were disseminating information on (to other system adminsitrators) was classified.    We did have a little informal meeting with Dennis in the hall and ran through our list of known issues at the time.

 

My favorite isn’t so much a UNIX bug but a PDP-11 hardware bug.    It’s just that I think it may be impossible on a DEC operating system to create a program that manifests it (you have to fill your entire address space with SPL instructions).

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/1f7bc028/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12  6:24         ` Hellwig Geisse
@ 2017-05-12 21:12           ` Dave Horsfall
  2017-05-12 23:25             ` Hellwig Geisse
  0 siblings, 1 reply; 77+ messages in thread
From: Dave Horsfall @ 2017-05-12 21:12 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 870 bytes --]

On Fri, 12 May 2017, Hellwig Geisse wrote:

> >     char*	cp1;
> >     char*	cp2;
> > 
> > etc, which IMHO makes it clear (which is every programmer's duty). I 
> > used  to write that way in a previous life, and the boss didn't 
> > complain.
> 
> This view does not work well with more complicated declarations like 
> "void (*p)(int)". What is the "fundamental type" here? One could argue 
> that the real culprit is the list construction, which does not mix well 
> with C declarations.

Let's see:

    aneurin% cdecl
    Type `help' or `?' for help
    explain void (*p)(int)
    declare p as pointer to function (int) returning void

So the "fundamental" type (if there was such a thing) would be a pointer 
to a function, I guess i.e. don't treat it as anything else.

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 20:06     ` Clem Cole
@ 2017-05-12 20:40       ` Jeremy C. Reed
  2017-05-12 21:29         ` Clem Cole
  0 siblings, 1 reply; 77+ messages in thread
From: Jeremy C. Reed @ 2017-05-12 20:40 UTC (permalink / raw)


profil code somewhere in here?
https://github.com/weiss/original-bsd/commits/master/sys/kern/kern_clock.c?after=b44636d7febc9dcf553118bd320571864188351d+104

that has the sccs history back to April 1980 for src/sys/sys/clock.c




^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 19:43   ` Clem Cole
@ 2017-05-12 20:06     ` Clem Cole
  2017-05-12 20:40       ` Jeremy C. Reed
  0 siblings, 1 reply; 77+ messages in thread
From: Clem Cole @ 2017-05-12 20:06 UTC (permalink / raw)


BTW:  As I think more about it, I believe that it is probable that George
found it on the PDP-11 first (V7) not yet BSD 3.0/4.0 (altlhough I've
forgotten the date or which USENIX we had the meeting).  I'm not sure why,
but I remember putting the fix into a multiple kernels due to specifics of
the different MMUs).   Because of that, memory I'm going to date it as
probably 1981, because I would have had 3 different kernels to play with:
PDP-11, Vax and Glaser and I were writing Magix an OS for the Tek Magonlia,
a 68000 based workstation being build in Tek Labs.

But I'm sure if we go hunting through different people's records around
that time, changes to the profile kernel code for security will show up.
As I said, George originally found it and we all grabbed his fix... so I
would expect to see a lot of updated kernels with a similar changes around
the same time; but all put in by different people - but I fear, it was also
earlier enough that this was before SCCS was widely being used; so tracking
it can only be done by looking at distribution tapes from those times.

Clem

On Fri, May 12, 2017 at 3:43 PM, Clem Cole <clemc at ccc.com> wrote:

> Could be.   The scare was that the anti-UNIX folks would get wind of it
> and it would used in the fight as to why VMS was 'better.' The CS Research
> community has not yet made the switch off the 36 bit world to Vaxen, so the
> Arpanet community is still pretty much PDP-10 central; but it was also
> right around the time when DARPA was defunding the PDP-10; had chosen the
> VAX but was arguing VMS vs UNIX.
>
> I don't thing CSRG had been funded as a group yet. Joy might have done his
> 'fast vax' paper to show that UNIX was just as good as VMS, but that work
> might be on the horizon.  Certainly all of 4.1a/b/c, 4.2, 4.3, NET-x was
> years away.
>
> The point is that you didn't (yet) have a mass of students on the systems
> 'in the field', but some folks had that as a vision (and want it to be that
> way and are scared it something bad happens 'in the press' - it would cause
> a set back.
>
> At the that time, think a couple of Universities are >>starting<< to use
> UNIX for general CS classes/teaching (Purdue & UCB being two of them),
> maybe Michigan and U of I, but I think CMU and Stanford are still using
> PDP-20's [not sure about MIT] (where Princeton and UCLA I think were still
> IBM shops for undergrads).
>
> So the whole reason to keep it quiet @ the USENOX conference was because
> it was felt at the time, the folks in that room were the primary people
> hacking the kernel and if we all took the couple of lines of fix back to
> our shops, the problem was solved.
>
> It sort of blows my mind if Doug never knew about it, in hind sight it
> seems George got his wish!!
>
> Clem
>
> On Fri, May 12, 2017 at 2:56 PM, Dan Cross <crossd at gmail.com> wrote:
>
>> On Fri, May 12, 2017 at 2:43 PM, Doug McIlroy <doug at cs.dartmouth.edu>
>> wrote:
>>>
>>> >  We all took the code back and promised to get patches out ASAP and
>>> not tell any one about it.
>>>
>>> Fascinating. Chnages were installed frequently in the Unix lab, mostly
>>> at night without fanfare. But an actual zero-day should have been big
>>> enough news for me to have heard about. I'm pretty sure I didn't; Dennis
>>> evidently kept his counsel.
>>
>>
>> I wonder if such a thing would have been treated the same way within Bell
>> Labs as outside?
>>
>> Presumably you didn't have to worry about hordes of undergraduates
>> picking over your systems looking for ways to get root access. Or, indeed,
>> undergraduates doing anything on your systems, save for the occasional
>> intern or precocious child of an employee. For that matter, this raises a
>> question: what was the attitude towards root access within the labs? Was it
>> constrained to the anointed few or did a large-ish number of people have it?
>>
>> Anyway, I could well imagine a scenario where Dennis comes back but
>> thinks fairly little of it and makes vague mention of a fairly serious bug
>> but gives it little more thought than any other fairly serious bug. It's
>> patched and folks go on with their lives, since it's much less likely to be
>> the source of irritation in a corporate search department than it would be
>> in, say, a university.
>>
>>         - Dan C.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/505437b9/attachment-0001.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 18:56 ` Dan Cross
@ 2017-05-12 19:43   ` Clem Cole
  2017-05-12 20:06     ` Clem Cole
  2017-05-12 21:29   ` Ron Natalie
  1 sibling, 1 reply; 77+ messages in thread
From: Clem Cole @ 2017-05-12 19:43 UTC (permalink / raw)


Could be.   The scare was that the anti-UNIX folks would get wind of it and
it would used in the fight as to why VMS was 'better.' The CS Research
community has not yet made the switch off the 36 bit world to Vaxen, so the
Arpanet community is still pretty much PDP-10 central; but it was also
right around the time when DARPA was defunding the PDP-10; had chosen the
VAX but was arguing VMS vs UNIX.

I don't thing CSRG had been funded as a group yet. Joy might have done his
'fast vax' paper to show that UNIX was just as good as VMS, but that work
might be on the horizon.  Certainly all of 4.1a/b/c, 4.2, 4.3, NET-x was
years away.

The point is that you didn't (yet) have a mass of students on the systems
'in the field', but some folks had that as a vision (and want it to be that
way and are scared it something bad happens 'in the press' - it would cause
a set back.

At the that time, think a couple of Universities are >>starting<< to use
UNIX for general CS classes/teaching (Purdue & UCB being two of them),
maybe Michigan and U of I, but I think CMU and Stanford are still using
PDP-20's [not sure about MIT] (where Princeton and UCLA I think were still
IBM shops for undergrads).

So the whole reason to keep it quiet @ the USENOX conference was because it
was felt at the time, the folks in that room were the primary people
hacking the kernel and if we all took the couple of lines of fix back to
our shops, the problem was solved.

It sort of blows my mind if Doug never knew about it, in hind sight it
seems George got his wish!!

Clem

On Fri, May 12, 2017 at 2:56 PM, Dan Cross <crossd at gmail.com> wrote:

> On Fri, May 12, 2017 at 2:43 PM, Doug McIlroy <doug at cs.dartmouth.edu>
> wrote:
>>
>> >  We all took the code back and promised to get patches out ASAP and not
>> tell any one about it.
>>
>> Fascinating. Chnages were installed frequently in the Unix lab, mostly
>> at night without fanfare. But an actual zero-day should have been big
>> enough news for me to have heard about. I'm pretty sure I didn't; Dennis
>> evidently kept his counsel.
>
>
> I wonder if such a thing would have been treated the same way within Bell
> Labs as outside?
>
> Presumably you didn't have to worry about hordes of undergraduates picking
> over your systems looking for ways to get root access. Or, indeed,
> undergraduates doing anything on your systems, save for the occasional
> intern or precocious child of an employee. For that matter, this raises a
> question: what was the attitude towards root access within the labs? Was it
> constrained to the anointed few or did a large-ish number of people have it?
>
> Anyway, I could well imagine a scenario where Dennis comes back but thinks
> fairly little of it and makes vague mention of a fairly serious bug but
> gives it little more thought than any other fairly serious bug. It's
> patched and folks go on with their lives, since it's much less likely to be
> the source of irritation in a corporate search department than it would be
> in, say, a university.
>
>         - Dan C.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/7af6e776/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 18:43 Doug McIlroy
@ 2017-05-12 18:56 ` Dan Cross
  2017-05-12 19:43   ` Clem Cole
  2017-05-12 21:29   ` Ron Natalie
  0 siblings, 2 replies; 77+ messages in thread
From: Dan Cross @ 2017-05-12 18:56 UTC (permalink / raw)


On Fri, May 12, 2017 at 2:43 PM, Doug McIlroy <doug at cs.dartmouth.edu> wrote:
>
> >  We all took the code back and promised to get patches out ASAP and not
> tell any one about it.
>
> Fascinating. Chnages were installed frequently in the Unix lab, mostly
> at night without fanfare. But an actual zero-day should have been big
> enough news for me to have heard about. I'm pretty sure I didn't; Dennis
> evidently kept his counsel.


I wonder if such a thing would have been treated the same way within Bell
Labs as outside?

Presumably you didn't have to worry about hordes of undergraduates picking
over your systems looking for ways to get root access. Or, indeed,
undergraduates doing anything on your systems, save for the occasional
intern or precocious child of an employee. For that matter, this raises a
question: what was the attitude towards root access within the labs? Was it
constrained to the anointed few or did a large-ish number of people have it?

Anyway, I could well imagine a scenario where Dennis comes back but thinks
fairly little of it and makes vague mention of a fairly serious bug but
gives it little more thought than any other fairly serious bug. It's
patched and folks go on with their lives, since it's much less likely to be
the source of irritation in a corporate search department than it would be
in, say, a university.

        - Dan C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/992e4ccc/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-12 18:43 Doug McIlroy
  2017-05-12 18:56 ` Dan Cross
  0 siblings, 1 reply; 77+ messages in thread
From: Doug McIlroy @ 2017-05-12 18:43 UTC (permalink / raw)



>  We all took the code back and promised to get patches out ASAP and not tell any one about it.

Fascinating. Chnages were installed frequently in the Unix lab, mostly
at night without fanfare. But an actual zero-day should have been big
enough news for me to have heard about. I'm pretty sure I didn't; Dennis
evidently kept his counsel.

Doug


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 15:52                     ` Chet Ramey
@ 2017-05-12 16:21                       ` Warner Losh
  0 siblings, 0 replies; 77+ messages in thread
From: Warner Losh @ 2017-05-12 16:21 UTC (permalink / raw)


On Fri, May 12, 2017 at 9:52 AM, Chet Ramey <chet.ramey at case.edu> wrote:
> On 5/12/17 10:30 AM, Larry McVoy wrote:
>> On Fri, May 12, 2017 at 02:56:59PM +0100, Tim Bradshaw wrote:
>>> When I found out about this I thought seriously of shorting Sun's
>>> stock (if I knew how to do that).  I would have made money.  As it was
>>> we stuck with logged UFS which, by 2007 or so was seriously bulletproof.
>>
>> Wait, someone added logging to UFS?  Is there a writeup of that anywhere?
>
> You could look at the soft updates paper from 1999 for Kirk's perspective.
>
> https://www.usenix.org/legacy/event/usenix99/full_papers/mckusick/mckusick.pdf
>
> There was a paper about journaled soft updates, too:
>
> https://www.mckusick.com/softdep/suj.pdf

There's also Margo's paper on LFS, which added logging to UFS, though
by the time it was over it wasn't recognizable.

https://www.usenix.org/publications/library/proceedings/sd93/seltzer.pdf

The Journaled updates, though, is that someone added an intent log to
UFS + SoftUpdates. This was used to replay the last few operations
when doing fsck coming back up after a crash. There's also something
called gjournal, which adds a different type of journaling to UFS, but
the less said about that train-wreck the better.

Warner


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 14:30                   ` Larry McVoy
  2017-05-12 15:11                     ` Tim Bradshaw
@ 2017-05-12 15:52                     ` Chet Ramey
  2017-05-12 16:21                       ` Warner Losh
  1 sibling, 1 reply; 77+ messages in thread
From: Chet Ramey @ 2017-05-12 15:52 UTC (permalink / raw)


On 5/12/17 10:30 AM, Larry McVoy wrote:
> On Fri, May 12, 2017 at 02:56:59PM +0100, Tim Bradshaw wrote:
>> When I found out about this I thought seriously of shorting Sun's
>> stock (if I knew how to do that).  I would have made money.  As it was
>> we stuck with logged UFS which, by 2007 or so was seriously bulletproof.
> 
> Wait, someone added logging to UFS?  Is there a writeup of that anywhere?

You could look at the soft updates paper from 1999 for Kirk's perspective.

https://www.usenix.org/legacy/event/usenix99/full_papers/mckusick/mckusick.pdf

There was a paper about journaled soft updates, too:

https://www.mckusick.com/softdep/suj.pdf

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://cnswww.cns.cwru.edu/~chet/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 15:18   ` Clem Cole
@ 2017-05-12 15:46     ` Clem Cole
  0 siblings, 0 replies; 77+ messages in thread
From: Clem Cole @ 2017-05-12 15:46 UTC (permalink / raw)


I just did a small amount of hunting.  The oldest printed USENIX Proceeding
seems to be 1983 [which was the one Rob gave 'cat -v considered harmful' -
although only the abstract is in it].

George did the ordered writes work earlier as I was still at Tektronix,
because I remember getting a tape from him a putting the changes into our
V7 system.

If we hunt around for a 'Purdue-EE' distribution circa '79-'81 we should be
able to find it.  BTW: as a piece of History for Diomidis on that same tape
is fix for one of the first '0-day' UNIX exploits I can remember.  I'll see
if I can find it and identify it for you.   That would be a good piece of
history to call out.

The story is this ...

George was very upset when he found it.  But this was during the time when
UNIX was fighting a bit for it's life in the press as not being a 'real'
OS.   DEC and IBM making claims that it was a toy, *etc*.   So most of the
the hacker community took it pretty seriously.   It is funny, today we
would react in the opposite manner.,  But, there was a big 'hush-hush'
meeting at a Summer USENIX that was very exclusive to be invited too.    We
were in a private conference room, the door was locked etc.   I remember
that Dennis was there, Joy was there. Ron's old friend Mike must have been
in it.  I think a couple of the Rand folks.   Anyway - it was an issue with
profile(2) -- surprise, surprise.   Pretty easy fix.   We all took the code
back and promised to get patches out ASAP and not tell any one about it.

Clem

On Fri, May 12, 2017 at 11:18 AM, Clem Cole <clemc at ccc.com> wrote:

> I should have said -- it was not hypothetical -- George implemented it
> and published the code and we all picked it up,
>
> On Fri, May 12, 2017 at 11:17 AM, Clem Cole <clemc at ccc.com> wrote:
>
>> George Gobble of Purdue did the FS work to V7/4.1 to fix the FS
>> corruption issues.   That was taken back by Kirk (wnj) and incorporated in
>> 4.1A.    It may have been before USENIX was creating proceedings.   I'll
>> have to look on my shelf at home or maybe ask George.
>>
>> Clem
>>
>> On Fri, May 12, 2017 at 11:12 AM, Noel Chiappa <jnc at mercury.lcs.mit.edu>
>> wrote:
>>
>>>     > From: "Ron Natalie"
>>>
>>>     > Ordered writes go back to the original BSD fast file system, no?
>>> I seem
>>>     > to recall that when we switched from our V6/V7 disks, the
>>> filesystem got
>>>     > a lot more stable in crashes.
>>>
>>> I had a vague memory of reading about that, so I looked in the canonical
>>> FFS
>>> paper (McKusick et al, "A Fast File System for UNIX" [1984)]) but found
>>> no
>>> mention of it.
>>>
>>> I did find a paper about 'fsck' (McKusick, Kowalski, "Fsck: The UNIX File
>>> System Check Program") which talks (in Section 2.5. "Updates to the file
>>> system") about how "problem[s] with asynchronous inode updates can be
>>> avoided
>>> by doing all inode deallocations synchronously", but it's not clear if
>>> they're
>>> talking about something that was actually done, or just saying
>>> (hypothetically) that that's how one would fix it.
>>>
>>> Is is possible that the changes to the file system (e.g. the way free
>>> blocks
>>> were kept) made it more crash-proof?
>>>
>>>      Noel
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/e573253f/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 15:17 ` Clem Cole
@ 2017-05-12 15:18   ` Clem Cole
  2017-05-12 15:46     ` Clem Cole
  0 siblings, 1 reply; 77+ messages in thread
From: Clem Cole @ 2017-05-12 15:18 UTC (permalink / raw)


I should have said -- it was not hypothetical -- George implemented it and
published the code and we all picked it up,

On Fri, May 12, 2017 at 11:17 AM, Clem Cole <clemc at ccc.com> wrote:

> George Gobble of Purdue did the FS work to V7/4.1 to fix the FS corruption
> issues.   That was taken back by Kirk (wnj) and incorporated in 4.1A.    It
> may have been before USENIX was creating proceedings.   I'll have to look
> on my shelf at home or maybe ask George.
>
> Clem
>
> On Fri, May 12, 2017 at 11:12 AM, Noel Chiappa <jnc at mercury.lcs.mit.edu>
> wrote:
>
>>     > From: "Ron Natalie"
>>
>>     > Ordered writes go back to the original BSD fast file system, no?  I
>> seem
>>     > to recall that when we switched from our V6/V7 disks, the
>> filesystem got
>>     > a lot more stable in crashes.
>>
>> I had a vague memory of reading about that, so I looked in the canonical
>> FFS
>> paper (McKusick et al, "A Fast File System for UNIX" [1984)]) but found no
>> mention of it.
>>
>> I did find a paper about 'fsck' (McKusick, Kowalski, "Fsck: The UNIX File
>> System Check Program") which talks (in Section 2.5. "Updates to the file
>> system") about how "problem[s] with asynchronous inode updates can be
>> avoided
>> by doing all inode deallocations synchronously", but it's not clear if
>> they're
>> talking about something that was actually done, or just saying
>> (hypothetically) that that's how one would fix it.
>>
>> Is is possible that the changes to the file system (e.g. the way free
>> blocks
>> were kept) made it more crash-proof?
>>
>>      Noel
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/b4c29ece/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 15:12 Noel Chiappa
@ 2017-05-12 15:17 ` Clem Cole
  2017-05-12 15:18   ` Clem Cole
  0 siblings, 1 reply; 77+ messages in thread
From: Clem Cole @ 2017-05-12 15:17 UTC (permalink / raw)


George Gobble of Purdue did the FS work to V7/4.1 to fix the FS corruption
issues.   That was taken back by Kirk (wnj) and incorporated in 4.1A.    It
may have been before USENIX was creating proceedings.   I'll have to look
on my shelf at home or maybe ask George.

Clem

On Fri, May 12, 2017 at 11:12 AM, Noel Chiappa <jnc at mercury.lcs.mit.edu>
wrote:

>     > From: "Ron Natalie"
>
>     > Ordered writes go back to the original BSD fast file system, no?  I
> seem
>     > to recall that when we switched from our V6/V7 disks, the filesystem
> got
>     > a lot more stable in crashes.
>
> I had a vague memory of reading about that, so I looked in the canonical
> FFS
> paper (McKusick et al, "A Fast File System for UNIX" [1984)]) but found no
> mention of it.
>
> I did find a paper about 'fsck' (McKusick, Kowalski, "Fsck: The UNIX File
> System Check Program") which talks (in Section 2.5. "Updates to the file
> system") about how "problem[s] with asynchronous inode updates can be
> avoided
> by doing all inode deallocations synchronously", but it's not clear if
> they're
> talking about something that was actually done, or just saying
> (hypothetically) that that's how one would fix it.
>
> Is is possible that the changes to the file system (e.g. the way free
> blocks
> were kept) made it more crash-proof?
>
>      Noel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170512/05b363ef/attachment-0001.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-12 15:12 Noel Chiappa
  2017-05-12 15:17 ` Clem Cole
  0 siblings, 1 reply; 77+ messages in thread
From: Noel Chiappa @ 2017-05-12 15:12 UTC (permalink / raw)


    > From: "Ron Natalie"

    > Ordered writes go back to the original BSD fast file system, no?  I seem
    > to recall that when we switched from our V6/V7 disks, the filesystem got
    > a lot more stable in crashes.

I had a vague memory of reading about that, so I looked in the canonical FFS
paper (McKusick et al, "A Fast File System for UNIX" [1984)]) but found no
mention of it.

I did find a paper about 'fsck' (McKusick, Kowalski, "Fsck: The UNIX File
System Check Program") which talks (in Section 2.5. "Updates to the file
system") about how "problem[s] with asynchronous inode updates can be avoided
by doing all inode deallocations synchronously", but it's not clear if they're
talking about something that was actually done, or just saying
(hypothetically) that that's how one would fix it.

Is is possible that the changes to the file system (e.g. the way free blocks
were kept) made it more crash-proof?

     Noel



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 14:30                   ` Larry McVoy
@ 2017-05-12 15:11                     ` Tim Bradshaw
  2017-05-12 15:52                     ` Chet Ramey
  1 sibling, 0 replies; 77+ messages in thread
From: Tim Bradshaw @ 2017-05-12 15:11 UTC (permalink / raw)


On 12 May 2017, at 15:30, Larry McVoy <lm at mcvoy.com> wrote:
> 
> Wait, someone added logging to UFS?

Oh, yes.  I forget when it came in (Solaris 2.5?).  It's been the default (ie you need to turn it off in vfstab rather than turn it on) for some time, perhaps since Solaris 9?

> 
> Yep, someone did.  I'd like to know who.  I found this:
> 
> http://www.oracle.com/technetwork/systems/linux/fs-performance-149840.pdf
> 
> Can anyone confirm those results?

I can't confirm them, but I can confirm that a lot of rubbish has been talked about filesystem performance by people with various agendas.  I had an argument (well over 10 years ago now) with someone who claimed that ext2 (I guess, might have been ext3 by then) was just way faster than UFS for various operations (lots of file creattion/deletion I think).  It was ... if you left logging off.  If you turned logging on, not so much.

--tim


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 13:56                 ` Tim Bradshaw
  2017-05-12 14:22                   ` Michael Kjörling
@ 2017-05-12 14:30                   ` Larry McVoy
  2017-05-12 15:11                     ` Tim Bradshaw
  2017-05-12 15:52                     ` Chet Ramey
  1 sibling, 2 replies; 77+ messages in thread
From: Larry McVoy @ 2017-05-12 14:30 UTC (permalink / raw)


On Fri, May 12, 2017 at 02:56:59PM +0100, Tim Bradshaw wrote:
> When I found out about this I thought seriously of shorting Sun's
> stock (if I knew how to do that).  I would have made money.  As it was
> we stuck with logged UFS which, by 2007 or so was seriously bulletproof.

Wait, someone added logging to UFS?  Is there a writeup of that anywhere?
That would stomp all over my claim that nobody has hacked on UFS since
I did (which would be fine with me, I liked UFS, be cool if someone moved
it forward).

(pause while I google)

Yep, someone did.  I'd like to know who.  I found this:

http://www.oracle.com/technetwork/systems/linux/fs-performance-149840.pdf

Can anyone confirm those results?  That would be the first I've heard
of Solaris being faster than Linux.  If that's true has Linux tried 
to implement the same sort of logging?  

--lm

P.S.  I realize this isn't ancient Unix so I could move this to the 
linux-kernel mailing list.  Though maybe it is appropriate, it's tech
from the 1990's - is that ancient enough?


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12 13:56                 ` Tim Bradshaw
@ 2017-05-12 14:22                   ` Michael Kjörling
  2017-05-12 14:30                   ` Larry McVoy
  1 sibling, 0 replies; 77+ messages in thread
From: Michael Kjörling @ 2017-05-12 14:22 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 598 bytes --]

On 12 May 2017 14:56 +0100, from tfb at tfeb.org (Tim Bradshaw):
> So if you have a filesystem (pool, whatever) which you think
> something bad might have happened to, you check it *by mounting it*,
> where the checker runs *in the kernel,

Easy peasy! No need to remember obscure fsck parameters; zpool import
is all you need.

Irony aside, I didn't say it was perfect.

-- 
Michael Kjörling • https://michael.kjorling.se • michael at kjorling.se
                 “People who think they know everything really annoy
                 those of us who know we don’t.” (Bjarne Stroustrup)


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12  8:17               ` Michael Kjörling
@ 2017-05-12 13:56                 ` Tim Bradshaw
  2017-05-12 14:22                   ` Michael Kjörling
  2017-05-12 14:30                   ` Larry McVoy
  0 siblings, 2 replies; 77+ messages in thread
From: Tim Bradshaw @ 2017-05-12 13:56 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

On 12 May 2017, at 09:17, Michael Kjörling <michael at kjorling.se> wrote:
> 
> These days, for me, it's pretty much all ZFS

One of ZFS's particularly lovely features was that there was no offline filesystem checker at all.  So if you have a filesystem (pool, whatever) which you think something bad might have happened to, you check it *by mounting it*, where the checker runs *in the kernel, so any serious error in the code means a panic, if you're lucky and something worse if you're not.

When I found out about this I thought seriously of shorting Sun's stock (if I knew how to do that).  I would have made money.  As it was we stuck with logged UFS which, by 2007 or so was seriously bulletproof.

--tim


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12  1:05             ` Toby Thain
@ 2017-05-12  8:17               ` Michael Kjörling
  2017-05-12 13:56                 ` Tim Bradshaw
  0 siblings, 1 reply; 77+ messages in thread
From: Michael Kjörling @ 2017-05-12  8:17 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

On 11 May 2017 21:05 -0400, from toby at telegraphics.com.au (Toby Thain):
>>> ext2
>> 
>> That's a journalled FS, isn't it?  In which case the transactions get
>> replayed.
> 
> No, I think ext3fs was the first version that was journaled.

Correct. ext2 doesn't have journaling, but if you do `mkfs.ext2 -j` on
Linux it creates a _journaled ext2_ A.K.A. _ext3_ file system.

With the resulting selling point of ext3 being mainly _much_ shorter
fsck times.


> So was reiserfs. With pull-plug tests I could get ext3fs to toss
> cookies but not reiserfs.

ReiserFS seems to be one of those where peoples' experiences really
differ. I've had massive crashes involving reiserfs myself, but don't
think I have ever actually lost any significant amounts of data to
ext2/3. Maybe it was just bad luck, but I have been bit sufficiently
badly by it to relegate it to the scrap heap of history. At least at
the time lack of good recovery tools didn't help (but then again if
ZFS breaks sufficiently that the pool doesn't import, you're pretty
hosed, too). Backups, backups.

These days, for me, it's pretty much all ZFS where I have any say in
it, and in addition even my home desktop is on a UPS (with automated,
controlled shutdown when approaching battery depletion), so unplanned
shutdowns are far less likely to happen. About the only way that can
realistically happen is either a kernel crash, or the UPS misreporting
(overestimating) the remaining battery time, resulting in a hard
shutdown before or during the shutdown process. Neither is impossible,
of course.

-- 
Michael Kjörling • https://michael.kjorling.se • michael at kjorling.se
                 “People who think they know everything really annoy
                 those of us who know we don’t.” (Bjarne Stroustrup)


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 23:47           ` Dave Horsfall
                               ` (2 preceding siblings ...)
  2017-05-12  1:05             ` Toby Thain
@ 2017-05-12  8:15             ` Harald Arnesen
  3 siblings, 0 replies; 77+ messages in thread
From: Harald Arnesen @ 2017-05-12  8:15 UTC (permalink / raw)


Dave Horsfall [2017-05-12 01:47]:

> On Thu, 11 May 2017, Larry McVoy wrote:
>> Try the same thing with Linux.  The file system will come back, starting 
>> with, I believe, ext2.
> 
> That's a journalled FS, isn't it?  In which case the transactions get
> replayed.

No, journalling started with ext3. However, ext2 was (and is) a very
solid fs, which survived many power-outs and crashes when I used it back
in the 90s.
-- 
Hilsen Harald


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 21:44       ` Dave Horsfall
  2017-05-11 22:06         ` Warner Losh
@ 2017-05-12  6:24         ` Hellwig Geisse
  2017-05-12 21:12           ` Dave Horsfall
  1 sibling, 1 reply; 77+ messages in thread
From: Hellwig Geisse @ 2017-05-12  6:24 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 710 bytes --]

On Fr, 2017-05-12 at 07:44 +1000, Dave Horsfall wrote:
> 
> Am I the only one here who thinks that e.g. a char pointer should be 
> "char* cp1, cp2" instead of "char *cp1, *cp2"?  I.e. the fundamental
> type is "char*", not "char", and to this day I still write:
> 
>     char*	cp1;
>     char*	cp2;
> 
> etc, which IMHO makes it clear (which is every programmer's duty).
> I used  to write that way in a previous life, and the boss didn't 
> complain.

This view does not work well with more complicated
declarations like "void (*p)(int)". What is the
"fundamental type" here? One could argue that the
real culprit is the list construction, which does
not mix well with C declarations.

Hellwig


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12  0:21               ` Larry McVoy
@ 2017-05-12  2:42                 ` Warner Losh
  0 siblings, 0 replies; 77+ messages in thread
From: Warner Losh @ 2017-05-12  2:42 UTC (permalink / raw)


On Thu, May 11, 2017 at 6:21 PM, Larry McVoy <lm at mcvoy.com> wrote:
> Yeah, I get ordered writes, I taught a CS course at Stanford and I made
> my students learn all about them.  I'm a UFS guy, so far as I know I'm
> the last guy to push UFS/FFS forward (which is sort of sad).
>
> The Linux stuff is better.  It just is.  And we should all respect that,
> I know we sit around and love on ancient Unix, and believe me, I love
> that stuff it changed the world, but we should respect people who have
> moved it past what Unix did.  And I think Linux moved the file system
> past what Unix did.

The Linux stuff was incrementally better in that it didn't fry the
filesystem, but instead would flush large amounts of changes if there
was no synchronous write before the power failed. This was progress.
However, it got its speed by doing async writes even of the metadata,
and that lead to the famous story where Linus was able to recover an
entire kernel he'd just accidentally rm -rf'd by hitting reset on his
machine before the syncer ran. Better than a scrambled filesystem?
Sure. But consistent and robust? Not so much. Plus, driver bugs could
still wreck havoc.

Soon after things like soft-updates came along in the BSD world which
solved the scrambled filesystem problems (for the most part), then SU
Journalling to make it robust. On the Linux side, ext2fs begat ext3
and then ext4fs came along and added journalling. ZFS came into being.
As did btrfs and many others. Some on Linux, some on Solaris, some on
BSD. The competition between them all has helped to make them all
better.

I've been running a v7 derivative (Venix) on my Rainbow lately.
There's issues with the motherboard that causes many insta-panics (it
turns off the interrupts due to something bad on the mobo --- a new
mobo doesn't do that). I've yet to lose anything, but the performance
isn't that great. So it's not an 'all the time' sort of thing.

Warner

> --lm
>
> On Thu, May 11, 2017 at 07:48:27PM -0400, Ron Natalie wrote:
>> Ordered writes go back to the original BSD fast file system, no?   I  seem
>> to recall that when we switched from our V6/V7 disks,
>> the filesystem got a lot more stable in crashes.
>>
>> -----Original Message-----
>> From: TUHS [mailto:tuhs-bounces at minnie.tuhs.org] On Behalf Of Dave Horsfall
>> Sent: Thursday, May 11, 2017 7:47 PM
>> To: The Eunuchs Hysterical Society
>> Subject: Re: [TUHS] The evolution of Unix facilities and architecture
>>
>> On Thu, 11 May 2017, Larry McVoy wrote:
>>
>> [...]
>>
>> > Try the same thing with Linux.  The file system will come back,
>> > starting with, I believe, ext2.
>>
>> That's a journalled FS, isn't it?  In which case the transactions get
>> replayed.
>>
>> > My belief is that Linux orders writes such that while you may lose
>> > data (as in, a process created a file, the OS said it was OK, but that
>> > file will not be in the file system after a crash), but the rest of
>> > the file system will be consistent.  I think it's as if you powered
>> > off the machine a few seconds earlier than you actually did, some
>> > stuff is in flight and until they can write stuff out in the proper
>> > order you may lose data on a hard reset.
>>
>> And FreeBSD (at least) has been doing ordered writes for quite some time.
>>
>> --
>> Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will
>> suffer."
>
> --
> ---
> Larry McVoy                  lm at mcvoy.com             http://www.mcvoy.com/lm


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-12  0:16             ` Larry McVoy
@ 2017-05-12  1:41               ` Wesley Parish
  0 siblings, 0 replies; 77+ messages in thread
From: Wesley Parish @ 2017-05-12  1:41 UTC (permalink / raw)


Consistent with what I remember of running fsck on Slackware in the 90s after unscheduled shutdowns. 
I longed for the time I'd get back when ext3 was incorporated into Linux and ext2 was relegated to 
legacy.

As far as I can remember ext2 was never journaled. I remembered getting quite excited reading about 
the log-structured file system in the O'Reilly 4.4BSD-Lite CD. That made so much sense to me.

Quoting Larry McVoy <lm at mcvoy.com>:

> On Fri, May 12, 2017 at 09:47:01AM +1000, Dave Horsfall wrote:
> > On Thu, 11 May 2017, Larry McVoy wrote:
> > 
> > [...]
> > 
> > > Try the same thing with Linux. The file system will come back,
> starting 
> > > with, I believe, ext2.
> > 
> > That's a journalled FS, isn't it? In which case the transactions get
> > replayed.
> 
> My memory is ext2 is not journaled, I think that happened in ext3. Or 
> maybe it was an option on ext2? Either way, I think ext2 did the right
> thing without the journal.
>  



"I have supposed that he who buys a Method means to learn it." - Ferdinand Sor,
Method for Guitar

"A verbal contract isn't worth the paper it's written on." -- Samuel Goldwyn


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 23:47           ` Dave Horsfall
  2017-05-11 23:48             ` Ron Natalie
  2017-05-12  0:16             ` Larry McVoy
@ 2017-05-12  1:05             ` Toby Thain
  2017-05-12  8:17               ` Michael Kjörling
  2017-05-12  8:15             ` Harald Arnesen
  3 siblings, 1 reply; 77+ messages in thread
From: Toby Thain @ 2017-05-12  1:05 UTC (permalink / raw)


On 2017-05-11 7:47 PM, Dave Horsfall wrote:
> On Thu, 11 May 2017, Larry McVoy wrote:
>
> [...]
>
>> Try the same thing with Linux.  The file system will come back, starting
>> with, I believe, ext2.
>
> That's a journalled FS, isn't it?  In which case the transactions get
> replayed.

No, I think ext3fs was the first version that was journaled.

So was reiserfs. With pull-plug tests I could get ext3fs to toss cookies 
but not reiserfs.

Now of course the state of the art is copy-on-write, like ZFS.

--Toby


>
>> My belief is that Linux orders writes such that while you may lose data
>> (as in, a process created a file, the OS said it was OK, but that file
>> will not be in the file system after a crash), but the rest of the file
>> system will be consistent.  I think it's as if you powered off the
>> machine a few seconds earlier than you actually did, some stuff is in
>> flight and until they can write stuff out in the proper order you may
>> lose data on a hard reset.
>
> And FreeBSD (at least) has been doing ordered writes for quite some time.
>



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 23:48             ` Ron Natalie
@ 2017-05-12  0:21               ` Larry McVoy
  2017-05-12  2:42                 ` Warner Losh
  0 siblings, 1 reply; 77+ messages in thread
From: Larry McVoy @ 2017-05-12  0:21 UTC (permalink / raw)


Yeah, I get ordered writes, I taught a CS course at Stanford and I made
my students learn all about them.  I'm a UFS guy, so far as I know I'm
the last guy to push UFS/FFS forward (which is sort of sad).  

The Linux stuff is better.  It just is.  And we should all respect that,
I know we sit around and love on ancient Unix, and believe me, I love 
that stuff it changed the world, but we should respect people who have
moved it past what Unix did.  And I think Linux moved the file system
past what Unix did.

--lm

On Thu, May 11, 2017 at 07:48:27PM -0400, Ron Natalie wrote:
> Ordered writes go back to the original BSD fast file system, no?   I  seem
> to recall that when we switched from our V6/V7 disks,
> the filesystem got a lot more stable in crashes.
> 
> -----Original Message-----
> From: TUHS [mailto:tuhs-bounces at minnie.tuhs.org] On Behalf Of Dave Horsfall
> Sent: Thursday, May 11, 2017 7:47 PM
> To: The Eunuchs Hysterical Society
> Subject: Re: [TUHS] The evolution of Unix facilities and architecture
> 
> On Thu, 11 May 2017, Larry McVoy wrote:
> 
> [...]
> 
> > Try the same thing with Linux.  The file system will come back, 
> > starting with, I believe, ext2.
> 
> That's a journalled FS, isn't it?  In which case the transactions get
> replayed.
> 
> > My belief is that Linux orders writes such that while you may lose 
> > data (as in, a process created a file, the OS said it was OK, but that 
> > file will not be in the file system after a crash), but the rest of 
> > the file system will be consistent.  I think it's as if you powered 
> > off the machine a few seconds earlier than you actually did, some 
> > stuff is in flight and until they can write stuff out in the proper 
> > order you may lose data on a hard reset.
> 
> And FreeBSD (at least) has been doing ordered writes for quite some time.
> 
> --
> Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will
> suffer."

-- 
---
Larry McVoy            	     lm at mcvoy.com             http://www.mcvoy.com/lm 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 23:47           ` Dave Horsfall
  2017-05-11 23:48             ` Ron Natalie
@ 2017-05-12  0:16             ` Larry McVoy
  2017-05-12  1:41               ` Wesley Parish
  2017-05-12  1:05             ` Toby Thain
  2017-05-12  8:15             ` Harald Arnesen
  3 siblings, 1 reply; 77+ messages in thread
From: Larry McVoy @ 2017-05-12  0:16 UTC (permalink / raw)


On Fri, May 12, 2017 at 09:47:01AM +1000, Dave Horsfall wrote:
> On Thu, 11 May 2017, Larry McVoy wrote:
> 
> [...]
> 
> > Try the same thing with Linux.  The file system will come back, starting 
> > with, I believe, ext2.
> 
> That's a journalled FS, isn't it?  In which case the transactions get
> replayed.

My memory is ext2 is not journaled, I think that happened in ext3.  Or 
maybe it was an option on ext2?  Either way, I think ext2 did the right
thing without the journal.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 23:47           ` Dave Horsfall
@ 2017-05-11 23:48             ` Ron Natalie
  2017-05-12  0:21               ` Larry McVoy
  2017-05-12  0:16             ` Larry McVoy
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 77+ messages in thread
From: Ron Natalie @ 2017-05-11 23:48 UTC (permalink / raw)


Ordered writes go back to the original BSD fast file system, no?   I  seem
to recall that when we switched from our V6/V7 disks,
the filesystem got a lot more stable in crashes.

-----Original Message-----
From: TUHS [mailto:tuhs-bounces@minnie.tuhs.org] On Behalf Of Dave Horsfall
Sent: Thursday, May 11, 2017 7:47 PM
To: The Eunuchs Hysterical Society
Subject: Re: [TUHS] The evolution of Unix facilities and architecture

On Thu, 11 May 2017, Larry McVoy wrote:

[...]

> Try the same thing with Linux.  The file system will come back, 
> starting with, I believe, ext2.

That's a journalled FS, isn't it?  In which case the transactions get
replayed.

> My belief is that Linux orders writes such that while you may lose 
> data (as in, a process created a file, the OS said it was OK, but that 
> file will not be in the file system after a crash), but the rest of 
> the file system will be consistent.  I think it's as if you powered 
> off the machine a few seconds earlier than you actually did, some 
> stuff is in flight and until they can write stuff out in the proper 
> order you may lose data on a hard reset.

And FreeBSD (at least) has been doing ordered writes for quite some time.

--
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will
suffer."



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 22:25         ` Larry McVoy
  2017-05-11 22:30           ` Ron Natalie
@ 2017-05-11 23:47           ` Dave Horsfall
  2017-05-11 23:48             ` Ron Natalie
                               ` (3 more replies)
  2017-05-14  4:30           ` Theodore Ts'o
  2 siblings, 4 replies; 77+ messages in thread
From: Dave Horsfall @ 2017-05-11 23:47 UTC (permalink / raw)


On Thu, 11 May 2017, Larry McVoy wrote:

[...]

> Try the same thing with Linux.  The file system will come back, starting 
> with, I believe, ext2.

That's a journalled FS, isn't it?  In which case the transactions get
replayed.

> My belief is that Linux orders writes such that while you may lose data 
> (as in, a process created a file, the OS said it was OK, but that file 
> will not be in the file system after a crash), but the rest of the file 
> system will be consistent.  I think it's as if you powered off the 
> machine a few seconds earlier than you actually did, some stuff is in 
> flight and until they can write stuff out in the proper order you may 
> lose data on a hard reset.

And FreeBSD (at least) has been doing ordered writes for quite some time.

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 22:25         ` Larry McVoy
@ 2017-05-11 22:30           ` Ron Natalie
  2017-05-11 23:47           ` Dave Horsfall
  2017-05-14  4:30           ` Theodore Ts'o
  2 siblings, 0 replies; 77+ messages in thread
From: Ron Natalie @ 2017-05-11 22:30 UTC (permalink / raw)


I got a phone call one day from one of the operators that "fsck was stuck in
a loop."    I thought maybe it was wedged or it was just printing out
continuously.
I got there and find that Fsck had repaired the root partition and rebooted
the machine and then come up and decided the root needed fixing again and
then rebooted...
The operator didn't mention that the "fsck loop" involved a reboot every
time through.

The other one I got was that "fsck has been printing for an hour."   After
getting them to read me one of the errors I suggested they scroll back the
printout and find the first message.
RP01 OFFLINE.

Now, go over to the disk drive and look in the glass top and tell me...is it
spinning?   No?

Now put the run/stop switch to stop.   Now put it to Run.     Now reboot the
machine and DON'T LEAN ON THE FRONT OF THE DRIVE AGAIN.
(This one RP06 if you leaned on the lid it would spin down the disk).
 



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 20:37       ` Ron Natalie
@ 2017-05-11 22:25         ` Larry McVoy
  2017-05-11 22:30           ` Ron Natalie
                             ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Larry McVoy @ 2017-05-11 22:25 UTC (permalink / raw)


This is one place where I think Linux kicked Unix's ass.  And I am not
really sure how they did it, I have an idea but am not positive.  Unix
file systems up through UFS as shipped by Sun, were all vulnerable to
what I call the power out test.  Untar some big tarball and power off
the machine in the middle of it.  Reboot.  Hilarity ensues (not).

You were dropped into some stand alone shell after fsck threw up its
hands and it was up to you to fix it.  Dozens and dozens of errors.
It was almost always faster to go to backups because figuring that 
stuff out, file by file (which I have done more than once), gets you
to the point that your run "fsck -y" and go poke at lost+found when
fsck is done, realize that there is no hope, and reach for backups.

Try the same thing with Linux.  The file system will come back, starting
with, I believe, ext2.

My belief is that Linux orders writes such that while you may lose data
(as in, a process created a file, the OS said it was OK, but that file 
will not be in the file system after a crash), but the rest of the file 
system will be consistent.  I think it's as if you powered off the
machine a few seconds earlier than you actually did, some stuff is in
flight and until they can write stuff out in the proper order you may
lose data on a hard reset.

But it doesn't leave your file system in a mess and it's not the brute
force slow way that DOS does it.

I copied Ted, who had his fingers deep in that code, maybe he can correct
me where I got it wrong.  Details aside, I think this is a place where
Linux moved the state of the art significantly forward.  There are other
places but this one is a big deal IMHO, maybe the biggest deal.

--lm

On Thu, May 11, 2017 at 04:37:29PM -0400, Ron Natalie wrote:
> I remember the pre-fsck days.   It was part of my test to become an operator at the UNIX site at JHU that I could run the various manual checks.
> 
> The V6 file system wasn???t exactly stable during crashes (lousy database behavior), so there was almost certainly something to clean up.
> 
>  
> 
> The first thing we???d run was icheck.   This runs down the superblock freelist and all the allocated blocks in the inodes.     If there were missing blocks (not in a file or the free list), you could use icheck ???s
> 
> to rebuild it.    Similarly, if you had duplicated allocations in the freelist or between the freelist and a single file.   Anything more complicated required some clever patching (typically, we???d just mount readonly, copy the files, and then blow them away with clri).
> 
>  
> 
> Then you???d run dcheck.   As mentioned dcheck walks the directory path from the top of the disk counting inode references that it reconciles with the link count in the inode.   Occasionally we???d end up with a 0-0 inode (no directory entires, but allocated???typically this is caused by people removing a file while it is still open, a regular practice of some programs for their /tmp files.).    clri again blew these away.
> 
>  
> 
> Clri wrote zeros all over the inode.   This had the effect of wiping out the file, but it was dangerous if you got the i-number wrong.    We replaced it with ???clrm??? which just cleared the allocated bit, a lot easy to reverse.
> 
>  
> 
> If you really had a mess of a file system, you might get a piece of the directory tree broken off from a path to the root.   Or you???d have an inode that icheck reported dups.   ncheck would try to reconcile an inumber into an absolute path.
> 
>  
> 
> After a while a program called fsdb came around that allowed you to poke at the various file system structures.    We didn???t use it much because by the time we had it, fsck was fast on its heals.
> 

-- 
---
Larry McVoy            	     lm at mcvoy.com             http://www.mcvoy.com/lm 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 21:44       ` Dave Horsfall
@ 2017-05-11 22:06         ` Warner Losh
  2017-05-12  6:24         ` Hellwig Geisse
  1 sibling, 0 replies; 77+ messages in thread
From: Warner Losh @ 2017-05-11 22:06 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1170 bytes --]

On Thu, May 11, 2017 at 3:44 PM, Dave Horsfall <dave at horsfall.org> wrote:
> On Thu, 11 May 2017, Michael Kjörling wrote:
>
>> On the flip side, it certainly does beat `char* x, y, z[100];` or `FILE*
>> fpsrc, fpdst;`. I wonder how many aspiring C programmers have been
>> tripped up by constructs like those? It's perfectly reasonable _once you
>> know about it_, but if you don't, then, well...
>
> Am I the only one here who thinks that e.g. a char pointer should be
> "char* cp1, cp2" instead of "char *cp1, *cp2"?  I.e. the fundamental type
> is "char*", not "char", and to this day I still write:
>
>     char*       cp1;
>     char*       cp2;
>
> etc, which IMHO makes it clear (which is every programmer's duty).  I used
> to write that way in a previous life, and the boss didn't complain.

I've encountered several people with that world view, so you aren't
alone. I take a contrary view. Since C doesn't behave that way, it
encourages people to think that char* cp1, cp2 is equivalent to what
you wrote, which it's not. * is a modifier of char rather than char *
being a fundamental type. Been burned too many times by it I guess
over the years.

Warner


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 17:11     ` Michael Kjörling
@ 2017-05-11 21:44       ` Dave Horsfall
  2017-05-11 22:06         ` Warner Losh
  2017-05-12  6:24         ` Hellwig Geisse
  0 siblings, 2 replies; 77+ messages in thread
From: Dave Horsfall @ 2017-05-11 21:44 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

On Thu, 11 May 2017, Michael Kjörling wrote:

> On the flip side, it certainly does beat `char* x, y, z[100];` or `FILE* 
> fpsrc, fpdst;`. I wonder how many aspiring C programmers have been 
> tripped up by constructs like those? It's perfectly reasonable _once you 
> know about it_, but if you don't, then, well...

Am I the only one here who thinks that e.g. a char pointer should be 
"char* cp1, cp2" instead of "char *cp1, *cp2"?  I.e. the fundamental type 
is "char*", not "char", and to this day I still write:

    char*	cp1;
    char*	cp2;

etc, which IMHO makes it clear (which is every programmer's duty).  I used 
to write that way in a previous life, and the boss didn't complain.

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 17:08 Noel Chiappa
@ 2017-05-11 21:34 ` Dave Horsfall
  0 siblings, 0 replies; 77+ messages in thread
From: Dave Horsfall @ 2017-05-11 21:34 UTC (permalink / raw)


On Thu, 11 May 2017, Noel Chiappa wrote:

> Another tool, 'icheck', consistency checks file blocks and the free 
> list.

Named so because it checked the inode list (for which the sledge-hammer 
fix was "clri").

Somewhere in the AUUGN archives is a paper I wrote detailing the proper 
use of "[ind]check" and "clri".

And then along came "fsdb", wherein you could edit the inode directly...

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 17:12     ` Clem Cole
@ 2017-05-11 20:37       ` Ron Natalie
  2017-05-11 22:25         ` Larry McVoy
  0 siblings, 1 reply; 77+ messages in thread
From: Ron Natalie @ 2017-05-11 20:37 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2060 bytes --]

I remember the pre-fsck days.   It was part of my test to become an operator at the UNIX site at JHU that I could run the various manual checks.

The V6 file system wasn’t exactly stable during crashes (lousy database behavior), so there was almost certainly something to clean up.

 

The first thing we’d run was icheck.   This runs down the superblock freelist and all the allocated blocks in the inodes.     If there were missing blocks (not in a file or the free list), you could use icheck –s

to rebuild it.    Similarly, if you had duplicated allocations in the freelist or between the freelist and a single file.   Anything more complicated required some clever patching (typically, we’d just mount readonly, copy the files, and then blow them away with clri).

 

Then you’d run dcheck.   As mentioned dcheck walks the directory path from the top of the disk counting inode references that it reconciles with the link count in the inode.   Occasionally we’d end up with a 0-0 inode (no directory entires, but allocated…typically this is caused by people removing a file while it is still open, a regular practice of some programs for their /tmp files.).    clri again blew these away.

 

Clri wrote zeros all over the inode.   This had the effect of wiping out the file, but it was dangerous if you got the i-number wrong.    We replaced it with “clrm” which just cleared the allocated bit, a lot easy to reverse.

 

If you really had a mess of a file system, you might get a piece of the directory tree broken off from a path to the root.   Or you’d have an inode that icheck reported dups.   ncheck would try to reconcile an inumber into an absolute path.

 

After a while a program called fsdb came around that allowed you to poke at the various file system structures.    We didn’t use it much because by the time we had it, fsck was fast on its heals.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170511/929937aa/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 16:52   ` Warner Losh
@ 2017-05-11 17:12     ` Clem Cole
  2017-05-11 20:37       ` Ron Natalie
  0 siblings, 1 reply; 77+ messages in thread
From: Clem Cole @ 2017-05-11 17:12 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3079 bytes --]

On Thu, May 11, 2017 at 12:52 PM, Warner Losh <imp at bsdimp.com> wrote:

> On Thu, May 11, 2017 at 10:15 AM, Clem Cole <clemc at ccc.com> wrote:
> >
> >
> >
> >
> >
> >
> > On Thu, May 11, 2017 at 10:07 AM, Noel Chiappa <jnc at mercury.lcs.mit.edu>
> > wrote:
> >>
> >>     > From: Clem Cole
> >>
> >>     > it was was originally written for the for the 6th edition FS
> (which
> >> I
> >>     > hope I have still have the sources in my files) ...
> >>     > I believe Noel recovered a copy in his files recently.
> >>
> >> Well, I have _something_. It's called 'fcheck', not 'fsck', but it looks
> >> like
> >> what we're talking about - maybe it was originally named, or renamed, to
> >> be in
> >> the same series as {d,i,n}check? But it does have the upper-case error
> >> messages... :-) Anyway, here it is:
> >>
> >>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/fcheck.c
> >>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/man8/fcheck.8
> >
> >
> > fcheck ---> aka fsick  -- aka fsck -- that's it.
>
> There's a dcheck.c in the TUHS v7 sources. How's that related?
>
> ​Directory CHECK - was a pass down the upper level pathname structure of
the FS.
In fact it was the model for fsck.   Ted had me steer at it.  One of the
passes in pretty much pulled from that code directly.

The problem was that originally there were a couple of tools to put things
back together, but until Ted wrote fsck there was not one single tool that
pretty much did what you wanted and got it right most of the time.

Clem​





> Warner
>
>
> >> Interestingly, the man page for it makes reference to a 'check' command,
> >> which
> >> I didn't recall at all; here it is:
> >>
> >>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/check.c
> >>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/man8/check.8
> >>
> >> for those who are interested.
> >>
> >>
> >>     > Noel has pointed out that MIT had it in the late 1970s also,
> >> probably
> >>     > brought back from BTL by one of their summer students.
> >>
> >> I think most of the Unix stuff we got from Bell (e.g. the OS, which is
> >> clearly
> >> PWB1, not V6) came from someone who was in a Scout unit there in high
> >> school,
> >
> > Jon Stienhart maybe???   He & Paul Rubinfield were in that scout group
> years
> > ago and were both long time UNIX hackers, but I've forgotten where
> Stienhart
> > did his undergrad.
> >
> >> of all bizarre connections! ISTR this came the same way, but maybe I'm
> >> wrong.
> >> It definitely arrived later than the OS - we'd be using icheck/dcheck
> for
> >> quite a while before it arrived - so maybe it was another channel?
> >
> > This is Ted's code and my error messages.
> >
> >>
> >> The only thing that for sure (that I recall) that didn't come this way
> was
> >> Emacs. Since the author had been a grad student in our group at MIT, I
> >> think
> >> you all can guess how we got that!
> >>
> >>         Noel
> >>
> > Clem
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170511/a124548c/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 16:17   ` Clem Cole
@ 2017-05-11 17:11     ` Michael Kjörling
  2017-05-11 21:44       ` Dave Horsfall
  0 siblings, 1 reply; 77+ messages in thread
From: Michael Kjörling @ 2017-05-11 17:11 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]

On 11 May 2017 12:17 -0400, from clemc at ccc.com (Clem Cole):
> On Thu, May 11, 2017 at 10:21 AM, Larry McVoy <lm at mcvoy.com> wrote:
>> Is this style of declarations common?
>> 
>> char
>>         *bbit,
>>         *abbit,
>>         *state,
>>         *lc,
>>         pathname[200],
>> /.../
> 
> Ted certainly did that a lot.
> (It drove me nuts.   I hated it and argued a bit about it.)  One of the
> reasons I hated C when I first learned it.

On the flip side, it certainly does beat `char* x, y, z[100];` or
`FILE* fpsrc, fpdst;`. I wonder how many aspiring C programmers have
been tripped up by constructs like those? It's perfectly reasonable
_once you know about it_, but if you don't, then, well...

It's even more fun if your system doesn't have memory protection. No,
I'm not speaking from experience; what made you think that I did? ;-)

-- 
Michael Kjörling • https://michael.kjorling.se • michael at kjorling.se
                 “People who think they know everything really annoy
                 those of us who know we don’t.” (Bjarne Stroustrup)


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-11 17:08 Noel Chiappa
  2017-05-11 21:34 ` Dave Horsfall
  0 siblings, 1 reply; 77+ messages in thread
From: Noel Chiappa @ 2017-05-11 17:08 UTC (permalink / raw)


    > From: Warner Losh

    > There's a dcheck.c in the TUHS v7 sources. How's that related?

That was one of the earlier tools - not sure how far back it goes, but it's in
V6, but not V5. It consistency checks the directory tree. Another tool, 'icheck',
consistency checks file blocks and the free list.

	Noel



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 16:15 ` Clem Cole
@ 2017-05-11 16:52   ` Warner Losh
  2017-05-11 17:12     ` Clem Cole
  0 siblings, 1 reply; 77+ messages in thread
From: Warner Losh @ 2017-05-11 16:52 UTC (permalink / raw)


On Thu, May 11, 2017 at 10:15 AM, Clem Cole <clemc at ccc.com> wrote:
>
>
>
>
>
>
> On Thu, May 11, 2017 at 10:07 AM, Noel Chiappa <jnc at mercury.lcs.mit.edu>
> wrote:
>>
>>     > From: Clem Cole
>>
>>     > it was was originally written for the for the 6th edition FS (which
>> I
>>     > hope I have still have the sources in my files) ...
>>     > I believe Noel recovered a copy in his files recently.
>>
>> Well, I have _something_. It's called 'fcheck', not 'fsck', but it looks
>> like
>> what we're talking about - maybe it was originally named, or renamed, to
>> be in
>> the same series as {d,i,n}check? But it does have the upper-case error
>> messages... :-) Anyway, here it is:
>>
>>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/fcheck.c
>>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/man8/fcheck.8
>
>
> fcheck ---> aka fsick  -- aka fsck -- that's it.

There's a dcheck.c in the TUHS v7 sources. How's that related?

Warner


>> Interestingly, the man page for it makes reference to a 'check' command,
>> which
>> I didn't recall at all; here it is:
>>
>>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/check.c
>>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/man8/check.8
>>
>> for those who are interested.
>>
>>
>>     > Noel has pointed out that MIT had it in the late 1970s also,
>> probably
>>     > brought back from BTL by one of their summer students.
>>
>> I think most of the Unix stuff we got from Bell (e.g. the OS, which is
>> clearly
>> PWB1, not V6) came from someone who was in a Scout unit there in high
>> school,
>
> Jon Stienhart maybe???   He & Paul Rubinfield were in that scout group years
> ago and were both long time UNIX hackers, but I've forgotten where Stienhart
> did his undergrad.
>
>> of all bizarre connections! ISTR this came the same way, but maybe I'm
>> wrong.
>> It definitely arrived later than the OS - we'd be using icheck/dcheck for
>> quite a while before it arrived - so maybe it was another channel?
>
> This is Ted's code and my error messages.
>
>>
>> The only thing that for sure (that I recall) that didn't come this way was
>> Emacs. Since the author had been a grad student in our group at MIT, I
>> think
>> you all can guess how we got that!
>>
>>         Noel
>>
> Clem
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 14:21 ` Larry McVoy
@ 2017-05-11 16:17   ` Clem Cole
  2017-05-11 17:11     ` Michael Kjörling
  0 siblings, 1 reply; 77+ messages in thread
From: Clem Cole @ 2017-05-11 16:17 UTC (permalink / raw)


Ted certainly did that a lot.
(It drove me nuts.   I hated it and argued a bit about it.)  One of the
reasons I hated C when I first learned it.
Clem

On Thu, May 11, 2017 at 10:21 AM, Larry McVoy <lm at mcvoy.com> wrote:

> On Thu, May 11, 2017 at 10:07:29AM -0400, Noel Chiappa wrote:
> >   http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/fcheck.c
>
> Is this style of declarations common?
>
> char
>         *bbit,
>         *abbit,
>         *state,
>         *lc,
>         pathname[200],
>         *pp,
>         *name,
>         sflag,
>         nflag,
>         yflag,
> ;
>
> unsigned
>         dsize,
>         fmin,
>         fmax
> ;
>
> I've not seen that before, if it's fairly unique then we might be able to
> figure out who wrote this stuff (or did I miss that and we know already?)
>
> --lm
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170511/6e7938cb/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 14:07 Noel Chiappa
  2017-05-11 14:21 ` Larry McVoy
@ 2017-05-11 16:15 ` Clem Cole
  2017-05-11 16:52   ` Warner Losh
  1 sibling, 1 reply; 77+ messages in thread
From: Clem Cole @ 2017-05-11 16:15 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]

On Thu, May 11, 2017 at 10:07 AM, Noel Chiappa <jnc at mercury.lcs.mit.edu>
wrote:

>     > From: Clem Cole
>
>     > it was was originally written for the for the 6th edition FS (which I
>     > hope I have still have the sources in my files) ...
>     > I believe Noel recovered a copy in his files recently.
>
> Well, I have _something_. It's called 'fcheck', not 'fsck', but it looks
> like
> what we're talking about - maybe it was originally named, or renamed, to
> be in
> the same series as {d,i,n}check? But it does have the upper-case error
> messages... :-) Anyway, here it is:
>
>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/fcheck.c
>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/man8/fcheck.8


fcheck ---> aka fsick  -- aka fsck -- that's it.


>
>
> Interestingly, the man page for it makes reference to a 'check' command,
> which
> I didn't recall at all; here it is:
>
>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/check.c
>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/man8/check.8
>
> for those who are interested.
>
>
>     > Noel has pointed out that MIT had it in the late 1970s also, probably
>     > brought back from BTL by one of their summer students.
>
> I think most of the Unix stuff we got from Bell (e.g. the OS, which is
> clearly
> PWB1, not V6) came from someone who was in a Scout unit there in high
> school,
>
Jon Stienhart maybe???   He & Paul Rubinfield were in that scout group
years ago and were both long time UNIX hackers, but I've forgotten where
Stienhart did his undergrad.

of all bizarre connections! ISTR this came the same way, but maybe I'm
> wrong.
> It definitely arrived later than the OS - we'd be using icheck/dcheck for
> quite a while before it arrived - so maybe it was another channel?
>
​ This is Ted's code and my error messages.


> The only thing that for sure (that I recall) that didn't come this way was
> Emacs. Since the author had been a grad student in our group at MIT, I
> think
> you all can guess how we got that!
>
>         Noel
>
> ​Clem​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170511/f56580e6/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-11 14:07 Noel Chiappa
@ 2017-05-11 14:21 ` Larry McVoy
  2017-05-11 16:17   ` Clem Cole
  2017-05-11 16:15 ` Clem Cole
  1 sibling, 1 reply; 77+ messages in thread
From: Larry McVoy @ 2017-05-11 14:21 UTC (permalink / raw)


On Thu, May 11, 2017 at 10:07:29AM -0400, Noel Chiappa wrote:
>   http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/fcheck.c

Is this style of declarations common?

char
        *bbit, 
        *abbit,
        *state, 
        *lc, 
        pathname[200], 
        *pp, 
        *name, 
        sflag, 
        nflag, 
        yflag, 
;

unsigned
        dsize,
        fmin,
        fmax
;

I've not seen that before, if it's fairly unique then we might be able to
figure out who wrote this stuff (or did I miss that and we know already?)

--lm



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-11 14:07 Noel Chiappa
  2017-05-11 14:21 ` Larry McVoy
  2017-05-11 16:15 ` Clem Cole
  0 siblings, 2 replies; 77+ messages in thread
From: Noel Chiappa @ 2017-05-11 14:07 UTC (permalink / raw)


    > From: Clem Cole

    > it was was originally written for the for the 6th edition FS (which I
    > hope I have still have the sources in my files) ...
    > I believe Noel recovered a copy in his files recently.

Well, I have _something_. It's called 'fcheck', not 'fsck', but it looks like
what we're talking about - maybe it was originally named, or renamed, to be in
the same series as {d,i,n}check? But it does have the upper-case error
messages... :-) Anyway, here it is:

  http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/fcheck.c
  http://ana-3.lcs.mit.edu/~jnc/tech/unix/man8/fcheck.8

Interestingly, the man page for it makes reference to a 'check' command, which
I didn't recall at all; here it is:

  http://ana-3.lcs.mit.edu/~jnc/tech/unix/s1/check.c
  http://ana-3.lcs.mit.edu/~jnc/tech/unix/man8/check.8

for those who are interested.


    > Noel has pointed out that MIT had it in the late 1970s also, probably
    > brought back from BTL by one of their summer students.

I think most of the Unix stuff we got from Bell (e.g. the OS, which is clearly
PWB1, not V6) came from someone who was in a Scout unit there in high school,
of all bizarre connections! ISTR this came the same way, but maybe I'm wrong.
It definitely arrived later than the OS - we'd be using icheck/dcheck for
quite a while before it arrived - so maybe it was another channel?

The only thing that for sure (that I recall) that didn't come this way was
Emacs. Since the author had been a grad student in our group at MIT, I think
you all can guess how we got that!

	Noel



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-10 23:09   ` Erik Berls
@ 2017-05-11 12:40     ` Steffen Nurpmeso
  0 siblings, 0 replies; 77+ messages in thread
From: Steffen Nurpmeso @ 2017-05-11 12:40 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]

Erik Berls <erik at ono-sendai.com> wrote:
 |groff came into NetBSD (of which, FreeBSD 1.0 incorporated a
 |pre-release NetBSD 0.8 tarball), with the 386bsd + patchkit initial
 |import.  Sun Mar 21 09:45:37 1993 UTC, by cgd.  It was upgraded to
 |groff release 1.08 about 4 months later, by jtc.
 |
 |Source: NetBSD’s cvsweb

So come, ey, Mr. Spinellis has a devilish red focus on FreeBSD.
No other BSD in sight.  For the git clone of FreeBSD that i have
i see groff-1.09 import as a series of committs starting with
[b4b083cfbe2] around 1995-01-17.  But i better should not have
looked, back then perl was still part of the base system, and
today Lua is missing, too.

P.S.: depite the fact that my VServer (still) runs Alpine happily.
At least there is awk, i currently have a mail-server rotation
period of ~12 hours because of otherwise valid nonsense
connections that blacklistd is not covering, and isn't it absurd
that i need to parse log files to recollect state that the server
has readily available.  But why complain and not coding something
better, yes.
Ciao.

--steffen
|
|Ralph says i must not use signatures which spread the light!


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-10 14:08 Diomidis Spinellis
  2017-05-10 14:38 ` Steffen Nurpmeso
@ 2017-05-11  0:49 ` Clem Cole
  1 sibling, 0 replies; 77+ messages in thread
From: Clem Cole @ 2017-05-11  0:49 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3160 bytes --]

On Wed, May 10, 2017 at 10:08 AM, Diomidis Spinellis <dds at aueb.gr> wrote:

> I've made available on GitHub a series of tables showing the evolution of
> Unix facilities (as documented in the man pages) over the system's lifetime
> [1] and two diagrams where I attempted to draw the corresponding
> architecture [2].  I've also documented the process in a short blog post
> [3].  I'd welcome any suggestions for corrections and improvements you may
> have, particularly for the architecture diagrams.
>
> [1] https://dspinellis.github.io/unix-history-man/

​fsck(8) is not a BSD program.   It is a CMU program originally, although
Michigan can sort of claim it also. Ted Kowalski is (was) the primary
author of fsck.  He started writing an earlier idea for it at Michigan,
which came from something like it he had seen on MTS, I believe called
"Scavenger" and IIRC we had also had TSS called 'Vulture' - which cleaned
up the 'carrion' after a disk crash (MTS and TSS are brothers for the IBM
360/67 and a number of early UNIX hacker also cut their teeth).   Anyway,
fsck, scavenger and vulture were all of the same idea.   The primary work
was done on the CMU 11/34 EE Digital lab system, in first floor of
Hammershag Hall in the mid-1970s.   Note, I had a >>very<< small hand in
fsck, as Ted was teaching me about C at the time (you can blame me for the
upper case error messages - that how MTS and TSS worked in those days).

If you look at edits and style it's clearly Ted's code.  BTW:  Ted took the
sources to fsck back to the labs when he finished it, and the program was
first released via the Summit streams but I can not tell which one [I think
PWB 2.0 was the first 'official' AT&T version - aps his old office mate at
Summit, might know].
I believe one of the AT&T features was support for RP06, which took
'swapping' and the temporary paging file stuff on the PDP-11.   That was
not need at CMU because we did not have disk that large on our PDP-11s.

Note it was was originally written for the for the 6th edition FS (which I
hope I have still have the sources in my files) as well as later for the
7th edition (or as Ted would called it UNIX/TS - which we had had a CMU
shortly after he had upgraded us - we ran a hybrid system until the 1979
when V7 was finally released).   I believe Noel recovered a copy in his
files recently.     As I have said previously, fsck migrated to a number of
sites independently - via mag tape most likely; Noel has pointed out that
MIT had it in the late 1970s also, probably brought back from BTL by one of
their summer students.   That said, it went more main stream via the BSD
4.x tapes when Joy passed it on, but all of that pre-dates BSD 4.x.

How UCB (Joy) got it is unknown, although I have also pointed out that Ted
was Joy's housemate at Michigan when they were undergrads, and Ted quite
likely sent him a tape or someone like Ken or any number of other BTL folks
could have brought it with him when they were there.​

​Clem​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20170510/c8944aee/attachment.html>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-10 14:38 ` Steffen Nurpmeso
@ 2017-05-10 23:09   ` Erik Berls
  2017-05-11 12:40     ` Steffen Nurpmeso
  0 siblings, 1 reply; 77+ messages in thread
From: Erik Berls @ 2017-05-10 23:09 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]

groff came into NetBSD (of which, FreeBSD 1.0 incorporated a
pre-release NetBSD 0.8 tarball), with the 386bsd + patchkit initial
import.  Sun Mar 21 09:45:37 1993 UTC, by cgd.  It was upgraded to
groff release 1.08 about 4 months later, by jtc.

Source: NetBSD’s cvsweb


-----Original Message-----
From: Steffen Nurpmeso <steffen@sdaoden.eu>
Reply: tuhs at minnie.tuhs.org <tuhs at minnie.tuhs.org>
Date: May 10, 2017 at 2:16:16 PM
To: Diomidis Spinellis <dds at aueb.gr>
Cc: tuhs at minnie.tuhs.org <tuhs at minnie.tuhs.org>
Subject:  Re: [TUHS] The evolution of Unix facilities and architecture

> Yes, hi,
>
> Diomidis Spinellis wrote:
> |I've made available on GitHub a series of tables showing the evolution
> |of Unix facilities (as documented in the man pages) over the system's
> |lifetime [1] and two diagrams where I attempted to draw the
> |corresponding architecture [2]. I've also documented the process in a
> |short blog post [3]. I'd welcome any suggestions for corrections and
> |improvements you may have, particularly for the architecture diagrams.
> |
> |[1] https://dspinellis.github.io/unix-history-man/
>
> i am confident groff was part of (Free)BSD even before it was
> released. That is, i have used groff with FreeBSD 4.9 (?) base
> systems onwards, and i know from the CSRG history that they worked
> to get there. (It is not part of the repo, though, it is only
> deduced. I still don't have the McKusick CD's, shame on me.)
>
> |[2] https://dspinellis.github.io/unix-architecture/
> |[3] https://www.spinellis.gr/blog/20170510/
>
> --steffen
> |
> |Ralph says i must not use signatures which spread the light!
>

--
Erik Berls


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
  2017-05-10 14:08 Diomidis Spinellis
@ 2017-05-10 14:38 ` Steffen Nurpmeso
  2017-05-10 23:09   ` Erik Berls
  2017-05-11  0:49 ` Clem Cole
  1 sibling, 1 reply; 77+ messages in thread
From: Steffen Nurpmeso @ 2017-05-10 14:38 UTC (permalink / raw)


Yes, hi,

Diomidis Spinellis <dds at aueb.gr> wrote:
 |I've made available on GitHub a series of tables showing the evolution 
 |of Unix facilities (as documented in the man pages) over the system's 
 |lifetime [1] and two diagrams where I attempted to draw the 
 |corresponding architecture [2].  I've also documented the process in a 
 |short blog post [3].  I'd welcome any suggestions for corrections and 
 |improvements you may have, particularly for the architecture diagrams.
 |
 |[1] https://dspinellis.github.io/unix-history-man/

i am confident groff was part of (Free)BSD even before it was
released.  That is, i have used groff with FreeBSD 4.9 (?) base
systems onwards, and i know from the CSRG history that they worked
to get there.  (It is not part of the repo, though, it is only
deduced.  I still don't have the McKusick CD's, shame on me.)

 |[2] https://dspinellis.github.io/unix-architecture/
 |[3] https://www.spinellis.gr/blog/20170510/

--steffen
|
|Ralph says i must not use signatures which spread the light!


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [TUHS] The evolution of Unix facilities and architecture
@ 2017-05-10 14:08 Diomidis Spinellis
  2017-05-10 14:38 ` Steffen Nurpmeso
  2017-05-11  0:49 ` Clem Cole
  0 siblings, 2 replies; 77+ messages in thread
From: Diomidis Spinellis @ 2017-05-10 14:08 UTC (permalink / raw)


I've made available on GitHub a series of tables showing the evolution 
of Unix facilities (as documented in the man pages) over the system's 
lifetime [1] and two diagrams where I attempted to draw the 
corresponding architecture [2].  I've also documented the process in a 
short blog post [3].  I'd welcome any suggestions for corrections and 
improvements you may have, particularly for the architecture diagrams.

[1] https://dspinellis.github.io/unix-history-man/
[2] https://dspinellis.github.io/unix-architecture/
[3] https://www.spinellis.gr/blog/20170510/

Cheers,

Diomidis



^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2017-05-19 14:31 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-13  0:44 [TUHS] The evolution of Unix facilities and architecture Noel Chiappa
2017-05-13  0:51 ` Random832
2017-05-13  0:55   ` Dave Horsfall
2017-05-13  1:17   ` Chris Torek
2017-05-13 15:25   ` Steve Simon
2017-05-13 16:55     ` Clem Cole
2017-05-13 17:19       ` William Pechter
2017-05-14 12:55         ` Derek Fawcus
2017-05-14 22:12           ` Dave Horsfall
2017-05-15  1:24             ` Nemo
2017-05-15 18:00               ` Steve Johnson
2017-05-16 22:33                 ` Ron Natalie
2017-05-16 23:13                   ` Arthur Krewat
2017-05-16 23:18                     ` Ron Natalie
2017-05-13 23:01     ` Dave Horsfall
     [not found] <mailman.1.1494986402.2329.tuhs@minnie.tuhs.org>
2017-05-19 14:31 ` David
  -- strict thread matches above, loose matches on Subject: below --
2017-05-16 13:20 Noel Chiappa
2017-05-16 13:46 ` Clem Cole
2017-05-14 21:44 Noel Chiappa
2017-05-13  1:25 Noel Chiappa
2017-05-12 23:30 Noel Chiappa
2017-05-12 23:38 ` Dave Horsfall
2017-05-12 23:52   ` Random832
2017-05-13  0:26     ` Dave Horsfall
2017-05-13  0:48       ` Random832
2017-05-13  0:22 ` Clem Cole
2017-05-13  0:23   ` Clem Cole
2017-05-12 18:43 Doug McIlroy
2017-05-12 18:56 ` Dan Cross
2017-05-12 19:43   ` Clem Cole
2017-05-12 20:06     ` Clem Cole
2017-05-12 20:40       ` Jeremy C. Reed
2017-05-12 21:29         ` Clem Cole
2017-05-12 21:29   ` Ron Natalie
2017-05-12 15:12 Noel Chiappa
2017-05-12 15:17 ` Clem Cole
2017-05-12 15:18   ` Clem Cole
2017-05-12 15:46     ` Clem Cole
2017-05-11 17:08 Noel Chiappa
2017-05-11 21:34 ` Dave Horsfall
2017-05-11 14:07 Noel Chiappa
2017-05-11 14:21 ` Larry McVoy
2017-05-11 16:17   ` Clem Cole
2017-05-11 17:11     ` Michael Kjörling
2017-05-11 21:44       ` Dave Horsfall
2017-05-11 22:06         ` Warner Losh
2017-05-12  6:24         ` Hellwig Geisse
2017-05-12 21:12           ` Dave Horsfall
2017-05-12 23:25             ` Hellwig Geisse
2017-05-11 16:15 ` Clem Cole
2017-05-11 16:52   ` Warner Losh
2017-05-11 17:12     ` Clem Cole
2017-05-11 20:37       ` Ron Natalie
2017-05-11 22:25         ` Larry McVoy
2017-05-11 22:30           ` Ron Natalie
2017-05-11 23:47           ` Dave Horsfall
2017-05-11 23:48             ` Ron Natalie
2017-05-12  0:21               ` Larry McVoy
2017-05-12  2:42                 ` Warner Losh
2017-05-12  0:16             ` Larry McVoy
2017-05-12  1:41               ` Wesley Parish
2017-05-12  1:05             ` Toby Thain
2017-05-12  8:17               ` Michael Kjörling
2017-05-12 13:56                 ` Tim Bradshaw
2017-05-12 14:22                   ` Michael Kjörling
2017-05-12 14:30                   ` Larry McVoy
2017-05-12 15:11                     ` Tim Bradshaw
2017-05-12 15:52                     ` Chet Ramey
2017-05-12 16:21                       ` Warner Losh
2017-05-12  8:15             ` Harald Arnesen
2017-05-14  4:30           ` Theodore Ts'o
2017-05-14 17:40             ` Clem Cole
2017-05-10 14:08 Diomidis Spinellis
2017-05-10 14:38 ` Steffen Nurpmeso
2017-05-10 23:09   ` Erik Berls
2017-05-11 12:40     ` Steffen Nurpmeso
2017-05-11  0:49 ` Clem Cole

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).