[COFF] Re: DevOps/SRE [was Re: [TUHS] Re: LOC [was Re: Re: Re.: Princeton's "Unix: An Oral History": who was in the team in "The Attic"?]

Computer Old Farts Forum
 help / color / mirror / Atom feed

* [COFF] Re: DevOps/SRE [was Re: [TUHS] Re: LOC [was Re: Re: Re.: Princeton's "Unix: An Oral History": who was in the team in "The Attic"?]
       [not found]                       ` <29942374-F162-43EE-9F65-D51C79B4D7B4@canb.auug.org.au>
@ 2022-11-09 19:38                         ` Michael Parson
  0 siblings, 0 replies; only message in thread
From: Michael Parson @ 2022-11-09 19:38 UTC (permalink / raw)
  To: COFF

(Moving to COFF, probably drifted enough from UNIX history)

On 2022-11-09 03:01, steve jenkin wrote:
>> On 9 Nov 2022, at 19:41, Dan Cross <crossd@gmail.com> wrote:
>> 
>> To tie this back to TUHS a little bit...when did being a "sysadmin"
>> become a thing unto itself? And is it just me, or has that largely
>> been superceded by SRE (which I think of as what one used to,
>> perhaps, call a "system programmer") and DevOps, which feels like a
>> more traditional Unix-y kind of thing?
>> 
>>         - Dan C.
> 
> In The Beginning, We were All Programmers…

<snip>

I got started in this field in the mid '90s, just as the Internet
started moving from mostly EDU & military to the start of dial-up ISPs.
My first job was at a small community college/satellite campus of
UTexas where me and my co-worker set up the first website for a UTexas
satellite campus. I'd played with VMS and SunOS, Linux was brand new
and was something we could install on a system we built out of spare
parts from the closet. At the time, my job title was "Assistant Systems
Manager," where my main job was to add/remove users from the VMS system,
reset stuck terminal lines, clean out the print queue, etc. Linux was
very much a toy and the Linux system we installed was a playground. It
was mostly myself, a few others on the team, and a few CS students that
wanted to use something that looked more like Unix than VMS.

> SRE roles & as a discipline has developed, alongside DevOps, into
> managing & fault finding in large clusters of physical and virtual
> machines.

My next several years were spent dot-com hopping, as a sysadmin. Mostly
in IT shops where we kept the systems that company used online and
working. The mail server(s), web-servers, ftp sites, database servers,
NFS/CIFS, etc.

My job-title for most of my jobs through the mid '00s was (senior)
sysadmin.

I then spent 8 years as a senior product support "engineer" at IBM
(I was CAG/SWAT, for anyone that's familiar with IBM/Rational's job
roles), during which time I started seeing the rise of what they
eventually started calling DevOps in the early 2010s.

As the web grew bigger and bigger, and the concept of Software as a
Service and so-called "Cloud" services (AWS, Azure, etc.) became more
and more of a thing, the job of keeping the systems that ran those
services started splitting off of IT and into their own teams.

They took what they learned in IT, tried to codify some "best practices"
around monitoring, automation and tooling, started using more
shrink-wrapped stuff like ansible/chef/saltstack instead of home-grown
stuff we (re)wrote with each job, etc, started forcing ourselves to be
part of the dev/test/deploy cycle of the products we were supporting,
etc, and someone branded the new work-flow as 'DevOps'. I've glossed
over the dev side of that a bit, as they also got more and better build
tools, IDEs, and for better or worse, all things git.

My current day-job is being a DevOps manager.  I started here 8 years
ago on the DevOps team and was promoted to manager 4 years ago.

> Never done it myself, but it’d seem the potential for screw-ups is
> now infinite and unlimited in time :)

Yup, the potential for pushing a bad config or big of code to dozens,
hundreds, or even thousands of systems with the click of mouse or a
single command line has never been higher, but only if the dev/test
cycle failed to find the error (or wasn't properly followed) before
someone decided to deploy.

The guys on my team are supposed to have tested their stuff in their
environments before even committing it to the repo, then it spends some
time in the QA/test lab before it gets pushed to production. They're not
even supposed to commit directly to the main repo, it should be done as
a pull-request and someone else at least does an eye-ball review to look
for obvious mistakes, which should have been caught by the originator,
if they were doing proper testing in their dev environment first.

Our basic tooling is github enterprise for source and saltstack is our
config management/automation framework.

Their work-flow is supposed to basically be:

1 pull latest copy of main repo
2 branch a working set
3 make their changes
4 use something like vagrant to spin up test VMs to test their changes
   (some people use docker instead of vagrant/virtualbox)
5 loop over 3-4 until it works
6 commit their changes to their branch
7 pull-request to main
   a. someone else on the team does an eyeball code-review
   b. other team member performs the merge
8 cherry-pick changes to the next release branch if changes need to
   go in the next release, PR those picks to the release branches, same
   process as above for merges.
9 push changes to the test env (test env is running on the next release
   branch)
10 when QA clears the release, we push to prod on release day.

The developers that actually write the software offering have similar
workflows for their stuff, except they have a build-system involved to
compile & pkg stuff up & put the packages into the package repo which
get deployed to test (and eventually prod) with saltstack rules.

Our SRE is mostly concerned with making sure the monitoring of
everything is up to snuff and the playbooks for acting on alerts is
up-to-date and the on-call person can follow it. We have a meeting every
other week to go over the alerts & playbooks to make sure that we're
keeping things up to date there. He doesn't manage the systems at all,
he just makes sure all the moving pieces are properly monitored and we
know how to deal with the problems as they come up.

-- 
Michael Parson
Pflugerville, TX
KF5LGQ

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-11-09 19:38 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <992562BA-E21F-4542-A50B-6CFE8F7ACE86@planet.nl>
     [not found] ` <20221011134842.GA11780@mcvoy.com>
     [not found]   ` <CANCZdfrY0DHVC+VNcJGKcyYNfHN0uM=OP8z3wm-2dEsDVd8RqA@mail.gmail.com>
     [not found]     ` <CALQ0xCA8djfEVxvxkmtwg0uE+YCEYNh1n7etbuJS=9+3=qPq7A@mail.gmail.com>
     [not found]       ` <20221011195447.GI11780@mcvoy.com>
     [not found]         ` <8583490b-c7cc-4633-b506-2f16335fd3e2@home.arpa>
     [not found]           ` <20221011201025.GJ11780@mcvoy.com>
     [not found]             ` <513e8a46-bd31-420a-bfdf-b59451f89c8d@home.arpa>
     [not found]               ` <0db171e4-7efe-8c00-bb30-a6f914cf9911@technologists.com>
     [not found]                 ` <CANCZdfoEg_++R7ANoU=Cg5Uzn5x-MUXm5wpk1zsAurOV_Be9Gg@mail.gmail.com>
     [not found]                   ` <CALQ0xCCNKZ=baxKA=8CXkvT0Q3Zy_5Yqm7Yn8HtjOt6SCMoyfA@mail.gmail.com>
     [not found]                     ` <CAEoi9W4dz7vvjyaXu26Zv5KXXujQzbh18=9wTrrW24qRS3zxig@mail.gmail.com>
     [not found]                       ` <29942374-F162-43EE-9F65-D51C79B4D7B4@canb.auug.org.au>
2022-11-09 19:38                         ` [COFF] Re: DevOps/SRE [was Re: [TUHS] Re: LOC [was Re: Re: Re.: Princeton's "Unix: An Oral History": who was in the team in "The Attic"?] Michael Parson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).