From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [50.116.15.146]) by inbox.vuxu.org (Postfix) with ESMTP id 6067B2B958 for ; Mon, 4 Mar 2024 17:58:05 +0100 (CET) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id EFF31436D1; Tue, 5 Mar 2024 02:57:57 +1000 (AEST) Received: from mail-vs1-xe35.google.com (mail-vs1-xe35.google.com [IPv6:2607:f8b0:4864:20::e35]) by minnie.tuhs.org (Postfix) with ESMTPS id 81953436D0 for ; Tue, 5 Mar 2024 02:57:53 +1000 (AEST) Received: by mail-vs1-xe35.google.com with SMTP id ada2fe7eead31-472751180f6so1136376137.2 for ; Mon, 04 Mar 2024 08:57:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ccc.com; s=google; t=1709571472; x=1710176272; darn=tuhs.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Ymat+AuFn/EhY3yqZpNSX1Dm9+xzylBmo10CIbZ24Zk=; b=Ei1fmQ2HaG7GekMfm7KXg9bj6CVnc8sXMB64Quz6YKDXm+1r913h9GXOOB5cMS25L7 kCiM5Jdj7ud6A7REcooKHNwfPxtt2Bcz2wsp5rO7jefGZnil/WkWlCfurn6j6/wB9hoR ijEIz0kRu2knfjkIuNqEd0WQM6Z7/bZPfoFWI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709571472; x=1710176272; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Ymat+AuFn/EhY3yqZpNSX1Dm9+xzylBmo10CIbZ24Zk=; b=wc5txQcdHJxfA5ZICushOY/XfyFYGN+vDrMLl4mfQ9yGkpB3qcagGdC9mCi7uWpNwu +Bz9pny8XEY1BV4X8SzdzD0TNoSAQFb1hpDOqxFo4ZqROzVbMu0hRCPowPlHTWxjOPgm bCp7lLv9lBm1v9FJT8fs2+CQ22uj8+iChcVaq2GRrrPziS9a8Vj3Xao07yMpeyYOFdMm QH/PZ9+2XwLa/2zw1B6xzCusrJr2yMnyJk40vfAHCVXc56ktJTVd8gBVBzyeEYf7GEAz 50e7oN1YROaY2l23d8s2rSym6lbFR+IfwYz7yPGE+20S7iT2Niu/fyYjZp8fr2ckr882 kLOw== X-Gm-Message-State: AOJu0YxoqCNtNiDkP9/TEr0cPkyOXKSxtghnKR2u+D+i2miTjKWP8J/4 37Zck+JcBXapWL/mZ36cHuf4BXTWopyCGx0ROjmXOMa1TEfrGN41fZ4eum4bkdnKwtl4MLnSmKB ea5utsp/1jHW8jJ++ltJFLgrBa/8ZwLf339t8 X-Google-Smtp-Source: AGHT+IEl2XjhxnEmoz8aiN3Xc1iINa07j/oJ7v390a+6KkeZcItWa/eEMMhfEI+borNAkJY5OG0kMGgfVHzXqG2r0AE= X-Received: by 2002:a05:6102:2261:b0:470:3eff:e755 with SMTP id v1-20020a056102226100b004703effe755mr6280900vsd.24.1709571472254; Mon, 04 Mar 2024 08:57:52 -0800 (PST) MIME-Version: 1.0 References: <13abd764-984a-4c9f-8e3e-b1eb7c624692@gmail.com> In-Reply-To: <13abd764-984a-4c9f-8e3e-b1eb7c624692@gmail.com> From: Clem Cole Date: Mon, 4 Mar 2024 11:57:15 -0500 Message-ID: To: Will Senn Content-Type: multipart/alternative; boundary="0000000000005290800612d8a07d" Message-ID-Hash: 2DECQOPKHJAJILIC24Z5GUAVJKUBLFON X-Message-ID-Hash: 2DECQOPKHJAJILIC24Z5GUAVJKUBLFON X-MailFrom: clemc@ccc.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: TUHS X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: regex early discussions List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --0000000000005290800612d8a07d Content-Type: text/plain; charset="UTF-8" I've already had a chat with Will, but I wanted to add some other thoughts to the group as a whole: - As was pointed out by others, computer life (certainly not interactive computing) does not begin with UNIX (*i.e.* Interactive Text Editors have been around since the beginning of Interactive computing). I'll use Thomas Haigh and Paul Ceruzzi's text: "A New History of Modern computing" - which basically pegs that as CTSS. I don't know what the original editor was for CTSS. [if some one like Doug or Ken remembers, I'd be curious to know]. - Numerous editors show up on different systems, including STOPGAP on the MIT PDP6, eventually SOS, TECO, EMACs, *etc*., and most have some concept of a 'line of text' to distinguish from a 'card image.' - Common to all is some way to search or find text and some way to replace it - usually on a line of input. - One of them is Lampson and Deutsch's "quick editor" or QED for SDS. - Language theory was definitely a hot item by the mid-1960s and lots of papers discussing automaton and the like appear, including Ken's CACM 1968 article describing his reg-ex search algorithm implementation for the IBM 7094 [it should be findable with a search -- send me an email offline, I have a copy of a crappy scan but it is readable]. - Most editors like SOS, TECO and the like do not have support for reg-ex, but do have some way to do sophisticated searching (and replacement). - Ken wrote an implementation of QED for CTSS and included his search algorithm as an integral part of this new implementation. - When Ken writes the original UNIX editor, he bases it on the above. - UNIX builds up this idea of a pipeline, so building separate tools that connect together make sense and are natural. - When Rudd, Doug, Ken, Dennis, *et al* start to develop UNIX - they are building a system for *themselves.* - One member of the group (Lee McHahon) is using the g/re/p command to find things and gets the brilliant idea of a separate tool, grep(1) would be born. - The most important item here is that said team is a group of programmers, so it was logical that the system was useful and easy to understand by other programmers. Will asked how did people learn about Reg-Ex? The answer of course, it depends. But if you were to take college-level CS courses in the late 60s or the 70s, as Bakul mentioned (I also had a similar experience), if you were going to be taught about automata and simple language theory -- likely in your first data structures and algorithms class, as certainly by the time you took a compiler course. My memory is I learned basic automata theory in the first, but did not see the idea of regular expressions until compilers [in my case, this is all pre-dragon book]. For all of you later in the 70s, Aho and Ullman's classic text would have exposed it to you. FWIW: In the 2000's my daughter's college CS training, she never had to take a compiler or comparative languages course, but she was taught about reg-ex in her data structures course. The key is you were taught a bit about automata theory, but if you really started to study it, you look at things like the performance of the different algorithms. As Rob says, the key take away from learning about the reg-ex idea, is its linear performance. So, if you were trained in some of the formal CS ideas, *using reg-ex was not a huge lift*. It was natural. That said, if you were coming from other systems using things like SOS or Teco (like me), they offered search functions also but the expressions but no in the same way. It was a different way to do things, but people like me, quickly realized it was a lot more powerful and could do much more. *"Ah ha .. cool beans, apply something I already knew about in a way I had not seen before ... next item ..."* So there are a few things to realize from this. 1. Adding things like reg-ex to tools like sed(1) and awk(1) were natural follow-ons to things like grep(1) and ed(1). 2. If you were a CS person, it was not a big deal - just the more powerful "UNIX-way" as it were. But... 3. If you came from another world of computing (say DEC or a PC) where such tools were not exposed in a manner that was easy to build upon *and/or you had never been taught much of any core CS theory* [which is where Will cut his teeth], reg-ex might be astonishing. So I think its not a question of why -- it was just how UNIX did things. It was a natural way for a programmer to express something. --0000000000005290800612d8a07d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I've already=C2=A0had a chat with Will, but I wante= d to add some other thoughts to the group as a whole:
  • As wa= s pointed out by others, computer life (certainly not interactive computing= ) does not begin with UNIX (i.e. Interactive=C2=A0Text Editors have = been around=C2=A0since the beginning of Interactive computing).=C2=A0 =C2= =A0I'll use Thomas Haigh and Paul Ceruzzi's=C2=A0text: "A New = History of Modern computing" - which basically pegs that as CTSS.=C2= =A0 I don't know what the original editor was for CTSS. [if some one li= ke Doug or Ken remembers, I'd be curious to know].
  • Numerous=C2= =A0editors show up on different systems, including STOPGAP on the MIT PDP6,= eventually=C2=A0SOS, TECO, EMACs, etc., and most have some concept = of a 'line of text' to distinguish=C2=A0from a 'card image.'= ;
  • Common to all is some way to search or find text and some way to = replace it - usually on a line of input.
  • One of them is Lampson and= Deutsch's "quick editor" or QED for SDS.
  • Language th= eory was definitely a hot item by the mid-1960s and lots of papers discussi= ng automaton and the like appear, including Ken's CACM 1968 article des= cribing his reg-ex search algorithm implementation for the IBM 7094 [it sho= uld be findable with a search -- send me an email offline, I have a copy of= a crappy scan but it is readable].
  • Most editors like SOS, TECO and= the like do not have support for reg-ex, but do have some way to do sophis= ticated searching (and replacement).
  • Ken wrote an implementatio= n of QED for CTSS and included his search algorithm as an integral part of = this new implementation.
  • When Ken writes the original UNIX editor, = he bases it on the above.
  • UNIX builds up this idea of a pipeline, s= o building separate tools that connect together make sense and are natural.=
  • When Rudd, Doug, Ken, Dennis,=C2=A0et al start to devel= op UNIX - they are building a system for themselves.
  • = One member of the group (Lee=C2=A0McHahon) is using the g/re/p command to f= ind things and gets the brilliant idea of a separate tool, grep(1) would be= born.
  • The most important item here is that said=C2=A0team is a= group of programmers, so it was logical that the system was useful and eas= y to understand by other programmers.

Will ask= ed how did people learn about Reg-Ex?=C2=A0 =C2=A0The answer of course, it = depends.=C2=A0

But if you were to take college-lev= el CS courses in the late 60s or the 70s, as Bakul mentioned (I also had a = similar experience), if you were going to be taught about automata and simp= le language theory -- likely in your first data structures and algorithms c= lass, as certainly by the time you took a compiler course. My memory is I l= earned basic automata theory in the first, but did not see the idea of regu= lar expressions until compilers [in my case, this is all pre-dragon book].= =C2=A0 =C2=A0For all of you later in the 70s, Aho and Ullman's classic = text would have exposed it to you.=C2=A0 =C2=A0 FWIW: In the 2000's my = daughter's college CS training, she never had to take a compiler or com= parative languages course, but she was taught about reg-ex in her data stru= ctures course.

The key is you were taught a bit ab= out automata theory, but if you really started to study it, you look at thi= ngs like the performance of the different algorithms.=C2=A0 As Rob says, th= e key take away from learning about the reg-ex idea, is its linear performa= nce.=C2=A0 So, if you were trained in some of the formal CS ideas, us= ing reg-ex was not a huge lift. It was natural.=C2=A0=C2=A0

That said, if you were coming from other systems using th= ings like SOS or Teco (like me), they=C2=A0offered search functions also bu= t the expressions but no in the same way.=C2=A0 It was a different way to d= o things, but people like me, quickly realized it was a lot more powerful a= nd could do much more. "Ah ha .. cool beans, apply something I alre= ady=C2=A0knew about in a way I had not seen before ... next item ..."<= /i>

So there are a few things to realize from this= .=C2=A0=C2=A0
  1. Adding things like reg-ex to tools like sed= (1) and awk(1) were natural follow-ons to things like grep(1) and ed(1).
  2. If you were a CS person, it was not a=C2=A0big deal - just the more p= owerful "UNIX-way" as it were. But...
  3. If you came from an= other world of computing (say DEC or a PC)=C2=A0 where such tools were not = exposed in a manner that was easy to build upon and/or you had never = been taught much of any core CS theory [which is where Will cut his= teeth], reg-ex might be astonishing.
So I think its=C2=A0not= a question of why -- it was just how UNIX did things. It was a natural way= for a programmer to express something.
--0000000000005290800612d8a07d--