From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 7999 invoked from network); 31 Jul 2020 22:58:38 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 31 Jul 2020 22:58:38 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id E8BE89C9E3; Sat, 1 Aug 2020 08:58:32 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 01F3C9CAA7; Sat, 1 Aug 2020 08:57:45 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="If7wIEQR"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id DBFBF93DFC; Sat, 1 Aug 2020 08:57:41 +1000 (AEST) Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) by minnie.tuhs.org (Postfix) with ESMTPS id E55B793DFC for ; Sat, 1 Aug 2020 08:57:40 +1000 (AEST) Received: by mail-ot1-f54.google.com with SMTP id h1so23826053otq.12 for ; Fri, 31 Jul 2020 15:57:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:subject:message-id:date:user-agent:mime-version :content-language; bh=0dSxx/FwF712YSDKxfOzcmVQ6h/xEazOqVMXSLVrE+o=; b=If7wIEQREjwb7E/7iQG5WVogc8kgNjHHM/YvqYxzPeijvFGCMQJxALFHXdArzLK4ws fhwBKTwA1FEr+YVO5cMEAUiI3gD7SXRjUV4B5B7FnE4HdwljJvUFrYaVPkgS086L2mLM 1DSizhA1MdCIAh9IaJNgzijgUr1L6PyjYB6FaBQvQlmvQupGDpo7cW7VFdKR6344tbqV Hak27FVKbL2hUWwdLI/aFqIs1/fvcFXyUso6QrQaO4c0SLBmav+Pdm+JMnTNCXWZgiaP T08xS0kLRZEzx206ErxLxNKad3SF1LnMr3VPc89KneBpbgtkl2wYCQGr9TftEKulLsiv 3swg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version:content-language; bh=0dSxx/FwF712YSDKxfOzcmVQ6h/xEazOqVMXSLVrE+o=; b=lFPuM8fLrZCq6btXlHAn5Bw+r1DEWM5hz8TuyFXQRTAOlGj//+157Ps4U0EgCS1PT+ RkaxGJ7OAJeqPQVcF14/Wnemy4zsbEgkpiv4dbJf1jRBSI8d0ovR6vtVQHn5xOFkMyhP tAIL8EloKrabm8TJsRVDIGm+Hg1KN0X/jedJKdW1Ae5GRn4rOMZ06gifAuvuFimpUDT/ T1jrp2uuaJV74HnD1L4rHWzy72V9QwdRkfUkeeGDwZGzTjFMOuWeC9QRk8hjlEfi8IiR TP9DN6Suy8wc049ep3zegIYxZULsvpyNUig0ldihqWTuMzAXy6z7EJ5ABIPBmXomNglV vZ4g== X-Gm-Message-State: AOAM531jPtBPzKnpX560N2D3NNUh2RrRg5vh7G4sKZ0eJ2+lH5z3c9qj cihEKDQ84JfB2G+bksj9oHjPjwDWR0A= X-Google-Smtp-Source: ABdhPJxWYF9e/VEaqXRsmNRNhTyrSCVWouZ/A5rXidapjov4Qu+UoLtQ9mEkY/fXoc7KoGaXWgHtcw== X-Received: by 2002:a9d:8e4:: with SMTP id 91mr4659420otf.38.1596236259595; Fri, 31 Jul 2020 15:57:39 -0700 (PDT) Received: from terra.local (99-139-148-170.lightspeed.rcsntx.sbcglobal.net. [99.139.148.170]) by smtp.gmail.com with ESMTPSA id x16sm1669564otp.15.2020.07.31.15.57.38 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 31 Jul 2020 15:57:39 -0700 (PDT) To: TUHS main list From: Will Senn Message-ID: <6e9ca056-dfb0-376d-effd-e41c9ed3ef2a@gmail.com> Date: Fri, 31 Jul 2020 17:57:37 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="------------23BEC2A6CA33BE53A82A9474" Content-Language: en-US Subject: [TUHS] Regular Expressions X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" This is a multi-part message in MIME format. --------------23BEC2A6CA33BE53A82A9474 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit I've always been intrigued with regexes. When I was first exposed to them, I was mystified and lost in the greediness of matches. Now, I use them regularly, but still have trouble using them. I think it is because I don't really understand how they work. My question for y'all has to do with early unix. I have a copy of Thompson, K. (1968). Regular expression search algorithm. Communications of the ACM, 11(6), 419-422. It is interesting as an example of Thompson's thinking about regexes. In this paper, he presents a non-backtracking, efficient, algorithm for converting a regex into an IBM 7094 (whatever that is) program that can be run against text input that generates matches. It's cool. It got me to thinking maybe the way to understand the unix regex lies in a careful investigation into how it is implemented (original thought, right?). So, here I am again to ask your indulgence as the latecomer wannabe unix apprentice. My thought is that ed is where it begins and might be a good starting point, but I'm not sure - what say y'all? I also have a copy of the O'Reilly Mastering Regular Expressions book, but that's not really the kind of thing I'm talking about. My question is more basic than how to use regexes practically. I would like to understand them at a parsing level/state change level (not sure that's the correct way to say it, but I'm really new to this kind of lingo).  When I'm done with my stepping through the source, I want to be able to reason that this is why that search matched that text and not this text and why the search was greedy, or not greedy because of this logic here... If my question above isn't focused or on topic enough, here's an alternative set to ruminate on and hopefully discuss: 1. What's the provenance of regex in unix (when did it appear, in what form, etc)? 2. What are the 'best' implementations throughout unix (keep it pre 1980s)? 3. What are some of the milestones along the way (major changes, forks, disagreements)? 4. Where, in the source, or in a paper, would you point someone to wanting to better understand the mechanics of regex? Thanks! Will -- GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF --------------23BEC2A6CA33BE53A82A9474 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit I've always been intrigued with regexes. When I was first exposed to them, I was mystified and lost in the greediness of matches. Now, I use them regularly, but still have trouble using them. I think it is because I don't really understand how they work.

My question for y'all has to do with early unix. I have a copy of Thompson, K. (1968). Regular expression search algorithm. Communications of the ACM, 11(6), 419-422. It is interesting as an example of Thompson's thinking about regexes. In this paper, he presents a non-backtracking, efficient, algorithm for converting a regex into an IBM 7094 (whatever that is) program that can be run against text input that generates matches. It's cool. It got me to thinking maybe the way to understand the unix regex lies in a careful investigation into how it is implemented (original thought, right?). So, here I am again to ask your indulgence as the latecomer wannabe unix apprentice. My thought is that ed is where it begins and might be a good starting point, but I'm not sure - what say y'all?

I also have a copy of the O'Reilly Mastering Regular Expressions book, but that's not really the kind of thing I'm talking about. My question is more basic than how to use regexes practically. I would like to understand them at a parsing level/state change level (not sure that's the correct way to say it, but I'm really new to this kind of lingo).  When I'm done with my stepping through the source, I want to be able to reason that this is why that search matched that text and not this text and why the search was greedy, or not greedy because of this logic here...

If my question above isn't focused or on topic enough, here's an alternative set to ruminate on and hopefully discuss:

1. What's the provenance of regex in unix (when did it appear, in what form, etc)?
2. What are the 'best' implementations throughout unix (keep it pre 1980s)?
3. What are some of the milestones along the way (major changes, forks, disagreements)?
4. Where, in the source, or in a paper, would you point someone to wanting to better understand the mechanics of regex?

Thanks!

Will

-- 
GPG Fingerprint: 68F4 B3BD 1730 555A 4462  7D45 3EAA 5B6D A982 BAAF
--------------23BEC2A6CA33BE53A82A9474--