From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 27173 invoked from network); 1 Aug 2020 01:40:53 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 1 Aug 2020 01:40:53 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id AD3129CB4A; Sat, 1 Aug 2020 11:40:51 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 269289C9E3; Sat, 1 Aug 2020 11:39:58 +1000 (AEST) Received: by minnie.tuhs.org (Postfix, from userid 112) id CED349C9E3; Sat, 1 Aug 2020 11:39:55 +1000 (AEST) Received: from mcvoy.com (mcvoy.com [192.169.23.250]) by minnie.tuhs.org (Postfix) with ESMTPS id 49A9E93DFC for ; Sat, 1 Aug 2020 11:39:55 +1000 (AEST) Received: by mcvoy.com (Postfix, from userid 3546) id F3CD235E0BB; Fri, 31 Jul 2020 18:39:54 -0700 (PDT) Date: Fri, 31 Jul 2020 18:39:54 -0700 From: Larry McVoy To: Rob Pike Message-ID: <20200801013954.GH10778@mcvoy.com> References: <6e9ca056-dfb0-376d-effd-e41c9ed3ef2a@gmail.com> <402F66B1-12C1-4FED-89FD-38AB99D919B4@iitbombay.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [TUHS] Regular Expressions X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Bakul Shah , TUHS main list Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" That link is a great read. On Sat, Aug 01, 2020 at 10:36:48AM +1000, Rob Pike wrote: > I think this link - https://swtch.com/~rsc/regexp/regexp1.html i- s the > best place to start. Superb exposition on the background, theory, and > implementation as well as a bit of history of how the industry lost its way > with regular expressions. > > Regular expressions are beautiful, simple, and widely misunderstood. > > -rob > > > On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah wrote: > > > On Jul 31, 2020, at 3:57 PM, Will Senn wrote: > > > > > > I've always been intrigued with regexes. When I was first exposed to > > them, I was mystified and lost in the greediness of matches. Now, I use > > them regularly, but still have trouble using them. I think it is because I > > don't really understand how they work. > > > ... > > > 1. What's the provenance of regex in unix (when did it appear, in what > > form, etc)? > > > 2. What are the 'best' implementations throughout unix (keep it pre > > 1980s)? > > > 3. What are some of the milestones along the way (major changes, forks, > > disagreements)? > > > 4. Where, in the source, or in a paper, would you point someone to > > wanting to better understand the mechanics of regex? > > > > Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction > > > > [I learned about regular expressions in an automata theory class, > > before I knew anything about Unix. What helped me was learning > > about finite state machines. You won't need more than paper and > > pencil to construct one. Reading source code would make more > > sense once you grasp how to construct a FSM corresponding to a RE.] -- --- Larry McVoy lm at mcvoy.com http://www.mcvoy.com/lm