From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, NICE_REPLY_A,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 4993 invoked from network); 1 Aug 2020 03:07:53 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 1 Aug 2020 03:07:53 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 8EB819CB47; Sat, 1 Aug 2020 13:07:52 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 2FD2B9C9E3; Sat, 1 Aug 2020 13:07:16 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="qaUupmp/"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id D5B1D9C9E3; Sat, 1 Aug 2020 13:07:12 +1000 (AEST) Received: from mail-oi1-f170.google.com (mail-oi1-f170.google.com [209.85.167.170]) by minnie.tuhs.org (Postfix) with ESMTPS id 0560793DFC for ; Sat, 1 Aug 2020 13:07:12 +1000 (AEST) Received: by mail-oi1-f170.google.com with SMTP id e6so8271117oii.4 for ; Fri, 31 Jul 2020 20:07:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=jF0a9AMCobm7p/zWdDlCwYPYDkFrk3X18kP7YuuLQ8w=; b=qaUupmp/ycoF6e6HAlkdd3jAFgZ1nKRlLekVarASd/HD02wIQVSgPiQqm/oF9fexlJ NHBmVUagTh4yJ/3FLr+V2BGWPz09OB8GvQOUaVzaIKiHzSLbRAVQHCSyvU73ceGnbmOJ tcZ0C3GsXapp3k7D156xRunV4z5l4pHGyCtnmHOPNSthwgsQ57UCW3P1zsgTSdrO5S6C w6/clbI2lxe0cEmOZExniZips3QcZzS2q39SlVfiS6ArIvSSbJpsvtchyQJunlJ5TF2K 6wtwCcd/qh0cTZIYT5/lN1CjXLPTnWzioMZogYTyL37gMXGs4PiWG0bwDUmhgKvKOGp8 7USA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=jF0a9AMCobm7p/zWdDlCwYPYDkFrk3X18kP7YuuLQ8w=; b=dTAGfJy4mOXa4rlum9DiyJRzT7xgpbRkZz4zXz0KBIVIxA6qWJoCjSjpm8HLmBUdI4 6K0lq3JpN55uziZZGlO3MaGbF0PMvckvlRAdS8U95cRqDBczcKsCHd26q7CgWOarnmfk SKLOgNQIwcdsWmq0ifwbnhc9rF6Y8kso925wlphZ6feuG7EmpDsiSf+SULmoZ3mdI7yn B51XT0liR3h2Lw/q4GnTIcSTSgM6i920n1Jx9wr6OnO9wlY/YTfB3GVfJqO+lyGvMqXE DVgAR10M+JGczOTJMgL4lNP+JxSe6OP6kXGYq2a4ru/z6LtoqVpanQbjBcVB0dlgOPC1 W4nA== X-Gm-Message-State: AOAM5339/JBKVOLetRiZtKU9E8thUCx1K7K5jSx+3wR+iLJq94wTGE6r M5N51cRdH+09pmcS0BFYHR/t9HtEQvc= X-Google-Smtp-Source: ABdhPJw/JBrCr/IR2KkgaI60EitTt2mZJ1oZD7058/M/XnSM/rboGJ3ZAU5iq80DR6fbRwDrTNe8nQ== X-Received: by 2002:a05:6808:204:: with SMTP id l4mr4006844oie.16.1596251230848; Fri, 31 Jul 2020 20:07:10 -0700 (PDT) Received: from terra.local (99-139-148-170.lightspeed.rcsntx.sbcglobal.net. [99.139.148.170]) by smtp.gmail.com with ESMTPSA id n17sm1700419ota.37.2020.07.31.20.07.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 31 Jul 2020 20:07:10 -0700 (PDT) To: Rob Pike , Bakul Shah References: <6e9ca056-dfb0-376d-effd-e41c9ed3ef2a@gmail.com> <402F66B1-12C1-4FED-89FD-38AB99D919B4@iitbombay.org> From: Will Senn Message-ID: <50453bb3-af58-043d-07f5-cedb14e4224b@gmail.com> Date: Fri, 31 Jul 2020 22:07:09 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------A2B9C1E2EC35D2A6EBF23D0A" Content-Language: en-US Subject: Re: [TUHS] Regular Expressions X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: TUHS main list Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" This is a multi-part message in MIME format. --------------A2B9C1E2EC35D2A6EBF23D0A Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Oh, and one good google over another, I also found this: https://www.drdobbs.com/architecture-and-design/regular-expressions/184410904 Which is also great and simple to follow :). Will On 7/31/20 7:36 PM, Rob Pike wrote: > I think this link - https://swtch.com/~rsc/regexp/regexp1.html i- s > the best place to start. Superb exposition on the background, theory, > and implementation as well as a bit of history of how the industry > lost its way with regular expressions. > > Regular expressions are beautiful, simple, and widely misunderstood. > > -rob > > > On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah > wrote: > > On Jul 31, 2020, at 3:57 PM, Will Senn > wrote: > > > > I've always been intrigued with regexes. When I was first > exposed to them, I was mystified and lost in the greediness of > matches. Now, I use them regularly, but still have trouble using > them. I think it is because I don't really understand how they work. > > ... > > 1. What's the provenance of regex in unix (when did it appear, > in what form, etc)? > > 2. What are the 'best' implementations throughout unix (keep it > pre 1980s)? > > 3. What are some of the milestones along the way (major changes, > forks, disagreements)? > > 4. Where, in the source, or in a paper, would you point someone > to wanting to better understand the mechanics of regex? > > Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction > > [I learned about regular expressions in an automata theory class, >  before I knew anything about Unix. What helped me was learning >  about finite state machines. You won't need more than paper and >  pencil to construct one. Reading source code would make more >  sense once you grasp how to construct a FSM corresponding to a RE.] > -- GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF --------------A2B9C1E2EC35D2A6EBF23D0A Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
Oh, and one good google over another, I also found this:

https://www.drdobbs.com/architecture-and-design/regular-expressions/184410904

Which is also great and simple to follow :).

Will

On 7/31/20 7:36 PM, Rob Pike wrote:
I think this link -  https://swtch.com/~rsc/regexp/regexp1.html i- s the best place to start. Superb exposition on the background, theory, and implementation as well as a bit of history of how the industry lost its way with regular expressions.

Regular expressions are beautiful, simple, and widely misunderstood.

-rob


On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah <bakul@iitbombay.org> wrote:
On Jul 31, 2020, at 3:57 PM, Will Senn <will.senn@gmail.com> wrote:
>
> I've always been intrigued with regexes. When I was first exposed to them, I was mystified and lost in the greediness of matches. Now, I use them regularly, but still have trouble using them. I think it is because I don't really understand how they work.
> ...
> 1. What's the provenance of regex in unix (when did it appear, in what form, etc)?
> 2. What are the 'best' implementations throughout unix (keep it pre 1980s)?
> 3. What are some of the milestones along the way (major changes, forks, disagreements)?
> 4. Where, in the source, or in a paper, would you point someone to wanting to better understand the mechanics of regex?

Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction

[I learned about regular expressions in an automata theory class,
 before I knew anything about Unix. What helped me was learning
 about finite state machines. You won't need more than paper and
 pencil to construct one. Reading source code would make more
 sense once you grasp how to construct a FSM corresponding to a RE.]


-- 
GPG Fingerprint: 68F4 B3BD 1730 555A 4462  7D45 3EAA 5B6D A982 BAAF
--------------A2B9C1E2EC35D2A6EBF23D0A--