From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, NICE_REPLY_A,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 1028 invoked from network); 1 Aug 2020 02:34:42 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 1 Aug 2020 02:34:42 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 34D5A9C9E3; Sat, 1 Aug 2020 12:34:35 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 91E269C9FB; Sat, 1 Aug 2020 12:33:49 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="HtYiv+bE"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id 858F49C9FB; Sat, 1 Aug 2020 12:33:43 +1000 (AEST) Received: from mail-ot1-f43.google.com (mail-ot1-f43.google.com [209.85.210.43]) by minnie.tuhs.org (Postfix) with ESMTPS id E4A4C9C9E3 for ; Sat, 1 Aug 2020 12:33:41 +1000 (AEST) Received: by mail-ot1-f43.google.com with SMTP id r21so13573498ota.10 for ; Fri, 31 Jul 2020 19:33:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=Uus4sg4aMvzAvE1PxqkdJn6HQyRsM8pl7dk3iXPsCPM=; b=HtYiv+bEzWKzJGD2z9UO0znCz8O87YOGbzu2T+TC5jpI1imdC8V4UHG1Kh2wtVtjDB hvOQVpFbFBfhTBwXnFxC5TAPeNRmzpXwMG3sL7VqTgp2byaRQOdq66JvxP08RIVONwDe sZ8bjqVkR0McCsrB6o3iEijFsLxP9lgv+M1NCagnzW6GZuls1pSCftJsbu4myfv9R0I7 PNxlaH3YkB6Tbl+IF/oishrB567NVzaNeRZmt+HEjy9UYLVXcgwBOgILmPGUjdZ3aCyE TwEe/MhDhmNv22f60eEtQAraCqufeO3V5icjGep8cbfcC3kCP/hc8qOm4AzZpHOE2daq G+RQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=Uus4sg4aMvzAvE1PxqkdJn6HQyRsM8pl7dk3iXPsCPM=; b=hzL6RbMOCa2VadOG4UllV1Ywwn+mYef/EZxNOS3TRySmzTWUNVDsDaYBzIOI3u1zfN Lg7uwHr22GCUAJdW7AowEkzbXthYtWcsjE/65pIeFaCMcqDwnQtLew7IBoC1D+GFereL GkOcy9RMAAt9cGDN927AEPTfssdl8WV4E/vWj7et3GHK+fCQt1zMFXLK9Byqs3YzC6u3 IAeCfokjvWoWHNJ5sCFjRRWHBIWsFafUL3KMPbyx29kmv4g893sEdJDuGfOGUqBWuMB6 JtjxbGdF0d5FKsrVvCYhl36dJw4aGvhYWqPB/tvRMsdwxH8YCZtC5b48w0IJWMfpswyB 1mWg== X-Gm-Message-State: AOAM533NU+E+N6B+L9qcsJ4oV57re6znXfRmYz8lt13IonvkMfjBYYA6 7AV8hNbFeU9j63+FYwEW93nQ5mQcsFM= X-Google-Smtp-Source: ABdhPJwu/Od8ck0SYZQDz7AQTgM7cUmCAvoy+QDJuOVUFix07raeiDdnJlSj2ErrkXdoRNfrVypOcA== X-Received: by 2002:a9d:66cd:: with SMTP id t13mr5281037otm.101.1596249220557; Fri, 31 Jul 2020 19:33:40 -0700 (PDT) Received: from terra.local (99-139-148-170.lightspeed.rcsntx.sbcglobal.net. [99.139.148.170]) by smtp.gmail.com with ESMTPSA id p63sm1553066oig.8.2020.07.31.19.33.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 31 Jul 2020 19:33:39 -0700 (PDT) To: Rob Pike , Bakul Shah References: <6e9ca056-dfb0-376d-effd-e41c9ed3ef2a@gmail.com> <402F66B1-12C1-4FED-89FD-38AB99D919B4@iitbombay.org> From: Will Senn Message-ID: Date: Fri, 31 Jul 2020 21:33:39 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------B3EE20D95649D47A2E4268F7" Content-Language: en-US Subject: Re: [TUHS] Regular Expressions X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: TUHS main list Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" This is a multi-part message in MIME format. --------------B3EE20D95649D47A2E4268F7 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Yes, I googled it per Clem's suggestion and wound up on that exact link after wandering around admiring the scenery. I envy the more mathematically inclined among us their view of matters technical. This piece, being in C and having step by step articulation of the diagrams, is better for me than the more formal wikipedia article, although, when I get enough background, that looks like it'll be good, too. Thanks Rob, Clem and Bakul. Will On 7/31/20 7:36 PM, Rob Pike wrote: > I think this link - https://swtch.com/~rsc/regexp/regexp1.html i- s > the best place to start. Superb exposition on the background, theory, > and implementation as well as a bit of history of how the industry > lost its way with regular expressions. > > Regular expressions are beautiful, simple, and widely misunderstood. > > -rob > > > On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah > wrote: > > On Jul 31, 2020, at 3:57 PM, Will Senn > wrote: > > > > I've always been intrigued with regexes. When I was first > exposed to them, I was mystified and lost in the greediness of > matches. Now, I use them regularly, but still have trouble using > them. I think it is because I don't really understand how they work. > > ... > > 1. What's the provenance of regex in unix (when did it appear, > in what form, etc)? > > 2. What are the 'best' implementations throughout unix (keep it > pre 1980s)? > > 3. What are some of the milestones along the way (major changes, > forks, disagreements)? > > 4. Where, in the source, or in a paper, would you point someone > to wanting to better understand the mechanics of regex? > > Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction > > [I learned about regular expressions in an automata theory class, >  before I knew anything about Unix. What helped me was learning >  about finite state machines. You won't need more than paper and >  pencil to construct one. Reading source code would make more >  sense once you grasp how to construct a FSM corresponding to a RE.] > -- GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF --------------B3EE20D95649D47A2E4268F7 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
Yes, I googled it per Clem's suggestion and wound up on that exact link after wandering around admiring the scenery. I envy the more mathematically inclined among us their view of matters technical. This piece, being in C and having step by step articulation of the diagrams, is better for me than the more formal wikipedia article, although, when I get enough background, that looks like it'll be good, too.

Thanks Rob, Clem and Bakul.

Will


On 7/31/20 7:36 PM, Rob Pike wrote:
I think this link -  https://swtch.com/~rsc/regexp/regexp1.html i- s the best place to start. Superb exposition on the background, theory, and implementation as well as a bit of history of how the industry lost its way with regular expressions.

Regular expressions are beautiful, simple, and widely misunderstood.

-rob


On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah <bakul@iitbombay.org> wrote:
On Jul 31, 2020, at 3:57 PM, Will Senn <will.senn@gmail.com> wrote:
>
> I've always been intrigued with regexes. When I was first exposed to them, I was mystified and lost in the greediness of matches. Now, I use them regularly, but still have trouble using them. I think it is because I don't really understand how they work.
> ...
> 1. What's the provenance of regex in unix (when did it appear, in what form, etc)?
> 2. What are the 'best' implementations throughout unix (keep it pre 1980s)?
> 3. What are some of the milestones along the way (major changes, forks, disagreements)?
> 4. Where, in the source, or in a paper, would you point someone to wanting to better understand the mechanics of regex?

Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction

[I learned about regular expressions in an automata theory class,
 before I knew anything about Unix. What helped me was learning
 about finite state machines. You won't need more than paper and
 pencil to construct one. Reading source code would make more
 sense once you grasp how to construct a FSM corresponding to a RE.]


-- 
GPG Fingerprint: 68F4 B3BD 1730 555A 4462  7D45 3EAA 5B6D A982 BAAF
--------------B3EE20D95649D47A2E4268F7--