From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 21760 invoked from network); 1 Aug 2020 00:54:51 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 1 Aug 2020 00:54:51 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id A0CFC9CB42; Sat, 1 Aug 2020 10:54:47 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 0F8E59C9E3; Sat, 1 Aug 2020 10:54:00 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="QcxJ0zI9"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id A6EF29C9E3; Sat, 1 Aug 2020 10:53:57 +1000 (AEST) Received: from mail-vk1-f179.google.com (mail-vk1-f179.google.com [209.85.221.179]) by minnie.tuhs.org (Postfix) with ESMTPS id 9806293DFC for ; Sat, 1 Aug 2020 10:53:55 +1000 (AEST) Received: by mail-vk1-f179.google.com with SMTP id x187so5268500vkc.1 for ; Fri, 31 Jul 2020 17:53:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=U9dxF/9Fp3B1cpKplFgb6xAGt0+M96uRzaO30AbOmYs=; b=QcxJ0zI9VNf3rBGstsgLZ6dscICm/oAcm2MprK8xCvV/e1B2bt5FZe276i3/mEqAtm /A5wDYI7Gk1eyQfv06iqMm2T8Xktw2HH8IiixuzKHdmtP8zlmGXTSb50HakFapBSBGpz AL9PDNmE39I/wSxPcCJS9Uhs4Xvo03gG9D/Hqki6p55GgmZLHGARNXsr2gOqRSBt79ZX oFckJBhVr4179hnbKK7ltDJ5/Lp+8gJC4+ZWOguEGpgUQQ+4MRbToRp42ZMplH88OOWf e0kUpaaI1ar/u1j0DxY30YeSbVJCZHe7C4IQQt5jSLxkywkLc2R7CuJ63Q5khhwiKfJ1 9yWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=U9dxF/9Fp3B1cpKplFgb6xAGt0+M96uRzaO30AbOmYs=; b=kfAiHEp/3WpL0ZVjZJZ6Py0jjhOiGh84/Rm0RXJvbzh2YiimZ656aCxlaO2ls9V84m gFq4zZkSrVZVkn8wE+ScnAv2CDgSlC+b+x2EPct9MiwTlo9sfK8+u89OKWK/ksVAvNMb TjDlnQWGyHQtDoV8kbA35iWqRnPvPaYV4zjjw3VIJPNBBDxbfwqRrVnEelon/C90nfPb EmkvHAZlxZsmzA2BCymgABDS6w5pmr+F0cUyXdoYk84xI0Z+mqymekb23+iAJzWg9sLL yii1K4croDmaWxmr5BbWtmPP5PKU47kofh1+0EwiruB0c0u1yz04PUFewHYmeVABtO9y z4hA== X-Gm-Message-State: AOAM533se8DHiPxwUJyfueoh8rxfTk1KIVWZwWOJ3un32SsTWJuaG1gw hyohdMQYuJ5mC3bj+IcXSyErQBEc4xmPGqry/Bw= X-Google-Smtp-Source: ABdhPJwA8lFzecHJhAQRdFwa49Frjjzznmi5vLhxdUX+azz4h+Q5YAR4Leg0pv5hpqmR3BvhjFCW+20yii+Mjq2eXkQ= X-Received: by 2002:a1f:2895:: with SMTP id o143mr2851289vko.59.1596243234692; Fri, 31 Jul 2020 17:53:54 -0700 (PDT) MIME-Version: 1.0 References: <6e9ca056-dfb0-376d-effd-e41c9ed3ef2a@gmail.com> <402F66B1-12C1-4FED-89FD-38AB99D919B4@iitbombay.org> In-Reply-To: From: "John P. Linderman" Date: Fri, 31 Jul 2020 20:53:43 -0400 Message-ID: To: Rob Pike Content-Type: multipart/alternative; boundary="000000000000faa0b705abc65688" Subject: Re: [TUHS] Regular Expressions X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Bakul Shah , TUHS main list Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --000000000000faa0b705abc65688 Content-Type: text/plain; charset="UTF-8" Hear hear. And I think it should be possible to analyze regular expressions to determine if "simple" (hence fast) regexps are adequate. "Fancy" re's are rare. At least from those of us where they couldn't be expressed. On Fri, Jul 31, 2020 at 8:38 PM Rob Pike wrote: > I think this link - https://swtch.com/~rsc/regexp/regexp1.html i- s the > best place to start. Superb exposition on the background, theory, and > implementation as well as a bit of history of how the industry lost its way > with regular expressions. > > Regular expressions are beautiful, simple, and widely misunderstood. > > -rob > > > On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah wrote: > >> On Jul 31, 2020, at 3:57 PM, Will Senn wrote: >> > >> > I've always been intrigued with regexes. When I was first exposed to >> them, I was mystified and lost in the greediness of matches. Now, I use >> them regularly, but still have trouble using them. I think it is because I >> don't really understand how they work. >> > ... >> > 1. What's the provenance of regex in unix (when did it appear, in what >> form, etc)? >> > 2. What are the 'best' implementations throughout unix (keep it pre >> 1980s)? >> > 3. What are some of the milestones along the way (major changes, forks, >> disagreements)? >> > 4. Where, in the source, or in a paper, would you point someone to >> wanting to better understand the mechanics of regex? >> >> Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction >> >> [I learned about regular expressions in an automata theory class, >> before I knew anything about Unix. What helped me was learning >> about finite state machines. You won't need more than paper and >> pencil to construct one. Reading source code would make more >> sense once you grasp how to construct a FSM corresponding to a RE.] > > --000000000000faa0b705abc65688 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hea= r hear. And I think it should be possible to analyze regular expressions to= determine=C2=A0if "simple" (hence fast) regexps are adequate. &q= uot;Fancy" re's are rare. At least from those=C2=A0of us where the= y couldn't be expressed.

On Fri, Jul 31, 2020 at 8:38 PM Rob Pike = <robpike@gmail.co= m> wrote:
I think this link -=C2=A0=C2=A0https://swtch.com/~rsc/regex= p/regexp1.html=C2=A0i- s the best place to start. Superb exposition on = the background, theory, and implementation as well as a bit of history of h= ow the industry lost its way with regular expressions.

R= egular expressions are beautiful, simple, and widely misunderstood.

-rob


On Sat, Aug 1, 2020 at 10:03= AM Bakul Shah <bakul@iitbombay.org> wrote:
On Jul 31, 2020, at 3:57 PM, Will Senn <will.senn@gmail.com> wr= ote:
>
> I've always been intrigued with regexes. When I was first exposed = to them, I was mystified and lost in the greediness of matches. Now, I use = them regularly, but still have trouble using them. I think it is because I = don't really understand how they work.
> ...
> 1. What's the provenance of regex in unix (when did it appear, in = what form, etc)?
> 2. What are the 'best' implementations throughout unix (keep i= t pre 1980s)?
> 3. What are some of the milestones along the way (major changes, forks= , disagreements)?
> 4. Where, in the source, or in a paper, would you point someone to wan= ting to better understand the mechanics of regex?

Start here: https://en.wikipedia.org/wiki/Tho= mpson%27s_construction

[I learned about regular expressions in an automata theory class,
=C2=A0before I knew anything about Unix. What helped me was learning
=C2=A0about finite state machines. You won't need more than paper and =C2=A0pencil to construct one. Reading source code would make more
=C2=A0sense once you grasp how to construct a FSM corresponding to a RE.]
--000000000000faa0b705abc65688--