From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 19845 invoked from network); 1 Aug 2020 00:37:55 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 1 Aug 2020 00:37:55 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 5D4E19CB50; Sat, 1 Aug 2020 10:37:54 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 538FA9CAA7; Sat, 1 Aug 2020 10:37:05 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="T0NqAm8J"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id 7B4219CAA7; Sat, 1 Aug 2020 10:37:01 +1000 (AEST) Received: from mail-vs1-f48.google.com (mail-vs1-f48.google.com [209.85.217.48]) by minnie.tuhs.org (Postfix) with ESMTPS id 537ED9C9E3 for ; Sat, 1 Aug 2020 10:37:00 +1000 (AEST) Received: by mail-vs1-f48.google.com with SMTP id q13so9682179vsn.9 for ; Fri, 31 Jul 2020 17:37:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=EWnZQm9Nkimso7W1BWa8JcRFsc2zsHdRHwXxzCBh7gE=; b=T0NqAm8Jw+qhB0Ucu2mQODic6bF4ScAVULtzWpkr2t74Co0+XFW3lWXf/Dty1q7R5o urQONwB//vAvZjG7Erjt7AJvqO88OQDBRiodnGchCUskuwsZMBDTIp27iBvmGS7roJaF 1pOAVIhVbQ9M3U8gmjd2gH7Qru37SjOsqCom81egP9nHd+mRN+JwqDRxgBs+8nxWaeDa +fmuqcnTbsxGsws8mKe+wuVqzmvP4xuIbHgAXsySEc6UOfvHY4ekd0CCYoHKrZKekkH3 2GCn+wnF6NL4wZI4+AOQ0lQPT4PtCuI9ISUum/0nFpBnFp70ow6GnrhjSL7EO+6UBj/y ZQKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EWnZQm9Nkimso7W1BWa8JcRFsc2zsHdRHwXxzCBh7gE=; b=DU/JLMFS5ZxIXa7JXU1yMOMO/3B1BgHsU7KBmDEEI6GuiohiubSDedYPUPPStKn+dZ lXEHXFcPkVdMyp+vWdiymAvohs4oBPvI5m8DZc4iUvmN7tj61Yin6XYDWdJSJ+jDQjl5 Ttjm3urw2wjssjwSbSOTUMpSXXm1lBbysnP2MkJp+p6M/1X+LC3sD2+hCMQ8NeOHPDPE zi/8N29TVutNX8DdYyl7e0ZyixpgmWaFkzOHUnTl0Z6H5sy3JOyTxEWsz3VlvNdpqNqM bit8jXtRZrCwU5g0FJJU6bYnjxCWsw+BL9SrcUeJUeVQ3MCrHcyFdtsuxflU5nZK0v+w 6t4Q== X-Gm-Message-State: AOAM531AI9P2a/71SP7wxpLEMDgBgL6j2yf46fH4ppKLebqlmbikZTgC WyCBB/+rWuQRSCknKAbYMoCTE+Fu8rB34GI0PvM= X-Google-Smtp-Source: ABdhPJxPuJ5p0XXYWFkDY80DqrI9r63sF+lgVBKFxniD8p2Nvw4zvl6l5xBzoU5jJkXCpuqWuRLTk5UIR1SDmfyNBVI= X-Received: by 2002:a67:c815:: with SMTP id u21mr5228570vsk.170.1596242219290; Fri, 31 Jul 2020 17:36:59 -0700 (PDT) MIME-Version: 1.0 References: <6e9ca056-dfb0-376d-effd-e41c9ed3ef2a@gmail.com> <402F66B1-12C1-4FED-89FD-38AB99D919B4@iitbombay.org> In-Reply-To: <402F66B1-12C1-4FED-89FD-38AB99D919B4@iitbombay.org> From: Rob Pike Date: Sat, 1 Aug 2020 10:36:48 +1000 Message-ID: To: Bakul Shah Content-Type: multipart/alternative; boundary="00000000000074d4f805abc61af5" Subject: Re: [TUHS] Regular Expressions X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: TUHS main list Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --00000000000074d4f805abc61af5 Content-Type: text/plain; charset="UTF-8" I think this link - https://swtch.com/~rsc/regexp/regexp1.html i- s the best place to start. Superb exposition on the background, theory, and implementation as well as a bit of history of how the industry lost its way with regular expressions. Regular expressions are beautiful, simple, and widely misunderstood. -rob On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah wrote: > On Jul 31, 2020, at 3:57 PM, Will Senn wrote: > > > > I've always been intrigued with regexes. When I was first exposed to > them, I was mystified and lost in the greediness of matches. Now, I use > them regularly, but still have trouble using them. I think it is because I > don't really understand how they work. > > ... > > 1. What's the provenance of regex in unix (when did it appear, in what > form, etc)? > > 2. What are the 'best' implementations throughout unix (keep it pre > 1980s)? > > 3. What are some of the milestones along the way (major changes, forks, > disagreements)? > > 4. Where, in the source, or in a paper, would you point someone to > wanting to better understand the mechanics of regex? > > Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction > > [I learned about regular expressions in an automata theory class, > before I knew anything about Unix. What helped me was learning > about finite state machines. You won't need more than paper and > pencil to construct one. Reading source code would make more > sense once you grasp how to construct a FSM corresponding to a RE.] --00000000000074d4f805abc61af5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I think this link -=C2=A0=C2=A0https://swtch.com/~rsc/regexp/regexp1.html= =C2=A0i- s the best place to start. Superb exposition on the background, th= eory, and implementation as well as a bit of history of how the industry lo= st its way with regular expressions.

Regular expressions= are beautiful, simple, and widely misunderstood.

-r= ob


On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah &l= t;bakul@iitbombay.org> wrote:=
On Jul 31, 2020= , at 3:57 PM, Will Senn <will.senn@gmail.com> wrote:
>
> I've always been intrigued with regexes. When I was first exposed = to them, I was mystified and lost in the greediness of matches. Now, I use = them regularly, but still have trouble using them. I think it is because I = don't really understand how they work.
> ...
> 1. What's the provenance of regex in unix (when did it appear, in = what form, etc)?
> 2. What are the 'best' implementations throughout unix (keep i= t pre 1980s)?
> 3. What are some of the milestones along the way (major changes, forks= , disagreements)?
> 4. Where, in the source, or in a paper, would you point someone to wan= ting to better understand the mechanics of regex?

Start here: https://en.wikipedia.org/wiki/Tho= mpson%27s_construction

[I learned about regular expressions in an automata theory class,
=C2=A0before I knew anything about Unix. What helped me was learning
=C2=A0about finite state machines. You won't need more than paper and =C2=A0pencil to construct one. Reading source code would make more
=C2=A0sense once you grasp how to construct a FSM corresponding to a RE.]
--00000000000074d4f805abc61af5--