From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 16636 invoked from network); 1 Aug 2020 04:54:00 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 1 Aug 2020 04:54:00 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 40BDF9CB47; Sat, 1 Aug 2020 14:53:57 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id E2F579C9E3; Sat, 1 Aug 2020 14:53:15 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="uIMBGgp9"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id 683A19C9E3; Sat, 1 Aug 2020 14:53:13 +1000 (AEST) Received: from mail-oo1-f41.google.com (mail-oo1-f41.google.com [209.85.161.41]) by minnie.tuhs.org (Postfix) with ESMTPS id 5287A93DFC for ; Sat, 1 Aug 2020 14:53:12 +1000 (AEST) Received: by mail-oo1-f41.google.com with SMTP id n24so3759163ooc.3 for ; Fri, 31 Jul 2020 21:53:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=gAOjDIMLa2xrVa934hPbN8+gvkcG4jmIka04J+iuBmw=; b=uIMBGgp9oyKH3Myv2uhEKxzCtDiAuWukSo7B6QOdWAfPuaKFbSfEzZH7onwPMhtLsp QSt/5pReRPh6liu9wkUoWlRTMCF1gtuEtFF+ExBmXAVdP0zE+L3WCtQVqXAGx7DAUFXP HQlOur3+ld+zj3rKqYgwMrdqVPylhpWvCGPNor/ytG3upXrDN3O7Wo1RqrX5XOBRakir a61g/G4aBFnoAhR7EqaQZJv93aLiInPmqAJI9BlYVa8UfPclH0pkPEoG1HkY/inza8vG Zxfvx3+jYtmM5IuYEir2AVCdHmfBp+InM1DV4gwP158B5ga1wNQTmpglIo0fctpD0GHF 55og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=gAOjDIMLa2xrVa934hPbN8+gvkcG4jmIka04J+iuBmw=; b=fkgAxOmdbSDgSMLWZu2Sham+SLT1XjXNB9g0H/tfitSM4hgasSgOlNBVviygxOqttp laJXMnxNKXOSEAKJk3yj080+bNz0wtrQ+xslBqvXC+osK2pviz81A+V2uanjCAx7RRWC n2+vkcMiD/MgBFXG1T69CJgDBCwjJFonTpnEZwkYqeq5FG8CRmkidZcdogzv8TyCx5Cn F2297MFaiVFZkt8TkIP8/9f77Yz7sNueVfrremfu2ULeXrhrSvK4J0YNsT5IOz3llbgv xqChJaIJ9uMpRp4sqQtd9vnlr27dUPWDaFOFvzn2ae9YJ6cr8cUjVHMEvGNZzdLAbWjv 2kPg== X-Gm-Message-State: AOAM532gTcmyqSQwmTlmTcbT9qu3ng8WmfCoGZjmIBfl3mKlRiElNLwY rj4jFZacY+qSGsBnb20WEhF6yxNFg7lQGCKmBf0= X-Google-Smtp-Source: ABdhPJxag8kgzBC6zxxPc4BkZRVXVH1S2bfka/sKpBd9yq/lnnfExU1F2hqzCU1zYcaqok/8dXnuEECUsZqrf1Mw468= X-Received: by 2002:a4a:a648:: with SMTP id j8mr4838258oom.36.1596257591303; Fri, 31 Jul 2020 21:53:11 -0700 (PDT) MIME-Version: 1.0 References: <50453bb3-af58-043d-07f5-cedb14e4224b@gmail.com> <763C08F0-0AF5-48B8-988A-3D417CB0E3FA@gmail.com> In-Reply-To: <763C08F0-0AF5-48B8-988A-3D417CB0E3FA@gmail.com> From: ron minnich Date: Fri, 31 Jul 2020 21:53:00 -0700 Message-ID: To: Earl Baugh Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [TUHS] Regular Expressions X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: TUHS main list , Bakul Shah Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" kernighan did this one at a talk at udel in 1976: find all the words in the dictionary that you can see on a calculator looking at it upside down. On Fri, Jul 31, 2020 at 9:32 PM Earl Baugh wrote: > > I have a 30k+ character regex that I used to parse US street addresses fr= om a single line of input into component pieces. That=E2=80=99s the largest= ( and yes it was crazy... but it just grew and grew...and made sense to me= as I decomposed the problem ). That=E2=80=99s the largest one I ever saw= .... and it was the fastest way I could solve the problem.. both mentally a= nd processing wise. > > ( there was some post processing to deal with soundex matching and data b= ase matching of the pieces to confirm the right answer. But it solved the = problem with 95%+ accuracy... that I got into the high 99%=E2=80=99s with t= he post processing. ) > > So that=E2=80=99s at least an example of an arbitrary problem that was so= lved with regex. > > Earl > > > Sent from my iPhone > > On Jul 31, 2020, at 11:08 PM, Will Senn wrote: > > =EF=BB=BF > Oh, and one good google over another, I also found this: > > https://www.drdobbs.com/architecture-and-design/regular-expressions/18441= 0904 > > Which is also great and simple to follow :). > > Will > > On 7/31/20 7:36 PM, Rob Pike wrote: > > I think this link - https://swtch.com/~rsc/regexp/regexp1.html i- s the = best place to start. Superb exposition on the background, theory, and imple= mentation as well as a bit of history of how the industry lost its way with= regular expressions. > > Regular expressions are beautiful, simple, and widely misunderstood. > > -rob > > > On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah wrote: >> >> On Jul 31, 2020, at 3:57 PM, Will Senn wrote: >> > >> > I've always been intrigued with regexes. When I was first exposed to t= hem, I was mystified and lost in the greediness of matches. Now, I use them= regularly, but still have trouble using them. I think it is because I don'= t really understand how they work. >> > ... >> > 1. What's the provenance of regex in unix (when did it appear, in what= form, etc)? >> > 2. What are the 'best' implementations throughout unix (keep it pre 19= 80s)? >> > 3. What are some of the milestones along the way (major changes, forks= , disagreements)? >> > 4. Where, in the source, or in a paper, would you point someone to wan= ting to better understand the mechanics of regex? >> >> Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction >> >> [I learned about regular expressions in an automata theory class, >> before I knew anything about Unix. What helped me was learning >> about finite state machines. You won't need more than paper and >> pencil to construct one. Reading source code would make more >> sense once you grasp how to construct a FSM corresponding to a RE.] > > > > -- > GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF