From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 14318 invoked from network); 1 Aug 2020 04:31:51 -0000 Received: from minnie.tuhs.org (45.79.103.53) by inbox.vuxu.org with ESMTPUTF8; 1 Aug 2020 04:31:51 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 3DE219C9FB; Sat, 1 Aug 2020 14:31:49 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id D61989C9E3; Sat, 1 Aug 2020 14:31:09 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="d3Nd6hzp"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id 5F5079C9FB; Sat, 1 Aug 2020 14:31:06 +1000 (AEST) Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) by minnie.tuhs.org (Postfix) with ESMTPS id A6E139C9E3 for ; Sat, 1 Aug 2020 14:31:02 +1000 (AEST) Received: by mail-qk1-f177.google.com with SMTP id 11so30850043qkn.2 for ; Fri, 31 Jul 2020 21:31:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=QDS4G5aq0GR0w+Mb5D0/6KlzKwT3l7XgWNCH02e0Yk0=; b=d3Nd6hzpAantQu3d0NZhXiU6IB1orVbv00xQX6XcgywN+46njUjZg3b72XZUOBnAHf a8rsF33iYU5ImpLE6qzBs7Qem5hMOPisMGVSy00OivHohsux3a3bmMgfopkRoZyrWbmQ 5oRzZbHuec1W8zmP88nUHbf2i1uyiGmeXlefIbhWK3hac2EadfonSUFmch+2NvyWBLug QvOEKQ5ccTGXxyRffBx/QKpfLC54TUYTGQEG3UjyrKdHzyX0K11ErtRPSNnjDNJWMmJ+ /nYQGQQc05jYlFwl4TbcUxCsDAoKMo5d08wAsvjzYxMJHYfTzRpv/IiwO/CYw9a5Nwun InMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=QDS4G5aq0GR0w+Mb5D0/6KlzKwT3l7XgWNCH02e0Yk0=; b=Dv2azEypMSPTQ0uU3TYWL2jjf/ybSMQl9/jlVYQhOX85klKrEuQqppDlIbaKFEPL3Z WTjOuZrj0rArhxW4rwSmLmJmsyHUAccVx4Y3Jme65+AR2eAPywGYl+lajs1i2c7aMAM9 /cTjVOWrYAKwLWmH2g1YAPJAG3GYKhZSbiZ8CYAa4wO9YTTsgVOwWFKFRi6SajHBsD/V oef6/9XAX/VwKrgvqVfXOp1IbosNQoERsHBAXFAUVdFrpwVG8Er2tCYkC8ORDLSvmQC/ Ds9nDRntNE80SrAdVopBS960E1ivrm+ycA8yia18VJNZdZinHEIQnpEfKUF8jJC/kCK2 WqGg== X-Gm-Message-State: AOAM530hlGTHuGEeXZT1HlN7EPDtU+eh8i6i7ieJrYNwOFV+yrXr30Yd IQnNU9BejFbR9JmbN/ps6Z1NBK52Jfk= X-Google-Smtp-Source: ABdhPJyb9hjSlE0qxv5zURsrDrn2CtZ+J4kmjqDaDxBb7rqlXPfbX3o+wfN7btWiOQ5QpowRUvoPPA== X-Received: by 2002:a37:9ccf:: with SMTP id f198mr7385511qke.168.1596256261648; Fri, 31 Jul 2020 21:31:01 -0700 (PDT) Received: from ?IPv6:2600:1700:1960:6f09:3493:fec4:7a42:7b46? ([2600:1700:1960:6f09:3493:fec4:7a42:7b46]) by smtp.gmail.com with ESMTPSA id u21sm11033334qkk.1.2020.07.31.21.31.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 31 Jul 2020 21:31:01 -0700 (PDT) Content-Type: multipart/alternative; boundary=Apple-Mail-EE3F70D4-1613-4377-BDF6-2C5E3BB8969B Content-Transfer-Encoding: 7bit From: Earl Baugh Mime-Version: 1.0 (1.0) Date: Sat, 1 Aug 2020 00:31:00 -0400 Message-Id: <763C08F0-0AF5-48B8-988A-3D417CB0E3FA@gmail.com> References: <50453bb3-af58-043d-07f5-cedb14e4224b@gmail.com> In-Reply-To: <50453bb3-af58-043d-07f5-cedb14e4224b@gmail.com> To: Will Senn X-Mailer: iPhone Mail (17F80) Subject: Re: [TUHS] Regular Expressions X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: TUHS main list , Bakul Shah Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --Apple-Mail-EE3F70D4-1613-4377-BDF6-2C5E3BB8969B Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable I have a 30k+ character regex that I used to parse US street addresses from a= single line of input into component pieces. That=E2=80=99s the largest ( an= d yes it was crazy... but it just grew and grew...and made sense to me as I d= ecomposed the problem ). That=E2=80=99s the largest one I ever saw.... and= it was the fastest way I could solve the problem.. both mentally and proces= sing wise. =20 ( there was some post processing to deal with soundex matching and data base= matching of the pieces to confirm the right answer. But it solved the prob= lem with 95%+ accuracy... that I got into the high 99%=E2=80=99s with the po= st processing. ) So that=E2=80=99s at least an example of an arbitrary problem that was solve= d with regex.=20 Earl=20 Sent from my iPhone > On Jul 31, 2020, at 11:08 PM, Will Senn wrote: >=20 > =EF=BB=BF > Oh, and one good google over another, I also found this: >=20 > https://www.drdobbs.com/architecture-and-design/regular-expressions/184410= 904 >=20 > Which is also great and simple to follow :). >=20 > Will >=20 >> On 7/31/20 7:36 PM, Rob Pike wrote: >> I think this link - https://swtch.com/~rsc/regexp/regexp1.html i- s the b= est place to start. Superb exposition on the background, theory, and impleme= ntation as well as a bit of history of how the industry lost its way with re= gular expressions. >>=20 >> Regular expressions are beautiful, simple, and widely misunderstood. >>=20 >> -rob >>=20 >>=20 >> On Sat, Aug 1, 2020 at 10:03 AM Bakul Shah wrote: >>> On Jul 31, 2020, at 3:57 PM, Will Senn wrote: >>> >=20 >>> > I've always been intrigued with regexes. When I was first exposed to t= hem, I was mystified and lost in the greediness of matches. Now, I use them r= egularly, but still have trouble using them. I think it is because I don't r= eally understand how they work. >>> > ... >>> > 1. What's the provenance of regex in unix (when did it appear, in what= form, etc)? >>> > 2. What are the 'best' implementations throughout unix (keep it pre 19= 80s)? >>> > 3. What are some of the milestones along the way (major changes, forks= , disagreements)? >>> > 4. Where, in the source, or in a paper, would you point someone to wan= ting to better understand the mechanics of regex? >>>=20 >>> Start here: https://en.wikipedia.org/wiki/Thompson%27s_construction >>>=20 >>> [I learned about regular expressions in an automata theory class,=20 >>> before I knew anything about Unix. What helped me was learning >>> about finite state machines. You won't need more than paper and >>> pencil to construct one. Reading source code would make more >>> sense once you grasp how to construct a FSM corresponding to a RE.] >=20 >=20 > --=20 > GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF --Apple-Mail-EE3F70D4-1613-4377-BDF6-2C5E3BB8969B Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable I have a 30k+ character regex that I used t= o parse US street addresses from a single line of input into component piece= s. That=E2=80=99s the largest ( and yes it was crazy... but it just grew and= grew...and made sense to me as I decomposed the problem ).   That=E2=80= =99s the largest one I ever saw.... and it was the fastest way I could solve= the problem.. both mentally and processing wise.  

= ( there was some post processing to deal with soundex matching and data base= matching of the pieces to confirm the right answer.  But it solved the= problem with 95%+ accuracy... that I got into the high 99%=E2=80=99s with t= he post processing. )

So that=E2=80=99s at least an= example of an arbitrary problem that was solved with regex. 

Earl 


Sent from my i= Phone

On Jul 31, 2020, a= t 11:08 PM, Will Senn <will.senn@gmail.com> wrote:

=EF=BB=BF =20 =20 =20
Oh, and one good google over another, I also found this:

https://www.drdobbs.com/archit= ecture-and-design/regular-expressions/184410904

Which is also great and simple to follow :).

Will

On 7/31/20 7:36 PM, Rob Pike wrote:
I think this link -  https://swtch.com/= ~rsc/regexp/regexp1.html i- s the best place to start. Superb exposition on the background, theory, and implementation as well as a bit of history of how the industry lost its way with regular expressions.

Regular expressions are beautiful, simple, and widely misunderstood.

-rob




--=20
GPG Fingerprint: 68F4 B3BD 1730 555A 4462  7D45 3EAA 5B6D A982 BAAF
=20
= --Apple-Mail-EE3F70D4-1613-4377-BDF6-2C5E3BB8969B--