From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 3657 invoked from network); 7 Mar 2023 04:20:07 -0000 Received: from minnie.tuhs.org (2600:3c01:e000:146::1) by inbox.vuxu.org with ESMTPUTF8; 7 Mar 2023 04:20:07 -0000 Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 8B5F54122B; Tue, 7 Mar 2023 14:20:05 +1000 (AEST) Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by minnie.tuhs.org (Postfix) with ESMTPS id BF53A40A6D for ; Tue, 7 Mar 2023 14:19:56 +1000 (AEST) Received: by mail-pf1-x430.google.com with SMTP id fa28so7246883pfb.12 for ; Mon, 06 Mar 2023 20:19:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678162796; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=VMb0WBUXObWde4EVu42uTZqAqUhtsiQCW6lthjRbkz8=; b=pdZ7Ydkg/bQUJWH3IpRhLEE1J3MmaYHbioye4fUdBrz+5oXPgIFLE6o+4wz2xv7Qr/ YZkYQcL/7qrRr79HDZj0+1yWfMQmobBOhYr+h8pVQqsnCqDt5v6YeUvIiRevyXJZTnDu VKZQfbuzet9IJSG3PR08U2WLlOWvYRJqfoAKDpgsbMTBSu90REfG4hn0xUqnYjs8yeBW Nr71GDGk3wUpC4SuP2VSjDvacMqSNkDvmwmIeHb/0glGqvGsIg0RJtKi5B6sNTypsVCO R/lxZyRWdYeSwJSwuUKAPlW5J7RyhMvDBeiFmWJaEN9yqpX5eE18ZFhvATmwIqgLgS+L 0DPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678162796; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VMb0WBUXObWde4EVu42uTZqAqUhtsiQCW6lthjRbkz8=; b=7UKNXZ/htS7mbSCYgN1OcFaSrao9fiWkFaU4VDFU8Xs1RQKFv1GWfhwuVpr8mMhBk0 jhrkNdkZj32SZkZuHCyBHPseqqWtw6uqCF3NCdXDa+NSVLYegvlgMQkVoOY6yLFFAJ4K hdab05UPJnipbZvf8BqiUzywB4MmQbSUw5FAkzwcTppCVBd8z2/UWOw9CNMKnvmm+Ld8 EoWscEubTbbW2nMuvZz3YY2T5doRFVwhA9F3anCqNw1tXgSsVVfqY7P/UqyJcfImJ1I1 FTFUAgNZ6ucrsFut2hMKxQGd7/PnIsBUkziaDURujcLtCbIqtYUqk+aynwnjngiaQBmB 79Yw== X-Gm-Message-State: AO0yUKVtyuZNyVnHFBcLgn/YIy+ZFGeTdfxtvOrfjno4YB7pDy5aVJLG 90kDCvV3tyAzWCKov4leMS+6Ma/pvy0s38CXLifslj3eSWw= X-Google-Smtp-Source: AK7set9xk//wIZGHCwCqsFocnAbqUR8Dmq2u4Dhz/eOC/kvgHZqLgil2GHgTBxfSoC+iD/ICJXhZSoDkASi+5RTLazs= X-Received: by 2002:a62:8247:0:b0:5a8:bdd2:f99c with SMTP id w68-20020a628247000000b005a8bdd2f99cmr5578732pfd.1.1678162795491; Mon, 06 Mar 2023 20:19:55 -0800 (PST) MIME-Version: 1.0 References: <8d1de5c8-1f34-3d37-395d-0f1da7b062ec@spamtrap.tnetconsulting.net> In-Reply-To: From: Ed Bradford Date: Mon, 6 Mar 2023 22:19:42 -0600 Message-ID: To: Dan Cross Content-Type: multipart/alternative; boundary="0000000000004cf55705f647b975" Message-ID-Hash: EQHWDP5XMY25BB6INLTBLHVCGERFAZ3W X-Message-ID-Hash: EQHWDP5XMY25BB6INLTBLHVCGERFAZ3W X-MailFrom: egbegb2@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Grant Taylor , COFF X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep. List-Id: Computer Old Farts Forum Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --0000000000004cf55705f647b975 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Dan, It sounds to me like an "optimizer" is needed. There is alreay a compiler that uses FA's. Is someone else going to create a program to look for dates without using regular expressions? Today, I write small-sized RE's. If I write a giant RE, there is nothing preventing the owner of RE world to change how they are used. For instance. Compile your RE and a subroutine/function is produced that performs the RE search. RE is a *language*, not necessarily an implementation. At least that is my understanding. Ed On Mon, Mar 6, 2023 at 3:02=E2=80=AFPM Dan Cross wrote: > On Mon, Mar 6, 2023 at 5:02=E2=80=AFAM Ed Bradford wr= ote: > >[snip] > > I would like to extend my program to > > any date format. That would require > > a much bigger RE. I have been led to > > believe that a 50Kbyte or 500Kbyte > > RE works just as well (if not > > as fast) as a 100 byte RE. I think > > with parentheses and > > pipe-symbols suitably used, > > one could match > > > > Monday, March 6, 2023 > > 2023-03-06 > > Mar 6, 2023 > > or > > ... > > This reminds me of something that I wanted to bring up. > > Perhaps one _could_ define a sufficiently rich regular expression that > one could match a number of date formats. However, I submit that one > _should not_. REs may be sufficiently powerful, but in all likelihood > what you'll end up with is an unreadable mess; it's like people who > abuse `sed` or whatever to execute complex, general purpose programs: > yeah, it's a clever hack, but that doesn't mean you should do it. > > Pick the right tool for the job. REs are a powerful tool, but they're > not the right tool for _every_ job, and I'd argue that once you hit a > threshold of complexity that'll be mostly self-evident, it's time to > move on to something else. > > As for large vs small REs.... When we start talking about differences > of orders of magnitude in size, we start talking about real > performance implications; in general an NDFA simulation of a regular > expression will have on the order of the length of the RE in states, > so when the length of the RE is half a million symbols, that's > half-a-million states, which practically speaking is a pretty big > number, even though it's bounded is still a pretty big number, and > even on modern CPUs. > > I wouldn't want to poke that bear. > > - Dan C. > --=20 Advice is judged by results, not by intentions. Cicero --0000000000004cf55705f647b975 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Dan,

It sounds to me like an "optimizer" i= s needed. There is alreay a compiler
that uses FA's. Is someone else g= oing to create a program
to look for dates without using regular expressio= ns?

Today, I write small-sized RE's. If I write a giant RE, ther= e is nothing preventing
the owner of RE world to change how they are used.= For instance.=C2=A0Compile your RE
and a subroutine/function is produced = that performs the RE search.

RE is a language, not necessari= ly=C2=A0an implementation.
At least that is my understanding.

<= div class=3D"gmail_default" style=3D"font-family:monospace,monospace">
<= /div>
Ed


On Mon, Mar 6, 2023 at 3:02=E2=80=AFPM Dan Cross <crossd@gmail.com> wr= ote:
On Mon, Mar= 6, 2023 at 5:02=E2=80=AFAM Ed Bradford <egbegb2@gmail.com> wrote:
>[snip]
> I would like to extend my program to
> any date format. That would require
> a much bigger RE. I have been led to
> believe that a 50Kbyte or 500Kbyte
> RE works just as well (if not
> as fast) as a 100 byte RE. I think
> with parentheses and
> pipe-symbols suitably used,
> one could match
>
>=C2=A0 =C2=A0Monday, March 6, 2023
>=C2=A0 =C2=A02023-03-06
>=C2=A0 =C2=A0Mar 6, 2023
>=C2=A0 =C2=A0or
>=C2=A0 =C2=A0...

This reminds me of something that I wanted to bring up.

Perhaps one _could_ define a sufficiently rich regular expression that
one could match a number of date formats. However, I submit that one
_should not_. REs may be sufficiently powerful, but in all likelihood
what you'll end up with is an unreadable mess; it's like people who=
abuse `sed` or whatever to execute complex, general purpose programs:
yeah, it's a clever hack, but that doesn't mean you should do it.
Pick the right tool for the job. REs are a powerful tool, but they're not the right tool for _every_ job, and I'd argue that once you hit a threshold of complexity that'll be mostly self-evident, it's time t= o
move on to something else.

As for large vs small REs.... When we start talking about differences
of orders of magnitude in size, we start talking about real
performance implications; in general an NDFA simulation of a regular
expression will have on the order of the length of the RE in states,
so when the length of the RE is half a million symbols, that's
half-a-million states, which practically speaking is a pretty big
number, even though it's bounded is still a pretty big number, and
even on modern CPUs.

I wouldn't want to poke that bear.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 - Dan C.


--
Advice is judged by results, not by intentions.
=C2=A0 Cicero

--0000000000004cf55705f647b975--