From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HTML_FONT_LOW_CONTRAST,HTML_IMAGE_ONLY_28,HTML_MESSAGE, MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 27803 invoked from network); 2 Mar 2023 19:24:01 -0000 Received: from minnie.tuhs.org (2600:3c01:e000:146::1) by inbox.vuxu.org with ESMTPUTF8; 2 Mar 2023 19:24:01 -0000 Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id A994B43BFC; Fri, 3 Mar 2023 05:23:59 +1000 (AEST) Received: from mail-ua1-x92c.google.com (mail-ua1-x92c.google.com [IPv6:2607:f8b0:4864:20::92c]) by minnie.tuhs.org (Postfix) with ESMTPS id 8EED443788 for ; Fri, 3 Mar 2023 05:23:52 +1000 (AEST) Received: by mail-ua1-x92c.google.com with SMTP id x40so78878uaf.2 for ; Thu, 02 Mar 2023 11:23:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ccc.com; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4XiiOSXGc0hhBV1nV5AVRejnYjXZx3fCDOeNzPwMoSs=; b=hOP4Ru5mS6aowq4Enlw34igdPyM8uo/ELuBRlVLrB8/HvHYfxV2in2cDz96uxSXASI ZLhy2EHQ7Zyg4eTrRm+L4v2QZTgfeIvxMBn+3ipTAmxrAYYMoekiHauEf2RZHqBiZXJA JWgtOQSWCC8+g1keS5OMCwEfZ7Vk+7Q3jECq8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4XiiOSXGc0hhBV1nV5AVRejnYjXZx3fCDOeNzPwMoSs=; b=h44NDZfDzW/Aa0mzh7cGXuEcLJRGSosxyJrTMeLQkYIsPnUcpsDKil6xXn9r4fORjE xCU92SYX4Dugzc1r/nIf1q0JwAd3PIA7UfjV6xUW1KoiAPowe0j6NHlzmcL9XjT/t9FK kHcIm9fQMkjjcntmV2JtKFicM9WsOjSO4IE/b0dj5kwn81y71MlXiHexzv2EydgLKS2q X4idpIqfua98c/IWOrRdYPehhWJMPgFBaYzwe5xtRxtF4bGDgAzWUXXGYR9Rv6Y9VdpS Q6TfvT66o+y3s/3OvgvvMzmJ8wPl8Mo2SY5FJa8hBwzv7XOSFZIdJJYQ5E73gU0a7j91 VNQQ== X-Gm-Message-State: AO0yUKXHTgog0g6V5yGL8o3fwzvT/S8uxxhlsGIkYL7hr+lzFUVVr904 OaSyyfigFlUUHA0co7lJ4X8KrISsD/TzoM2ROih+uA== X-Google-Smtp-Source: AK7set9rTnCKhyqzTd4ZgdjZxS8jRQdgL2eN7TQfNjcwd0kR5nkiJjaUD3QipoZDcFn+LpvPkAcdrule1HD42zBvPBg= X-Received: by 2002:a9f:37ab:0:b0:67a:2833:5ceb with SMTP id q40-20020a9f37ab000000b0067a28335cebmr1927527uaq.0.1677785031355; Thu, 02 Mar 2023 11:23:51 -0800 (PST) MIME-Version: 1.0 References: <8d1de5c8-1f34-3d37-395d-0f1da7b062ec@spamtrap.tnetconsulting.net> In-Reply-To: <8d1de5c8-1f34-3d37-395d-0f1da7b062ec@spamtrap.tnetconsulting.net> From: Clem Cole Date: Thu, 2 Mar 2023 14:23:25 -0500 Message-ID: To: Grant Taylor Content-Type: multipart/alternative; boundary="000000000000cdd18705f5efc4ca" Message-ID-Hash: GZPXSQEDYMM7NVWMJZLXQ5FX7DNB3TTK X-Message-ID-Hash: GZPXSQEDYMM7NVWMJZLXQ5FX7DNB3TTK X-MailFrom: clemc@ccc.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: COFF X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep. List-Id: Computer Old Farts Forum Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --000000000000cdd18705f5efc4ca Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Grant - check out Russ Cox's web page on this very subject: Implementing Regular Expressions =E1=90=A7 On Thu, Mar 2, 2023 at 1:55=E2=80=AFPM Grant Taylor via COFF wrote: > Hi, > > I'd like some thoughts ~> input on extended regular expressions used > with grep, specifically GNU grep -e / egrep. > > What are the pros / cons to creating extended regular expressions like > the following: > > ^\w{3} > > vs: > > ^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) > > Or: > > [ :[:digit:]]{11} > > vs: > > ( 1| 2| 3| 4| 5| 6| 7| 8| > 9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31) > (0|1|2)[[:digit:]]:(0|1|2|3|4|5)[[:digit:]]:(0|1|2|3|4|5)[[:digit:]] > > I'm currently eliding the 61st (60) second, the 32nd day, and dealing > with February having fewer days for simplicity. > > For matching patterns like the following in log files? > > Mar 2 03:23:38 > > I'm working on organically training logcheck to match known good log > entries. So I'm *DEEP* in the bowels of extended regular expressions > (GNU egrep) that runs over all logs hourly. As such, I'm interested in > making sure that my REs are both efficient and accurate or at least not > WILDLY badly structured. The pedantic part of me wants to avoid > wildcard type matches (\w), even if they are bounded (\w{3}), unless it > truly is for unpredictable text. > > I'd appreciate any feedback and recommendations from people who have > been using and / or optimizing (extended) regular expressions for longer > than I have been using them. > > Thank you for your time and input. > > > > -- > Grant. . . . > unix || die > > --000000000000cdd18705f5efc4ca Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Grant - check out Russ Cox's web page on this very = subject:=C2=A0Implementing Regular Expressio= ns
<= img alt=3D"" style=3D"width:0px;max-height:0px;overflow:hidden" src=3D"http= s://mailfoogae.appspot.com/t?sender=3DaY2xlbWNAY2NjLmNvbQ%3D%3D&type=3D= zerocontent&guid=3D06995b37-76b6-4506-be95-cceee4aef424">=E1=90=A7

<= div dir=3D"ltr" class=3D"gmail_attr">On Thu, Mar 2, 2023 at 1:55=E2=80=AFPM= Grant Taylor via COFF <coff@tuhs.org> wrote:
Hi= ,

I'd like some thoughts ~> input on extended regular expressions used=
with grep, specifically GNU grep -e / egrep.

What are the pros / cons to creating extended regular expressions like
the following:

=C2=A0 =C2=A0 ^\w{3}

vs:

=C2=A0 =C2=A0 ^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)

Or:

=C2=A0 =C2=A0 [ :[:digit:]]{11}

vs:

=C2=A0 =C2=A0 ( 1| 2| 3| 4| 5| 6| 7| 8|
9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31)
(0|1|2)[[:digit:]]:(0|1|2|3|4|5)[[:digit:]]:(0|1|2|3|4|5)[[:digit:]]

I'm currently eliding the 61st (60) second, the 32nd day, and dealing <= br> with February having fewer days for simplicity.

For matching patterns like the following in log files?

=C2=A0 =C2=A0 Mar=C2=A0 2 03:23:38

I'm working on organically training logcheck to match known good log entries.=C2=A0 So I'm *DEEP* in the bowels of extended regular expressi= ons
(GNU egrep) that runs over all logs hourly.=C2=A0 As such, I'm interest= ed in
making sure that my REs are both efficient and accurate or at least not WILDLY badly structured.=C2=A0 The pedantic part of me wants to avoid
wildcard type matches (\w), even if they are bounded (\w{3}), unless it truly is for unpredictable text.

I'd appreciate any feedback and recommendations from people who have been using and / or optimizing (extended) regular expressions for longer than I have been using them.

Thank you for your time and input.



--
Grant. . . .
unix || die

--000000000000cdd18705f5efc4ca--