Grant - check out Russ Cox's web page on this very subject: Implementing Regular Expressions

On Thu, Mar 2, 2023 at 1:55 PM Grant Taylor via COFF <coff@tuhs.org> wrote:
Hi,

I'd like some thoughts ~> input on extended regular expressions used
with grep, specifically GNU grep -e / egrep.

What are the pros / cons to creating extended regular expressions like
the following:

    ^\w{3}

vs:

    ^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)

Or:

    [ :[:digit:]]{11}

vs:

    ( 1| 2| 3| 4| 5| 6| 7| 8|
9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31)
(0|1|2)[[:digit:]]:(0|1|2|3|4|5)[[:digit:]]:(0|1|2|3|4|5)[[:digit:]]

I'm currently eliding the 61st (60) second, the 32nd day, and dealing
with February having fewer days for simplicity.

For matching patterns like the following in log files?

    Mar  2 03:23:38

I'm working on organically training logcheck to match known good log
entries.  So I'm *DEEP* in the bowels of extended regular expressions
(GNU egrep) that runs over all logs hourly.  As such, I'm interested in
making sure that my REs are both efficient and accurate or at least not
WILDLY badly structured.  The pedantic part of me wants to avoid
wildcard type matches (\w), even if they are bounded (\w{3}), unless it
truly is for unpredictable text.

I'd appreciate any feedback and recommendations from people who have
been using and / or optimizing (extended) regular expressions for longer
than I have been using them.

Thank you for your time and input.



--
Grant. . . .
unix || die