Computer Old Farts Forum
 help / color / mirror / Atom feed
From: Dan Cross <crossd@gmail.com>
To: Ed Bradford <egbegb2@gmail.com>
Cc: Grant Taylor <gtaylor@tnetconsulting.net>, COFF <coff@tuhs.org>
Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep.
Date: Mon, 6 Mar 2023 16:01:51 -0500	[thread overview]
Message-ID: <CAEoi9W6tZ+55MSPxPoZqfS3k9RO9MOQqB0yu=MO_vzzw0K6Lhw@mail.gmail.com> (raw)
In-Reply-To: <CAHTagfH97hXvW4=pMPYfQuJsVCtGtUvfTgDjUhw2kRe6FOUqTg@mail.gmail.com>

On Mon, Mar 6, 2023 at 5:02 AM Ed Bradford <egbegb2@gmail.com> wrote:
>[snip]
> I would like to extend my program to
> any date format. That would require
> a much bigger RE. I have been led to
> believe that a 50Kbyte or 500Kbyte
> RE works just as well (if not
> as fast) as a 100 byte RE. I think
> with parentheses and
> pipe-symbols suitably used,
> one could match
>
>   Monday, March 6, 2023
>   2023-03-06
>   Mar 6, 2023
>   or
>   ...

This reminds me of something that I wanted to bring up.

Perhaps one _could_ define a sufficiently rich regular expression that
one could match a number of date formats. However, I submit that one
_should not_. REs may be sufficiently powerful, but in all likelihood
what you'll end up with is an unreadable mess; it's like people who
abuse `sed` or whatever to execute complex, general purpose programs:
yeah, it's a clever hack, but that doesn't mean you should do it.

Pick the right tool for the job. REs are a powerful tool, but they're
not the right tool for _every_ job, and I'd argue that once you hit a
threshold of complexity that'll be mostly self-evident, it's time to
move on to something else.

As for large vs small REs.... When we start talking about differences
of orders of magnitude in size, we start talking about real
performance implications; in general an NDFA simulation of a regular
expression will have on the order of the length of the RE in states,
so when the length of the RE is half a million symbols, that's
half-a-million states, which practically speaking is a pretty big
number, even though it's bounded is still a pretty big number, and
even on modern CPUs.

I wouldn't want to poke that bear.

        - Dan C.

  reply	other threads:[~2023-03-06 21:02 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-02 18:54 [COFF] " Grant Taylor via COFF
2023-03-02 19:23 ` [COFF] " Clem Cole
2023-03-02 19:38   ` Grant Taylor via COFF
2023-03-02 23:01   ` Stuff Received
2023-03-02 23:46     ` Steffen Nurpmeso
2023-03-03  1:08     ` Grant Taylor via COFF
2023-03-03  2:10       ` Dave Horsfall
2023-03-03  3:34         ` Grant Taylor via COFF
2023-03-02 21:53 ` Dan Cross
2023-03-03  1:05   ` Grant Taylor via COFF
2023-03-03  3:04     ` Dan Cross
2023-03-03  3:53       ` Grant Taylor via COFF
2023-03-03 13:47         ` Dan Cross
2023-03-03 19:26           ` Grant Taylor via COFF
2023-03-03 10:59 ` Ralph Corderoy
2023-03-03 13:11   ` Dan Cross
2023-03-03 13:42     ` Ralph Corderoy
2023-03-03 19:19       ` Grant Taylor via COFF
2023-03-04 10:15         ` [COFF] Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.) Ralph Corderoy
2023-03-07 21:49           ` [COFF] " Tomasz Rola
2023-03-07 22:46             ` Tomasz Rola
2023-06-20 16:02           ` Michael Parson
2023-06-20 21:26             ` Tomasz Rola
2023-06-22 15:45               ` Michael Parson
2023-07-10  9:08                 ` [COFF] Re: Reader, paper, tablet, phone (was: Re: Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.)) Tomasz Rola
2023-03-03 16:12   ` [COFF] Re: Requesting thoughts on extended regular expressions in grep Dave Horsfall
2023-03-03 17:13     ` Dan Cross
2023-03-03 17:38       ` Ralph Corderoy
2023-03-03 19:09         ` Dan Cross
2023-03-03 19:36     ` Grant Taylor via COFF
2023-03-04 10:26       ` Ralph Corderoy
2023-03-03 19:06 ` Grant Taylor via COFF
2023-03-03 19:31   ` Dan Cross
2023-03-04 10:07   ` Ralph Corderoy
2023-03-06 10:01 ` Ed Bradford
2023-03-06 21:01   ` Dan Cross [this message]
2023-03-06 21:49     ` Steffen Nurpmeso
2023-03-07  1:43     ` Larry McVoy
2023-03-07  4:01       ` Ed Bradford
2023-03-07 11:39         ` [COFF] " Ralph Corderoy
2023-03-07 18:31           ` [COFF] " Grant Taylor via COFF
2023-03-08 11:22           ` Ed Bradford
2023-03-07 16:14         ` Dan Cross
2023-03-07 17:34           ` [COFF] " Ralph Corderoy
2023-03-07 18:33             ` [COFF] " Dan Cross
2023-03-07  4:19     ` Ed Bradford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEoi9W6tZ+55MSPxPoZqfS3k9RO9MOQqB0yu=MO_vzzw0K6Lhw@mail.gmail.com' \
    --to=crossd@gmail.com \
    --cc=coff@tuhs.org \
    --cc=egbegb2@gmail.com \
    --cc=gtaylor@tnetconsulting.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).