Computer Old Farts Forum
 help / color / mirror / Atom feed
From: Dan Cross <crossd@gmail.com>
To: Ed Bradford <egbegb2@gmail.com>
Cc: Grant Taylor <gtaylor@tnetconsulting.net>, COFF <coff@tuhs.org>
Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep.
Date: Tue, 7 Mar 2023 11:14:49 -0500	[thread overview]
Message-ID: <CAEoi9W6sLfpoGo8pGF_1RYA62Pi8aCWdyo53vNKqdtxyFKTGpA@mail.gmail.com> (raw)
In-Reply-To: <CAHTagfFqfP3eVSgQOgV29O=JJkGdhjiv40pw-LNsvNvORC1XTA@mail.gmail.com>

On Mon, Mar 6, 2023 at 11:01 PM Ed Bradford <egbegb2@gmail.com> wrote:
>[snip]
> I think it is possible to make a 50K RE that is understandable. However, it requires
> a lot of 'splainin' throughout the code. I'm naive though; I will eventually discover
> a lack of truth in that belief, if such exists.

Actually, I believe you. I'm sure that with enough effort, it _is_
possible to make a 50K RE that is understandable to mere mortals. But
it begs the question: why bother? The answer to that question, in my
mind, shows the difference between a clever programmer and a pragmatic
engineer. I submit that it's time to reach for another tool well
before you get to an RE that big, and if one is still considering such
a thing, one must really ask what properties of REs and the problem at
hand one thinks lend itself to that as the solution.

>[snip]
> It sounds to me like an "optimizer" is needed. There is alreay a compiler
> that uses FA's.

I'm not sure what you're referring to here, though you were replying
to me. There are a couple of different threads floating around:

1. Writing really big regular expressions: this is probably a bad
idea. Don't do it (see below).
2. Writing a recognizer for dates. Yeah, the small REs you have for
that are fine. If you want to extend those to arbitrary date formats,
I think you'll find it starts getting ugly.
3. Optimizing regular expressions. You're still bound by the known
theoretical properties of finite automata here.

> Is someone else going to create a program
> to look for dates without using regular expressions?

Many people have already done so. :-)

> Today, I write small-sized RE's. If I write a giant RE, there is nothing preventing
> the owner of RE world to change how they are used. For instance. Compile your RE
> and a subroutine/function is produced that performs the RE search.

I'm not sure I understand what you mean.

The theory here is well-understood: we know recognizers for regular
languages can be built from DFAs, that run in time linear in the size
of their inputs, but we also know that constructing such a DFA can be
exponential in space and time, and thus impractical for many REs.

We know that NDFA simulators can be built in time and space linear in
the length of the RE, but that the resulting recognizers will be
superlinear at runtime, proportional to the product of the length of
input, number of states, and number edges between states in the state
transition graph. For a very large regular expression, that's going to
be a pretty big number, and even on modern CPUs won't be particularly
fast. Compilation to native code won't really help you.

There is no "owner of RE world" that can change that. If you can find
some way to do so, I think that would qualify as a major breakthrough
in computer science.

> RE is a language, not necessarily an implementation.
> At least that is my understanding.

Regular expressions describe regular languages, but as I mentioned
above, the theory gives the currently understood bounds for their
performance characteristics. It's kinda like the speed of light in
this regard; we can't really make it go faster.

        - Dan C.

  parent reply	other threads:[~2023-03-07 16:15 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-02 18:54 [COFF] " Grant Taylor via COFF
2023-03-02 19:23 ` [COFF] " Clem Cole
2023-03-02 19:38   ` Grant Taylor via COFF
2023-03-02 23:01   ` Stuff Received
2023-03-02 23:46     ` Steffen Nurpmeso
2023-03-03  1:08     ` Grant Taylor via COFF
2023-03-03  2:10       ` Dave Horsfall
2023-03-03  3:34         ` Grant Taylor via COFF
2023-03-02 21:53 ` Dan Cross
2023-03-03  1:05   ` Grant Taylor via COFF
2023-03-03  3:04     ` Dan Cross
2023-03-03  3:53       ` Grant Taylor via COFF
2023-03-03 13:47         ` Dan Cross
2023-03-03 19:26           ` Grant Taylor via COFF
2023-03-03 10:59 ` Ralph Corderoy
2023-03-03 13:11   ` Dan Cross
2023-03-03 13:42     ` Ralph Corderoy
2023-03-03 19:19       ` Grant Taylor via COFF
2023-03-04 10:15         ` [COFF] Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.) Ralph Corderoy
2023-03-07 21:49           ` [COFF] " Tomasz Rola
2023-03-07 22:46             ` Tomasz Rola
2023-06-20 16:02           ` Michael Parson
2023-06-20 21:26             ` Tomasz Rola
2023-06-22 15:45               ` Michael Parson
2023-07-10  9:08                 ` [COFF] Re: Reader, paper, tablet, phone (was: Re: Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.)) Tomasz Rola
2023-03-03 16:12   ` [COFF] Re: Requesting thoughts on extended regular expressions in grep Dave Horsfall
2023-03-03 17:13     ` Dan Cross
2023-03-03 17:38       ` Ralph Corderoy
2023-03-03 19:09         ` Dan Cross
2023-03-03 19:36     ` Grant Taylor via COFF
2023-03-04 10:26       ` Ralph Corderoy
2023-03-03 19:06 ` Grant Taylor via COFF
2023-03-03 19:31   ` Dan Cross
2023-03-04 10:07   ` Ralph Corderoy
2023-03-06 10:01 ` Ed Bradford
2023-03-06 21:01   ` Dan Cross
2023-03-06 21:49     ` Steffen Nurpmeso
2023-03-07  1:43     ` Larry McVoy
2023-03-07  4:01       ` Ed Bradford
2023-03-07 11:39         ` [COFF] " Ralph Corderoy
2023-03-07 18:31           ` [COFF] " Grant Taylor via COFF
2023-03-08 11:22           ` Ed Bradford
2023-03-07 16:14         ` Dan Cross [this message]
2023-03-07 17:34           ` [COFF] " Ralph Corderoy
2023-03-07 18:33             ` [COFF] " Dan Cross
2023-03-07  4:19     ` Ed Bradford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEoi9W6sLfpoGo8pGF_1RYA62Pi8aCWdyo53vNKqdtxyFKTGpA@mail.gmail.com \
    --to=crossd@gmail.com \
    --cc=coff@tuhs.org \
    --cc=egbegb2@gmail.com \
    --cc=gtaylor@tnetconsulting.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).