Computer Old Farts Forum
 help / color / mirror / Atom feed
From: Ed Bradford <egbegb2@gmail.com>
To: Larry McVoy <lm@mcvoy.com>
Cc: Grant Taylor <gtaylor@tnetconsulting.net>, COFF <coff@tuhs.org>
Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep.
Date: Mon, 6 Mar 2023 22:01:14 -0600	[thread overview]
Message-ID: <CAHTagfFqfP3eVSgQOgV29O=JJkGdhjiv40pw-LNsvNvORC1XTA@mail.gmail.com> (raw)
In-Reply-To: <20230307014311.GN5398@mcvoy.com>

[-- Attachment #1: Type: text/plain, Size: 2560 bytes --]

I have made an attempt to make my RE stuff readable and supportable. I
think I write more description that I do RE "code". As for, *it won't be
comprehendable,* Machine language
was unreadable and then along came assembly language. Assembly language was
unreadable, then came higher level languages. Even higher level languages
are unsupportable if not well documented and mostly simple to understand
("you are not expected to understand this" notwithstanding). The jump from
machine language to python today
was unimagined in early times.

    [
     As an old timer, I see inflection points
     between:

       machine language and assembly language
       assembly language and high level languages
       and
       high level languages and python.

      But that's just me.
     ]



I think it is possible to make a 50K RE that is understandable. However, it
requires
a lot of 'splainin' throughout the code. I'm naive though; I will
eventually discover
a lack of truth in that belief, if such exists.

I repeat. I put stuff down for months at a time. My metric is *coming back
to it*
*and understanding where I left off*. So far, I can do that for this RE
program that
works for small files, large files,
binary files and text files for exactly one pattern:

    YYYY[-MM-DD]

I constructed this RE with code like this:

# ymdt is YYYY-MM-DD RE in text.

# looking only for 1900s and 2000s years and no later than today.
_YYYY = "(19\d\d|20[01]\d|202" + "[0-" + lastYearRE) + "]" + "){1}"

# months
_MM   = "(0[1-9]|1[012])"

# days
_DD   = "(0[1-9]|[12]\d|3[01])"

ymdt = _YYYY + '[' + _INTERNALSEP +
                     _MM          +
                     _INTERNALSEP +
               ']'{0,1)

For the whole file, RE I used

ymdthf = _FRSEP + ymdt + _BASEP

where FRSEP is front separator which includes
a bunch of possible separators, excluding numbers and letters, or-ed
with the up arrow "beginning of line" RE mark. BASEP is back separator
is same as FRSEP with "^" replaced with "$".

I then aimed ymdthf at "data" the thing that represents
the entire memory mapped file (where there is only one beginning
and one end).

Again, I say validating an RE is as difficult or more than writing one.
What does it miss?

Dates are an excellent test ground for RE's. Latitude and longitude is
another.

Ed

PS: I thought I was on the COFF mailing list. I received this email
by direct mail to from Larry. I haven't seen any other comments
on my submission. I might have unsubscribed, but now I regret it. Dear
powers
that be: Please resubscribe me.

[-- Attachment #2: Type: text/html, Size: 8375 bytes --]

  reply	other threads:[~2023-03-07  4:01 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-02 18:54 [COFF] " Grant Taylor via COFF
2023-03-02 19:23 ` [COFF] " Clem Cole
2023-03-02 19:38   ` Grant Taylor via COFF
2023-03-02 23:01   ` Stuff Received
2023-03-02 23:46     ` Steffen Nurpmeso
2023-03-03  1:08     ` Grant Taylor via COFF
2023-03-03  2:10       ` Dave Horsfall
2023-03-03  3:34         ` Grant Taylor via COFF
2023-03-02 21:53 ` Dan Cross
2023-03-03  1:05   ` Grant Taylor via COFF
2023-03-03  3:04     ` Dan Cross
2023-03-03  3:53       ` Grant Taylor via COFF
2023-03-03 13:47         ` Dan Cross
2023-03-03 19:26           ` Grant Taylor via COFF
2023-03-03 10:59 ` Ralph Corderoy
2023-03-03 13:11   ` Dan Cross
2023-03-03 13:42     ` Ralph Corderoy
2023-03-03 19:19       ` Grant Taylor via COFF
2023-03-04 10:15         ` [COFF] Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.) Ralph Corderoy
2023-03-07 21:49           ` [COFF] " Tomasz Rola
2023-03-07 22:46             ` Tomasz Rola
2023-06-20 16:02           ` Michael Parson
2023-06-20 21:26             ` Tomasz Rola
2023-06-22 15:45               ` Michael Parson
2023-07-10  9:08                 ` [COFF] Re: Reader, paper, tablet, phone (was: Re: Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.)) Tomasz Rola
2023-03-03 16:12   ` [COFF] Re: Requesting thoughts on extended regular expressions in grep Dave Horsfall
2023-03-03 17:13     ` Dan Cross
2023-03-03 17:38       ` Ralph Corderoy
2023-03-03 19:09         ` Dan Cross
2023-03-03 19:36     ` Grant Taylor via COFF
2023-03-04 10:26       ` Ralph Corderoy
2023-03-03 19:06 ` Grant Taylor via COFF
2023-03-03 19:31   ` Dan Cross
2023-03-04 10:07   ` Ralph Corderoy
2023-03-06 10:01 ` Ed Bradford
2023-03-06 21:01   ` Dan Cross
2023-03-06 21:49     ` Steffen Nurpmeso
2023-03-07  1:43     ` Larry McVoy
2023-03-07  4:01       ` Ed Bradford [this message]
2023-03-07 11:39         ` [COFF] " Ralph Corderoy
2023-03-07 18:31           ` [COFF] " Grant Taylor via COFF
2023-03-08 11:22           ` Ed Bradford
2023-03-07 16:14         ` Dan Cross
2023-03-07 17:34           ` [COFF] " Ralph Corderoy
2023-03-07 18:33             ` [COFF] " Dan Cross
2023-03-07  4:19     ` Ed Bradford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHTagfFqfP3eVSgQOgV29O=JJkGdhjiv40pw-LNsvNvORC1XTA@mail.gmail.com' \
    --to=egbegb2@gmail.com \
    --cc=coff@tuhs.org \
    --cc=gtaylor@tnetconsulting.net \
    --cc=lm@mcvoy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).