Computer Old Farts Forum
 help / color / mirror / Atom feed
From: Grant Taylor via COFF <coff@tuhs.org>
To: coff@tuhs.org
Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep.
Date: Tue, 7 Mar 2023 11:31:55 -0700	[thread overview]
Message-ID: <ef8945e4-c25c-eed5-2480-78f18d9bc75a@spamtrap.tnetconsulting.net> (raw)
In-Reply-To: <20230307113949.501602135B@orac.inputplus.co.uk>

[-- Attachment #1: Type: text/plain, Size: 4934 bytes --]

On 3/7/23 4:39 AM, Ralph Corderoy wrote:
> Readable to you, which is fine because you're the prime future 
> reader.  But it's less readable than the regexp to those that know 
> and read them because of the indirection introduced by the variables. 
> You've created your own little language of CAPITALS rather than the 
> lingua franca of regexps.  :-)

I want to agree, but then I run into things like this:

    ^\w{3} [ :[:digit:]]{11} [._[:alnum:]-]+ 
postfix(/smtps)?/smtpd\[[[:digit:]]+\]: disconnect from 
[._[:alnum:]-]+\[[.:[:xdigit:]]+\]( helo=[[:digit:]]+(/[[:digit:]]+)?)?( 
ehlo=[[:digit:]]+(/[[:digit:]]+)?)?( 
starttls=[[:digit:]]+(/[[:digit:]]+)?)?( 
auth=[[:digit:]]+(/[[:digit:]]+)?)?( 
mail=[[:digit:]]+(/[[:digit:]]+)?)?( 
rcpt=[[:digit:]]+(/[[:digit:]]+)?)?( 
data=[[:digit:]]+(/[[:digit:]]+)?)?( 
bdat=[[:digit:]]+(/[[:digit:]]+)?)?( 
rset=[[:digit:]]+(/[[:digit:]]+)?)?( 
noop=[[:digit:]]+(/[[:digit:]]+)?)?( 
quit=[[:digit:]]+(/[[:digit:]]+)?)?( 
unknown=[[:digit:]]+(/[[:digit:]]+)?)?( 
commands=[[:digit:]]+(/[[:digit:]]+)?)?$

Which is produced by this m4:

    define(`DAEMONPID', `$1\[DIGITS\]:')dnl
    define(`DATE', `\w{3} [ :[:digit:]]{11}')dnl
    define(`DIGIT', `[[:digit:]]')dnl
    define(`DIGITS', `DIGIT+')dnl
    define(`HOST', `[._[:alnum:]-]+')dnl
    define(`HOSTIP', `HOST\[IP\]')dnl
    define(`IP', `[.:[:xdigit:]]+')dnl
    define(`VERB', `( $1=DIGITS`'(/DIGITS)?)?')dnl
    ^DATE HOST DAEMONPID(`postfix(/smtps)?/smtpd') disconnect from 
 
HOSTIP`'VERB(`helo')VERB(`ehlo')VERB(`starttls')VERB(`auth')VERB(`mail')VERB(`rcpt')VERB(`data')VERB(`bdat')VERB(`rset')VERB(`noop')VERB(`quit')VERB(`unknown')VERB(`commands')$

I only consider myself to be an /adequate/ m4 user.  Though I've done 
some things that are arguably creating new languages.

I personally find the generated regular expression to be onerous to read 
and understand, much less modify.  I would be highly dependent on my 
editor's (vim's) parenthesis / square bracket matching (%) capability 
and / or would need to explode the RE into multiple components on 
multiple lines to have a hope of accurately understanding or modifying it.

Conversely I think that the m4 is /largely/ find and replace with a 
little syntactic sugar around the definitions.

I also think that anyone that does understand regular expressions and 
the concept of find & replace is likely to be able to both recognize 
patterns -- as in "VERB(...)" corresponds to "( 
$1=DIGITS`'(/DIGITS)?)?", that "DIGITS" corresponds to "DIGIT+", and 
that "DIGIT" corresponds to "[[:digit:]]".

There seems to be a point between simple REs w/o any supporting 
constructor and complex REs with supporting constructor where I think it 
is better to have the constructors.  Especially when duplication comes 
into play.

If nothing else, the constructors are likely to reduce one-off typo 
errors.  The typo will either be everywhere the constructor was used, or 
similarly be fixed everywhere at the same time.  Conversely, finding an 
unmatched parenthesis or square bracket in the RE above will be annoying 
at best if not likely to be more daunting.

> Each time the original language was readable because practitioners 
> had to read and write it.  When its replacement came along, the old 
> skill was no longer learnt and the language became ‘unreadable’.

I feel like there is an analogy between machine code and assembly 
language as well as assembly language and higher level languages.

My understanding is that the computer industry has vastly agreed that 
the higher level language is easier to understand and maintain.

> ‘{1}’ is redundant.

That may very well be.  But what will be more maintainable / easier to 
correct in the future; adding `{2}` when necessary or changing the value 
of `1` to `2`?

I think this is an example of tradeoff of not strictly required to make 
something more maintainable down the road.  Sort of like fleet vehicles 
vs non-fleet vehicles.

> BTW, ‘{0,1}’ is more readable to those who know regexps as ‘?’.

I think this is another example of the maintainability.

> I'm sending this to just the list.

I'm also replying to only the COFF mailing list.

> Perhaps your account on the list is configured to not send you an 
> email if it sees your address in the header's fields.

There is a reasonable chance that the COFF mailing list and / or your 
account therein is configured to minimize duplicates meaning the COFF 
mailing list won't send you a copy if it sees your subscribed address as 
receiving a copy directly.

I personally always prefer the mailing list copy and shun the direct 
copies.  I think that the copy from the mailing list keeps the 
discussion on the mailing list and avoids accidental replies bypassing 
the mailing list.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4017 bytes --]

  reply	other threads:[~2023-03-07 18:32 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-02 18:54 [COFF] " Grant Taylor via COFF
2023-03-02 19:23 ` [COFF] " Clem Cole
2023-03-02 19:38   ` Grant Taylor via COFF
2023-03-02 23:01   ` Stuff Received
2023-03-02 23:46     ` Steffen Nurpmeso
2023-03-03  1:08     ` Grant Taylor via COFF
2023-03-03  2:10       ` Dave Horsfall
2023-03-03  3:34         ` Grant Taylor via COFF
2023-03-02 21:53 ` Dan Cross
2023-03-03  1:05   ` Grant Taylor via COFF
2023-03-03  3:04     ` Dan Cross
2023-03-03  3:53       ` Grant Taylor via COFF
2023-03-03 13:47         ` Dan Cross
2023-03-03 19:26           ` Grant Taylor via COFF
2023-03-03 10:59 ` Ralph Corderoy
2023-03-03 13:11   ` Dan Cross
2023-03-03 13:42     ` Ralph Corderoy
2023-03-03 19:19       ` Grant Taylor via COFF
2023-03-04 10:15         ` [COFF] Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.) Ralph Corderoy
2023-03-07 21:49           ` [COFF] " Tomasz Rola
2023-03-07 22:46             ` Tomasz Rola
2023-06-20 16:02           ` Michael Parson
2023-06-20 21:26             ` Tomasz Rola
2023-06-22 15:45               ` Michael Parson
2023-07-10  9:08                 ` [COFF] Re: Reader, paper, tablet, phone (was: Re: Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.)) Tomasz Rola
2023-03-03 16:12   ` [COFF] Re: Requesting thoughts on extended regular expressions in grep Dave Horsfall
2023-03-03 17:13     ` Dan Cross
2023-03-03 17:38       ` Ralph Corderoy
2023-03-03 19:09         ` Dan Cross
2023-03-03 19:36     ` Grant Taylor via COFF
2023-03-04 10:26       ` Ralph Corderoy
2023-03-03 19:06 ` Grant Taylor via COFF
2023-03-03 19:31   ` Dan Cross
2023-03-04 10:07   ` Ralph Corderoy
2023-03-06 10:01 ` Ed Bradford
2023-03-06 21:01   ` Dan Cross
2023-03-06 21:49     ` Steffen Nurpmeso
2023-03-07  1:43     ` Larry McVoy
2023-03-07  4:01       ` Ed Bradford
2023-03-07 11:39         ` [COFF] " Ralph Corderoy
2023-03-07 18:31           ` Grant Taylor via COFF [this message]
2023-03-08 11:22           ` [COFF] " Ed Bradford
2023-03-07 16:14         ` Dan Cross
2023-03-07 17:34           ` [COFF] " Ralph Corderoy
2023-03-07 18:33             ` [COFF] " Dan Cross
2023-03-07  4:19     ` Ed Bradford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ef8945e4-c25c-eed5-2480-78f18d9bc75a@spamtrap.tnetconsulting.net \
    --to=coff@tuhs.org \
    --cc=gtaylor@tnetconsulting.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).