From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 1563 invoked from network); 7 Mar 2023 04:01:35 -0000 Received: from minnie.tuhs.org (50.116.15.146) by inbox.vuxu.org with ESMTPUTF8; 7 Mar 2023 04:01:35 -0000 Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id B24CB41212; Tue, 7 Mar 2023 14:01:32 +1000 (AEST) Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by minnie.tuhs.org (Postfix) with ESMTPS id 850E64120F for ; Tue, 7 Mar 2023 14:01:28 +1000 (AEST) Received: by mail-pf1-x436.google.com with SMTP id fd25so7275459pfb.1 for ; Mon, 06 Mar 2023 20:01:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678161688; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=zHKwZarGx90yg236CR8f08QjOYMTsyS4qBkvxpBVJNw=; b=PtLElgwPgfrkFoQqP2ly49t9GW13s7mGUeQ3XXSmUe+ZxznRd2pUICi58v2VoMpj1X jG5S3gSTozVaUWdA3IavdTkr3cFisy2dsF0LKE+i3lo9sx9np4xI4M5NOcdMw5FbqSO3 8jqvxi85XgQmIvBVcfjMbpmLEuyjj76kBEys5lfh6lrkwIhycJ2dopTnh9V+F/IZM0f1 ET2YjO8ejdG89vZ/z2EzlBWCs57oYlzvyO05TKhzCwfqf6tezZ9JNPtNNrW80HOLfa56 oPuray6eUkyWjXRi23DIkbYwGGF9+pm6F1eb12ZlYJH0oEysxlVIPkSh8knLar+2Jpk/ jmLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678161688; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zHKwZarGx90yg236CR8f08QjOYMTsyS4qBkvxpBVJNw=; b=MNTmbXobQe5Qe63fzc4w/sqiDaPS+t6QcT56Bp0xY3Fj/SPur1q5n4vLB1aEuSFxju rPQjtOPkX3QkMBHmVp1Z8Xw+n5+QBXhoCRZ/MpzMijFXE5+FB5AJwbyaQ7LNOvacOCsa UnAl+oQr/bnwY5zAcDp89TvUSP5XnwX1h3dqlayeRBe5QroJXnHTYxPUwrBR3VnbNdxQ VVBwHmzG8FGAiLubLTgUaamBhRN8g6bGGCDEevdVTEmK/NCoI3aBokojjhi7+4K6mggO 5hsnc2bSxDmeXv05xDTUFRSFtGgLirwQ7H1/F6caU+fNJY/Mhh8rXqQZm+dEuDLQgTOS LIcg== X-Gm-Message-State: AO0yUKV2mpAy8J24AJdSXMx8tM56ZoOI99M5PK0H+jn8HvoTf9NnF/FP BY7AsF7G6VjyHJqciyeJ+PBoDP+F8+VLePns5tI= X-Google-Smtp-Source: AK7set/woo6Pw37KzJSQnIs6kHle6TMBCNpyP3QeA+4srdWGf5yOCic40ofkh+LNbyUd0tIl40xEIoqjNpJIIYpWlX8= X-Received: by 2002:a63:2953:0:b0:503:7be1:f812 with SMTP id bu19-20020a632953000000b005037be1f812mr4860769pgb.7.1678161687714; Mon, 06 Mar 2023 20:01:27 -0800 (PST) MIME-Version: 1.0 References: <8d1de5c8-1f34-3d37-395d-0f1da7b062ec@spamtrap.tnetconsulting.net> <20230307014311.GN5398@mcvoy.com> In-Reply-To: <20230307014311.GN5398@mcvoy.com> From: Ed Bradford Date: Mon, 6 Mar 2023 22:01:14 -0600 Message-ID: To: Larry McVoy Content-Type: multipart/alternative; boundary="00000000000045a13905f6477707" Message-ID-Hash: 2TE7BO426HNDR4ZNVFJ5AJZ4RK6WBPAP X-Message-ID-Hash: 2TE7BO426HNDR4ZNVFJ5AJZ4RK6WBPAP X-MailFrom: egbegb2@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Grant Taylor , COFF X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep. List-Id: Computer Old Farts Forum Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --00000000000045a13905f6477707 Content-Type: text/plain; charset="UTF-8" I have made an attempt to make my RE stuff readable and supportable. I think I write more description that I do RE "code". As for, *it won't be comprehendable,* Machine language was unreadable and then along came assembly language. Assembly language was unreadable, then came higher level languages. Even higher level languages are unsupportable if not well documented and mostly simple to understand ("you are not expected to understand this" notwithstanding). The jump from machine language to python today was unimagined in early times. [ As an old timer, I see inflection points between: machine language and assembly language assembly language and high level languages and high level languages and python. But that's just me. ] I think it is possible to make a 50K RE that is understandable. However, it requires a lot of 'splainin' throughout the code. I'm naive though; I will eventually discover a lack of truth in that belief, if such exists. I repeat. I put stuff down for months at a time. My metric is *coming back to it* *and understanding where I left off*. So far, I can do that for this RE program that works for small files, large files, binary files and text files for exactly one pattern: YYYY[-MM-DD] I constructed this RE with code like this: # ymdt is YYYY-MM-DD RE in text. # looking only for 1900s and 2000s years and no later than today. _YYYY = "(19\d\d|20[01]\d|202" + "[0-" + lastYearRE) + "]" + "){1}" # months _MM = "(0[1-9]|1[012])" # days _DD = "(0[1-9]|[12]\d|3[01])" ymdt = _YYYY + '[' + _INTERNALSEP + _MM + _INTERNALSEP + ']'{0,1) For the whole file, RE I used ymdthf = _FRSEP + ymdt + _BASEP where FRSEP is front separator which includes a bunch of possible separators, excluding numbers and letters, or-ed with the up arrow "beginning of line" RE mark. BASEP is back separator is same as FRSEP with "^" replaced with "$". I then aimed ymdthf at "data" the thing that represents the entire memory mapped file (where there is only one beginning and one end). Again, I say validating an RE is as difficult or more than writing one. What does it miss? Dates are an excellent test ground for RE's. Latitude and longitude is another. Ed PS: I thought I was on the COFF mailing list. I received this email by direct mail to from Larry. I haven't seen any other comments on my submission. I might have unsubscribed, but now I regret it. Dear powers that be: Please resubscribe me. --00000000000045a13905f6477707 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I have made an attempt=C2=A0to make my RE stu= ff readable and supportable. I think I write more description that I do RE = "code". As for, it won't be comprehendable,=C2=A0Machi= ne language
was unreadable and then along came assembly=C2=A0language. Ass= embly=C2=A0language was unreadable, then came higher level languages. Even = higher level languages are unsupportable if not well documented and mostly = simple to understand ("you are not expected to understand this" n= otwithstanding). The jump from machine language to python today
was unimag= ined in early times.

=C2=A0 =C2=A0 [
=C2=A0 =C2=A0 =C2=A0As an old = timer, I see inflection points
=C2=A0 =C2=A0 =C2=A0between:

=C2=A0 = =C2=A0 =C2=A0 =C2=A0machine language and assembly language
=C2=A0 =C2=A0 = =C2=A0 =C2=A0assembly language and high level languages
=C2=A0 =C2=A0 =C2= =A0 =C2=A0and
=C2=A0 =C2=A0 =C2=A0 =C2=A0high level languages and python.<= /div>

=C2=A0 =C2=A0 =C2=A0 But that's just me.
=C2=A0 =C2=A0 =C2=A0]<= /div>



I think it is possible to make a 50K RE that is understa= ndable. However, it requires
a lot of 'splainin' throughout the co= de. I'm naive though; I will eventually discover
a lack of truth in th= at belief, if such exists.

I repeat. I put stuff down for months at = a time. My metric is coming back to it
and understandi= ng where I left off. So far, I can do that for this RE program that=
works for small files, large files,
binary files and text files for exac= tly one pattern:

=C2=A0 =C2=A0 YYYY[-MM-DD]

I constructed thi= s RE with code like this:

# ymdt is YYYY-MM-DD RE in text.

# = looking only for 1900s and 2000s years and no later than today.
_YYYY= =3D=C2=A0"(19\d\d|20[01]\d|202" = + "[0-" + lastYearRE)=C2=A0+ "]" + "){1}"
<= br>
# months
_MM=C2=A0 =C2=A0=3D "(0[1-9]|1[012])"
=

=
# days
_DD=C2=A0 =C2=A0=3D "(0[1-9]|[12]\d|3[01])"
ymdt =3D _YYYY=C2=A0+ '[' + _INTERNALSEP=C2=A0+
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0_MM=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 +
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0_INTERNALSEP=C2=A0+
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&#= 39;]'{0,1)

For the whole file, RE I used

ymdthf =3D _FR= SEP=C2=A0+ ymdt=C2=A0+ _BASEP

where FRSEP=C2=A0is front separator wh= ich includes
a bunch of possible=C2=A0separators, excluding numbers and le= tters, or-ed
with the up arrow "beginning of line" RE mark. BASE= P is back separator
is same=C2=A0as FRSEP with "^" replaced with= "$".

I then aimed ymdthf=C2=A0at "data" the thi= ng that represents
the entire memory mapped file (where there is only one = beginning
and one end).

Again, I say validating an RE is as difficu= lt or more than writing one.
What does it miss?

Dates are an excell= ent test ground for RE's. Latitude and longitude is another.

Ed<= /div>

PS: I thought I was on the COFF mailing list. I received this email<= /div>
by direct mail to from Larry. I haven't seen any other comments
<= div class=3D"gmail_default" style=3D"font-family:monospace,monospace">on my= submission. I might have unsubscribed, but now I regret it. Dear powers
t= hat be: Please resubscribe me.

--00000000000045a13905f6477707--