From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 27331 invoked from network); 8 Mar 2023 11:23:45 -0000 Received: from minnie.tuhs.org (2600:3c01:e000:146::1) by inbox.vuxu.org with ESMTPUTF8; 8 Mar 2023 11:23:45 -0000 Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 05C0941266; Wed, 8 Mar 2023 21:23:44 +1000 (AEST) Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by minnie.tuhs.org (Postfix) with ESMTPS id 721EE41263 for ; Wed, 8 Mar 2023 21:23:35 +1000 (AEST) Received: by mail-pj1-x1033.google.com with SMTP id 6-20020a17090a190600b00237c5b6ecd7so1926590pjg.4 for ; Wed, 08 Mar 2023 03:23:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678274615; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vz+RmgoVPQlXRUpqNSWvfduMiluAx3e1Sx/UQ+gYwzE=; b=XJgbr7d8Y2moV4DefJ1p/F80hvEQZH+Oy6+9L6IRWwkIZkbIGPeEXL/0p4/hYagoUh 6SA7JNXgaSI/hMmFukvWCv68WKRE1D0S7tryq9pS5/4baJ0/ogMYRSbIKKMx1h8uZOCg kcRAPPdAUchWrqcwqn1btqab9XZQvxzPq/cFt5zPxFBurRVu7lmu4jHNedGwN/ueNr+8 wEhWPT0ZFBZRPksvA4LPXHfEjsZQSW+ejORRJceyjsx6QiR781QimCpTmUT5yAgpa1ZJ IPZdWI9+wM3gK/z2Kvg2iGqF1KfYzF1JL0Csf6mg4WgivCDR2f0Kqwlmouj+/tn2jsau feig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678274615; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vz+RmgoVPQlXRUpqNSWvfduMiluAx3e1Sx/UQ+gYwzE=; b=C+I/R4nS7IgV4IALVUh4L4HDey8n0rul29bIDBn+uwm3PGSvWeO9P0UP7dC6w5OZiL 08qd/205RGNFUgJef3H2LA9rWwrhBVmNzKAtv1KO12bpB/1opVpCNng+FyP+42Q3IF1+ 0Ct3Sw4Xa8SSbBuLNR4U2AARfyp/usjiKZLU9lDq99t5XQ4tUffO1DQ5P73ff7qjFI03 uxBDfoV5cNChblz8O68sYu7Eu0TwoHvZLIpBkvg2xav9B7iGytWKYZnoDqoovJgYohAY mvBl3BUdEFGwYc//DvLO7jDzOqk45eV+Gjrk8b8YxyvRZS/ThSeC59K+VMvKEQ0eZTE1 yXeg== X-Gm-Message-State: AO0yUKWxW5DGnZyxhRFU0dz5XlmuiI5ruunyx8oC4CV0jtQiawTd7U+w 7z6PXrNgNWoUBjyNmIoSXzbR1kklza+tD/eYYmn7y8uGcnvQ0g== X-Google-Smtp-Source: AK7set8qiGMtK9bC51fJ/79XEEQnLSO8FM1ZaXZ9GqwmbQJr+kruKBh6CJDQPH9L86NkS3paRaxTP4YRlrSpEFJnVLU= X-Received: by 2002:a17:903:25d2:b0:19a:ec61:98d4 with SMTP id jc18-20020a17090325d200b0019aec6198d4mr6911406plb.0.1678274614446; Wed, 08 Mar 2023 03:23:34 -0800 (PST) MIME-Version: 1.0 References: <8d1de5c8-1f34-3d37-395d-0f1da7b062ec@spamtrap.tnetconsulting.net> <20230307014311.GN5398@mcvoy.com> <20230307113949.501602135B@orac.inputplus.co.uk> In-Reply-To: <20230307113949.501602135B@orac.inputplus.co.uk> From: Ed Bradford Date: Wed, 8 Mar 2023 05:22:56 -0600 Message-ID: To: Ralph Corderoy Content-Type: multipart/alternative; boundary="0000000000003acf5505f661c282" Message-ID-Hash: J3JSUHZXTZ2SZKBFCSGSUZUHYNL6BCC6 X-Message-ID-Hash: J3JSUHZXTZ2SZKBFCSGSUZUHYNL6BCC6 X-MailFrom: egbegb2@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: coff@tuhs.org X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep. List-Id: Computer Old Farts Forum Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --0000000000003acf5505f661c282 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you for the very useful comments. However, I disagree with you about the RE language. While I agree all RE experts don't need that, when I was hiring and gave some software to a new hire (whether an experienced programmer or a recent college grad) simply handing over huge RE's to my new hire was a daunting task to that person. I wrote that stuff that way to help remind me and anyone who might use the python program. I don't claim success. It does help me. When you say '{1}' is redundant, I think I did that to avoid any possibility of conflicts with the next string that is concatentated to the *Y_* (e.g. '*' or '+' or '{4,7}'). I am embarrassed I did not communicate that in the code. I had to think about it for a couple of hours before I recalled the "why". I will fix that. (it would be difficult to discuss this RE if I had to write "(19\d\d|20[01]\d|202" + "[0-" + lastYearRE + "]" + ") rather than just *Y_*). My initial thoughts on naming were I wanted the definition to be defined in exactly one place in the software. Python and the BTL folks told me to never use a constant in code. Always name it. Hence, I gave it a name. Each name might be used in multiple places. They might be imported. You are correct, the expression is unbalanced. I tried to remove the text2bytes(lastYearRE*)* call so the expression in this email was all text. I failed to remove the trailing *)* when I removed the call to text2bytes(). My hasty transcriptions might have produced similar errors in my email. Recall, my focus was on any file of any size. I'm on Windows 10 and an m1 MacBook. Python works on both. I don't have a Linux machine or enough desktop space to host one. I'm also mildly fed-up with virtual machines. Friedl taught me one thing. Most RE implementations are different. I'm trying to write a program that I could give to anyone and could reliably find a date (an RE) in any file. YYYY, MM, DD, HR, MI, SE, TH are words my user could use in the command line or in an options dialog. LAT and LON might also be possibilities. CST, EST, MST, PST, ... also. A 500 gigabyte archive or directory/folder of pictures and movies would be a great test target. I very much appreciate your comments. If this discussion is boring to others, I would be happy to take it to emails. I like your program. My experience with RE, grep, python, and sed suggests that anything but gnu grep and sed might not work due to the different implementations. I've been out of the Unix software business for 30 years after starting work at BTL in the 1970s and working on Version 6. I didn't know "printf" was now built into bash! That was a surprise. It's an incremental improvement, but doesn't compare with f-strings in python. *The interactive interpreter for python should have* *a "bash" mode?!* Does grep use a memory mapped file for its search, thereby avoiding all buffering boundaries? That too, would be new information to me. The additional complexity of dealing with buffering is more than annoying. Do you have any thoughts on how to verify a program that uses RE's. I've given no thought until now. My first thought for dates would be to write a separate module that simply searched through the file looking for 4 numbers in a row without using RE's, recording the offsets and 16 characters after and 1 character before in a python list of (offset,str) of tuples, ddddList, and using *dddd**List* as a proxy for the entire file. I could then aim my RE's at *ddddList*. *[A list of tuples in python* *is wonderful! !]* It seems to me '*' and '+' and {x,y} are the performance hogs in RE's. My RE's avoid them. One pass, I think, should suffice. What do you think? I haven't "archived" my 350 GB of pictures and movies, but one pass over all files therein ought to suffice, right? Two different programs that use different algorithms should be pretty good proof of correctness wouldn't you think? My RE's have no stars or pluses. If there is a mismatch before a match, give up and move on. On my Windows 10 machine, I have cygwin. Microsoft says my CPU doesn't have a TPM and the specific Intel Core I7 on my system is not supported so Windows 11 is not happening. Microsoft is DOS personified. (An unkind editorial remark about the low quality of software coming from Microsoft.) Anyway, I thank you again for your patience with me and your observations. I value your views and the other views I've seen here on coff@tuhs.org. I welcome all input to my education and will share all I have done so far with anyone who wants to collaborate, test, or is just curious. GOAL: run python program from an at-cost thumb drive that: reaps all media files from a user specified directory/folder tree and Adds files to the thumb drive. *Adds files* means Original file system is untouched Adds only unique files (hash codes are unique) Creates on the thumb drive a relative directory wherein the original file was found Prepends a "YYYY-MM-DD-" string to the filename if one can be found (EXIF is great shortcut). Copies srcroot/relative_path/oldfilename to thumbdrive/relative_path/YYYY-MM-DD-oldfilename or thumbdrive/relative_path/0000-oldfilename. Can also incrementally add new files by just scanning anywhere in any other computer file system or any other computer. Must work on Mac, Windows, and Linux What I have is a working prototype. It works on Mac and Windows. It doesn't do the date thing very well, and there are other shortcomings. I have delivered exactly one Christmas present to my favorite person in the world - a 400 GB SSD drive with all our pictures and media we have ever taken. The next things are to *add *more media and *re-unique-ify* (check) what is already present on the SSD drive and *improve the proper choice of "YYYY-MM-DD-" prefix* to filenames. I am retired and this is fun. I'm too old to want to get rich. Ed Bradford Pflugerville, TX egbegb2@gmail.com On Tue, Mar 7, 2023 at 5:40=E2=80=AFAM Ralph Corderoy wrote: > Hi Ed, > > > I have made an attempt to make my RE stuff readable and supportable. > > Readable to you, which is fine because you're the prime future reader. > But it's less readable than the regexp to those that know and read them > because of the indirection introduced by the variables. You've created > your own little language of CAPITALS rather than the lingua franca of > regexps. :-) > > > Machine language was unreadable and then along came assembly language. > > Assembly language was unreadable, then came higher level languages. > > Each time the original language was readable because practitioners had > to read and write it. When its replacement came along, the old skill > was no longer learnt and the language became =E2=80=98unreadable=E2=80=99= . > > > So far, I can do that for this RE program that works for small files, > > large files, binary files and text files for exactly one pattern: > > YYYY[-MM-DD] > > I constructed this RE with code like this: > > # ymdt is YYYY-MM-DD RE in text. > > # looking only for 1900s and 2000s years and no later than today. > > _YYYY =3D "(19\d\d|20[01]\d|202" + "[0-" + lastYearRE) + "]" + "){1= }" > > =E2=80=98{1}=E2=80=99 is redundant. > > > # months > > _MM =3D "(0[1-9]|1[012])" > > # days > > _DD =3D "(0[1-9]|[12]\d|3[01])" > > ymdt =3D _YYYY + '[' + _INTERNALSEP + > > _MM + > > _INTERNALSEP + > > ']'{0,1) > > I think we're missing something as the =E2=80=98'['=E2=80=99 is starting = a character > class which is odd for wrapping the month and the =E2=80=98{0,1)=E2=80=99= doesn't have > matching brackets and is outside the string. > > BTW, =E2=80=98{0,1}=E2=80=99 is more readable to those who know regexps a= s =E2=80=98?=E2=80=99. > > > For the whole file, RE I used > > ymdthf =3D _FRSEP + ymdt + _BASEP > > where FRSEP is front separator which includes > > a bunch of possible separators, excluding numbers and letters, or-ed > > with the up arrow "beginning of line" RE mark. > > It sounds like you're wanting a word boundary; something provided by > regexps. In Python, it's =E2=80=98\b=E2=80=99. > > >>> re.search(r'\bfoo\b', 'endfoo foostart foo ends'), > (,) > > Are you aware of the /x modifier to a regexp which ignores internal > whitespace, including linefeeds? This allows a large regexp to be split > over lines. There's a comment syntax too. See > https://docs.python.org/3/library/re.html#re.X > > GNU grep isn't too shabby at looking through binary files. I can't use > /x with grep so in a bash script, I'd do it manually. \< and \> match > the start and end of a word, a bit like Python's \b. > > re=3D' > .?\< > (19[0-9][0-9]|20[01][0-9]|202[0-3]) > ( > ([-:._]) > (0[1-9]|1[0-2]) > \3 > (0[1-9]|[12][0-9]|3[01]) > )? > \>.? > ' > re=3D${re//$'\n'/} > re=3D${re// /} > > printf '%s\n' 2001-04-01,1999_12_31 1944.03.01,1914! 2000-01.01 > >big-binary-file > LC_ALL=3DC grep -Eboa "$re" big-binary-file | sed -n l > > which gives > > 0:2001-04-01,$ > 11:1999_12_31$ > 22:1944.03.01,$ > 33:1914!$ > 39:2000-$ > > showing: > > - the byte offset within the file of each match, > - along with the any before and after byte if it's not a \n and not > already matched, just to show the word-boundary at work, > - with any non-printables escaped into octal by sed. > > > I thought I was on the COFF mailing list. > > I'm sending this to just the list. > > > I received this email by direct mail to from Larry. > > Perhaps your account on the list is configured to not send you an email > if it sees your address in the header's fields. > > -- > Cheers, Ralph. > --=20 Advice is judged by results, not by intentions. Cicero --0000000000003acf5505f661c282 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you for the very useful comments. However,
I disagree w= ith you about the RE language. While
I agree all RE experts don't need= that, when I was
hiring and gave some software to a new hire=C2=A0(whethe= r
an experienced programmer or a recent college grad)
simply handing over= huge RE's to my=C2=A0new hire was
a daunting task to that person. I w= rote that stuff
that way to help remind me and anyone who might
use the p= ython program.

I don't claim success. It does help me.

Wh= en you say '{1}' is redundant, I think I did that
to avoid any p= ossibility of conflicts with the
next string that is
concatentated to the= Y_ (e.g. '*' or '+' or '{4,7}').
I am emba= rrassed I did not communicate that
in the code. I had to think about it f= or a couple of hours
before I recalled the "why". I will fix tha= t.

=C2=A0 (it would be difficult to discuss
=C2=A0 =C2=A0this RE if= I had to write
=C2=A0 =C2=A0 =C2=A0 =C2=A0"(19\d\d|20[01]\d|202" + &quo= t;[0-" + lastYearRE + "]" + ")
=C2=A0 =C2=A0rather = than just Y_).

My initial thoughts
on naming were= =C2=A0I wanted the definition = to be defined
in exactly one place=C2=A0in the software.
Python and the= =C2=A0BTL=C2=A0folks told me to never
use a constant in=C2= =A0code. Always name it.
Hence, I gave it a name. Each name = might
be used in multiple places. They might be imported.
=
You are correct, the expression is unbalanced. I tried
<= div class=3D"gmail_default" style=3D"">to remove t= he text2bytes(lastYearRE)=C2=A0call so the e= xpression in
this email was all text. I failed to remove the trailing=C2= =A0)=C2=A0when
I removed the call to text2= bytes(). My hasty transcriptions
might have produced similar errors in my= email.

Recall, my focus was=C2=A0on any file of any size.
I'm on Windows 10 = and an=C2=A0m1 MacBook.
Python works on both. I don't have
a Linux = machine or enough=C2=A0desktop space to
host one. I'm also mildly fed= -up with
virtual machines.=C2=A0

Friedl taught me one thing. Mos= t
RE implementations are different. I'm trying
to write a program= =C2=A0that I could give
to anyone and could reliably find a date (an RE) = in
any file. YYYY, MM, DD, HR, MI, SE, TH are words
my user could use i= n the command line or in
an options dialog. LAT and LON might also be
p= ossibilities. CST, EST, MST, PST, ... also.
A 500 gigabyte archive or dir= ectory/folder
of pictures and movies would be
a great test target.
I very much appreciate your comments. If this
discussion is boring= to others, I would be happy
to take it to emails.

I like your p= rogram. My experience
<= font face=3D"monospace">with RE, grep, python, and sed suggests that=
anyt= hing but gnu grep and sed might not work due to the
different implementa= tions.

I've been out of the=C2=A0Unix software business
for 30 years after st= arting work at BTL in the 1970s
and working on Version 6. I = didn't know "printf" was now
built into bash! = That was a surprise. It's an incremental
improvement, bu= t doesn't compare with f-strings in python.
The interactive interpreter for python should have
a "bash" mode?!

Does grep use a memory mapped file for its search, thereby
avoiding all buffering boundaries? That too, would
be new information t= o me. The additional complexity
of dealing with buffering is more than an= noying.

Do you have any thoughts on how to verify
a program that= uses RE's. I've given no thought
until now. My first thought for= dates would be
to write a separate module that simply searched
through= the file looking for 4 numbers in a row
without using RE's, recordin= g the offsets and 16 characters
after and 1 character before in a python = list of (offset,str)
of tuples,=C2=A0ddddList,=C2=A0and using=C2=A0dddd<= i style=3D"font-family:arial,sans-serif">List
as a proxy for th= e entire file. I could then
aim my RE's at ddddList. [A list of tuples= in python
is wonderful! !]<= font face=3D"monospace"> It seems to me '*' and '+' and {x,= y} are the performance
= hogs in RE's. My RE's avoid them. One pass= , I think, should
suffice. What do you think? I haven't "archive= d" my 350 GB
of pictures and movies, but one pass over all files the= rein
ought to suffice, right? Two different programs that use different
algorithms should be pretty good proof of correctness wouldn't<= /div>
you t= hink?

My RE's have no stars or pluses. If there is a mismatch = before
a match, give up and move on.

On my Windows 10 machine, I= have cygwin.
Microsoft says my CPU doesn't have a TPM and
the spec= ific Intel Core I7 on my system is not
supported so Windows 11 is not hap= pening.
Microsoft is DOS personified.
=C2=A0(An unkind editorial remark= about the low
=C2=A0 quality of software coming from Microsoft.)<= /div>

<= /font>
Anyway, I= thank you again for your patience with me
and your observations. I value your views= and the
= other views I've seen here on coff@tuh= s.org.

I = welcome all input to my education and will share
all I have done so far with anyone = who wants to
collaborate, test, or is just curious.

=C2=A0 =C2=A0 GOAL: run python program from a= n at-cost thumb drive that:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 reaps all media files= from a user specified
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 directory/folder tree and=

=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 Adds files to the thumb drive.

= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Adds files means
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 Original file system is untouched

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 Adds only unique files (hash codes are unique)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 Creates on the thumb drive a relative directory
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 wherein the=C2=A0original=C2=A0file was found

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 Prepends a "YYYY-MM-DD-" string to the filename
<= div class=3D"gmail_default">=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 if one can be found (EXIF is great shortcut).

=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Copies
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 srcroot/relative_path/oldfilename
<= div class=3D"gmail_default">=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 to
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0thumbdrive/relative_path/YYYY-MM-DD-oldfilename
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0or
<= div class=3D"gmail_default">=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0thumbdrive/relative_path/00= 00-oldfilename.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Can also incrementally add ne= w files by just
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 scanning anywhere in any o= ther computer
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 file system or any other com= puter.
=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 Must work on Mac, Windows, and Linux

<= div class=3D"gmail_default">What I have is a worki= ng prototype. It works
on Mac and Windows. It doesn't do the
date thing very well, and t= here are other shortcomings.

I have delivered exactly=C2=A0one Christmas present to my f= avorite person
in the world - a 400 GB SSD drive with all our pictures and media
we have ever= taken. The next things are=C2=A0to add more media
and=C2=A0re-unique-ify=C2=A0(check) what is already present on the SSD drive
and=C2=A0=C2= =A0improve=C2=A0the proper choice of "YYYY-MM-DD-" prefix = to
filenames.

<= /div>
I am = retired and this is fun.
I'm too old to want to get rich.

Ed= Bradford
Pflugerville, TX





On Tue, Mar 7, 2023 at 5:40=E2=80=AFAM Ra= lph Corderoy <ralph@inputplus.c= o.uk> wrote:
Hi Ed,

> I have made an attempt=C2=A0to make my RE stuff readable and supportab= le.

Readable to you, which is fine because you're the prime future reader.<= br> But it's less readable than the regexp to those that know and read them=
because of the indirection introduced by the variables.=C2=A0 You've cr= eated
your own little language of CAPITALS rather than the lingua franca of
regexps.=C2=A0 :-)

> Machine language was unreadable and then along came assembly=C2=A0lang= uage.
> Assembly=C2=A0language was unreadable, then came higher level language= s.

Each time the original language was readable because practitioners had
to read and write it.=C2=A0 When its replacement came along, the old skill<= br> was no longer learnt and the language became =E2=80=98unreadable=E2=80=99.<= br>
> So far, I can do that for this RE program that works for small files,<= br> > large files, binary files and text files for exactly one pattern:
> =C2=A0 =C2=A0 YYYY[-MM-DD]
> I constructed this RE with code like this:
>=C2=A0 =C2=A0 =C2=A0# ymdt is YYYY-MM-DD RE in text.
>=C2=A0 =C2=A0 =C2=A0# looking only for 1900s and 2000s years and no lat= er than today.
>=C2=A0 =C2=A0 =C2=A0_YYYY =3D=C2=A0"(19\d\d|20[01]\d|202" + &= quot;[0-" + lastYearRE)=C2=A0+ "]" + "){1}"

=E2=80=98{1}=E2=80=99 is redundant.

>=C2=A0 =C2=A0 =C2=A0# months
>=C2=A0 =C2=A0 =C2=A0_MM=C2=A0 =C2=A0=3D "(0[1-9]|1[012])"
>=C2=A0 =C2=A0 =C2=A0# days
>=C2=A0 =C2=A0 =C2=A0_DD=C2=A0 =C2=A0=3D "(0[1-9]|[12]\d|3[01])&quo= t;
>=C2=A0 =C2=A0 =C2=A0ymdt =3D _YYYY=C2=A0+ '[' + _INTERNALSEP=C2= =A0+
>=C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0_MM=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 +
>=C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0_INTERNALSEP=C2=A0+
>=C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0']'{0,1)

I think we're missing something as the =E2=80=98'['=E2=80=99 is= starting a character
class which is odd for wrapping the month and the =E2=80=98{0,1)=E2=80=99 d= oesn't have
matching brackets and is outside the string.

BTW, =E2=80=98{0,1}=E2=80=99 is more readable to those who know regexps as = =E2=80=98?=E2=80=99.

> For the whole file, RE I used
>=C2=A0 =C2=A0 =C2=A0ymdthf =3D _FRSEP=C2=A0+ ymdt=C2=A0+ _BASEP
> where FRSEP=C2=A0is front separator which includes
> a bunch of possible=C2=A0separators, excluding numbers and letters, or= -ed
> with the up arrow "beginning of line" RE mark.

It sounds like you're wanting a word boundary; something provided by regexps.=C2=A0 In Python, it's =E2=80=98\b=E2=80=99.

=C2=A0 =C2=A0 >>> re.search(r'\bfoo\b', 'endfoo foosta= rt foo ends'),
=C2=A0 =C2=A0 (<re.Match object; span=3D(16, 19), match=3D'foo'&= gt;,)

Are you aware of the /x modifier to a regexp which ignores internal
whitespace, including linefeeds?=C2=A0 This allows a large regexp to be spl= it
over lines.=C2=A0 There's a comment syntax too.=C2=A0 See
https://docs.python.org/3/library/re.html#re.X

GNU grep isn't too shabby at looking through binary files.=C2=A0 I can&= #39;t use
/x with grep so in a bash script, I'd do it manually.=C2=A0 \< and \= > match
the start and end of a word, a bit like Python's \b.

=C2=A0 =C2=A0 re=3D'
=C2=A0 =C2=A0 =C2=A0 =C2=A0 .?\<
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (19[0-9][0-9]|20[01][0-9]|202[0-3= ])
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ([-:._])
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (0[1-9]|1[0-2])
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 \3
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (0[1-9]|[12][0-9]|3= [01])
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 )?
=C2=A0 =C2=A0 =C2=A0 =C2=A0 \>.?
=C2=A0 =C2=A0 '
=C2=A0 =C2=A0 re=3D${re//$'\n'/}
=C2=A0 =C2=A0 re=3D${re// /}

=C2=A0 =C2=A0 printf '%s\n' 2001-04-01,1999_12_31 1944.03.01,1914! = 2000-01.01 >big-binary-file
=C2=A0 =C2=A0 LC_ALL=3DC grep -Eboa "$re" big-binary-file | sed -= n l

which gives

=C2=A0 =C2=A0 0:2001-04-01,$
=C2=A0 =C2=A0 11:1999_12_31$
=C2=A0 =C2=A0 22:1944.03.01,$
=C2=A0 =C2=A0 33:1914!$
=C2=A0 =C2=A0 39:2000-$

showing:

- the byte offset within the file of each match,
- along with the any before and after byte if it's not a \n and not
=C2=A0 already matched, just to show the word-boundary at work,
- with any non-printables escaped into octal by sed.

> I thought I was on the COFF mailing list.

I'm sending this to just the list.

> I received this email by direct mail to from Larry.

Perhaps your account on the list is configured to not send you an email
if it sees your address in the header's fields.

--
Cheers, Ralph.


--
Advice is judged by results, not by intentions.
=C2=A0 Ci= cero

--0000000000003acf5505f661c282--