From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED,
	DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI
	autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 27331 invoked from network); 8 Mar 2023 11:23:45 -0000
Received: from minnie.tuhs.org (2600:3c01:e000:146::1)
  by inbox.vuxu.org with ESMTPUTF8; 8 Mar 2023 11:23:45 -0000
Received: from minnie.tuhs.org (localhost [IPv6:::1])
	by minnie.tuhs.org (Postfix) with ESMTP id 05C0941266;
	Wed,  8 Mar 2023 21:23:44 +1000 (AEST)
Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033])
	by minnie.tuhs.org (Postfix) with ESMTPS id 721EE41263
	for <coff@tuhs.org>; Wed,  8 Mar 2023 21:23:35 +1000 (AEST)
Received: by mail-pj1-x1033.google.com with SMTP id 6-20020a17090a190600b00237c5b6ecd7so1926590pjg.4
        for <coff@tuhs.org>; Wed, 08 Mar 2023 03:23:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112; t=1678274615;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=vz+RmgoVPQlXRUpqNSWvfduMiluAx3e1Sx/UQ+gYwzE=;
        b=XJgbr7d8Y2moV4DefJ1p/F80hvEQZH+Oy6+9L6IRWwkIZkbIGPeEXL/0p4/hYagoUh
         6SA7JNXgaSI/hMmFukvWCv68WKRE1D0S7tryq9pS5/4baJ0/ogMYRSbIKKMx1h8uZOCg
         kcRAPPdAUchWrqcwqn1btqab9XZQvxzPq/cFt5zPxFBurRVu7lmu4jHNedGwN/ueNr+8
         wEhWPT0ZFBZRPksvA4LPXHfEjsZQSW+ejORRJceyjsx6QiR781QimCpTmUT5yAgpa1ZJ
         IPZdWI9+wM3gK/z2Kvg2iGqF1KfYzF1JL0Csf6mg4WgivCDR2f0Kqwlmouj+/tn2jsau
         feig==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1678274615;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=vz+RmgoVPQlXRUpqNSWvfduMiluAx3e1Sx/UQ+gYwzE=;
        b=C+I/R4nS7IgV4IALVUh4L4HDey8n0rul29bIDBn+uwm3PGSvWeO9P0UP7dC6w5OZiL
         08qd/205RGNFUgJef3H2LA9rWwrhBVmNzKAtv1KO12bpB/1opVpCNng+FyP+42Q3IF1+
         0Ct3Sw4Xa8SSbBuLNR4U2AARfyp/usjiKZLU9lDq99t5XQ4tUffO1DQ5P73ff7qjFI03
         uxBDfoV5cNChblz8O68sYu7Eu0TwoHvZLIpBkvg2xav9B7iGytWKYZnoDqoovJgYohAY
         mvBl3BUdEFGwYc//DvLO7jDzOqk45eV+Gjrk8b8YxyvRZS/ThSeC59K+VMvKEQ0eZTE1
         yXeg==
X-Gm-Message-State: AO0yUKWxW5DGnZyxhRFU0dz5XlmuiI5ruunyx8oC4CV0jtQiawTd7U+w
	7z6PXrNgNWoUBjyNmIoSXzbR1kklza+tD/eYYmn7y8uGcnvQ0g==
X-Google-Smtp-Source: AK7set8qiGMtK9bC51fJ/79XEEQnLSO8FM1ZaXZ9GqwmbQJr+kruKBh6CJDQPH9L86NkS3paRaxTP4YRlrSpEFJnVLU=
X-Received: by 2002:a17:903:25d2:b0:19a:ec61:98d4 with SMTP id
 jc18-20020a17090325d200b0019aec6198d4mr6911406plb.0.1678274614446; Wed, 08
 Mar 2023 03:23:34 -0800 (PST)
MIME-Version: 1.0
References: <8d1de5c8-1f34-3d37-395d-0f1da7b062ec@spamtrap.tnetconsulting.net>
 <CAHTagfH97hXvW4=pMPYfQuJsVCtGtUvfTgDjUhw2kRe6FOUqTg@mail.gmail.com>
 <CAEoi9W6tZ+55MSPxPoZqfS3k9RO9MOQqB0yu=MO_vzzw0K6Lhw@mail.gmail.com>
 <20230307014311.GN5398@mcvoy.com> <CAHTagfFqfP3eVSgQOgV29O=JJkGdhjiv40pw-LNsvNvORC1XTA@mail.gmail.com>
 <20230307113949.501602135B@orac.inputplus.co.uk>
In-Reply-To: <20230307113949.501602135B@orac.inputplus.co.uk>
From: Ed Bradford <egbegb2@gmail.com>
Date: Wed, 8 Mar 2023 05:22:56 -0600
Message-ID: <CAHTagfGYNi-TvkPMsXBf36a3g-b7D7qtk-xn9k6kiwu0YM7DcA@mail.gmail.com>
To: Ralph Corderoy <ralph@inputplus.co.uk>
Content-Type: multipart/alternative; boundary="0000000000003acf5505f661c282"
Message-ID-Hash: J3JSUHZXTZ2SZKBFCSGSUZUHYNL6BCC6
X-Message-ID-Hash: J3JSUHZXTZ2SZKBFCSGSUZUHYNL6BCC6
X-MailFrom: egbegb2@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: coff@tuhs.org
X-Mailman-Version: 3.3.6b1
Precedence: list
Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep.
List-Id: Computer Old Farts Forum <coff.tuhs.org>
Archived-At: <https://www.tuhs.org/mailman3/hyperkitty/list/coff@tuhs.org/message/J3JSUHZXTZ2SZKBFCSGSUZUHYNL6BCC6/>
List-Archive: <https://www.tuhs.org/mailman3/hyperkitty/list/coff@tuhs.org/>
List-Help: <mailto:coff-request@tuhs.org?subject=help>
List-Owner: <mailto:coff-owner@tuhs.org>
List-Post: <mailto:coff@tuhs.org>
List-Subscribe: <mailto:coff-join@tuhs.org>
List-Unsubscribe: <mailto:coff-leave@tuhs.org>

--0000000000003acf5505f661c282
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Thank you for the very useful comments. However,
I disagree with you about the RE language. While
I agree all RE experts don't need that, when I was
hiring and gave some software to a new hire (whether
an experienced programmer or a recent college grad)
simply handing over huge RE's to my new hire was
a daunting task to that person. I wrote that stuff
that way to help remind me and anyone who might
use the python program.

I don't claim success. It does help me.

When you say '{1}' is redundant, I think I did that
to avoid any possibility of conflicts with the
next string that is
concatentated to the *Y_* (e.g. '*' or '+' or '{4,7}').
I am embarrassed I did not communicate that
in the code. I had to think about it for a couple of hours
before I recalled the "why". I will fix that.

  (it would be difficult to discuss
   this RE if I had to write
       "(19\d\d|20[01]\d|202" + "[0-" + lastYearRE + "]" + ")
   rather than just *Y_*).

My initial thoughts
on naming were I wanted the definition to be defined
in exactly one place in the software.
Python and the BTL folks told me to never
use a constant in code. Always name it.
Hence, I gave it a name. Each name might
be used in multiple places. They might be imported.

You are correct, the expression is unbalanced. I tried
to remove the text2bytes(lastYearRE*)* call so the expression in
this email was all text. I failed to remove the trailing *)* when
I removed the call to text2bytes(). My hasty transcriptions
might have produced similar errors in my email.

Recall, my focus was on any file of any size.
I'm on Windows 10 and an m1 MacBook.
Python works on both. I don't have
a Linux machine or enough desktop space to
host one. I'm also mildly fed-up with
virtual machines.

Friedl taught me one thing. Most
RE implementations are different. I'm trying
to write a program that I could give
to anyone and could reliably find a date (an RE) in
any file. YYYY, MM, DD, HR, MI, SE, TH are words
my user could use in the command line or in
an options dialog. LAT and LON might also be
possibilities. CST, EST, MST, PST, ... also.
A 500 gigabyte archive or directory/folder
of pictures and movies would be
a great test target.

I very much appreciate your comments. If this
discussion is boring to others, I would be happy
to take it to emails.

I like your program. My experience
with RE, grep, python, and sed suggests that
anything but gnu grep and sed might not work due to the
different implementations.

I've been out of the Unix software business
for 30 years after starting work at BTL in the 1970s
and working on Version 6. I didn't know "printf" was now
built into bash! That was a surprise. It's an incremental
improvement, but doesn't compare with f-strings in python.
*The interactive interpreter for python should have*
*a "bash" mode?!*

Does grep use a memory mapped file for its search, thereby
avoiding all buffering boundaries? That too, would
be new information to me. The additional complexity
of dealing with buffering is more than annoying.

Do you have any thoughts on how to verify
a program that uses RE's. I've given no thought
until now. My first thought for dates would be
to write a separate module that simply searched
through the file looking for 4 numbers in a row
without using RE's, recording the offsets and 16 characters
after and 1 character before in a python list of (offset,str)
of tuples, ddddList, and using *dddd**List*
as a proxy for the entire file. I could then
aim my RE's at *ddddList*. *[A list of tuples in python*
*is wonderful! !]* It seems to me '*' and '+' and {x,y} are the performance
hogs in RE's. My RE's avoid them. One pass, I think, should
suffice. What do you think? I haven't "archived" my 350 GB
of pictures and movies, but one pass over all files therein
ought to suffice, right? Two different programs that use different
algorithms should be pretty good proof of correctness wouldn't
you think?

My RE's have no stars or pluses. If there is a mismatch before
a match, give up and move on.

On my Windows 10 machine, I have cygwin.
Microsoft says my CPU doesn't have a TPM and
the specific Intel Core I7 on my system is not
supported so Windows 11 is not happening.
Microsoft is DOS personified.
 (An unkind editorial remark about the low
  quality of software coming from Microsoft.)

Anyway, I thank you again for your patience with me
and your observations. I value your views and the
other views I've seen here on coff@tuhs.org.

I welcome all input to my education and will share
all I have done so far with anyone who wants to
collaborate, test, or is just curious.

    GOAL: run python program from an at-cost thumb drive that:
          reaps all media files from a user specified
          directory/folder tree and

          Adds files to the thumb drive.

          *Adds files* means
            Original file system is untouched

            Adds only unique files (hash codes are unique)

            Creates on the thumb drive a relative directory
              wherein the original file was found

            Prepends a "YYYY-MM-DD-" string to the filename
              if one can be found (EXIF is great shortcut).

            Copies
                      srcroot/relative_path/oldfilename
              to
                   thumbdrive/relative_path/YYYY-MM-DD-oldfilename
                     or
                   thumbdrive/relative_path/0000-oldfilename.

          Can also incrementally add new files by just
            scanning anywhere in any other computer
            file system or any other computer.

          Must work on Mac, Windows, and Linux

What I have is a working prototype. It works
on Mac and Windows. It doesn't do the
date thing very well, and there are other shortcomings.

I have delivered exactly one Christmas present to my favorite person
in the world - a 400 GB SSD drive with all our pictures and media
we have ever taken. The next things are to *add *more media
and *re-unique-ify* (check) what is already present on the SSD drive
and  *improve the proper choice of "YYYY-MM-DD-" prefix* to
filenames.

I am retired and this is fun.
I'm too old to want to get rich.

Ed Bradford
Pflugerville, TX
egbegb2@gmail.com


On Tue, Mar 7, 2023 at 5:40=E2=80=AFAM Ralph Corderoy <ralph@inputplus.co.u=
k> wrote:

> Hi Ed,
>
> > I have made an attempt to make my RE stuff readable and supportable.
>
> Readable to you, which is fine because you're the prime future reader.
> But it's less readable than the regexp to those that know and read them
> because of the indirection introduced by the variables.  You've created
> your own little language of CAPITALS rather than the lingua franca of
> regexps.  :-)
>
> > Machine language was unreadable and then along came assembly language.
> > Assembly language was unreadable, then came higher level languages.
>
> Each time the original language was readable because practitioners had
> to read and write it.  When its replacement came along, the old skill
> was no longer learnt and the language became =E2=80=98unreadable=E2=80=99=
.
>
> > So far, I can do that for this RE program that works for small files,
> > large files, binary files and text files for exactly one pattern:
> >     YYYY[-MM-DD]
> > I constructed this RE with code like this:
> >     # ymdt is YYYY-MM-DD RE in text.
> >     # looking only for 1900s and 2000s years and no later than today.
> >     _YYYY =3D "(19\d\d|20[01]\d|202" + "[0-" + lastYearRE) + "]" + "){1=
}"
>
> =E2=80=98{1}=E2=80=99 is redundant.
>
> >     # months
> >     _MM   =3D "(0[1-9]|1[012])"
> >     # days
> >     _DD   =3D "(0[1-9]|[12]\d|3[01])"
> >     ymdt =3D _YYYY + '[' + _INTERNALSEP +
> >                          _MM          +
> >                          _INTERNALSEP +
> >                    ']'{0,1)
>
> I think we're missing something as the =E2=80=98'['=E2=80=99 is starting =
a character
> class which is odd for wrapping the month and the =E2=80=98{0,1)=E2=80=99=
 doesn't have
> matching brackets and is outside the string.
>
> BTW, =E2=80=98{0,1}=E2=80=99 is more readable to those who know regexps a=
s =E2=80=98?=E2=80=99.
>
> > For the whole file, RE I used
> >     ymdthf =3D _FRSEP + ymdt + _BASEP
> > where FRSEP is front separator which includes
> > a bunch of possible separators, excluding numbers and letters, or-ed
> > with the up arrow "beginning of line" RE mark.
>
> It sounds like you're wanting a word boundary; something provided by
> regexps.  In Python, it's =E2=80=98\b=E2=80=99.
>
>     >>> re.search(r'\bfoo\b', 'endfoo foostart foo ends'),
>     (<re.Match object; span=3D(16, 19), match=3D'foo'>,)
>
> Are you aware of the /x modifier to a regexp which ignores internal
> whitespace, including linefeeds?  This allows a large regexp to be split
> over lines.  There's a comment syntax too.  See
> https://docs.python.org/3/library/re.html#re.X
>
> GNU grep isn't too shabby at looking through binary files.  I can't use
> /x with grep so in a bash script, I'd do it manually.  \< and \> match
> the start and end of a word, a bit like Python's \b.
>
>     re=3D'
>         .?\<
>             (19[0-9][0-9]|20[01][0-9]|202[0-3])
>             (
>                 ([-:._])
>                 (0[1-9]|1[0-2])
>                 \3
>                 (0[1-9]|[12][0-9]|3[01])
>             )?
>         \>.?
>     '
>     re=3D${re//$'\n'/}
>     re=3D${re// /}
>
>     printf '%s\n' 2001-04-01,1999_12_31 1944.03.01,1914! 2000-01.01
> >big-binary-file
>     LC_ALL=3DC grep -Eboa "$re" big-binary-file | sed -n l
>
> which gives
>
>     0:2001-04-01,$
>     11:1999_12_31$
>     22:1944.03.01,$
>     33:1914!$
>     39:2000-$
>
> showing:
>
> - the byte offset within the file of each match,
> - along with the any before and after byte if it's not a \n and not
>   already matched, just to show the word-boundary at work,
> - with any non-printables escaped into octal by sed.
>
> > I thought I was on the COFF mailing list.
>
> I'm sending this to just the list.
>
> > I received this email by direct mail to from Larry.
>
> Perhaps your account on the list is configured to not send you an email
> if it sees your address in the header's fields.
>
> --
> Cheers, Ralph.
>


--=20
Advice is judged by results, not by intentions.
  Cicero

--0000000000003acf5505f661c282
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:monospac=
e,monospace">Thank you for the very useful comments. However,</div><div cla=
ss=3D"gmail_default" style=3D"font-family:monospace,monospace">I disagree w=
ith you about the RE language. While</div><div class=3D"gmail_default" styl=
e=3D"font-family:monospace,monospace">I agree all RE experts don&#39;t need=
 that, when I was</div><div class=3D"gmail_default" style=3D"font-family:mo=
nospace,monospace">hiring and gave some software to a new hire=C2=A0(whethe=
r</div><div class=3D"gmail_default" style=3D"font-family:monospace,monospac=
e">an experienced programmer or a recent college grad)</div><div class=3D"g=
mail_default" style=3D"font-family:monospace,monospace">simply handing over=
 huge RE&#39;s to my=C2=A0new hire was</div><div class=3D"gmail_default" st=
yle=3D"font-family:monospace,monospace">a daunting task to that person. I w=
rote that stuff</div><div class=3D"gmail_default" style=3D"font-family:mono=
space,monospace">that way to help remind me and anyone who might</div><div =
class=3D"gmail_default" style=3D"font-family:monospace,monospace">use the p=
ython program.</div><div class=3D"gmail_default" style=3D"font-family:monos=
pace,monospace"><br></div><div class=3D"gmail_default" style=3D"font-family=
:monospace,monospace">I don&#39;t claim success. It does help me.</div><div=
 class=3D"gmail_default" style=3D"font-family:monospace,monospace"><br></di=
v><div class=3D"gmail_default" style=3D"font-family:monospace,monospace">Wh=
en you say &#39;{1}&#39; is redundant, I think I did that</div><div class=
=3D"gmail_default" style=3D"font-family:monospace,monospace">to avoid any p=
ossibility of conflicts with the</div><div class=3D"gmail_default" style=3D=
"font-family:monospace,monospace">next string that is</div><div class=3D"gm=
ail_default" style=3D"font-family:monospace,monospace">concatentated to the=
 <b>Y_</b> (e.g. &#39;*&#39; or &#39;+&#39; or &#39;{4,7}&#39;).</div><div =
class=3D"gmail_default" style=3D"font-family:monospace,monospace">I am emba=
rrassed I did not communicate that</div><div class=3D"gmail_default" style=
=3D"font-family:monospace,monospace">in the code. I had to think about it f=
or a couple of hours</div><div class=3D"gmail_default" style=3D"font-family=
:monospace,monospace">before I recalled the &quot;why&quot;. I will fix tha=
t.</div><div class=3D"gmail_default" style=3D"font-family:monospace,monospa=
ce"><br></div><div class=3D"gmail_default" style=3D"font-family:monospace,m=
onospace">=C2=A0 (it would be difficult to discuss</div><div class=3D"gmail=
_default" style=3D"font-family:monospace,monospace">=C2=A0 =C2=A0this RE if=
 I had to write</div><div class=3D"gmail_default" style=3D""><font face=3D"=
monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0</font><span style=3D"font=
-family:Arial,Helvetica,sans-serif">&quot;(19\d\d|20[01]\d|202&quot; + &quo=
t;[0-&quot; + lastYearRE + &quot;]&quot; + &quot;)</span></div><div class=
=3D"gmail_default" style=3D""><font face=3D"monospace">=C2=A0 =C2=A0rather =
than just </font><b style=3D""><font face=3D"arial, sans-serif">Y_</font></=
b><font face=3D"monospace">).</font></div><div class=3D"gmail_default" styl=
e=3D""><font face=3D"monospace"><br></font></div><div class=3D"gmail_defaul=
t" style=3D""><font face=3D"monospace">My initial thoughts</font></div><div=
 class=3D"gmail_default" style=3D""><font face=3D"monospace">on naming were=
=C2=A0</font><span style=3D"font-family:monospace">I wanted the definition =
to be defined</span></div><div class=3D"gmail_default" style=3D""><span sty=
le=3D"font-family:monospace">in exactly one place=C2=A0</span><span style=
=3D"font-family:monospace">in the software.</span></div><div class=3D"gmail=
_default" style=3D""><span style=3D"font-family:monospace">Python and the=
=C2=A0BTL=C2=A0folks told me to never</span></div><div class=3D"gmail_defau=
lt" style=3D""><span style=3D"font-family:monospace">use a constant in=C2=
=A0code. Always name it.</span></div><div class=3D"gmail_default" style=3D"=
"><span style=3D"font-family:monospace">Hence, I gave it a name. Each name =
might</span></div><div class=3D"gmail_default" style=3D""><span style=3D"fo=
nt-family:monospace">be used in multiple places. They might be imported.</s=
pan></div><div class=3D"gmail_default" style=3D""><font face=3D"monospace">=
<br></font></div><div class=3D"gmail_default" style=3D""><font face=3D"mono=
space">You are correct, the expression is unbalanced. I tried</font></div><=
div class=3D"gmail_default" style=3D""><font face=3D"monospace">to remove t=
he text2bytes(lastYearRE<b><font size=3D"4">)</font></b>=C2=A0call so the e=
xpression in</font></div><div class=3D"gmail_default" style=3D""><font face=
=3D"monospace">this email was all text. I failed to remove the trailing=C2=
=A0<b><font size=3D"4">)</font></b>=C2=A0when</font></div><div class=3D"gma=
il_default" style=3D""><font face=3D"monospace">I removed the call to text2=
bytes(). My hasty transcriptions</font></div><div class=3D"gmail_default" s=
tyle=3D""><font face=3D"monospace">might have produced similar errors in my=
 email.</font></div><div class=3D"gmail_default" style=3D""><font face=3D"m=
onospace"><br></font></div><div class=3D"gmail_default" style=3D""><font fa=
ce=3D"monospace">Recall, my focus was=C2=A0</font><span style=3D"font-famil=
y:monospace">on any file of any size.</span></div><div class=3D"gmail_defau=
lt" style=3D""><span style=3D"font-family:monospace">I&#39;m on Windows 10 =
and an=C2=A0m1 MacBook.</span></div><div class=3D"gmail_default" style=3D""=
><font face=3D"monospace">Python works on both. I don&#39;t have</font></di=
v><div class=3D"gmail_default" style=3D""><font face=3D"monospace">a Linux =
machine or enough=C2=A0desktop space to</font></div><div class=3D"gmail_def=
ault" style=3D""><font face=3D"monospace">host one. I&#39;m also mildly fed=
-up with</font></div><div class=3D"gmail_default" style=3D""><font face=3D"=
monospace">virtual machines.=C2=A0</font></div><div class=3D"gmail_default"=
 style=3D""><font face=3D"monospace"><br></font></div><div class=3D"gmail_d=
efault" style=3D""><font face=3D"monospace">Friedl taught me one thing. Mos=
t</font></div><div class=3D"gmail_default" style=3D""><font face=3D"monospa=
ce">RE implementations are different. I&#39;m trying</font></div><div class=
=3D"gmail_default" style=3D""><font face=3D"monospace">to write a program=
=C2=A0that I could give</font></div><div class=3D"gmail_default" style=3D""=
><font face=3D"monospace">to anyone and could reliably find a date (an RE) =
in</font></div><div class=3D"gmail_default" style=3D""><font face=3D"monosp=
ace">any file. YYYY, MM, DD, HR, MI, SE, TH are words</font></div><div clas=
s=3D"gmail_default" style=3D""><font face=3D"monospace">my user could use i=
n the command line or in</font></div><div class=3D"gmail_default" style=3D"=
"><font face=3D"monospace">an options dialog. LAT and LON might also be</fo=
nt></div><div class=3D"gmail_default" style=3D""><font face=3D"monospace">p=
ossibilities. CST, EST, MST, PST, ... also.</font></div><div class=3D"gmail=
_default" style=3D""><font face=3D"monospace">A 500 gigabyte archive or dir=
ectory/folder</font></div><div class=3D"gmail_default" style=3D""><font fac=
e=3D"monospace">of pictures and movies would be</font></div><div class=3D"g=
mail_default" style=3D""><font face=3D"monospace">a great test target.</fon=
t></div><div class=3D"gmail_default" style=3D""><font face=3D"monospace"><b=
r></font></div><div class=3D"gmail_default" style=3D""><font face=3D"monosp=
ace">I very much appreciate your comments. If this</font></div><div class=
=3D"gmail_default" style=3D""><font face=3D"monospace">discussion is boring=
 to others, I would be happy</font></div><div class=3D"gmail_default" style=
=3D""><font face=3D"monospace">to take it to emails.</font></div><div class=
=3D"gmail_default" style=3D""><font face=3D"monospace"><br></font></div><di=
v class=3D"gmail_default" style=3D""><font face=3D"monospace">I like your p=
rogram. My experience</font></div><div class=3D"gmail_default" style=3D""><=
font face=3D"monospace">with RE, grep, python, and sed suggests that</font>=
</div><div class=3D"gmail_default" style=3D""><font face=3D"monospace">anyt=
hing but gnu grep and sed might not work due to the</font></div><div class=
=3D"gmail_default" style=3D""><font face=3D"monospace">different implementa=
tions.</font></div><div class=3D"gmail_default" style=3D""><font face=3D"mo=
nospace"><br></font></div><div class=3D"gmail_default" style=3D""><font fac=
e=3D"monospace">I&#39;ve been out of the=C2=A0</font><span style=3D"font-fa=
mily:monospace">Unix software business</span></div><div class=3D"gmail_defa=
ult" style=3D""><span style=3D"font-family:monospace">for 30 years after st=
arting work at BTL in the 1970s</span></div><div class=3D"gmail_default" st=
yle=3D""><span style=3D"font-family:monospace">and working on Version 6. I =
didn&#39;t know &quot;printf&quot; was now</span></div><div class=3D"gmail_=
default" style=3D""><span style=3D"font-family:monospace">built into bash! =
That was a surprise. It&#39;s an incremental</span></div><div class=3D"gmai=
l_default" style=3D""><span style=3D"font-family:monospace">improvement, bu=
t doesn&#39;t compare with f-strings in python.</span></div><div class=3D"g=
mail_default" style=3D""><span style=3D"font-family:monospace"><i><font siz=
e=3D"1">The interactive interpreter for python should have</font></i></span=
></div><div class=3D"gmail_default" style=3D""><span style=3D"font-family:m=
onospace"><i><font size=3D"1">a &quot;bash&quot; mode?!</font></i></span></=
div><div class=3D"gmail_default" style=3D""><span style=3D"font-family:mono=
space"><br></span></div><div class=3D"gmail_default" style=3D""><font face=
=3D"monospace">Does grep use a memory mapped file for its search, thereby</=
font></div><div class=3D"gmail_default" style=3D""><font face=3D"monospace"=
>avoiding all buffering boundaries? That too, would</font></div><div class=
=3D"gmail_default" style=3D""><font face=3D"monospace">be new information t=
o me. The additional complexity</font></div><div class=3D"gmail_default" st=
yle=3D""><font face=3D"monospace">of dealing with buffering is more than an=
noying.</font></div><div class=3D"gmail_default" style=3D""><font face=3D"m=
onospace"><br></font></div><div class=3D"gmail_default" style=3D""><font fa=
ce=3D"monospace">Do you have any thoughts on how to verify</font></div><div=
 class=3D"gmail_default" style=3D""><font face=3D"monospace">a program that=
 uses RE&#39;s. I&#39;ve given no thought</font></div><div class=3D"gmail_d=
efault" style=3D""><font face=3D"monospace">until now. My first thought for=
 dates would be</font></div><div class=3D"gmail_default" style=3D""><font f=
ace=3D"monospace">to write a separate module that simply searched</font></d=
iv><div class=3D"gmail_default" style=3D""><font face=3D"monospace">through=
 the file looking for 4 numbers in a row</font></div><div class=3D"gmail_de=
fault" style=3D""><font face=3D"monospace">without using RE&#39;s, recordin=
g the offsets and 16 characters</font></div><div class=3D"gmail_default" st=
yle=3D""><font face=3D"monospace">after and 1 character before in a python =
list of (offset,str)</font></div><div class=3D"gmail_default" style=3D""><f=
ont face=3D"monospace">of tuples,=C2=A0</font><span style=3D"font-family:ar=
ial,sans-serif;font-style:italic">ddddList</span><span style=3D"font-family=
:arial,sans-serif;font-style:italic">,=C2=A0</span><font face=3D"monospace"=
>and using=C2=A0</font><font face=3D"arial, sans-serif"><i>dddd</i></font><=
i style=3D"font-family:arial,sans-serif">List</i></div><div class=3D"gmail_=
default" style=3D""><span style=3D"font-family:monospace">as a proxy for th=
e entire file. I could then</span></div><div class=3D"gmail_default" style=
=3D""><font face=3D"monospace">aim my RE&#39;s at </font><font face=3D"aria=
l, sans-serif"><i><b>ddddList</b></i></font><font face=3D"monospace">. </fo=
nt><b style=3D""><font face=3D"comic sans ms, sans-serif">[A list of tuples=
 in python</font></b></div><div class=3D"gmail_default" style=3D""><b style=
=3D""><font face=3D"comic sans ms, sans-serif">is wonderful! !]</font></b><=
font face=3D"monospace"> It seems to me &#39;*&#39; and &#39;+&#39; and {x,=
y} are the performance</font></div><div class=3D"gmail_default" style=3D"">=
<font face=3D"monospace">hogs in RE&#39;s. My RE&#39;s avoid them. One pass=
, I think, should</font></div><div class=3D"gmail_default" style=3D""><font=
 face=3D"monospace">suffice. What do you think? I haven&#39;t &quot;archive=
d&quot; my 350 GB</font></div><div class=3D"gmail_default" style=3D""><font=
 face=3D"monospace">of pictures and movies, but one pass over all files the=
rein</font></div><div class=3D"gmail_default" style=3D""><font face=3D"mono=
space">ought to suffice, right? Two different programs that use different</=
font></div><div class=3D"gmail_default" style=3D""><font face=3D"monospace"=
>algorithms should be pretty good proof of correctness wouldn&#39;t</font><=
/div><div class=3D"gmail_default" style=3D""><font face=3D"monospace">you t=
hink?</font></div><div class=3D"gmail_default" style=3D""><font face=3D"mon=
ospace"><br></font></div><div class=3D"gmail_default" style=3D""><font face=
=3D"monospace">My RE&#39;s have no stars or pluses. If there is a mismatch =
before</font></div><div class=3D"gmail_default" style=3D""><font face=3D"mo=
nospace">a match, give up and move on.</font></div><div class=3D"gmail_defa=
ult" style=3D""><font face=3D"monospace"><br></font></div><div class=3D"gma=
il_default" style=3D""><font face=3D"monospace">On my Windows 10 machine, I=
 have cygwin.</font></div><div class=3D"gmail_default" style=3D""><font fac=
e=3D"monospace">Microsoft says my CPU doesn&#39;t have a TPM and</font></di=
v><div class=3D"gmail_default" style=3D""><font face=3D"monospace">the spec=
ific Intel Core I7 on my system is not</font></div><div class=3D"gmail_defa=
ult" style=3D""><font face=3D"monospace">supported so Windows 11 is not hap=
pening.</font></div><div class=3D"gmail_default" style=3D""><font face=3D"m=
onospace">Microsoft is DOS personified.</font></div><div class=3D"gmail_def=
ault" style=3D""><font face=3D"monospace">=C2=A0(An unkind editorial remark=
 about the low</font></div><div class=3D"gmail_default" style=3D""><font fa=
ce=3D"monospace">=C2=A0 quality of software coming from Microsoft.)</font><=
/div><div class=3D"gmail_default" style=3D""><font face=3D"monospace"><br><=
/font></div><div class=3D"gmail_default"><font face=3D"monospace">Anyway, I=
 thank you again for your patience with me</font></div><div class=3D"gmail_=
default"><font face=3D"monospace">and your observations. I value your views=
 and the</font></div><div class=3D"gmail_default"><font face=3D"monospace">=
other views I&#39;ve seen here on <a href=3D"mailto:coff@tuhs.org">coff@tuh=
s.org</a>.</font></div><div class=3D"gmail_default"><font face=3D"monospace=
"><br></font></div><div class=3D"gmail_default"><font face=3D"monospace">I =
welcome all input to my education and will share</font></div><div class=3D"=
gmail_default"><font face=3D"monospace">all I have done so far with anyone =
who wants to</font></div><div class=3D"gmail_default"><font face=3D"monospa=
ce">collaborate, test, or is just curious.</font></div><div class=3D"gmail_=
default"><font face=3D"monospace"><br></font></div><div class=3D"gmail_defa=
ult"><font face=3D"monospace">=C2=A0 =C2=A0 GOAL: run python program from a=
n at-cost thumb drive that:</font></div><div class=3D"gmail_default"><font =
face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 reaps all media files=
 from a user specified</font></div><div class=3D"gmail_default"><font face=
=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 directory/folder tree and=
</font></div><div class=3D"gmail_default"><font face=3D"monospace"><br></fo=
nt></div><div class=3D"gmail_default"><font face=3D"monospace">=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 Adds files to the thumb drive.</font></div><div cl=
ass=3D"gmail_default"><font face=3D"monospace"><br></font></div><div class=
=3D"gmail_default"><div class=3D"gmail_default"><font face=3D"monospace">=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <b>Adds files</b> means</font></div><div=
 class=3D"gmail_default"><font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 Original file system is untouched</font></div><div class=
=3D"gmail_default"><font face=3D"monospace"><br></font></div><div class=3D"=
gmail_default"><font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 Adds only unique files (hash codes are unique)</font></div><div clas=
s=3D"gmail_default"><font face=3D"monospace"><br></font></div><div class=3D=
"gmail_default"><font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 Creates on the thumb drive a relative directory</font></div><div cl=
ass=3D"gmail_default"><font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 wherein the=C2=A0</font><span style=3D"font-family:mon=
ospace">original=C2=A0file was found</span></div><div class=3D"gmail_defaul=
t"><span style=3D"font-family:monospace"><br></span></div><div class=3D"gma=
il_default"><font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 Prepends a &quot;YYYY-MM-DD-&quot; string to the filename</font></div><=
div class=3D"gmail_default"><font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if one can be found (EXIF is great shortcut).</=
font></div><div class=3D"gmail_default"><font face=3D"monospace"><br></font=
></div><div class=3D"gmail_default"><font face=3D"monospace">=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Copies</font></div><div class=3D"gmail_default"=
><font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 srcroot/relative_path/oldfilename</font></div><=
div class=3D"gmail_default"><font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 to</font></div><div class=3D"gmail_default"><fo=
nt face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0thumbdrive/relative_path/YYYY-MM-DD-oldfilename</font></di=
v><div class=3D"gmail_default"><font face=3D"monospace">=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0or</font></div><=
div class=3D"gmail_default"><font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0thumbdrive/relative_path/00=
00-oldfilename.</font></div></div><div class=3D"gmail_default"><font face=
=3D"monospace"><br></font></div><div class=3D"gmail_default"><font face=3D"=
monospace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Can also incrementally add ne=
w files by just</font></div><div class=3D"gmail_default"><font face=3D"mono=
space">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 scanning anywhere in any o=
ther computer</font></div><div class=3D"gmail_default"><font face=3D"monosp=
ace">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 file system or any other com=
puter.</font></div><div class=3D"gmail_default"><font face=3D"monospace"><b=
r></font></div><div class=3D"gmail_default"><font face=3D"monospace">=C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Must work on Mac, Windows, and Linux</font></d=
iv><div class=3D"gmail_default"><font face=3D"monospace"><br></font></div><=
div class=3D"gmail_default"><font face=3D"monospace">What I have is a worki=
ng prototype. It works</font></div><div class=3D"gmail_default"><font face=
=3D"monospace">on Mac and Windows. It doesn&#39;t do the</font></div><div c=
lass=3D"gmail_default"><font face=3D"monospace">date thing very well, and t=
here are other shortcomings.</font></div><div class=3D"gmail_default"><font=
 face=3D"monospace"><br></font></div><div class=3D"gmail_default"><font fac=
e=3D"monospace">I have delivered exactly=C2=A0one Christmas present to my f=
avorite person</font></div><div class=3D"gmail_default"><font face=3D"monos=
pace">in the world - a 400 GB SSD drive with all our pictures and media</fo=
nt></div><div class=3D"gmail_default"><font face=3D"monospace">we have ever=
 taken. The next things are=C2=A0</font><span style=3D"font-family:monospac=
e">to </span><b style=3D"font-family:monospace">add </b><span style=3D"font=
-family:monospace">more media</span></div><div class=3D"gmail_default"><spa=
n style=3D"font-family:monospace">and</span><span style=3D"font-family:mono=
space">=C2=A0<b>re-unique-ify</b>=C2=A0</span><span style=3D"font-family:mo=
nospace">(check) what is already present on the SSD drive</span></div><div =
class=3D"gmail_default"><span style=3D"font-family:monospace">and=C2=A0=C2=
=A0<b>improve=C2=A0the proper choice of &quot;YYYY-MM-DD-&quot; prefix</b> =
to</span></div><div class=3D"gmail_default"><span style=3D"font-family:mono=
space">filenames.</span></div><div class=3D"gmail_default" style=3D""><br><=
/div><div class=3D"gmail_default" style=3D""><font face=3D"monospace">I am =
retired and this is fun.</font></div><div class=3D"gmail_default" style=3D"=
"><font face=3D"monospace">I&#39;m too old to want to get rich.</font></div=
><div class=3D"gmail_default" style=3D""><font face=3D"monospace"><br></fon=
t></div><div class=3D"gmail_default" style=3D""><font face=3D"monospace">Ed=
 Bradford</font></div><div class=3D"gmail_default" style=3D""><font face=3D=
"monospace">Pflugerville, TX</font></div><div class=3D"gmail_default" style=
=3D""><font face=3D"monospace"><a href=3D"mailto:egbegb2@gmail.com">egbegb2=
@gmail.com</a></font></div><div class=3D"gmail_default" style=3D""><br></di=
v><div class=3D"gmail_default" style=3D""><br></div><div class=3D"gmail_def=
ault" style=3D""><br></div><div class=3D"gmail_default" style=3D""><font fa=
ce=3D"monospace"><br></font></div></div><br><div class=3D"gmail_quote"><div=
 dir=3D"ltr" class=3D"gmail_attr">On Tue, Mar 7, 2023 at 5:40=E2=80=AFAM Ra=
lph Corderoy &lt;<a href=3D"mailto:ralph@inputplus.co.uk">ralph@inputplus.c=
o.uk</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1=
ex">Hi Ed,<br>
<br>
&gt; I have made an attempt=C2=A0to make my RE stuff readable and supportab=
le.<br>
<br>
Readable to you, which is fine because you&#39;re the prime future reader.<=
br>
But it&#39;s less readable than the regexp to those that know and read them=
<br>
because of the indirection introduced by the variables.=C2=A0 You&#39;ve cr=
eated<br>
your own little language of CAPITALS rather than the lingua franca of<br>
regexps.=C2=A0 :-)<br>
<br>
&gt; Machine language was unreadable and then along came assembly=C2=A0lang=
uage.<br>
&gt; Assembly=C2=A0language was unreadable, then came higher level language=
s.<br>
<br>
Each time the original language was readable because practitioners had<br>
to read and write it.=C2=A0 When its replacement came along, the old skill<=
br>
was no longer learnt and the language became =E2=80=98unreadable=E2=80=99.<=
br>
<br>
&gt; So far, I can do that for this RE program that works for small files,<=
br>
&gt; large files, binary files and text files for exactly one pattern:<br>
&gt; =C2=A0 =C2=A0 YYYY[-MM-DD]<br>
&gt; I constructed this RE with code like this:<br>
&gt;=C2=A0 =C2=A0 =C2=A0# ymdt is YYYY-MM-DD RE in text.<br>
&gt;=C2=A0 =C2=A0 =C2=A0# looking only for 1900s and 2000s years and no lat=
er than today.<br>
&gt;=C2=A0 =C2=A0 =C2=A0_YYYY =3D=C2=A0&quot;(19\d\d|20[01]\d|202&quot; + &=
quot;[0-&quot; + lastYearRE)=C2=A0+ &quot;]&quot; + &quot;){1}&quot;<br>
<br>
=E2=80=98{1}=E2=80=99 is redundant.<br>
<br>
&gt;=C2=A0 =C2=A0 =C2=A0# months<br>
&gt;=C2=A0 =C2=A0 =C2=A0_MM=C2=A0 =C2=A0=3D &quot;(0[1-9]|1[012])&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0# days<br>
&gt;=C2=A0 =C2=A0 =C2=A0_DD=C2=A0 =C2=A0=3D &quot;(0[1-9]|[12]\d|3[01])&quo=
t;<br>
&gt;=C2=A0 =C2=A0 =C2=A0ymdt =3D _YYYY=C2=A0+ &#39;[&#39; + _INTERNALSEP=C2=
=A0+<br>
&gt;=C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0_MM=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 +<br>
&gt;=C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0_INTERNALSEP=C2=A0+<br>
&gt;=C2=A0 =C2=A0 =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0&#39;]&#39;{0,1)<br>
<br>
I think we&#39;re missing something as the =E2=80=98&#39;[&#39;=E2=80=99 is=
 starting a character<br>
class which is odd for wrapping the month and the =E2=80=98{0,1)=E2=80=99 d=
oesn&#39;t have<br>
matching brackets and is outside the string.<br>
<br>
BTW, =E2=80=98{0,1}=E2=80=99 is more readable to those who know regexps as =
=E2=80=98?=E2=80=99.<br>
<br>
&gt; For the whole file, RE I used<br>
&gt;=C2=A0 =C2=A0 =C2=A0ymdthf =3D _FRSEP=C2=A0+ ymdt=C2=A0+ _BASEP<br>
&gt; where FRSEP=C2=A0is front separator which includes<br>
&gt; a bunch of possible=C2=A0separators, excluding numbers and letters, or=
-ed<br>
&gt; with the up arrow &quot;beginning of line&quot; RE mark.<br>
<br>
It sounds like you&#39;re wanting a word boundary; something provided by<br=
>
regexps.=C2=A0 In Python, it&#39;s =E2=80=98\b=E2=80=99.<br>
<br>
=C2=A0 =C2=A0 &gt;&gt;&gt; re.search(r&#39;\bfoo\b&#39;, &#39;endfoo foosta=
rt foo ends&#39;),<br>
=C2=A0 =C2=A0 (&lt;re.Match object; span=3D(16, 19), match=3D&#39;foo&#39;&=
gt;,)<br>
<br>
Are you aware of the /x modifier to a regexp which ignores internal<br>
whitespace, including linefeeds?=C2=A0 This allows a large regexp to be spl=
it<br>
over lines.=C2=A0 There&#39;s a comment syntax too.=C2=A0 See<br>
<a href=3D"https://docs.python.org/3/library/re.html#re.X" rel=3D"noreferre=
r" target=3D"_blank">https://docs.python.org/3/library/re.html#re.X</a><br>
<br>
GNU grep isn&#39;t too shabby at looking through binary files.=C2=A0 I can&=
#39;t use<br>
/x with grep so in a bash script, I&#39;d do it manually.=C2=A0 \&lt; and \=
&gt; match<br>
the start and end of a word, a bit like Python&#39;s \b.<br>
<br>
=C2=A0 =C2=A0 re=3D&#39;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 .?\&lt;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (19[0-9][0-9]|20[01][0-9]|202[0-3=
])<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ([-:._])<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (0[1-9]|1[0-2])<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 \3<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (0[1-9]|[12][0-9]|3=
[01])<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 )?<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 \&gt;.?<br>
=C2=A0 =C2=A0 &#39;<br>
=C2=A0 =C2=A0 re=3D${re//$&#39;\n&#39;/}<br>
=C2=A0 =C2=A0 re=3D${re// /}<br>
<br>
=C2=A0 =C2=A0 printf &#39;%s\n&#39; 2001-04-01,1999_12_31 1944.03.01,1914! =
2000-01.01 &gt;big-binary-file<br>
=C2=A0 =C2=A0 LC_ALL=3DC grep -Eboa &quot;$re&quot; big-binary-file | sed -=
n l<br>
<br>
which gives<br>
<br>
=C2=A0 =C2=A0 0:2001-04-01,$<br>
=C2=A0 =C2=A0 11:1999_12_31$<br>
=C2=A0 =C2=A0 22:1944.03.01,$<br>
=C2=A0 =C2=A0 33:1914!$<br>
=C2=A0 =C2=A0 39:2000-$<br>
<br>
showing:<br>
<br>
- the byte offset within the file of each match,<br>
- along with the any before and after byte if it&#39;s not a \n and not<br>
=C2=A0 already matched, just to show the word-boundary at work,<br>
- with any non-printables escaped into octal by sed.<br>
<br>
&gt; I thought I was on the COFF mailing list.<br>
<br>
I&#39;m sending this to just the list.<br>
<br>
&gt; I received this email by direct mail to from Larry.<br>
<br>
Perhaps your account on the list is configured to not send you an email<br>
if it sees your address in the header&#39;s fields.<br>
<br>
-- <br>
Cheers, Ralph.<br>
</blockquote></div><br clear=3D"all"><div><br></div><span class=3D"gmail_si=
gnature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature"><f=
ont face=3D"&#39;courier new&#39;, monospace"><span style=3D"font-weight:90=
0"><div>Advice is judged by results, not by intentions.</div><div>=C2=A0 Ci=
cero</div></span></font><div><br></div></div>

--0000000000003acf5505f661c282--