From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 28188 invoked from network); 3 Mar 2023 03:53:23 -0000 Received: from minnie.tuhs.org (50.116.15.146) by inbox.vuxu.org with ESMTPUTF8; 3 Mar 2023 03:53:23 -0000 Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 631A24328C; Fri, 3 Mar 2023 13:53:21 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tuhs.org; s=dkim; t=1677815601; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-owner:list-unsubscribe: list-subscribe:list-post; bh=AfmuQ73USI64dMjAz+P/yB/3tcIopRRcXecKDYQaRLE=; b=D/tCaGFosiHTp468GKM5MULq0u2G4FAeURooUjLdPpgvaigV8P7s0qWQHHsN+qEPpj1X3I Pdr8DCH2K7oqBekjPOlfGqg7R3TpvS8DR11NlqemT36Q7Vdj9s1yjlUfIqY5Xo7DAGbdHd +Tv4kBc/Y3pnKvW4+Ka+yL/f38qexk0= Received: from tncsrv06.tnetconsulting.net (tncsrv06.tnetconsulting.net [IPv6:2600:3c00:e000:1e9::8849]) by minnie.tuhs.org (Postfix) with ESMTPS id 149C143278 for ; Fri, 3 Mar 2023 13:53:15 +1000 (AEST) Received: from Contact-TNet-Consulting-Abuse-for-assistance by tncsrv06.tnetconsulting.net (8.15.2/8.15.2/Debian-3) with ESMTPSA id 3233rEBA005412 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Thu, 2 Mar 2023 21:53:14 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tnetconsulting.net; s=2019; t=1677815594; bh=AfmuQ73USI64dMjAz+P/yB/3tcIopRRcXecKDYQaRLE=; h=Subject:To:References:From:Message-ID:Date:User-Agent: MIME-Version:In-Reply-To:Content-Type:Cc:Content-Disposition: Content-Language:Content-Transfer-Encoding:Content-Type:Date:From: In-Reply-To:Message-ID:MIME-Version:References:Reply-To: Resent-Date:Resent-From:Resent-To:Resent-Cc:Sender:Subject:To: User-Agent; b=suVOwxT7K4xfrKyMaSbJ/0Cc13g1pI9Qt/8GEfvL+Ss0W42ClrHv+77tkEkVkGvDy mIWefqEAfsBUEYArpvjFWtzDBP2rNuSPmWcxZcAdyH20RbKMvW3LLrFig1Su3MM9/3 YnGbKNqu3lQqzSZPtIN7camWUkO7Cku1x2GQWGlQ= To: coff@tuhs.org References: <8d1de5c8-1f34-3d37-395d-0f1da7b062ec@spamtrap.tnetconsulting.net> <688396c8-7a25-5cd6-282c-49f1b13117d4@spamtrap.tnetconsulting.net> Organization: TNet Consulting Message-ID: <1519cce3-1c38-8a9c-cfdd-b39484bd163b@spamtrap.tnetconsulting.net> Date: Thu, 2 Mar 2023 20:53:08 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-256; boundary="------------ms050901000508030303000600" Message-ID-Hash: YZYGYWBHKFPBKL5PG3X3QLQDFCWNLJKX X-Message-ID-Hash: YZYGYWBHKFPBKL5PG3X3QLQDFCWNLJKX X-MailFrom: gtaylor@tnetconsulting.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep. List-Id: Computer Old Farts Forum Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Grant Taylor via COFF Reply-To: Grant Taylor This is a cryptographically signed message in MIME format. --------------ms050901000508030303000600 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 3/2/23 8:04 PM, Dan Cross wrote: > I guess what I'm saying is, match what you want to match and don't swea= t=20 > the small stuff. ACK > Not exactly. :-) >=20 > What I understand you to mean, based on this and the rest of your note,= =20 > is that you want to find a good division point between overly specific,= =20 > complex REs and simpler, easy to understand REs that are less specific.= =20 > The danger with the latter is that they may match things you don't=20 > intend, while the former are harder to maintain and (arguably) more=20 > brittle. I can sympathize. You got it. > For the purposes of grep/egrep, that'll be a logical "line" of text,=20 > terminated by a newline, though the newline itself isn't considered par= t=20 > of the text for matching. I believe the `-z` option can be used to set = a=20 > NUL byte as the "line" terminator; presumably this lets one match=20 > strings with embedded newlines, though I haven't tried. Fair enough. That's also sort of what I thought might be the case. > "string" in this context is the input you're attempting to match=20 > against. `egrep` will attempt to match your pattern against each "line"= =20 > of text it reads from the files its searching. That is, each line in=20 > your log file(s). *nod* > But consider what `[ :[:digit:]]{11}` means: you've got a character=20 > class consisting of space, colon and a digit; {11} means "match any of = > the characters in that class exactly 11 times" (as opposed to other=20 > variations on the '{}' syntax that say "at least m times", "at most n=20 > times", or "between n and m times"). Yep, I'm well aware of the that. > But that'll match all sorts of things that don't look like 'dd=20 > hh:mm:ss': That's one of the reasons that I'm interested in coming up with a more=20 precise regular expression ... without being overly complex. > (The first line is my typing; the second is output from egrep except fo= r=20 > the short line of 9 '1's, for which egrep had no output. That last two = > lines are matching space characters and egrep echoing the match, but I'= m=20 > guessing gmail will eat those.) >=20 > Note that there are inputs with more than 11 characters that match; thi= s=20 > is because there is some 11-character substring that matches the RE=C2=A0= in=20 > those lines. In any event, I suspect this would generally not be what=20 > you want. But if nothing else in your input can match the RE (which you= =20 > might know a priori because of domain knowledge about whatever is=20 > generating those logs) then it's no big deal, even if the RE was capabl= e=20 > of matching more things generally. Yep. Here's an example of the full RE: ^\w{3} [ :[:digit:]]{11} [._[:alnum:]-]+=20 postfix/msa/smtpd\[[[:digit:]]+\]: timeout after STARTTLS from=20 [._[:alnum:]-]+\[[.:[:xdigit:]]+\]$ As you can see the "[ :[:digit:]]{11}" is actually only a sub-part of a=20 larger RE and there is bounding & delimiting around the subpart. This is to match a standard message from postfix via standard SYSLOG. > Ah. I suspect this relies on domain knowledge about the format of log=20 > lines to match reliably. Otherwise it could match, `___ 123 456:789`=20 > which is probably not what you are expecting. Yep. Though said domain knowledge isn't anything special in and of itself. > Sure.=C2=A0 One nice thing about `egrep` et al is that you can put the = REs=20 > into a file and include them with `-f`, as opposed to having them all=20 > directly on the command line. Yep. logcheck makes extensive use of many files like this to do it's wor= k. > Typo.=C2=A0 :-) ACKK > That seems reasonable. Thank you for the logic CRC. > Aside: I found the note on it's website amusing: Brought to you by the = > UK's best gambling sites! "Only gamble with what you can afford to=20 > lose." Yikes! Um ... that's concerning. > I'd proceed with caution here; it also seems to be in the FreeBSD and=20 > DragonFly ports collections and Homebrew on the Mac (but so is GNU grep= =20 > for all of those). Fair enough. My use case is on Linux where GNU egrep is a thing. > Yeah. IMHO `\w` is too general for what you're trying to do. I think that `\w` is a good primer, but not where I want things to end=20 up long term. > Basically, a regular expression is a regular expression if you can buil= d=20 > a machine with no additional memory that can tell you whether or not a = > given string matches the RE examining its input one character at a time= =2E I /think/ that I could build a complex nested tree of switch statements=20 to test each character to see if things match what they should or not.=20 Though I would need at least one variable / memory to hold absolutely=20 minimal state to know where I am in the switch tree. I think a number=20 to identify the switch statement in question would be sufficient. So=20 I'm guessing two bytes of variable and uncounted bytes of program code. > I think that's about right. Thank you again Dan. > Sure thing! :-) --=20 Grant. . . . unix || die --------------ms050901000508030303000600 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC CzowggUiMIIECqADAgECAhEAw8IZWQHDVuWWKHZeojBgoDANBgkqhkiG9w0BAQsFADCBljEL MAkGA1UEBhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2Fs Zm9yZDEYMBYGA1UEChMPU2VjdGlnbyBMaW1pdGVkMT4wPAYDVQQDEzVTZWN0aWdvIFJTQSBD bGllbnQgQXV0aGVudGljYXRpb24gYW5kIFNlY3VyZSBFbWFpbCBDQTAeFw0yMjExMTQwMDAw MDBaFw0yMzExMTQyMzU5NTlaMCsxKTAnBgkqhkiG9w0BCQEWGmd0YXlsb3JAdG5ldGNvbnN1 bHRpbmcubmV0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAzOnBjTJUlBTzN81c PlYErJc9kEbTI/hXq0NA6ZoG4VM6puYTEXtITANjgX+NRwwHjldESnC8dvh6Mx5ckEk9sWoD l8Yr/dWhF3s4fGxAX5ziOeuBI/yX7rKJn6DOwclV3C6dyt3zrLB6LOiF4gA+lk/o3EbOwoPh pW2MqAywy18OIvzfmEXKdya8E/uIP4v/8AHmtakxHfmZ33Krbwh2oia69esRKc7q2i3Jh+ar Tf3PuZJETd86Sb0Lz1+3zAXcYko2/3G9O9AwtUSDvkx5IUKieG8R4a8HLwuUTBNIsJ0qOdmv 4hUjc3IsP0jN+xebTE4w7PheolE/OStiFshpKQIDAQABo4IB0zCCAc8wHwYDVR0jBBgwFoAU CcDy/AvalNtf/ivfqJlCz8ngrQAwHQYDVR0OBBYEFPUkNRFsHVlNMgaz3G4kfNa8DU4VMA4G A1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMB0GA1UdJQQWMBQGCCsGAQUFBwMEBggrBgEF BQcDAjBABgNVHSAEOTA3MDUGDCsGAQQBsjEBAgEBATAlMCMGCCsGAQUFBwIBFhdodHRwczov L3NlY3RpZ28uY29tL0NQUzBaBgNVHR8EUzBRME+gTaBLhklodHRwOi8vY3JsLnNlY3RpZ28u Y29tL1NlY3RpZ29SU0FDbGllbnRBdXRoZW50aWNhdGlvbmFuZFNlY3VyZUVtYWlsQ0EuY3Js MIGKBggrBgEFBQcBAQR+MHwwVQYIKwYBBQUHMAKGSWh0dHA6Ly9jcnQuc2VjdGlnby5jb20v U2VjdGlnb1JTQUNsaWVudEF1dGhlbnRpY2F0aW9uYW5kU2VjdXJlRW1haWxDQS5jcnQwIwYI KwYBBQUHMAGGF2h0dHA6Ly9vY3NwLnNlY3RpZ28uY29tMCUGA1UdEQQeMByBGmd0YXlsb3JA dG5ldGNvbnN1bHRpbmcubmV0MA0GCSqGSIb3DQEBCwUAA4IBAQBdVEYkwnfj7/0fx6R9ll/7 F1HeOL+Q/gzdd4bKpaY3/dkCyHVtx2dAMixzM4YGIq4rDsbhPK1MXqQAS89B786rG9XjWKgM VlgiBHir/9eQxhvX4AbQx1eJdCXNKTMJJwyIG2qlvuor/8H8//ZIjJuBgYAzW4TZREolhzVP 4g92+De1zyWW+3bESGHgx1E1+tkdvYeQATt7wkUtsEkn05MUHGAfRWt0tE3C321ajqSuFtxC VCeGvGusV8+3rw2vsqVG/mkTsmn1EAtq0jGhVgwIgQO8soFSRt/3zWibnVk1aRrXvy45WMGv an16R0/HQp8oLG3MYq++Vq6CFBbIG+9OMIIGEDCCA/igAwIBAgIQTZQsENQ74JQJxYEtOisG TzANBgkqhkiG9w0BAQwFADCBiDELMAkGA1UEBhMCVVMxEzARBgNVBAgTCk5ldyBKZXJzZXkx FDASBgNVBAcTC0plcnNleSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNUIE5ldHdvcmsx LjAsBgNVBAMTJVVTRVJUcnVzdCBSU0EgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkwHhcNMTgx MTAyMDAwMDAwWhcNMzAxMjMxMjM1OTU5WjCBljELMAkGA1UEBhMCR0IxGzAZBgNVBAgTEkdy ZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEYMBYGA1UEChMPU2VjdGlnbyBM aW1pdGVkMT4wPAYDVQQDEzVTZWN0aWdvIFJTQSBDbGllbnQgQXV0aGVudGljYXRpb24gYW5k IFNlY3VyZSBFbWFpbCBDQTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAMo87ZQK Qf/e+Ua56NY75tqSvysQTqoavIK9viYcKSoq0s2cUIE/bZQu85eoZ9X140qOTKl1HyLTJbaz Gl6nBEibivHbSuejQkq6uIgymiqvTcTlxZql19szfBxxo0Nm9l79L9S+TZNTEDygNfcXlkHK RhBhVFHdJDfqB6Mfi/Wlda43zYgo92yZOpCWjj2mz4tudN55/yE1+XvFnz5xsOFbme/SoY9W Aa39uJORHtbC0x7C7aYivToxuIkEQXaumf05Vcf4RgHs+Yd+mwSTManRy6XcCFJE6k/LHt3n dD3sA3If/JBz6OX2ZebtQdHnKav7Azf+bAhudg7PkFOTuRMCAwEAAaOCAWQwggFgMB8GA1Ud IwQYMBaAFFN5v1qqK0rPVIDh2JvAnfKyA2bLMB0GA1UdDgQWBBQJwPL8C9qU21/+K9+omULP yeCtADAOBgNVHQ8BAf8EBAMCAYYwEgYDVR0TAQH/BAgwBgEB/wIBADAdBgNVHSUEFjAUBggr BgEFBQcDAgYIKwYBBQUHAwQwEQYDVR0gBAowCDAGBgRVHSAAMFAGA1UdHwRJMEcwRaBDoEGG P2h0dHA6Ly9jcmwudXNlcnRydXN0LmNvbS9VU0VSVHJ1c3RSU0FDZXJ0aWZpY2F0aW9uQXV0 aG9yaXR5LmNybDB2BggrBgEFBQcBAQRqMGgwPwYIKwYBBQUHMAKGM2h0dHA6Ly9jcnQudXNl cnRydXN0LmNvbS9VU0VSVHJ1c3RSU0FBZGRUcnVzdENBLmNydDAlBggrBgEFBQcwAYYZaHR0 cDovL29jc3AudXNlcnRydXN0LmNvbTANBgkqhkiG9w0BAQwFAAOCAgEAQUR1AKs5whX13o6V bTJxaIwA3RfXehwQOJDI47G9FzGR87bjgrShfsbMIYdhqpFuSUKzPM1ZVPgNlT+9istp5UQN RsJiD4KLu+E2f102qxxvM3TEoGg65FWM89YN5yFTvSB5PelcLGnCLwRfCX6iLPvGlh9j30lK zcT+mLO1NLGWMeK1w+vnKhav2VuQVHwpTf64ZNnXUF8p+5JJpGtkUG/XfdJ5jR3YCq8H0OPZ kNoVkDQ5CSSF8Co2AOlVEf32VBXglIrHQ3v9AAS0yPo4Xl1FdXqGFe5TcDQSqXh3TbjugGnG +d9yZX3lB8bwc/Tn2FlIl7tPbDAL4jNdUNA7jGee+tAnTtlZ6bFz+CsWmCIb6j6lDFqkXVsp +3KyLTZGXq6F2nnBtN4t5jO3ZIj2gpIKHAYNBAWLG2Q2fG7Bt2tPC8BLC9WIM90gbMhAmtMG quITn/2fORdsNmaV3z/sPKuIn8DvdEhmWVfh0fyYeqxGlTw0RfwhBlakdYYrkDmdWC+XszE1 9GUi8K8plBNKcIvyg2omAdebrMIHiAHAOiczxX/aS5ABRVrNUDcjfvp4hYbDOO6qHcfzy/uY 0fO5ssebmHQREJJA3PpSgdVnLernF6pthJrGkNDPeUI05svqw1o5A2HcNzLOpklhNwZ+4uWY LcAi14ACHuVvJsmzNicxggQ1MIIEMQIBATCBrDCBljELMAkGA1UEBhMCR0IxGzAZBgNVBAgT EkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEYMBYGA1UEChMPU2VjdGln byBMaW1pdGVkMT4wPAYDVQQDEzVTZWN0aWdvIFJTQSBDbGllbnQgQXV0aGVudGljYXRpb24g YW5kIFNlY3VyZSBFbWFpbCBDQQIRAMPCGVkBw1bllih2XqIwYKAwDQYJYIZIAWUDBAIBBQCg ggJZMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTIzMDMwMzAz NTMwOFowLwYJKoZIhvcNAQkEMSIEIMQlyK8bdpNcvVkY/mxlD0artRCW4y8qLaAPIvB3xCvL MGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAKBggqhkiG9w0D BzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYIKoZIhvcNAwIC ASgwgb0GCSsGAQQBgjcQBDGBrzCBrDCBljELMAkGA1UEBhMCR0IxGzAZBgNVBAgTEkdyZWF0 ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2FsZm9yZDEYMBYGA1UEChMPU2VjdGlnbyBMaW1p dGVkMT4wPAYDVQQDEzVTZWN0aWdvIFJTQSBDbGllbnQgQXV0aGVudGljYXRpb24gYW5kIFNl Y3VyZSBFbWFpbCBDQQIRAMPCGVkBw1bllih2XqIwYKAwgb8GCyqGSIb3DQEJEAILMYGvoIGs MIGWMQswCQYDVQQGEwJHQjEbMBkGA1UECBMSR3JlYXRlciBNYW5jaGVzdGVyMRAwDgYDVQQH EwdTYWxmb3JkMRgwFgYDVQQKEw9TZWN0aWdvIExpbWl0ZWQxPjA8BgNVBAMTNVNlY3RpZ28g UlNBIENsaWVudCBBdXRoZW50aWNhdGlvbiBhbmQgU2VjdXJlIEVtYWlsIENBAhEAw8IZWQHD VuWWKHZeojBgoDANBgkqhkiG9w0BAQEFAASCAQAEeuXYzRq83e3Vr3WJIrNHuU5P2k6WvBM1 X7z0DIu51NEe/Z4i5IiW98eLfZHAyF6697i7M3+phYEc8Xr5ISa7k/mPv5QQy27QwHP2oDdl WuYZU3PnObBDoUXAGoRckDyCS7UuWcWXL0Z3cI2FIg4sWvXE1lCXlBXVWq89Gv3xZjrewPNL 17pwS42H74/bEjxjyuMvvCFMuOPJp00/YU368sIiq2P6DEV4StytKT2li87tas583J0TWpM/ 64um+5E3Kqml0lMUvv11CG88TWW1chf9S3GKvkppnmRbNyd05NTq5XdSkEJiR1N8nJGYPEjo sn9+/V0g9yqJLZBiExT3AAAAAAAA --------------ms050901000508030303000600--