From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 14680 invoked from network); 6 Mar 2023 21:02:40 -0000 Received: from minnie.tuhs.org (50.116.15.146) by inbox.vuxu.org with ESMTPUTF8; 6 Mar 2023 21:02:40 -0000 Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 8DA7941221; Tue, 7 Mar 2023 07:02:37 +1000 (AEST) Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by minnie.tuhs.org (Postfix) with ESMTPS id 406E941220 for ; Tue, 7 Mar 2023 07:02:31 +1000 (AEST) Received: by mail-lf1-x12b.google.com with SMTP id t11so14554272lfr.1 for ; Mon, 06 Mar 2023 13:02:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678136549; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kcOcg8Q9XED7x7y+dvrByxXeq3tN3rvQf9NMon3GCTw=; b=kjxlI9TDhAkvqucuOYZo2vVSKhSaK0eG5ThNpAbHupc8LFbPoddHgDUfYhOfJVl62O vwdsFvx+zagEpXD5jC7wQksG9omFDSX/60o97sx0MNskJ3ZS0CKg/PLOhw7tlaXxXFH0 yWIjhAaPUTTLJ4P7AQbLAeSEfW51C6oiaK21bIQBL9dWzRcVJ8/39XUP2bGjh+RcTOCX PbgJGsaeElYkkWM4bSBCn7p9Iwq4Al9EU/Im9FARJDAZQyxAATOoD52g7GatTtOa0FEw +eZ2Zzt2YVpLnfenF8Fr43EusljfCzm0MiKp4axWEx9Ky59tfF/z/OvNmtrnisGl/+Mc +dMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678136549; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kcOcg8Q9XED7x7y+dvrByxXeq3tN3rvQf9NMon3GCTw=; b=YUm1C4ekiKIEAYUWU4DHPgXdXozg0BvhxU2EgnsdChZ8bT1PDiI7sp0Zi+X46Oc627 P/RY4CN6KReoEVXSP9OS/zDxcjmO+Ix9NfWtCjoE7qgMvG1wa/21SRHq6Fs6qLyWjHPV 4ckJwBgJGX7H4Hf7DFOXfeDEp9TPLlgTGzgfWeHNOesixhM15K8p9sh9sq6HPtr8Xedi 8iat9iM4CaI0kdNVPXk3RspUuVHMKaLl2VV2zuCmp01kUDJ8LwGfUjv6X+r7whZGsPNw VS63Jrk/pWOfG/ZQcfqlds2TJgB1cXwRt2VjbvwiscLrPk72VqqWEOFvi7gPeX+DjBsQ Uu2g== X-Gm-Message-State: AO0yUKVfAGxDPkkWSJeqjLiHTQEoPdALLjoYqpcXY2v5E8gTKXIN7BX3 P03v8T7rO0Ih/jvBXb4RucyGJb0nLEt3vDNPPYQ= X-Google-Smtp-Source: AK7set+LJoepCjZna/eKu4kXMb4f4EKwLZctXeMYRg0Lvl2Vk+bjlp4vQFJNEtZGeupM7JunHnIOEjAPDqGYkqNusuA= X-Received: by 2002:ac2:53aa:0:b0:4dc:807a:d140 with SMTP id j10-20020ac253aa000000b004dc807ad140mr3624577lfh.10.1678136548578; Mon, 06 Mar 2023 13:02:28 -0800 (PST) MIME-Version: 1.0 References: <8d1de5c8-1f34-3d37-395d-0f1da7b062ec@spamtrap.tnetconsulting.net> In-Reply-To: From: Dan Cross Date: Mon, 6 Mar 2023 16:01:51 -0500 Message-ID: To: Ed Bradford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: JLADRJ3WOPCEGYDQ57LHBNZEOBE7GUWN X-Message-ID-Hash: JLADRJ3WOPCEGYDQ57LHBNZEOBE7GUWN X-MailFrom: crossd@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Grant Taylor , COFF X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep. List-Id: Computer Old Farts Forum Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, Mar 6, 2023 at 5:02=E2=80=AFAM Ed Bradford wrot= e: >[snip] > I would like to extend my program to > any date format. That would require > a much bigger RE. I have been led to > believe that a 50Kbyte or 500Kbyte > RE works just as well (if not > as fast) as a 100 byte RE. I think > with parentheses and > pipe-symbols suitably used, > one could match > > Monday, March 6, 2023 > 2023-03-06 > Mar 6, 2023 > or > ... This reminds me of something that I wanted to bring up. Perhaps one _could_ define a sufficiently rich regular expression that one could match a number of date formats. However, I submit that one _should not_. REs may be sufficiently powerful, but in all likelihood what you'll end up with is an unreadable mess; it's like people who abuse `sed` or whatever to execute complex, general purpose programs: yeah, it's a clever hack, but that doesn't mean you should do it. Pick the right tool for the job. REs are a powerful tool, but they're not the right tool for _every_ job, and I'd argue that once you hit a threshold of complexity that'll be mostly self-evident, it's time to move on to something else. As for large vs small REs.... When we start talking about differences of orders of magnitude in size, we start talking about real performance implications; in general an NDFA simulation of a regular expression will have on the order of the length of the RE in states, so when the length of the RE is half a million symbols, that's half-a-million states, which practically speaking is a pretty big number, even though it's bounded is still a pretty big number, and even on modern CPUs. I wouldn't want to poke that bear. - Dan C.