From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 22426 invoked from network); 7 Mar 2023 16:15:40 -0000 Received: from minnie.tuhs.org (50.116.15.146) by inbox.vuxu.org with ESMTPUTF8; 7 Mar 2023 16:15:40 -0000 Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 48113411F6; Wed, 8 Mar 2023 02:15:37 +1000 (AEST) Received: from mail-lj1-x229.google.com (mail-lj1-x229.google.com [IPv6:2a00:1450:4864:20::229]) by minnie.tuhs.org (Postfix) with ESMTPS id 5235C411F4 for ; Wed, 8 Mar 2023 02:15:28 +1000 (AEST) Received: by mail-lj1-x229.google.com with SMTP id g18so13733395ljl.3 for ; Tue, 07 Mar 2023 08:15:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678205726; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=C0OgD9Hhr1EUD3u9SKqHAxCzzorPzdh6v7f+04/H40Y=; b=A1sXCcq8snPO9V2XwU0CDtvEUkiIfBkSas1At9EXlvnaHCjnKq/ot5Lk/SXDZykuON kGsoru1jqn/kIQ174SYl0XKnYL5i8DdRpal9Evc9J5V+Jj/wDRfiluQIfNGAoFQpLhNP kBr6yRM8/6JsIWwcS03Yjxv1kt29m9E7UPU+d4HHAPshv9iIbSQ5HM/BEqJq1Ax30LHr tnkZ6YMzFI9vafyUAGULenkDfI3AnlZdF7RlXMTUb0hznIwShRqe6YwYY8yFvHEPtuOp IhKbLdnjxc4fTKTYwDVndjINDU2sC1QAOXsIBOja2l19JxDkRSJYAoq2H744CuWHPbf5 I4xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678205726; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C0OgD9Hhr1EUD3u9SKqHAxCzzorPzdh6v7f+04/H40Y=; b=dmV8s1B3iSajpKNg9OwYyvEYNBfKZQCaHfXh8xroUsQ/fBY6AwOBnNQpVOQWeSj/fa EJA0R+m41s6vxaNgEBGU8KMUV4iNWGToiWNUlTBcvH33yb3DjbUVvutV1vT8b+Bz54fL 5qs1X0x+kqVIDwdGeU+8LBQ5G+GpUtiqYq55D8/4FlVTORIUl45kN8jpb8beS1xOoVjc QQF8b7SW0guMvobyVxr4PIY+m09kYwFZv+8CSpp3UKKmJFjB5PnwfB03aiF0B1n+7rux Ybfreo6KRo76TpDle4I0KBXnzEHeIIc5Vcl1q3xh9rugZgd9xY5AjsK4+nZ+aOC4+Cq5 0qQQ== X-Gm-Message-State: AO0yUKV7FZ3Fjnt59SqcndR+gXcWOUGRSDKwpY/rZ6nqyKJZWixFZhD6 BbFfI0YeNACn4Gp4Z7DXps9usY3UzUowYw+9WHcN2TRPqZU= X-Google-Smtp-Source: AK7set9UZOGLEhLtITJtglkpijgeppuIwCouZu/JRR85zsVQHqMdLDU2BdjNHkzVZj5QFMTR9lSRRGpzQYpso+J4q8M= X-Received: by 2002:a2e:b988:0:b0:295:a699:8cef with SMTP id p8-20020a2eb988000000b00295a6998cefmr4459884ljp.1.1678205726100; Tue, 07 Mar 2023 08:15:26 -0800 (PST) MIME-Version: 1.0 References: <8d1de5c8-1f34-3d37-395d-0f1da7b062ec@spamtrap.tnetconsulting.net> <20230307014311.GN5398@mcvoy.com> In-Reply-To: From: Dan Cross Date: Tue, 7 Mar 2023 11:14:49 -0500 Message-ID: To: Ed Bradford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID-Hash: GDW5R6TGUEPAPNOC3DXWIBUJO7NNVA37 X-Message-ID-Hash: GDW5R6TGUEPAPNOC3DXWIBUJO7NNVA37 X-MailFrom: crossd@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Grant Taylor , COFF X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep. List-Id: Computer Old Farts Forum Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Mon, Mar 6, 2023 at 11:01=E2=80=AFPM Ed Bradford wro= te: >[snip] > I think it is possible to make a 50K RE that is understandable. However, = it requires > a lot of 'splainin' throughout the code. I'm naive though; I will eventua= lly discover > a lack of truth in that belief, if such exists. Actually, I believe you. I'm sure that with enough effort, it _is_ possible to make a 50K RE that is understandable to mere mortals. But it begs the question: why bother? The answer to that question, in my mind, shows the difference between a clever programmer and a pragmatic engineer. I submit that it's time to reach for another tool well before you get to an RE that big, and if one is still considering such a thing, one must really ask what properties of REs and the problem at hand one thinks lend itself to that as the solution. >[snip] > It sounds to me like an "optimizer" is needed. There is alreay a compiler > that uses FA's. I'm not sure what you're referring to here, though you were replying to me. There are a couple of different threads floating around: 1. Writing really big regular expressions: this is probably a bad idea. Don't do it (see below). 2. Writing a recognizer for dates. Yeah, the small REs you have for that are fine. If you want to extend those to arbitrary date formats, I think you'll find it starts getting ugly. 3. Optimizing regular expressions. You're still bound by the known theoretical properties of finite automata here. > Is someone else going to create a program > to look for dates without using regular expressions? Many people have already done so. :-) > Today, I write small-sized RE's. If I write a giant RE, there is nothing = preventing > the owner of RE world to change how they are used. For instance. Compile = your RE > and a subroutine/function is produced that performs the RE search. I'm not sure I understand what you mean. The theory here is well-understood: we know recognizers for regular languages can be built from DFAs, that run in time linear in the size of their inputs, but we also know that constructing such a DFA can be exponential in space and time, and thus impractical for many REs. We know that NDFA simulators can be built in time and space linear in the length of the RE, but that the resulting recognizers will be superlinear at runtime, proportional to the product of the length of input, number of states, and number edges between states in the state transition graph. For a very large regular expression, that's going to be a pretty big number, and even on modern CPUs won't be particularly fast. Compilation to native code won't really help you. There is no "owner of RE world" that can change that. If you can find some way to do so, I think that would qualify as a major breakthrough in computer science. > RE is a language, not necessarily an implementation. > At least that is my understanding. Regular expressions describe regular languages, but as I mentioned above, the theory gives the currently understood bounds for their performance characteristics. It's kinda like the speed of light in this regard; we can't really make it go faster. - Dan C.