From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [IPv6:2600:3c01:e000:146::1]) by inbox.vuxu.org (Postfix) with ESMTP id 55E27266B1 for ; Thu, 23 May 2024 15:49:55 +0200 (CEST) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 1364443B70; Thu, 23 May 2024 23:49:46 +1000 (AEST) Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com [IPv6:2607:f8b0:4864:20::932]) by minnie.tuhs.org (Postfix) with ESMTPS id 065A943B37 for ; Thu, 23 May 2024 23:49:36 +1000 (AEST) Received: by mail-ua1-x932.google.com with SMTP id a1e0cc1a2514c-8031c5bfe3bso806153241.0 for ; Thu, 23 May 2024 06:49:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dartmouth.edu; s=google1; t=1716472175; x=1717076975; darn=tuhs.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=YUeiDDCtjk4Iqs673s292uaV0SGDe0ar8j406ctd/d4=; b=AqzIWZw+LyQYQROQ+r1ldzCsquMdiGu26odp3fxRe3ltjlNhdoc6UnGTLnO9SAw3CA OlYLGD76ZPWSONlMcz9J+5kqWN/OFdYMjvWr+9DjMJyJNhczAqem4xnnogz4J5MhoWPk q2L5f0qTHI9l0qEOR7QsIJvx30LFpgypFfkm6+Eq07eZ09IGqpBCbue51j82n3r/9go4 kJoov2agxhfnTalFe/bXFd7t+kPG4SXBF/OmA0n5fVy84g6czANnrK4Ok4ZG9LqW4euM jFYr+rfxTrKGG/U+Tmwlfxz7hBCRDC+xe1nfIRuEgFU/o5CyeYxTaYIM+H6vfewpnNuO Cdxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716472175; x=1717076975; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=YUeiDDCtjk4Iqs673s292uaV0SGDe0ar8j406ctd/d4=; b=ZgyYPSQxDEg1Zddwx2mEJAp88fkdkcL+TS0+Fc9QWKVTj9un8p2nkmc8OOSj75upfZ 8Zy+03Ob3n6a9ODZOhbNCb6485UtIo+9DvEgLZ3Xk/ilu18rc+TaSE4/LNiLg+tpJxXf NFxQG/GGibM9uc8XsSeUcIuvZAd3yrZUYxgR2ZueT/bRUxSACloiKjVdZf7uhXqMsCGG BMN0+6hCgVtT/h4YAC+zPV5lzyi/jf7BKD1wpDiyeK1ip+RJZNSo+rDrQ2n4CMIEiZun n47czx/xLa1alD8xcc1fB9gUX6u3h861EQaGzuKpM+GzAmYU34/MUCz/1eNk0zkukUqL Ekiw== X-Gm-Message-State: AOJu0YxOxcuQ2MZ7L074VXW309N7qAbF519XLN5vFrAOWJxUBTYzvMxL sqhOOPbTAnT/MzlPoJYVLk7hX8eQGSsks4XJ4pYK21QuYBt4AVfuumFwAUbkE8M7VVtn+sqBoVX AWamhFYHTL9GmIfZxu9vYyCnB0XpdlDbpXw2j8jeDiwm7D39IhQw= X-Google-Smtp-Source: AGHT+IH52iO8d9Q94rATreqJ12gpKQaJmNUAzKk2XIx++F0oMGKYgvT0oawkjaHq4NLgCRLqBKdszZczauCguuOgUCs= X-Received: by 2002:a05:6102:390d:b0:480:70d2:34c9 with SMTP id ada2fe7eead31-48a2bc73378mr2143380137.15.1716472174769; Thu, 23 May 2024 06:49:34 -0700 (PDT) MIME-Version: 1.0 From: Douglas McIlroy Date: Thu, 23 May 2024 09:49:18 -0400 Message-ID: To: TUHS main list Content-Type: multipart/alternative; boundary="0000000000003e934106191f523f" Message-ID-Hash: 7KQJPRT44BQPD3KQZFWTGUAZBYG3GDDH X-Message-ID-Hash: 7KQJPRT44BQPD3KQZFWTGUAZBYG3GDDH X-MailFrom: douglas.mcilroy@dartmouth.edu X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: A fuzzy awk List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --0000000000003e934106191f523f Content-Type: text/plain; charset="UTF-8" > Doug McIlroy was generating random regular expressions Actually not. I exhaustively (within limits) tested an RE recognizer without knowingly generating any RE either mechanically or by hand. The trick: From recursive equations (easily derived from the grammar of REs), I counted how many REs exist up to various limits on token counts, Then I generated all strings that satisfied those limits, turned the recognizer loose on them and counted how many it accepted. Any disagreement of counts revealed the existence (but not any symptom) of bugs. Unlike most diagnostic techniques, this scheme produces a certificate of (very high odds on) correctness over a representative subdomain. The scheme also agnostically checks behavior on bad inputs as well as good. It does not, however, provide a stress test of a recognizer's capacity limits. And its exponential nature limits its applicability to rather small domains. (REs have only 5 distinct kinds of token.) Doug --0000000000003e934106191f523f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
> Doug McIlroy was generating random regular expres= sions

Actually no= t. I exhaustively (within limits) tested an RE recognizer without knowingly= generating any RE either mechanically or by hand.

The trick: From recursive equations (easily derived from the grammar = of REs), I counted how many REs exist up to various limits on token counts,= Then I generated all strings that satisfied those limits, turned the recog= nizer loose on them and counted how many it accepted. Any disagreement of c= ounts revealed the existence (but not any symptom) of bugs.=C2=A0

Unlike most diagnostic techniques, this scheme produ= ces a certificate of (very high odds on) correctness over a representative = subdomain.=C2=A0The sch= eme also agnostically checks behavior on bad inputs as well as good.=C2=A0= =C2=A0It does not, howe= ver, provide a stress test of a recognizer's capacity limits.=C2=A0And its exponential nature = limits its applicability to rather small domains. (REs have only 5 distinct= kinds of token.)

Dou= g
--0000000000003e934106191f523f--