From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tuhs-bounces@tuhs.org>
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.5 required=5.0 tests=DKIM_ADSP_CUSTOM_MED,
	DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI
	autolearn=ham autolearn_force=no version=3.4.4
Received: from minnie.tuhs.org (minnie.tuhs.org [IPv6:2600:3c01:e000:146::1])
	by inbox.vuxu.org (Postfix) with ESMTP id 522702862B
	for <ml@inbox.vuxu.org>; Thu, 23 May 2024 22:52:57 +0200 (CEST)
Received: from minnie.tuhs.org (localhost [IPv6:::1])
	by minnie.tuhs.org (Postfix) with ESMTP id 5ADC743A91;
	Fri, 24 May 2024 06:52:54 +1000 (AEST)
Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031])
	by minnie.tuhs.org (Postfix) with ESMTPS id E2762432DC
	for <tuhs@tuhs.org>; Fri, 24 May 2024 06:52:47 +1000 (AEST)
Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2bdf11888a5so723434a91.0
        for <tuhs@tuhs.org>; Thu, 23 May 2024 13:52:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1716497567; x=1717102367; darn=tuhs.org;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=Oa5tXjgTXQUkznQoY4CBvD+tsZoxMwBso3wtDc2f1JM=;
        b=GViByd0udG0/YJsvl7SdtR/dc3BFVaWlQz1FmD89Lr8M/nE2jGF/NaBWfBMSwZlW4N
         FPPe/nGXoo178WHtsU+yX9dJcBgYQYnAYvQS+uqCubfGiXmDuUZ+3Jmu/hT5DKVDNqGH
         rQ1hvwBUPNP0SxLvCNb025IVTOyWGMiWdKy76Om54no3mQCzLgckVPUgaZgz2xRW/J9J
         fWsSHWU1U7Vwx4MRvHOgunBEFYeQI8AxPEvQ2toHrRgTNf0sBXMJ145mkIi/nQKRrcPC
         nWSlKK89c+TZNDacQwiSMP3jj7DVctGHGKm8KrSiAoUSJhMxGo78t0LmlUbkd/iiRk+q
         hprQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1716497567; x=1717102367;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=Oa5tXjgTXQUkznQoY4CBvD+tsZoxMwBso3wtDc2f1JM=;
        b=u/7Gtjw/Nn3/AQGexGcFQhn9HS0pLxQUZFHwhAcVyCR+P9TgyNEw/VuPcnyl+tUIY/
         5wMXTINgnx0Oz6dPKXrsgRMHxAiV5MjQaxNiyjOl6tllR5k1KmaYo5jbIK0rs5hhUihM
         pa5aKSIt7dOOS4zDXsbkkFAvXhYDRQC3XNNKOiv2HKSA0TLxMlQAVM9mqR3tMQ4fNTtm
         ysWLZZVmgo+xYWIFuIn+A6p58nUFjFWi19QdR5buMnxPJy0vGksipnoBSpWcZHCNw1a3
         jTBC5YySM/NGFw4bzY+JXKxO+OkBhgk+pf5IbsjpX/5HqBTZV0T7Wh1Ln85LsOIHhL7B
         cdfA==
X-Gm-Message-State: AOJu0YxlC5CR+YnMeUue35vhDwpsiYJYQnNdcQiFASN9fqBBdBmtb+Ox
	8S5wKdd+goJT6pZSFb5+C+P5rMsTqy6Cpb1r2aIa1t5nFyFWyIqeaWwAS7ZZlAMNkmaHQidYPJT
	dk+3z+6PzfGCZ+NMCDkXMmFDSjNYbrw==
X-Google-Smtp-Source: AGHT+IFTn9VaBqhZ5A3y6K6xMQ+qSaTxDY48QnUx7FDLpTxOlsbbWiB0JG+s3wnCrcMAz72X9kxEJw0iOssxef6vDXU=
X-Received: by 2002:a17:90a:ead8:b0:2b2:ce88:c68c with SMTP id
 98e67ed59e1d1-2bf5e170f1bmr299035a91.19.1716497566909; Thu, 23 May 2024
 13:52:46 -0700 (PDT)
MIME-Version: 1.0
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
In-Reply-To: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
From: Rob Pike <robpike@gmail.com>
Date: Fri, 24 May 2024 06:52:35 +1000
Message-ID: <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
To: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Content-Type: multipart/alternative; boundary="000000000000bbcc6b0619253bd1"
Message-ID-Hash: BAA6OIKPXOKJ3BBZL7W3QGVCFL6XDUSQ
X-Message-ID-Hash: BAA6OIKPXOKJ3BBZL7W3QGVCFL6XDUSQ
X-MailFrom: robpike@gmail.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: TUHS main list <tuhs@tuhs.org>
X-Mailman-Version: 3.3.6b1
Precedence: list
Subject: [TUHS] Re: A fuzzy awk
List-Id: The Unix Heritage Society mailing list <tuhs.tuhs.org>
Archived-At: <https://www.tuhs.org/mailman3/hyperkitty/list/tuhs@tuhs.org/message/BAA6OIKPXOKJ3BBZL7W3QGVCFL6XDUSQ/>
List-Archive: <https://www.tuhs.org/mailman3/hyperkitty/list/tuhs@tuhs.org/>
List-Help: <mailto:tuhs-request@tuhs.org?subject=help>
List-Owner: <mailto:tuhs-owner@tuhs.org>
List-Post: <mailto:tuhs@tuhs.org>
List-Subscribe: <mailto:tuhs-join@tuhs.org>
List-Unsubscribe: <mailto:tuhs-leave@tuhs.org>

--000000000000bbcc6b0619253bd1
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

The semantic distinction is important but the end result is very similar.
"Fuzzing" as it is now called (for no reason I can intuit) tries to get to
the troublesome cases faster by a sort of depth-first search, but
exhaustive will always beat it for value. Our exhaustive tester for bitblt,
first done by John Reiser if I remember right, set the stage for my own
thinking about how you properly test something.

-rob


On Thu, May 23, 2024 at 11:49=E2=80=AFPM Douglas McIlroy <
douglas.mcilroy@dartmouth.edu> wrote:

> > Doug McIlroy was generating random regular expressions
>
> Actually not. I exhaustively (within limits) tested an RE recognizer
> without knowingly generating any RE either mechanically or by hand.
>
> The trick: From recursive equations (easily derived from the grammar of
> REs), I counted how many REs exist up to various limits on token counts,
> Then I generated all strings that satisfied those limits, turned the
> recognizer loose on them and counted how many it accepted. Any disagreeme=
nt
> of counts revealed the existence (but not any symptom) of bugs.
>
> Unlike most diagnostic techniques, this scheme produces a certificate of
> (very high odds on) correctness over a representative subdomain. The
> scheme also agnostically checks behavior on bad inputs as well as good.  =
It
> does not, however, provide a stress test of a recognizer's capacity limit=
s. And
> its exponential nature limits its applicability to rather small domains.
> (REs have only 5 distinct kinds of token.)
>
> Doug
>

--000000000000bbcc6b0619253bd1
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:arial,sa=
ns-serif">The semantic distinction is important but the end result is very =
similar. &quot;Fuzzing&quot; as it is now called (for no reason I can intui=
t) tries to get to the troublesome cases faster by a sort of depth-first se=
arch, but exhaustive will always beat it for value. Our exhaustive tester f=
or bitblt, first done by John Reiser if I remember right, set the stage for=
 my own thinking about how you properly test something.</div><div class=3D"=
gmail_default" style=3D"font-family:arial,sans-serif"><br></div><div class=
=3D"gmail_default" style=3D"font-family:arial,sans-serif">-rob</div><div cl=
ass=3D"gmail_default" style=3D"font-family:arial,sans-serif"><br></div></di=
v><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On T=
hu, May 23, 2024 at 11:49=E2=80=AFPM Douglas McIlroy &lt;<a href=3D"mailto:=
douglas.mcilroy@dartmouth.edu">douglas.mcilroy@dartmouth.edu</a>&gt; wrote:=
<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr=
"><div dir=3D"ltr"><div dir=3D"ltr"><span style=3D"font-family:arial,sans-s=
erif">&gt; Doug McIlroy was generating random regular expressions</span><br=
></div><div dir=3D"ltr"><span style=3D"font-family:arial,sans-serif"><br></=
span></div><div><font face=3D"arial, sans-serif">Actually not. I exhaustive=
ly (within limits) tested an RE recognizer without knowingly generating any=
 RE either mechanically or by hand.</font></div><div><font face=3D"arial, s=
ans-serif"><br></font></div><div><font face=3D"arial, sans-serif">The trick=
: From recursive equations (easily derived from the grammar of REs), I coun=
ted how many REs exist up to various limits on token counts, Then I generat=
ed all strings that satisfied those limits, turned the recognizer loose on =
them and counted how many it accepted. Any disagreement of counts revealed =
the existence (but not any symptom) of bugs.=C2=A0</font></div><div><font f=
ace=3D"arial, sans-serif"><br></font></div><div><font face=3D"arial, sans-s=
erif">Unlike most diagnostic techniques, this scheme produces a certificate=
 of (very high odds on) correctness over a representative subdomain.=C2=A0<=
/font><span style=3D"font-family:arial,sans-serif">The scheme also agnostic=
ally checks behavior on bad inputs as well as good.=C2=A0=C2=A0</span><span=
 style=3D"font-family:arial,sans-serif">It does not, however, provide a str=
ess test of a recognizer&#39;s capacity limits.=C2=A0</span><span style=3D"=
font-family:arial,sans-serif">And its exponential nature limits its applica=
bility to rather small domains. (REs have only 5 distinct kinds of token.)<=
/span></div><div><span style=3D"font-family:arial,sans-serif"><br></span></=
div><div><span style=3D"font-family:arial,sans-serif">Doug</span></div></di=
v></div>
</blockquote></div>

--000000000000bbcc6b0619253bd1--