From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: from minnie.tuhs.org (minnie.tuhs.org [IPv6:2600:3c01:e000:146::1]) by inbox.vuxu.org (Postfix) with ESMTP id 522702862B for ; Thu, 23 May 2024 22:52:57 +0200 (CEST) Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 5ADC743A91; Fri, 24 May 2024 06:52:54 +1000 (AEST) Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by minnie.tuhs.org (Postfix) with ESMTPS id E2762432DC for ; Fri, 24 May 2024 06:52:47 +1000 (AEST) Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2bdf11888a5so723434a91.0 for ; Thu, 23 May 2024 13:52:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716497567; x=1717102367; darn=tuhs.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Oa5tXjgTXQUkznQoY4CBvD+tsZoxMwBso3wtDc2f1JM=; b=GViByd0udG0/YJsvl7SdtR/dc3BFVaWlQz1FmD89Lr8M/nE2jGF/NaBWfBMSwZlW4N FPPe/nGXoo178WHtsU+yX9dJcBgYQYnAYvQS+uqCubfGiXmDuUZ+3Jmu/hT5DKVDNqGH rQ1hvwBUPNP0SxLvCNb025IVTOyWGMiWdKy76Om54no3mQCzLgckVPUgaZgz2xRW/J9J fWsSHWU1U7Vwx4MRvHOgunBEFYeQI8AxPEvQ2toHrRgTNf0sBXMJ145mkIi/nQKRrcPC nWSlKK89c+TZNDacQwiSMP3jj7DVctGHGKm8KrSiAoUSJhMxGo78t0LmlUbkd/iiRk+q hprQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716497567; x=1717102367; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Oa5tXjgTXQUkznQoY4CBvD+tsZoxMwBso3wtDc2f1JM=; b=u/7Gtjw/Nn3/AQGexGcFQhn9HS0pLxQUZFHwhAcVyCR+P9TgyNEw/VuPcnyl+tUIY/ 5wMXTINgnx0Oz6dPKXrsgRMHxAiV5MjQaxNiyjOl6tllR5k1KmaYo5jbIK0rs5hhUihM pa5aKSIt7dOOS4zDXsbkkFAvXhYDRQC3XNNKOiv2HKSA0TLxMlQAVM9mqR3tMQ4fNTtm ysWLZZVmgo+xYWIFuIn+A6p58nUFjFWi19QdR5buMnxPJy0vGksipnoBSpWcZHCNw1a3 jTBC5YySM/NGFw4bzY+JXKxO+OkBhgk+pf5IbsjpX/5HqBTZV0T7Wh1Ln85LsOIHhL7B cdfA== X-Gm-Message-State: AOJu0YxlC5CR+YnMeUue35vhDwpsiYJYQnNdcQiFASN9fqBBdBmtb+Ox 8S5wKdd+goJT6pZSFb5+C+P5rMsTqy6Cpb1r2aIa1t5nFyFWyIqeaWwAS7ZZlAMNkmaHQidYPJT dk+3z+6PzfGCZ+NMCDkXMmFDSjNYbrw== X-Google-Smtp-Source: AGHT+IFTn9VaBqhZ5A3y6K6xMQ+qSaTxDY48QnUx7FDLpTxOlsbbWiB0JG+s3wnCrcMAz72X9kxEJw0iOssxef6vDXU= X-Received: by 2002:a17:90a:ead8:b0:2b2:ce88:c68c with SMTP id 98e67ed59e1d1-2bf5e170f1bmr299035a91.19.1716497566909; Thu, 23 May 2024 13:52:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Rob Pike Date: Fri, 24 May 2024 06:52:35 +1000 Message-ID: To: Douglas McIlroy Content-Type: multipart/alternative; boundary="000000000000bbcc6b0619253bd1" Message-ID-Hash: BAA6OIKPXOKJ3BBZL7W3QGVCFL6XDUSQ X-Message-ID-Hash: BAA6OIKPXOKJ3BBZL7W3QGVCFL6XDUSQ X-MailFrom: robpike@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: TUHS main list X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: A fuzzy awk List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: --000000000000bbcc6b0619253bd1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable The semantic distinction is important but the end result is very similar. "Fuzzing" as it is now called (for no reason I can intuit) tries to get to the troublesome cases faster by a sort of depth-first search, but exhaustive will always beat it for value. Our exhaustive tester for bitblt, first done by John Reiser if I remember right, set the stage for my own thinking about how you properly test something. -rob On Thu, May 23, 2024 at 11:49=E2=80=AFPM Douglas McIlroy < douglas.mcilroy@dartmouth.edu> wrote: > > Doug McIlroy was generating random regular expressions > > Actually not. I exhaustively (within limits) tested an RE recognizer > without knowingly generating any RE either mechanically or by hand. > > The trick: From recursive equations (easily derived from the grammar of > REs), I counted how many REs exist up to various limits on token counts, > Then I generated all strings that satisfied those limits, turned the > recognizer loose on them and counted how many it accepted. Any disagreeme= nt > of counts revealed the existence (but not any symptom) of bugs. > > Unlike most diagnostic techniques, this scheme produces a certificate of > (very high odds on) correctness over a representative subdomain. The > scheme also agnostically checks behavior on bad inputs as well as good. = It > does not, however, provide a stress test of a recognizer's capacity limit= s. And > its exponential nature limits its applicability to rather small domains. > (REs have only 5 distinct kinds of token.) > > Doug > --000000000000bbcc6b0619253bd1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The semantic distinction is important but the end result is very = similar. "Fuzzing" as it is now called (for no reason I can intui= t) tries to get to the troublesome cases faster by a sort of depth-first se= arch, but exhaustive will always beat it for value. Our exhaustive tester f= or bitblt, first done by John Reiser if I remember right, set the stage for= my own thinking about how you properly test something.

-rob


On T= hu, May 23, 2024 at 11:49=E2=80=AFPM Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:=
> Doug McIlroy was generating random regular expressions

Actually not. I exhaustive= ly (within limits) tested an RE recognizer without knowingly generating any= RE either mechanically or by hand.

The trick= : From recursive equations (easily derived from the grammar of REs), I coun= ted how many REs exist up to various limits on token counts, Then I generat= ed all strings that satisfied those limits, turned the recognizer loose on = them and counted how many it accepted. Any disagreement of counts revealed = the existence (but not any symptom) of bugs.=C2=A0

Unlike most diagnostic techniques, this scheme produces a certificate= of (very high odds on) correctness over a representative subdomain.=C2=A0<= /font>The scheme also agnostic= ally checks behavior on bad inputs as well as good.=C2=A0=C2=A0It does not, however, provide a str= ess test of a recognizer's capacity limits.=C2=A0And its exponential nature limits its applica= bility to rather small domains. (REs have only 5 distinct kinds of token.)<= /span>

Doug
--000000000000bbcc6b0619253bd1--