The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Ralph Corderoy <ralph@inputplus.co.uk>
To: TUHS <tuhs@tuhs.org>
Subject: [TUHS] Testing an RE recogniser exhaustively.  (Was: A fuzzy awk)
Date: Mon, 27 May 2024 10:39:09 +0100	[thread overview]
Message-ID: <20240527093909.91CAD21F18@orac.inputplus.co.uk> (raw)
In-Reply-To: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>

Hi,

Doug wrote:
> The trick: From recursive equations (easily derived from the grammar
> of REs), I counted how many REs exist up to various limits on token
> counts,  Then I generated all strings that satisfied those limits,
> turned the recognizer loose on them and counted how many it accepted.

Which reminded me of Doug's paper.

    Enumerating the strings of regular languages,
    J. Functional Programming 14 (2004) 503-518

    Haskell code is developed for two ways to list the strings of the
    language defined by a regular expression: directly by set operations
    and indirectly by converting to and simulating an equivalent
    automaton.  The exercise illustrates techniques for dealing with
    infinite ordered domains and leads to an effective standard form for
    nondeterministic finite automata.

    PDF preprint: https://www.cs.dartmouth.edu/~doug/nfa.pdf

It's also nice for the NFA construction with one state per symbol plus
one final state, and no epsilon transitions.  Doug writes:

    The even-a language (ab*a|b)* is defined by automaton h, with three
    start states.

	h0 = State 0 ’~’ []
	h1 = State 1 ’b’ [h4,h1,h0]
	h2 = State 2 ’a’ [h4,h1,h0]
	h3 = State 3 ’b’ [h2,h3]
	h4 = State 4 ’a’ [h2,h3]
	h = [h4,h1,h0]

The symbols replaced by their state numbers gives (43*2|1)*; state 0 is
the sole final state.

-- 
Cheers, Ralph.

  parent reply	other threads:[~2024-05-27  9:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-23 13:49 [TUHS] Re: A fuzzy awk Douglas McIlroy
2024-05-23 20:52 ` Rob Pike
2024-05-24  5:41   ` andrew
2024-05-24  7:17   ` Ralph Corderoy
2024-05-24  7:41     ` Rob Pike
2024-05-24 10:00       ` [TUHS] Is fuzz testing random? (Was: A fuzzy awk) Ralph Corderoy
2024-05-24 11:56     ` [TUHS] Re: A fuzzy awk Dan Halbert
2024-05-25  0:17   ` Bakul Shah via TUHS
2024-05-25  0:57     ` G. Branden Robinson
2024-05-25 13:56     ` David Arnold
2024-05-25 17:18     ` Paul Winalski
2024-05-25 17:36       ` Tom Perrine
2024-05-25 17:53         ` [TUHS] Prof Don Good [was " Charles H Sauer (he/him)
2024-05-27  9:39 ` Ralph Corderoy [this message]
2024-05-27 13:03   ` [TUHS] Re: Testing an RE recogniser exhaustively. (Was: A fuzzy awk) Hellwig Geisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240527093909.91CAD21F18@orac.inputplus.co.uk \
    --to=ralph@inputplus.co.uk \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).