The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Doug McIlroy's C++ regular expression library (mostly) revived
@ 2018-07-22 14:16 Arnold Robbins
  0 siblings, 0 replies; 4+ messages in thread
From: Arnold Robbins @ 2018-07-22 14:16 UTC (permalink / raw)
  To: tuhs

Hi All.

I have (mostly) revived Doug McIlroy's C++ regular expression parsing
library.  I gratefully acknowledge and thank him for allowing me to
publish the code and for his help in finding all the bits and pieces.

It's available at https://github.com/arnoldrobbins/mcilroy-regex .
The main things I've done are to gather all the bits and pieces, rename files
to have a .cpp extension, and get everything to compile using current g++
and standard make.

I'm at the point where I could use some help. The various tests
do not all run successfully. 

1. make retest - a number of tests fail
2. ./tesgrep.sh - a number of tests fail
3. ./testsed.sh - tests fail with core dumps

Looking briefly, some of the code in sed plays C games, casting various
things arouond to pointers of different types and dereferencing them;
these things tend to cause trouble in C++.

I'm hopeful that more eyes on this code will help it come back to life
more quickly.  Any and all help will be appreciated.

Thanks,

Arnold

P.S. Let's not start a flame war about C vs. C++ etc. etc.  If you can
help, please just dive in. Otherwise, just go, "wow, neat work" and
move on to something else. :-) Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread
* Re: [TUHS] Doug McIlroy's C++ regular expression library (mostly) revived
@ 2018-07-28 22:31 Doug McIlroy
  2018-07-29  6:02 ` arnold
  0 siblings, 1 reply; 4+ messages in thread
From: Doug McIlroy @ 2018-07-28 22:31 UTC (permalink / raw)
  To: tuhs


Why would anyone be interested in an old regex package that never was
a part of any Unix distro?

The driving force was Posix, whose regex spec was quite inscrutable. Could
there be a reference implementation? It was easy to fool every
implementation I could get my hands on, including Gnu's over-the-top
9000-line implementation.

But as I got into it, I got fascinated by regexes per se. In making a
recognizer, there's a tradeoff between contruction time and execution
time. Linear execution can be achieved, but at a potentially exponential
cost in construction time (and space). Backreferencing takes the regex
languages out of the class of regular languages.

Recalling that regular languages are closed under intersection and
negation, I wondered about how to implement new regex operators, &
and -. I came up with a scheme for this optional non-Posix feature that
involved layering continuation-passing over more traditional methods. And
while I was at it, I broke out smaller sublanguages for special treatment
(as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
in which the only operation is catenation.

And finally, having followed the development of C++ from its infancy,
I wanted to try out its new template facility, so there's a bit of
that in the package, too. Arnold has discovered that not only has C++
evolved, but also that without the discipline of -Wall to force clean
code, I was rather cavalier about casting, both explicitly and implicitly.

The only real customer the code ever had was the AST project, which
translated it to C. After the C++ had sat idle for a half-dozen years, I
thought to revive it in Linux, but found it riddled with incompatibilities
with that new environment and gave up. Arnold deserves a citation for
bravery in pushing that through 15 years further on.

Doug


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-08-01  4:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-22 14:16 [TUHS] Doug McIlroy's C++ regular expression library (mostly) revived Arnold Robbins
2018-07-28 22:31 Doug McIlroy
2018-07-29  6:02 ` arnold
2018-08-01  4:15   ` Larry McVoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).