The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Doug McIlroy's C++ regular expression library (mostly) revived
@ 2018-07-22 14:16 Arnold Robbins
  0 siblings, 0 replies; 4+ messages in thread
From: Arnold Robbins @ 2018-07-22 14:16 UTC (permalink / raw)
  To: tuhs

Hi All.

I have (mostly) revived Doug McIlroy's C++ regular expression parsing
library.  I gratefully acknowledge and thank him for allowing me to
publish the code and for his help in finding all the bits and pieces.

It's available at https://github.com/arnoldrobbins/mcilroy-regex .
The main things I've done are to gather all the bits and pieces, rename files
to have a .cpp extension, and get everything to compile using current g++
and standard make.

I'm at the point where I could use some help. The various tests
do not all run successfully. 

1. make retest - a number of tests fail
2. ./tesgrep.sh - a number of tests fail
3. ./testsed.sh - tests fail with core dumps

Looking briefly, some of the code in sed plays C games, casting various
things arouond to pointers of different types and dereferencing them;
these things tend to cause trouble in C++.

I'm hopeful that more eyes on this code will help it come back to life
more quickly.  Any and all help will be appreciated.

Thanks,

Arnold

P.S. Let's not start a flame war about C vs. C++ etc. etc.  If you can
help, please just dive in. Otherwise, just go, "wow, neat work" and
move on to something else. :-) Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [TUHS] Doug McIlroy's C++ regular expression library (mostly) revived
  2018-07-29  6:02 ` arnold
@ 2018-08-01  4:15   ` Larry McVoy
  0 siblings, 0 replies; 4+ messages in thread
From: Larry McVoy @ 2018-08-01  4:15 UTC (permalink / raw)
  To: arnold; +Cc: tuhs, rsc, doug

The old school Unix guys are just like this.  I asked BWK about awk and
he tarred up ~bwk/awk and sent me the source to awk, to the awk book,
it was crazy cool.

And it's crazy cool that us "younger" (I feel pretty old but younger
than the Unix guys) folks get to hang out with the people who were 
there at the beginning.  I can't tell you how grateful I am to be on
this list with these people.

On Sun, Jul 29, 2018 at 12:02:45AM -0600, arnold@skeeve.com wrote:
> Dr. McIlroy,
> 
> Much thanks for this!  If you don't object, I will add this note to the
> repo as it provides insight into the wherefores of the package.
> 
> At this point, I must also give credit where credit is due:
> 
> * Chet Ramey, who suggested that I ask Russ Cox to take a look at the package,
> * Russ Cox, who fixed the major problems and got all the tests to pass,
> * Rares Aioanei, who volunteered to tackle fixing things but did not get
>   to do so before Russ beat him to it.
> 
> As implied by the above, the package is now up-to-date and functional!
> I hope it's of interest to the broader community.
> 
> My own reason for seeking this out is that I have (likely vain) hopes
> of one day finding a better regex package to use for gawk.  But
> regular expressions are interesting in their own right. Russ Cox has
> a series of papers on his web site about them that are worth reading.
> 
> Finally, thanks again to Dr. McIlroy for humoring me and giving me
> his code to play with.
> 
> Arnold
> 
> 
> Doug McIlroy <doug@cs.dartmouth.edu> wrote:
> 
> > Why would anyone be interested in an old regex package that never was
> > a part of any Unix distro?
> >
> > The driving force was Posix, whose regex spec was quite inscrutable. Could
> > there be a reference implementation? It was easy to fool every
> > implementation I could get my hands on, including Gnu's over-the-top
> > 9000-line implementation.
> >
> > But as I got into it, I got fascinated by regexes per se. In making a
> > recognizer, there's a tradeoff between contruction time and execution
> > time. Linear execution can be achieved, but at a potentially exponential
> > cost in construction time (and space). Backreferencing takes the regex
> > languages out of the class of regular languages.
> >
> > Recalling that regular languages are closed under intersection and
> > negation, I wondered about how to implement new regex operators, &
> > and -. I came up with a scheme for this optional non-Posix feature that
> > involved layering continuation-passing over more traditional methods. And
> > while I was at it, I broke out smaller sublanguages for special treatment
> > (as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
> > in which the only operation is catenation.
> >
> > And finally, having followed the development of C++ from its infancy,
> > I wanted to try out its new template facility, so there's a bit of
> > that in the package, too. Arnold has discovered that not only has C++
> > evolved, but also that without the discipline of -Wall to force clean
> > code, I was rather cavalier about casting, both explicitly and implicitly.
> >
> > The only real customer the code ever had was the AST project, which
> > translated it to C. After the C++ had sat idle for a half-dozen years, I
> > thought to revive it in Linux, but found it riddled with incompatibilities
> > with that new environment and gave up. Arnold deserves a citation for
> > bravery in pushing that through 15 years further on.
> >
> > Doug

-- 
---
Larry McVoy            	     lm at mcvoy.com             http://www.mcvoy.com/lm 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [TUHS] Doug McIlroy's C++ regular expression library (mostly) revived
  2018-07-28 22:31 Doug McIlroy
@ 2018-07-29  6:02 ` arnold
  2018-08-01  4:15   ` Larry McVoy
  0 siblings, 1 reply; 4+ messages in thread
From: arnold @ 2018-07-29  6:02 UTC (permalink / raw)
  To: tuhs, doug; +Cc: rsc

Dr. McIlroy,

Much thanks for this!  If you don't object, I will add this note to the
repo as it provides insight into the wherefores of the package.

At this point, I must also give credit where credit is due:

* Chet Ramey, who suggested that I ask Russ Cox to take a look at the package,
* Russ Cox, who fixed the major problems and got all the tests to pass,
* Rares Aioanei, who volunteered to tackle fixing things but did not get
  to do so before Russ beat him to it.

As implied by the above, the package is now up-to-date and functional!
I hope it's of interest to the broader community.

My own reason for seeking this out is that I have (likely vain) hopes
of one day finding a better regex package to use for gawk.  But
regular expressions are interesting in their own right. Russ Cox has
a series of papers on his web site about them that are worth reading.

Finally, thanks again to Dr. McIlroy for humoring me and giving me
his code to play with.

Arnold


Doug McIlroy <doug@cs.dartmouth.edu> wrote:

> Why would anyone be interested in an old regex package that never was
> a part of any Unix distro?
>
> The driving force was Posix, whose regex spec was quite inscrutable. Could
> there be a reference implementation? It was easy to fool every
> implementation I could get my hands on, including Gnu's over-the-top
> 9000-line implementation.
>
> But as I got into it, I got fascinated by regexes per se. In making a
> recognizer, there's a tradeoff between contruction time and execution
> time. Linear execution can be achieved, but at a potentially exponential
> cost in construction time (and space). Backreferencing takes the regex
> languages out of the class of regular languages.
>
> Recalling that regular languages are closed under intersection and
> negation, I wondered about how to implement new regex operators, &
> and -. I came up with a scheme for this optional non-Posix feature that
> involved layering continuation-passing over more traditional methods. And
> while I was at it, I broke out smaller sublanguages for special treatment
> (as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
> in which the only operation is catenation.
>
> And finally, having followed the development of C++ from its infancy,
> I wanted to try out its new template facility, so there's a bit of
> that in the package, too. Arnold has discovered that not only has C++
> evolved, but also that without the discipline of -Wall to force clean
> code, I was rather cavalier about casting, both explicitly and implicitly.
>
> The only real customer the code ever had was the AST project, which
> translated it to C. After the C++ had sat idle for a half-dozen years, I
> thought to revive it in Linux, but found it riddled with incompatibilities
> with that new environment and gave up. Arnold deserves a citation for
> bravery in pushing that through 15 years further on.
>
> Doug

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [TUHS] Doug McIlroy's C++ regular expression library (mostly) revived
@ 2018-07-28 22:31 Doug McIlroy
  2018-07-29  6:02 ` arnold
  0 siblings, 1 reply; 4+ messages in thread
From: Doug McIlroy @ 2018-07-28 22:31 UTC (permalink / raw)
  To: tuhs


Why would anyone be interested in an old regex package that never was
a part of any Unix distro?

The driving force was Posix, whose regex spec was quite inscrutable. Could
there be a reference implementation? It was easy to fool every
implementation I could get my hands on, including Gnu's over-the-top
9000-line implementation.

But as I got into it, I got fascinated by regexes per se. In making a
recognizer, there's a tradeoff between contruction time and execution
time. Linear execution can be achieved, but at a potentially exponential
cost in construction time (and space). Backreferencing takes the regex
languages out of the class of regular languages.

Recalling that regular languages are closed under intersection and
negation, I wondered about how to implement new regex operators, &
and -. I came up with a scheme for this optional non-Posix feature that
involved layering continuation-passing over more traditional methods. And
while I was at it, I broke out smaller sublanguages for special treatment
(as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
in which the only operation is catenation.

And finally, having followed the development of C++ from its infancy,
I wanted to try out its new template facility, so there's a bit of
that in the package, too. Arnold has discovered that not only has C++
evolved, but also that without the discipline of -Wall to force clean
code, I was rather cavalier about casting, both explicitly and implicitly.

The only real customer the code ever had was the AST project, which
translated it to C. After the C++ had sat idle for a half-dozen years, I
thought to revive it in Linux, but found it riddled with incompatibilities
with that new environment and gave up. Arnold deserves a citation for
bravery in pushing that through 15 years further on.

Doug


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-08-01  4:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-22 14:16 [TUHS] Doug McIlroy's C++ regular expression library (mostly) revived Arnold Robbins
2018-07-28 22:31 Doug McIlroy
2018-07-29  6:02 ` arnold
2018-08-01  4:15   ` Larry McVoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).