The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Rob Pike <robpike@gmail.com>
To: Steve Johnson <scj@yaccman.com>
Cc: Bakul Shah <bakul@iitbombay.org>,
	The Unix Heritage Society mailing list <tuhs@minnie.tuhs.org>
Subject: Re: [TUHS] [tuhs] Dennis Ritchie's couch
Date: Fri, 2 Jul 2021 13:34:27 +1000	[thread overview]
Message-ID: <CAKzdPgxGOZLLQzzPKe3kyAq1soAYPHwyeVP2iePiE8zMSk=f7g@mail.gmail.com> (raw)
In-Reply-To: <CAKzdPgw6zkXg9tB8uoPOZBv2C5nV8=N4kuW8-tHCUoby=9ki+Q@mail.gmail.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 8286 bytes --]

<resending with the program as an attachment, as 100K is considered big
here? moderator, you can kill the prior message>

To show you what I mean, here is a parser I wrote recently for a simple Go
expression evaluator, in Go. It has all Go's precedence levels and
operators. The one odd detail is that >= etc. can be combined, as in 1 >= 2
> 3, but that doesn't matter in the context of this program and is easily
avoided with a few more lines in cmpList.

I'm using a screen grab because GMail refuses to leave my indentation
alone. I left factor off. It's got all the usual details but it's the leaf
of the grammar and of no interest here.

Note that this could easily have been made a table instead of a bunch of
one-line functions.

Parsers are easy to write. It took us a generation (or more) to understand
that, but it's remarkable nonetheless. The first big step might have been
realizing that recursion was a good idea, even if you weren't writing LISP,
if the data structure is itself recursive.

-rob


On Fri, Jul 2, 2021 at 1:30 PM Rob Pike <robpike@gmail.com> wrote:

> To show you what I mean, here is a parser I wrote recently for a simple Go
> expression evaluator, in Go. It has all Go's precedence levels and
> operators. The one odd detail is that >= etc. can be combined, as in 1 >= 2
> > 3, but that doesn't matter in the context of this program and is easily
> avoided with a few more lines in cmpList.
>
> I'm using a screen grab because GMail refuses to leave my indentation
> alone. I left factor off. It's got all the usual details but it's the leaf
> of the grammar and of no interest here.
>
> Note that this could easily have been made a table instead of a bunch of
> one-line functions.
>
> Parsers are easy to write. It took us a generation (or more) to understand
> that, but it's remarkable nonetheless. The first big step might have been
> realizing that recursion was a good idea, even if you weren't writing LISP,
> if the data structure is itself recursive.
>
> -rob
>
>
> [image: image.png]
>
>
> On Fri, Jul 2, 2021 at 12:13 PM <scj@yaccman.com> wrote:
>
>> I found Dennis's compiler to be a thing of beauty when compiling
>> statements, but hell on wheels when doing expressions.  One of the
>> motivations for writing Yacc was that I wanted to add the exclusive or into
>> the expression syntax and we had agreed on the character (^) and the
>> precedence.  Practically the whole expression grammar needed to be
>> rewritten, and small typos created un-debuggable chaos.  The state of the
>> art at the time was to write multi-layered grammars for each precedence
>> level.  A paper was published on how to shrink the resulting parsing tables
>> by eliminating redundant states.   I realized that the same results could
>> be obtained by writing an ambiguous expression grammar and using a
>> precedence table to resolve the ambiguities.  The initial response in the
>> academic community to programming with ambiguous grammars was somewhere
>> between skeptical and horrified -- as if I had shown porn to a Victorian.
>> So Al and I worked out a proof that we would get the same optimized parser
>> in a much more intuitive way.
>>
>> I do agree with Rob that some of the languages that Yacc gave birth to
>> should never have been born.  Remember, though, that the dominant language
>> at the time was FORTRAN, and it had all sorts of strange implementation
>> issues in their hand-written compilers.  Things like subscript indices had
>> to be single variables in some places, and in others could have a constant
>> added to the index.  One of Yacc's best features was that it made
>> consistency of usage the path of least resistance when designing the
>> language, and the grammar was often easier to understand than the
>> programming manual.  At Bell Labs, Barbara Ryder wrote a program that would
>> read FORTRAN and detect things that would not work on one or more of the
>> six major FORTRAN's at the time.  It was an inspiration for me, later, do
>> the same thing with Lint.
>>
>> I do suggest that having languages like C++ that have bloated up to over
>> 1000 pages in the programmer reference doesn't feel like a real advance,
>> especially since the real language problems of today are how to program
>> very large numbers of processor-like objects on a single chip.  We need new
>> ways to think, and I doubt that we'll get there by making C++ require a
>> 2000-page manual.
>> ---
>>
>>
>>
>> On 2021-05-25 23:52, Rob Pike wrote:
>>
>> I enjoy writing recursive descent parsers, and I enjoy the grammars that
>> result from such parsers when cleanly done.
>>
>> I do agree though that if you grammar is being invented as you go, yacc
>> can be a boon. But in a sense, that's also it's biggest failing: it makes
>> it too easy to write bad grammars.
>>
>> -rob
>>
>>
>> On Wed, May 26, 2021 at 4:22 PM Bakul Shah <bakul@iitbombay.org> wrote:
>>
>> Many existing programming languages do have a simple enough
>> syntax to make it easy to write a recursive descent parser for them
>> but not alll.
>>
>> Handwritten recursive descent parsers are often LL(1) with may be
>> a separate operator-precedence parsing for expressions.
>>
>> If you are defining a new language syntax you can make sure parsing
>> is easy but not all languages are LL(1) (which is a subset of LALR(1),
>> which is a subset of LR(1), which is a subset of GLR). Handwritten
>> parsers for these more general grammars are bound to get more
>> complicated.
>>
>> Even *we* understand parsing, writing a parser for some existing
>> languages  which grew "organically" can become tedious, or
>> complicated or adhoc. Often such languages have no well specified
>> grammar (the code is the specification!). A yacc grammar would help.
>>
>> Often one writes a yacc grammar while a new language & its syntax
>> is evolving. Changing a yacc file is more localized & easier than fixing
>> up a handwritten parser. Even Go has such a grammar initially.
>>
>> -- Bakul
>>
>> > On May 25, 2021, at 8:03 PM, Larry McVoy <lm@mcvoy.com> wrote:
>> >
>> > You do, I don't.  I'm not alone in my lack of understanding.
>> >
>> > I think that all the things that yacc solved, Steve gets some kudos.
>> > I've used it a bunch and I did not need to be as smart as you or
>> > Steve to get the job done.
>> >
>> > You getting past that is cool but it doesn't make his work less.
>> >
>> > On Wed, May 26, 2021 at 10:37:45AM +1000, Rob Pike wrote:
>> >> And today, we understand parsing so well we don't need yacc.
>> >>
>> >> -rob
>> >>
>> >>
>> >> On Wed, May 26, 2021 at 10:20 AM Nelson H. F. Beebe <
>> beebe@math.utah.edu>
>> >> wrote:
>> >>
>> >>> The last article of the latest issue of the Communications of the ACM
>> >>> that appeared electronically earlier today is a brief interview with
>> >>> this year's ACM Turing Award winners, Al Aho and Jeff Ullman.
>> >>>
>> >>> The article is
>> >>>
>> >>>        Last byte: Shaping the foundations of programming languages
>> >>>        https://doi.org/10.1145/3460442
>> >>>        Comm. ACM 64(6), 120, 119, June 2021.
>> >>>
>> >>> and it includes a picture of the two winners sitting on Dennis
>> >>> Ritchie's couch.
>> >>>
>> >>> I liked this snippet from Jeff Ullman, praising fellow list member
>> >>> Steve Johnson's landmark program, yacc:
>> >>>
>> >>>>> ...
>> >>>>> At the time of the first Fortran compiler, it took several
>> >>>>> person-years to write a parser.  By the time yacc came around,
>> >>>>> you could do it in an afternoon.
>> >>>>> ...
>> >>>
>> >>>
>> >>>
>> -------------------------------------------------------------------------------
>> >>> - Nelson H. F. Beebe                    Tel: +1 801 581 5254
>> >>>    -
>> >>> - University of Utah                    FAX: +1 801 581 4148
>> >>>    -
>> >>> - Department of Mathematics, 110 LCB    Internet e-mail:
>> >>> beebe@math.utah.edu  -
>> >>> - 155 S 1400 E RM 233                       beebe@acm.org
>> >>> beebe@computer.org -
>> >>> - Salt Lake City, UT 84112-0090, USA    URL:
>> >>> http://www.math.utah.edu/~beebe/ -
>> >>>
>> >>>
>> -------------------------------------------------------------------------------
>> >>>
>> >
>> > --
>> > ---
>> > Larry McVoy                        lm at mcvoy.com
>> http://www.mcvoy.com/lm
>>
>>

[-- Attachment #1.1.2: Type: text/html, Size: 11298 bytes --]

[-- Attachment #1.2: image.png --]
[-- Type: image/png, Size: 246447 bytes --]

[-- Attachment #2: parse.txt --]
[-- Type: text/plain, Size: 1314 bytes --]

// parse implements a production in the expression parse hierarchy. Singles and
// doubles are strings holding the operators that are available at at this precedence
// level, while nextLevel implements the next higher precendence level.
func (p *parser) parse(singles, doubles string, nextLevel func(*parser) *Expr) *Expr {
	e := nextLevel(p)
	for {
		if p.peek(true) == eof {
			return e
		}
		op := p.op(singles, doubles)
		if op == "" {
			return e
		}
		e = &Expr{
			op:    op,
			left:  e,
			right: nextLevel(p),
		}
	}
}

// orlist = andList | andList '||' orList.
func orList(p *parser) *Expr {
	return p.parse("", "||", andList)
}

// andlist = cmpList | cmpList '&&' andList.
func andList(p *parser) *Expr {
	return p.parse("", "&&", cmpList)
}

// cmpList = expr | expr ('>' | '<' | '==' | '!=' | '>=' | '<=') expr.
func cmpList(p *parser) *Expr {
	return p.parse("+-|^!><", "==!=>=<=", expr)
}

// expr = term | term ('+' | '-' | '|' | '^') term.
func expr(p *parser) *Expr {
	return p.parse("+-|^!", "", term)
}

// term = factor | factor ('*' | '/' | '%' | '>>' | '<<' | '&' | '&^') factor
func term(p *parser) *Expr {
	return p.parse("*/%&", ">><<&^", factor)
}

// factor = constant | identifier | '+' factor | '-' factor | '^' factor | '!' factor | '(' orList ')'
func factor(p *parser) *Expr {

  reply	other threads:[~2021-07-03  7:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26  0:12 Nelson H. F. Beebe
2021-05-26  0:37 ` Rob Pike
2021-05-26  3:03   ` Larry McVoy
2021-05-26  4:02     ` Rob Pike
2021-05-26  6:20     ` Bakul Shah
2021-05-26  6:52       ` Rob Pike
2021-07-02  2:13         ` scj
2021-07-02  3:30           ` Rob Pike
2021-07-02  3:34             ` Rob Pike [this message]
2021-07-02  4:40               ` Rob Pike
2021-07-02  6:29                 ` George Michaelson
2021-07-02  7:00                   ` Bakul Shah
2021-07-02 17:17                     ` [TUHS] " Tony Finch
2021-07-04 23:40                       ` George Michaelson
2021-07-05  3:41                         ` Bakul Shah
2021-05-26  5:13   ` [TUHS] [tuhs] " arnold
2021-05-26  4:10 ` Avindra G

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKzdPgxGOZLLQzzPKe3kyAq1soAYPHwyeVP2iePiE8zMSk=f7g@mail.gmail.com' \
    --to=robpike@gmail.com \
    --cc=bakul@iitbombay.org \
    --cc=scj@yaccman.com \
    --cc=tuhs@minnie.tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).