9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] cc lexer bug?
@ 2007-02-18 20:19 Joel Salomon
  2007-02-18 21:22 ` erik quanstrom
  0 siblings, 1 reply; 4+ messages in thread
From: Joel Salomon @ 2007-02-18 20:19 UTC (permalink / raw)
  To: 9fans

cpu% cat t.c
void foo (void)
{
	double d;
	d = 08.7;
	USED(d);
}
cpu% 8c t.c
t.c:4 syntax error, last name: 8.7
cpu% 

This came up as I’m making my lexer for C able to scan numbers.  I
tried to understand ken’s code, but it gets very hairy right around
/sys/src/cmd/cc/lex.c:751 — and I think there’s a bug.  Or, at least,
an undocumented departure from the ANSI standard; Harbison & Steele
(5E) suggest that “08.7” is a valid floating point constant.

As far as my lexer is concerned (http://www.tip9ug.jp/who/chesky/comp/lex.c,
if anyone cares), it’s using line-at-a-time buffering courtesy of
Brdstr(2), so I’m back to thinking that regcomp(2) + strtod(2) et al.
is the way to go.  It won’t handle hex floats, but who cares?

--Joel



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [9fans] cc lexer bug?
  2007-02-18 20:19 [9fans] cc lexer bug? Joel Salomon
@ 2007-02-18 21:22 ` erik quanstrom
  2007-02-18 22:07   ` Joel Salomon
  0 siblings, 1 reply; 4+ messages in thread
From: erik quanstrom @ 2007-02-18 21:22 UTC (permalink / raw)
  To: 9fans

you're right, that's wrong.  but the question is do
you really want that to work?

in that case, one could always remove the goto ncu
on line 757.

however, for better or worse, octal constants start with a 0.
08.7 feels like a syntax error to me.  i am a little
suprised that gcc accepts 08.7.  are you sure that that's
actually in the standard?

this testcase should do:

double d = 08.7;

- erik
On Sun Feb 18 15:26:26 EST 2007, JoelCSalomon@Gmail.com wrote:
> cpu% cat t.c
> void foo (void)
> {
> 	double d;
> 	d = 08.7;
> 	USED(d);
> }
> cpu% 8c t.c
> t.c:4 syntax error, last name: 8.7
> cpu% 
> 
> This came up as I’m making my lexer for C able to scan numbers.  I
> tried to understand ken’s code, but it gets very hairy right around
> /sys/src/cmd/cc/lex.c:751 — and I think there’s a bug.  Or, at least,
> an undocumented departure from the ANSI standard; Harbison & Steele
> (5E) suggest that “08.7” is a valid floating point constant.
> 
> As far as my lexer is concerned (http://www.tip9ug.jp/who/chesky/comp/lex.c,
> if anyone cares), it’s using line-at-a-time buffering courtesy of
> Brdstr(2), so I’m back to thinking that regcomp(2) + strtod(2) et al.
> is the way to go.  It won’t handle hex floats, but who cares?
> 
> --Joel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [9fans] cc lexer bug?
  2007-02-18 21:22 ` erik quanstrom
@ 2007-02-18 22:07   ` Joel Salomon
  2007-02-18 22:40     ` erik quanstrom
  0 siblings, 1 reply; 4+ messages in thread
From: Joel Salomon @ 2007-02-18 22:07 UTC (permalink / raw)
  To: 9fans

> > cpu% cat t.c
> > void foo (void)
> > {
> > 	double d = 08.7;
> > 	USED(d);
> > }
> > cpu% 8c t.c
> > t.c:3 syntax error, last name: 8.7
> > cpu% 
> > 
> > This came up as I’m making my lexer for C able to scan numbers.  I
> > tried to understand ken’s code, but it gets very hairy right around
> > /sys/src/cmd/cc/lex.c:751 — and I think there’s a bug.
> 
> you're right, that's wrong.  but the question is do you really want
> that to work?

In kencc, maybe not; in a class project designed to implement a C89 compiler, definitely yes.

> in that case, one could always remove the goto ncu on line 757.

cpu% grep -n 'goto ncu;' /sys/src/cmd/cc/lex.c
778: 			goto ncu;
788: 		goto ncu;
801: 		goto ncu;
808: 		goto ncu;
cpu% 

I think you mean the one on 788.  I'm far from comfortable enough with
the code here to suggest a patch, though.  Certainly not comfortable
enough to incorporate it into my lexer.

> however, for better or worse, octal constants start with a 0.  08.7
> feels like a syntax error to me.  i am a little suprised that gcc
> accepts 08.7.  are you sure that that's actually in the standard?

>From the ISO standard (the freely available “draft" C99+TG1+TG2 standard):

6.4.4.2 Floating constants
Syntax
floating-constant:
		decimal-floating-constant
		hexadecimal-floating-constant

decimal-floating-constant:
	fractional-constant exponent-part_{opt} floating-suffix_{opt}
	digit-sequence exponent-part floating-suffix_{opt}
…

fractional-constant:
	digit-sequence_{opt} . digit-sequence
	digit-sequence .
…

digit-sequence:
	digit
	digit-sequence digit


— no restrictions on the initial digit.  You get the same table in
Harbison & Steele, 5th Ed. §2.7.2.

--Joel



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [9fans] cc lexer bug?
  2007-02-18 22:07   ` Joel Salomon
@ 2007-02-18 22:40     ` erik quanstrom
  0 siblings, 0 replies; 4+ messages in thread
From: erik quanstrom @ 2007-02-18 22:40 UTC (permalink / raw)
  To: 9fans

On Sun Feb 18 17:08:00 EST 2007, JoelCSalomon@Gmail.com wrote:
> I think you mean the one on 788.  I'm far from comfortable enough with
> the code here to suggest a patch, though.  Certainly not comfortable
> enough to incorporate it into my lexer.
> 

sorry.  i was using an older version of the lexer.  line 788 is the correct
one to delete.

i read the current code like this

		[...]
	assert(first digit is '0');

	if(not octal(c))
		goto dc;

	while(is octal(c))
		GETC();
	goto ncu;		// i.e. this can't be floating point.

dc:
	/* check for floating point */
	[...]

ncu:
	/* check integer size suffixes */


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-02-18 22:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-18 20:19 [9fans] cc lexer bug? Joel Salomon
2007-02-18 21:22 ` erik quanstrom
2007-02-18 22:07   ` Joel Salomon
2007-02-18 22:40     ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).