[9fans] Yacc question

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] Yacc question
@ 2004-10-15 13:35 Brantley Coile
  0 siblings, 0 replies; 12+ messages in thread
From: Brantley Coile @ 2004-10-15 13:35 UTC (permalink / raw)
  To: 9fans

What goes into the y.debug that is included if yydebug
is defined?

  Brantley



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
  2007-02-05  3:06           ` geoff
@ 2007-02-05  9:36             ` Bruce Ellis
  0 siblings, 0 replies; 12+ messages in thread
From: Bruce Ellis @ 2007-02-05  9:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

cool hack, ESCAPE ESCAPE ... too much confusion.

On 2/5/07, geoff@plan9.bell-labs.com <geoff@plan9.bell-labs.com> wrote:
> I don't have access to the code currently, but I believe the trick was
> to have the function that lex calls to get its next input character
> notice the start of a non-ASCII UTF sequence, read the whole rune,
> convert it to \033 followed by the 4 hex digits of the rune's value,
> and pass those bytes consecutively to lex.  Then yylex() would do the
> reverse translation from escaped hex back to a rune, so yacc's parser
> would see full runes.
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
  2007-02-05  2:49         ` Joel Salomon
@ 2007-02-05  3:06           ` geoff
  2007-02-05  9:36             ` Bruce Ellis
  0 siblings, 1 reply; 12+ messages in thread
From: geoff @ 2007-02-05  3:06 UTC (permalink / raw)
  To: 9fans

I don't have access to the code currently, but I believe the trick was
to have the function that lex calls to get its next input character
notice the start of a non-ASCII UTF sequence, read the whole rune,
convert it to \033 followed by the 4 hex digits of the rune's value,
and pass those bytes consecutively to lex.  Then yylex() would do the
reverse translation from escaped hex back to a rune, so yacc's parser
would see full runes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
  2007-02-05  1:06       ` geoff
@ 2007-02-05  2:49         ` Joel Salomon
  2007-02-05  3:06           ` geoff
  0 siblings, 1 reply; 12+ messages in thread
From: Joel Salomon @ 2007-02-05  2:49 UTC (permalink / raw)
  To: 9fans

> I did use an encoding trick to smuggle UTF through lex when I once
> wanted to.

Was this just having ‘宁静’ match ‘..’ or something more clever?

--Joel



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
  2007-02-04 21:01     ` Russ Cox
  2007-02-04 22:05       ` Joel Salomon
@ 2007-02-05  1:06       ` geoff
  2007-02-05  2:49         ` Joel Salomon
  1 sibling, 1 reply; 12+ messages in thread
From: geoff @ 2007-02-05  1:06 UTC (permalink / raw)
  To: 9fans

It's lex that doesn't understand UTF, though I did use
an encoding trick to smuggle UTF through lex when I once
wanted to.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
  2007-02-04 21:01     ` Russ Cox
@ 2007-02-04 22:05       ` Joel Salomon
  2007-02-05  1:06       ` geoff
  1 sibling, 0 replies; 12+ messages in thread
From: Joel Salomon @ 2007-02-04 22:05 UTC (permalink / raw)
  To: 9fans

> I just tried your program and it worked fine for
> me once I changed yylex to return actual Unicode
> values instead of byte values.  (There is a difference
> between Bgetc and Bgetrune.)

D’oh!

> > Depending on how I write the constants, yacc may or may not accept
> > the grammar, and when it does accept it the resulting program
> > suicides at start-up.
> 
> Oh, I did have to fix this too.  But this has nothing to do with
> Unicode.  Use '*' and it will still die.  This one you should be able
> to figure out on your own, with help from acid or db.

Can I get a hint on how to use these for my program?

cpu% hoc1
hoc1 38691: suicide: sys: trap: fault write addr=0x10 pc=0x000068d7
cpu% db 38691
386 binary
page fault
/sys/src/libbio/binit.c:66 Binits+c0/               	MOVL	$1,10(BP)

But I can’t tell how I’m using Binit(2) wrong—

—never mind; added
	src = malloc(sizeof *src);
and all is well.  Now I need to guess what was different in the
version of the code that worked…

Thanks for the help,

--Joel



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
  2007-02-04 21:20 erik quanstrom
@ 2007-02-04 21:26 ` erik quanstrom
  0 siblings, 0 replies; 12+ messages in thread
From: erik quanstrom @ 2007-02-04 21:26 UTC (permalink / raw)
  To: 9fans

given russ' observation, i'm probablly wrong about the sucicide.

- erik


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
@ 2007-02-04 21:20 erik quanstrom
  2007-02-04 21:26 ` erik quanstrom
  0 siblings, 1 reply; 12+ messages in thread
From: erik quanstrom @ 2007-02-04 21:20 UTC (permalink / raw)
  To: 9fans

i think the key is that you need to define
	%left MUL
rather than trying to use '×'.  this is going to try to index
an array of len 255 by the codepoint of '×', which is why you're
getting a sucicide.

also, if you are doing this, you need to replace each Bgetc with Bgetrune.
and Bungetc with Bungetrune. here's about what you want.  but, warning,
i didn't compile this.  


	int
	getr(void *)
	{
		return Bgetrune(src);
	}

	int
	yylex(void)
	{
		int r;
	
		while((r = Bgetrune(src)) == ' ' || c == '\t')
			;
		if(r == Beof)
			return 0;
		if(r == '.' || (isasciirune(r) && isdigitrune(r))){
			Bungetrune(src);
			yylval = charstod(getr, 0);
			Bungetrune(src);
			return NUMBER;
		}
		if(r == L'×')
			return MUL;
		if(c == '\n')
			lineno++;
		return r;	
}	

you can also look at the patch i submitted to hoc
last summer to allow runic constants like π and γ.

- erik


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
  2007-02-04 20:52   ` Joel Salomon
@ 2007-02-04 21:01     ` Russ Cox
  2007-02-04 22:05       ` Joel Salomon
  2007-02-05  1:06       ` geoff
  0 siblings, 2 replies; 12+ messages in thread
From: Russ Cox @ 2007-02-04 21:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Then what does yacc(1) mean when it says that yacc accepts UTF input?
> Why should the value for non-terminals start at 0xE000 if yylex can't
> return 0x2254 and have yacc understand it?

I learn something new every day!  Neat.
I've never chosen to depend on that behavior.

I just tried your program and it worked fine for
me once I changed yylex to return actual Unicode
values instead of byte values.  (There is a difference
between Bgetc and Bgetrune.)

> I've attached a version of hoc1.y (from Kernighan & Pike) to which
> I've attempted to add '×' as a valid multiplication symbol.  Depending
> on how I write the constants, yacc may or may not accept the grammar,
> and when it does accept it the resulting program suicides at start-up.

Oh, I did have to fix this too.  But this has nothing to
do with Unicode.  Use '*' and it will still die.  This one
you should be able to figure out on your own, with help
from acid or db.

Russ

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
  2007-02-04 20:40 ` Russ Cox
@ 2007-02-04 20:52   ` Joel Salomon
  2007-02-04 21:01     ` Russ Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Joel Salomon @ 2007-02-04 20:52 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

> You need to treat non-ASCII UTF-8 the same way
> that you treat multiple characters.  That is, you
> implement '≔' the same way you'd implement ':=' or '+=':
> in the lexer as a named symbol like NUMBER and VAR.

Then what does yacc(1) mean when it says that yacc accepts UTF input?
Why should the value for non-terminals start at 0xE000 if yylex can’t
return 0x2254 and have yacc understand it?

I’ve attached a version of hoc1.y (from Kernighan & Pike) to which
I’ve attempted to add ‘×’ as a valid multiplication symbol.  Depending
on how I write the constants, yacc may or may not accept the grammar,
and when it does accept it the resulting program suicides at start-up.

--Joel

[-- Attachment #2: hoc1.y --]
[-- Type: text/plain, Size: 1214 bytes --]

%{
#define YYSTYPE double
%}
%token	NUMBER
%left	'+' '-'
%left	'*'  '/'
%%
list:	/* nothing */
	| list '\n'
	| list expr '\n'	{ print("\t%.8g\n", $2); }
	;
expr:	NUMBER		{ $$ = $1; }
	| expr '+' expr	{ $$ = $1 + $3; }
	| expr '-' expr	{ $$ = $1 - $3; }
	| expr '*' expr	{ $$ = $1 * $3; }
	| expr '×' expr	{ $$ = $1 * $3; }
	| expr '/' expr	{ $$ = $1 / $3; }
	| '(' expr ')'		{ $$ = $2; }
	;
%%
#include <u.h>
#include <libc.h>
#include <ctype.h>
#include <bio.h>

int	lineno = 1;
Biobuf	*src;

int	yylex(void);
int	yyparse(void);
void	yyerror(char *);


void
main(int, char *argv[])
{
	argv0 = argv[0];
	Binit(src, 0, OREAD);
	yyparse();
}

int
getc(void *)
{
	return Bgetc(src);
}

int
yylex(void)
{
	int c;
	
	while((c=Bgetc(src)) == ' ' || c == '\t')
		;
	if(c == Beof)
		return 0;
	if(c == '.' || (isascii(c) && isdigit(c))){
		Bungetc(src);
		yylval = charstod(getc, 0);
		Bungetc(src);
		return NUMBER;
	}
	if(c == '\n')
		lineno++;
	return c;	
}	

void
warning(char *s, char *t)
{
	fprint(2, "%s: %s", argv0, s);
	if(t)
		fprint(2, " %s", t);
	fprint(2, " near line %d\n", lineno);
}

void
yyerror(char *s)
{
	warning(s, 0);
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] yacc question
  2007-02-04 20:12 [9fans] yacc question Joel Salomon
@ 2007-02-04 20:40 ` Russ Cox
  2007-02-04 20:52   ` Joel Salomon
  0 siblings, 1 reply; 12+ messages in thread
From: Russ Cox @ 2007-02-04 20:40 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

You need to treat non-ASCII UTF-8 the same way
that you treat multiple characters.  That is, you
implement '≔' the same way you'd implement ':=' or '+=':
in the lexer as a named symbol like NUMBER and VAR.

Russ

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [9fans] yacc question
@ 2007-02-04 20:12 Joel Salomon
  2007-02-04 20:40 ` Russ Cox
  0 siblings, 1 reply; 12+ messages in thread
From: Joel Salomon @ 2007-02-04 20:12 UTC (permalink / raw)
  To: 9fans

How do I best tell yacc to expect a Rune?  For example, defining a
“colon-equals” assignment operator with
	%right	L'≔'
then adding it to the grammar with
	expr:	NUMBER
		| VAR		{ $$ = mem[$1]; }
		| VAR L'≔' expr	{ $$ = mem[$1] = $3; }
		| …
results in the yacc error message:
	fatal error:must specify type for  ≔, /usr/chesky/src/hak/hoc/hoc2.y:23
Expressing the character constant as '≔' rather than L'≔' gets rid of
that error message, but now I’m not confident that it’s looking for
the correct value; I find no reference to the character ≔ or to the
numbers 2254 or 8788 in y.tab.c.  (Although y.output does have it in
the proper place.) More disturbing, if I say
	%right	L'≔'
	…
	expr:	NUMBER
		| VAR		{ $$ = mem[$1]; }
		| VAR '≔' expr	{ $$ = mem[$1] = $3; }
(keeping the L prefix on one instace), yacc is satisfied as well.

What is the “right” way of using Rune literals in yacc?

--Joel



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-02-05  9:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-15 13:35 [9fans] Yacc question Brantley Coile
2007-02-04 20:12 [9fans] yacc question Joel Salomon
2007-02-04 20:40 ` Russ Cox
2007-02-04 20:52   ` Joel Salomon
2007-02-04 21:01     ` Russ Cox
2007-02-04 22:05       ` Joel Salomon
2007-02-05  1:06       ` geoff
2007-02-05  2:49         ` Joel Salomon
2007-02-05  3:06           ` geoff
2007-02-05  9:36             ` Bruce Ellis
2007-02-04 21:20 erik quanstrom
2007-02-04 21:26 ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).