* [9fans] Yacc question
@ 2004-10-15 13:35 Brantley Coile
0 siblings, 0 replies; 12+ messages in thread
From: Brantley Coile @ 2004-10-15 13:35 UTC (permalink / raw)
To: 9fans
What goes into the y.debug that is included if yydebug
is defined?
Brantley
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
2007-02-05 3:06 ` geoff
@ 2007-02-05 9:36 ` Bruce Ellis
0 siblings, 0 replies; 12+ messages in thread
From: Bruce Ellis @ 2007-02-05 9:36 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
cool hack, ESCAPE ESCAPE ... too much confusion.
On 2/5/07, geoff@plan9.bell-labs.com <geoff@plan9.bell-labs.com> wrote:
> I don't have access to the code currently, but I believe the trick was
> to have the function that lex calls to get its next input character
> notice the start of a non-ASCII UTF sequence, read the whole rune,
> convert it to \033 followed by the 4 hex digits of the rune's value,
> and pass those bytes consecutively to lex. Then yylex() would do the
> reverse translation from escaped hex back to a rune, so yacc's parser
> would see full runes.
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
2007-02-05 2:49 ` Joel Salomon
@ 2007-02-05 3:06 ` geoff
2007-02-05 9:36 ` Bruce Ellis
0 siblings, 1 reply; 12+ messages in thread
From: geoff @ 2007-02-05 3:06 UTC (permalink / raw)
To: 9fans
I don't have access to the code currently, but I believe the trick was
to have the function that lex calls to get its next input character
notice the start of a non-ASCII UTF sequence, read the whole rune,
convert it to \033 followed by the 4 hex digits of the rune's value,
and pass those bytes consecutively to lex. Then yylex() would do the
reverse translation from escaped hex back to a rune, so yacc's parser
would see full runes.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
2007-02-05 1:06 ` geoff
@ 2007-02-05 2:49 ` Joel Salomon
2007-02-05 3:06 ` geoff
0 siblings, 1 reply; 12+ messages in thread
From: Joel Salomon @ 2007-02-05 2:49 UTC (permalink / raw)
To: 9fans
> I did use an encoding trick to smuggle UTF through lex when I once
> wanted to.
Was this just having ‘宁静’ match ‘..’ or something more clever?
--Joel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
2007-02-04 21:01 ` Russ Cox
2007-02-04 22:05 ` Joel Salomon
@ 2007-02-05 1:06 ` geoff
2007-02-05 2:49 ` Joel Salomon
1 sibling, 1 reply; 12+ messages in thread
From: geoff @ 2007-02-05 1:06 UTC (permalink / raw)
To: 9fans
It's lex that doesn't understand UTF, though I did use
an encoding trick to smuggle UTF through lex when I once
wanted to.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
2007-02-04 21:01 ` Russ Cox
@ 2007-02-04 22:05 ` Joel Salomon
2007-02-05 1:06 ` geoff
1 sibling, 0 replies; 12+ messages in thread
From: Joel Salomon @ 2007-02-04 22:05 UTC (permalink / raw)
To: 9fans
> I just tried your program and it worked fine for
> me once I changed yylex to return actual Unicode
> values instead of byte values. (There is a difference
> between Bgetc and Bgetrune.)
D’oh!
> > Depending on how I write the constants, yacc may or may not accept
> > the grammar, and when it does accept it the resulting program
> > suicides at start-up.
>
> Oh, I did have to fix this too. But this has nothing to do with
> Unicode. Use '*' and it will still die. This one you should be able
> to figure out on your own, with help from acid or db.
Can I get a hint on how to use these for my program?
cpu% hoc1
hoc1 38691: suicide: sys: trap: fault write addr=0x10 pc=0x000068d7
cpu% db 38691
386 binary
page fault
/sys/src/libbio/binit.c:66 Binits+c0/ MOVL $1,10(BP)
But I can’t tell how I’m using Binit(2) wrong—
—never mind; added
src = malloc(sizeof *src);
and all is well. Now I need to guess what was different in the
version of the code that worked…
Thanks for the help,
--Joel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
2007-02-04 21:20 erik quanstrom
@ 2007-02-04 21:26 ` erik quanstrom
0 siblings, 0 replies; 12+ messages in thread
From: erik quanstrom @ 2007-02-04 21:26 UTC (permalink / raw)
To: 9fans
given russ' observation, i'm probablly wrong about the sucicide.
- erik
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
@ 2007-02-04 21:20 erik quanstrom
2007-02-04 21:26 ` erik quanstrom
0 siblings, 1 reply; 12+ messages in thread
From: erik quanstrom @ 2007-02-04 21:20 UTC (permalink / raw)
To: 9fans
i think the key is that you need to define
%left MUL
rather than trying to use '×'. this is going to try to index
an array of len 255 by the codepoint of '×', which is why you're
getting a sucicide.
also, if you are doing this, you need to replace each Bgetc with Bgetrune.
and Bungetc with Bungetrune. here's about what you want. but, warning,
i didn't compile this.
int
getr(void *)
{
return Bgetrune(src);
}
int
yylex(void)
{
int r;
while((r = Bgetrune(src)) == ' ' || c == '\t')
;
if(r == Beof)
return 0;
if(r == '.' || (isasciirune(r) && isdigitrune(r))){
Bungetrune(src);
yylval = charstod(getr, 0);
Bungetrune(src);
return NUMBER;
}
if(r == L'×')
return MUL;
if(c == '\n')
lineno++;
return r;
}
you can also look at the patch i submitted to hoc
last summer to allow runic constants like π and γ.
- erik
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
2007-02-04 20:52 ` Joel Salomon
@ 2007-02-04 21:01 ` Russ Cox
2007-02-04 22:05 ` Joel Salomon
2007-02-05 1:06 ` geoff
0 siblings, 2 replies; 12+ messages in thread
From: Russ Cox @ 2007-02-04 21:01 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> Then what does yacc(1) mean when it says that yacc accepts UTF input?
> Why should the value for non-terminals start at 0xE000 if yylex can't
> return 0x2254 and have yacc understand it?
I learn something new every day! Neat.
I've never chosen to depend on that behavior.
I just tried your program and it worked fine for
me once I changed yylex to return actual Unicode
values instead of byte values. (There is a difference
between Bgetc and Bgetrune.)
> I've attached a version of hoc1.y (from Kernighan & Pike) to which
> I've attempted to add '×' as a valid multiplication symbol. Depending
> on how I write the constants, yacc may or may not accept the grammar,
> and when it does accept it the resulting program suicides at start-up.
Oh, I did have to fix this too. But this has nothing to
do with Unicode. Use '*' and it will still die. This one
you should be able to figure out on your own, with help
from acid or db.
Russ
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
2007-02-04 20:40 ` Russ Cox
@ 2007-02-04 20:52 ` Joel Salomon
2007-02-04 21:01 ` Russ Cox
0 siblings, 1 reply; 12+ messages in thread
From: Joel Salomon @ 2007-02-04 20:52 UTC (permalink / raw)
To: 9fans
[-- Attachment #1: Type: text/plain, Size: 714 bytes --]
> You need to treat non-ASCII UTF-8 the same way
> that you treat multiple characters. That is, you
> implement '≔' the same way you'd implement ':=' or '+=':
> in the lexer as a named symbol like NUMBER and VAR.
Then what does yacc(1) mean when it says that yacc accepts UTF input?
Why should the value for non-terminals start at 0xE000 if yylex can’t
return 0x2254 and have yacc understand it?
I’ve attached a version of hoc1.y (from Kernighan & Pike) to which
I’ve attempted to add ‘×’ as a valid multiplication symbol. Depending
on how I write the constants, yacc may or may not accept the grammar,
and when it does accept it the resulting program suicides at start-up.
--Joel
[-- Attachment #2: hoc1.y --]
[-- Type: text/plain, Size: 1214 bytes --]
%{
#define YYSTYPE double
%}
%token NUMBER
%left '+' '-'
%left '*' '/'
%%
list: /* nothing */
| list '\n'
| list expr '\n' { print("\t%.8g\n", $2); }
;
expr: NUMBER { $$ = $1; }
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '×' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '(' expr ')' { $$ = $2; }
;
%%
#include <u.h>
#include <libc.h>
#include <ctype.h>
#include <bio.h>
int lineno = 1;
Biobuf *src;
int yylex(void);
int yyparse(void);
void yyerror(char *);
void
main(int, char *argv[])
{
argv0 = argv[0];
Binit(src, 0, OREAD);
yyparse();
}
int
getc(void *)
{
return Bgetc(src);
}
int
yylex(void)
{
int c;
while((c=Bgetc(src)) == ' ' || c == '\t')
;
if(c == Beof)
return 0;
if(c == '.' || (isascii(c) && isdigit(c))){
Bungetc(src);
yylval = charstod(getc, 0);
Bungetc(src);
return NUMBER;
}
if(c == '\n')
lineno++;
return c;
}
void
warning(char *s, char *t)
{
fprint(2, "%s: %s", argv0, s);
if(t)
fprint(2, " %s", t);
fprint(2, " near line %d\n", lineno);
}
void
yyerror(char *s)
{
warning(s, 0);
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [9fans] yacc question
2007-02-04 20:12 [9fans] yacc question Joel Salomon
@ 2007-02-04 20:40 ` Russ Cox
2007-02-04 20:52 ` Joel Salomon
0 siblings, 1 reply; 12+ messages in thread
From: Russ Cox @ 2007-02-04 20:40 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
You need to treat non-ASCII UTF-8 the same way
that you treat multiple characters. That is, you
implement '≔' the same way you'd implement ':=' or '+=':
in the lexer as a named symbol like NUMBER and VAR.
Russ
^ permalink raw reply [flat|nested] 12+ messages in thread
* [9fans] yacc question
@ 2007-02-04 20:12 Joel Salomon
2007-02-04 20:40 ` Russ Cox
0 siblings, 1 reply; 12+ messages in thread
From: Joel Salomon @ 2007-02-04 20:12 UTC (permalink / raw)
To: 9fans
How do I best tell yacc to expect a Rune? For example, defining a
“colon-equals” assignment operator with
%right L'≔'
then adding it to the grammar with
expr: NUMBER
| VAR { $$ = mem[$1]; }
| VAR L'≔' expr { $$ = mem[$1] = $3; }
| …
results in the yacc error message:
fatal error:must specify type for ≔, /usr/chesky/src/hak/hoc/hoc2.y:23
Expressing the character constant as '≔' rather than L'≔' gets rid of
that error message, but now I’m not confident that it’s looking for
the correct value; I find no reference to the character ≔ or to the
numbers 2254 or 8788 in y.tab.c. (Although y.output does have it in
the proper place.) More disturbing, if I say
%right L'≔'
…
expr: NUMBER
| VAR { $$ = mem[$1]; }
| VAR '≔' expr { $$ = mem[$1] = $3; }
(keeping the L prefix on one instace), yacc is satisfied as well.
What is the “right” way of using Rune literals in yacc?
--Joel
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-02-05 9:36 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-15 13:35 [9fans] Yacc question Brantley Coile
2007-02-04 20:12 [9fans] yacc question Joel Salomon
2007-02-04 20:40 ` Russ Cox
2007-02-04 20:52 ` Joel Salomon
2007-02-04 21:01 ` Russ Cox
2007-02-04 22:05 ` Joel Salomon
2007-02-05 1:06 ` geoff
2007-02-05 2:49 ` Joel Salomon
2007-02-05 3:06 ` geoff
2007-02-05 9:36 ` Bruce Ellis
2007-02-04 21:20 erik quanstrom
2007-02-04 21:26 ` erik quanstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).