9front - general discussion about 9front
 help / color / mirror / Atom feed
* [9front] cpp bug?
@ 2024-12-18 21:13 James Cook
  2024-12-18 21:47 ` ori
  0 siblings, 1 reply; 7+ messages in thread
From: James Cook @ 2024-12-18 21:13 UTC (permalink / raw)
  To: 9front

cpp seems to evaluate arguments too eagerly when a macro expands
to another macro with arguments. Discovered while trying to build
a recent perl.

My knowledge on the topic is pretty fuzzy, but I'm guessing it would
be better to behave like Clang and gcc; at least, perl's source code
seems to assume things work that way. (First and third examples directly
inspired by problems I'm having to work around.)

A fix would be lovely of course but otherwise I might make an attempt
one at some point if it's agreed cpp is doing the wrong thing.

First example:

	#define X 5
	#define Y(A) A ## foo
	#define Z Y(X)
	Z

Ignoring whitespace, cpp -P says

	5foo

bot both Clang's and gcc's C preprocessors (cc -E and egcc -E on
OpenBSD) say

	Xfoo

Second example:

	#define X 5
	#define Y(A) #A
	#define Z Y(X)
	Z

cpp says

	"5"

Clang and gcc say

	"X"

Third example:

	#define X
	#define Y(A) foo
	#define Z Y(X)
	Z

cpp emits an error "cpp: f.c:4 Disagreement in number of macro
arguments" and its output (sans whitespace) is Y(). Clang and gcc
say foo.

-- 
James

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9front] cpp bug?
  2024-12-18 21:13 [9front] cpp bug? James Cook
@ 2024-12-18 21:47 ` ori
  2024-12-18 23:59   ` Michael Forney
  0 siblings, 1 reply; 7+ messages in thread
From: ori @ 2024-12-18 21:47 UTC (permalink / raw)
  To: 9front

ugh. yeah, probably bugs; cpp is at least easy to debug,
but the algorithm for expansion is annoying, and perl
exercises it in all the worst ways.

it would be useful to see what the standard wants us tu
do here.

Quoth James Cook <falsifian@falsifian.org>:
> cpp seems to evaluate arguments too eagerly when a macro expands
> to another macro with arguments. Discovered while trying to build
> a recent perl.
> 
> My knowledge on the topic is pretty fuzzy, but I'm guessing it would
> be better to behave like Clang and gcc; at least, perl's source code
> seems to assume things work that way. (First and third examples directly
> inspired by problems I'm having to work around.)
> 
> A fix would be lovely of course but otherwise I might make an attempt
> one at some point if it's agreed cpp is doing the wrong thing.
> 
> First example:
> 
> 	#define X 5
> 	#define Y(A) A ## foo
> 	#define Z Y(X)
> 	Z
> 
> Ignoring whitespace, cpp -P says
> 
> 	5foo
> 
> bot both Clang's and gcc's C preprocessors (cc -E and egcc -E on
> OpenBSD) say
> 
> 	Xfoo
> 
> Second example:
> 
> 	#define X 5
> 	#define Y(A) #A
> 	#define Z Y(X)
> 	Z
> 
> cpp says
> 
> 	"5"
> 
> Clang and gcc say
> 
> 	"X"
> 
> Third example:
> 
> 	#define X
> 	#define Y(A) foo
> 	#define Z Y(X)
> 	Z
> 
> cpp emits an error "cpp: f.c:4 Disagreement in number of macro
> arguments" and its output (sans whitespace) is Y(). Clang and gcc
> say foo.
> 
> -- 
> James


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9front] cpp bug?
  2024-12-18 21:47 ` ori
@ 2024-12-18 23:59   ` Michael Forney
  2024-12-21 16:33     ` James Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Forney @ 2024-12-18 23:59 UTC (permalink / raw)
  To: 9front

ori@eigenstate.org wrote:
> ugh. yeah, probably bugs; cpp is at least easy to debug,
> but the algorithm for expansion is annoying, and perl
> exercises it in all the worst ways.
> 
> it would be useful to see what the standard wants us tu
> do here.

The key detail is in C99 6.10.3.1p1 (emphasis mine):
> After the arguments for the invocation of a function-like macro
> have been identified, argument substitution takes place. A parameter
> in the replacement list, **unless preceded by a # or ## preprocessing
> token or followed by a ## preprocessing token (see below)**, is
> replaced by the corresponding argument after all macros contained
> therein have been expanded. Before being substituted, each
> argument's preprocessing tokens are completely macro replaced as
> if they formed the rest of the preprocessing file; no other
> preprocessing tokens are available.

Since in the body of the Y macro, A is followed by the ##, it
is does not undergo argument substitution.

Instead, by 6.10.3.3p2:
> the parameter is replaced by the corresponding argument's
> preprocessing token sequence

Something similar happens for the # operator.

It's also possible that a parameter appears both with and without
a token operator. Here's another example:

	#define X 5
	#define Y(A) A ## foo A #A
	Y(X)

This should result in

	Xfoo 5 "X"

since only the second instance of A undergoes argument substitution
and expansion. The others are simply replaced with the argument
token sequence.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9front] cpp bug?
  2024-12-18 23:59   ` Michael Forney
@ 2024-12-21 16:33     ` James Cook
  2024-12-21 17:27       ` rod
  2024-12-21 20:57       ` ori
  0 siblings, 2 replies; 7+ messages in thread
From: James Cook @ 2024-12-21 16:33 UTC (permalink / raw)
  To: 9front

Michael Forney <mforney@mforney.org> wrote:
> ori@eigenstate.org wrote:
> > ugh. yeah, probably bugs; cpp is at least easy to debug,
> > but the algorithm for expansion is annoying, and perl
> > exercises it in all the worst ways.
> > 
> > it would be useful to see what the standard wants us tu
> > do here.
> 
> The key detail is in C99 6.10.3.1p1 (emphasis mine):
> > After the arguments for the invocation of a function-like macro
> > have been identified, argument substitution takes place. A parameter
> > in the replacement list, **unless preceded by a # or ## preprocessing
> > token or followed by a ## preprocessing token (see below)**, is
> > replaced by the corresponding argument after all macros contained
> > therein have been expanded. Before being substituted, each
> > argument's preprocessing tokens are completely macro replaced as
> > if they formed the rest of the preprocessing file; no other
> > preprocessing tokens are available.
> 
> Since in the body of the Y macro, A is followed by the ##, it
> is does not undergo argument substitution.
> 
> Instead, by 6.10.3.3p2:
> > the parameter is replaced by the corresponding argument's
> > preprocessing token sequence
> 
> Something similar happens for the # operator.
> 
> It's also possible that a parameter appears both with and without
> a token operator. Here's another example:
> 
> 	#define X 5
> 	#define Y(A) A ## foo A #A
> 	Y(X)
> 
> This should result in
> 
> 	Xfoo 5 "X"
> 
> since only the second instance of A undergoes argument substitution
> and expansion. The others are simply replaced with the argument
> token sequence.

Thanks for the explanation. This shows cpp is wrong in my first and
second examples.

I looked into my third example a bit more. I simplified it to this:

        #define X(A) foo
        X()

Clang and gcc both say foo. cpp complains: "cpp: f.c:2 Disagreement
in number of macro arguments" and outputs X().

I don't own the official standard, but I've been looking at the
draft at https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

I found a hint in 6.10.3 point 4 of the draft (emphasis mine):

> If the identifier-list in the macro definition does not end with
> an ellipsis, the number of arguments (**including those arguments
> consisting of no preprocessing tokens**) in an invocation of a
> function-like macro shal equal the number of parameters in the
> macro definition. Otherwise...

That seems to indicate the invocation X() can be considered as
passing a single empty argument to the function-like macro X, so
cpp is wrong to complain, and should have performed the substitution
like Clang and gcc.

I wasn't able to find a more explicit statement that an empty
argument list can count as a single empty argument. The closest I
could find is that point 11 says the arguments are separated by
commas (which at least doesn't contradict the possibility of an
empty argument).

-- 
James

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9front] cpp bug?
  2024-12-21 16:33     ` James Cook
@ 2024-12-21 17:27       ` rod
  2024-12-22  1:30         ` ori
  2024-12-21 20:57       ` ori
  1 sibling, 1 reply; 7+ messages in thread
From: rod @ 2024-12-21 17:27 UTC (permalink / raw)
  To: 9front

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

>Would you mind sharing your change with the 9front list?

As requested, attached is a diff of cpp/macro.c with a fix
for the incorrect macro expansions.

Please do note that the diff is off the original sources version
of cpp, so may be of limited use. The 9front version has undergone
some subsequent changes that I haven't kept up to date with.

-rod

[-- Attachment #2: macro.diff --]
[-- Type: text/plain, Size: 4004 bytes --]

/n/sources/plan9/sys/src/cmd/cpp/macro.c:74,79 - macro.c:74,84
  	if (((trp->lp)-1)->type==NL)
  		trp->lp -= 1;
  	def = normtokenrow(trp);
+ 
+ 	/* ## token in macro body is concatenation */
+ 	for(tp = def->bp; tp < def->lp; tp++)
+ 		if(tp->type == DSHARP) tp->type = DSHARP1;
+ 
  	if (np->flag&ISDEFINED) {
  		if (comparetokens(def, np->vp)
  		 || (np->ap==NULL) != (args==NULL)
/n/sources/plan9/sys/src/cmd/cpp/macro.c:187,197 - macro.c:192,204
  expand(Tokenrow *trp, Nlist *np, int inmacro)
  {
  	Tokenrow ntr;
- 	int ntokc, narg, i;
+ 	int ntokc, narg, nparam, i;
  	Token *tp;
  	Tokenrow *atr[NARG+1];
  	int hs;
  
+ 	USED(inmacro);
+ 
  	copytokenrow(&ntr, np->vp);		/* copy macro value */
  	if (np->ap==NULL)			/* parameterless */
  		ntokc = 1;
/n/sources/plan9/sys/src/cmd/cpp/macro.c:202,208 - macro.c:209,223
  			/* gatherargs has already pushed trp->tr to the next token */
  			return;
  		}
- 		if (narg != rowlen(np->ap)) {
+ 		nparam = rowlen(np->ap);
+ 
+ 		if(narg == 0 && nparam == 1) {
+ 			atr[0] = new(Tokenrow);
+ 			maketokenrow(0, atr[0]);
+ 			narg = 1;
+ 		}
+ 
+ 		if (narg != nparam) {
  			error(ERROR, "Disagreement in number of macro arguments");
  			trp->tp->hideset = newhideset(trp->tp->hideset, np);
  			trp->tp += ntokc;
/n/sources/plan9/sys/src/cmd/cpp/macro.c:214,221 - macro.c:229,235
  			dofree(atr[i]);
  		}
  	}
- 	if(!inmacro)
- 		doconcat(&ntr);				/* execute ## operators */
+ 	doconcat(&ntr);				/* execute ## operators */
  	hs = newhideset(trp->tp->hideset, np);
  	for (tp=ntr.bp; tp<ntr.lp; tp++) {	/* distribute hidesets */
  		if (tp->type==NAME) {
/n/sources/plan9/sys/src/cmd/cpp/macro.c:311,318 - macro.c:325,330
  		}
  		if (lp->type==RP)
  			parens--;
- 		if (lp->type==DSHARP)
- 			lp->type = DSHARP1;	/* ## not special in arg */
  		if ((lp->type==COMMA && parens==0) || (parens<0 && (lp-1)->type!=LP)) {
  			if (lp->type == COMMA && dots && *narg == dots-1)
  				continue;
/n/sources/plan9/sys/src/cmd/cpp/macro.c:337,342 - macro.c:349,358
  	Tokenrow tatr;
  	Token *tp;
  	int ntok, argno;
+ 	int before, after, blank;
+ 	static Token t_placeholder[1] = {NAME,0,0,0,0,(uchar*)""};
+ 	static Tokenrow placeholder = { t_placeholder,
+ 		t_placeholder, &t_placeholder[1] };
  
  	for (rtr->tp=rtr->bp; rtr->tp<rtr->lp; ) {
  		if (rtr->tp->type==SHARP) {	/* string operator */
/n/sources/plan9/sys/src/cmd/cpp/macro.c:353,363 - macro.c:369,382
  		}
  		if (rtr->tp->type==NAME
  		 && (argno = lookuparg(np, rtr->tp)) >= 0) {
- 			if (rtr->tp < rtr->bp)
- 				error(ERROR, "access out of bounds");
- 			if ((rtr->tp+1)->type==DSHARP
- 			 || rtr->tp!=rtr->bp && (rtr->tp-1)->type==DSHARP)
- 				insertrow(rtr, 1, atr[argno]);
+ 
+ 			after = rtr->tp < rtr->lp-1 && rtr->tp[1].type == DSHARP1;
+ 			before = rtr->tp > rtr->bp && rtr->tp[-1].type == DSHARP1;
+ 			blank = atr[argno]->tp == atr[argno]->lp;
+ 
+ 			if(blank || before || after) {
+ 				insertrow(rtr, 1, blank ? &placeholder : atr[argno]);
+ 			}
  			else {
  				copytokenrow(&tatr, atr[argno]);
  				expandrow(&tatr, "<macro>", Inmacro);
/n/sources/plan9/sys/src/cmd/cpp/macro.c:381,389 - macro.c:400,406
  	int len;
  
  	for (trp->tp=trp->bp; trp->tp<trp->lp; trp->tp++) {
- 		if (trp->tp->type==DSHARP1)
- 			trp->tp->type = DSHARP;
- 		else if (trp->tp->type==DSHARP) {
+ 		if (trp->tp->type==DSHARP1) {
  			char tt[128];
  			ltp = trp->tp-1;
  			ntp = trp->tp+1;
/n/sources/plan9/sys/src/cmd/cpp/macro.c:391,396 - macro.c:408,428
  				error(ERROR, "## occurs at border of replacement");
  				continue;
  			}
+ 
+ 			if(ltp->len == 0) {
+ 				trp->tp++;
+ 				adjustrow(trp, -2); /* delete (placeholder, ##) */
+ 				trp->tp -= 3;
+ 				continue;
+ 			}
+ 
+ 			if(ntp->len == 0) {
+ 				trp->tp += 2;
+ 				adjustrow(trp, -2); /* delete (##, placeholder) */
+ 				trp->tp -= 3;
+ 				continue;
+ 			}
+ 
  			len = ltp->len + ntp->len;
  			strncpy((char*)tt, (char*)ltp->t, ltp->len);
  			strncpy((char*)tt+ltp->len, (char*)ntp->t, ntp->len);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9front] cpp bug?
  2024-12-21 16:33     ` James Cook
  2024-12-21 17:27       ` rod
@ 2024-12-21 20:57       ` ori
  1 sibling, 0 replies; 7+ messages in thread
From: ori @ 2024-12-21 20:57 UTC (permalink / raw)
  To: 9front

Quoth James Cook <falsifian@falsifian.org>:
> 
> I wasn't able to find a more explicit statement that an empty
> argument list can count as a single empty argument. The closest I
> could find is that point 11 says the arguments are separated by
> commas (which at least doesn't contradict the possibility of an
> empty argument).

Yes, there are a couple of known cases; see test/edges.in:

	/*
	 * CURRENTLY BROKEN:
	 *     __VA_ARGS__ requires at least one item.
	 *     It should accept an empty list.
	#define xprint(a, ...)	print(a, __VA_ARGS__)
	xprint("hi", "there")
	xprint("hi")
	*/
	
	/*
	 * CURRENTLY BROKEN:
	 *     __VA_ARGS__ requires at least one item.
	 *     It should accept an empty list.
	#define xprint(a, ...)	print(a, __VA_ARGS__)
	xprint("hi", "there")
	xprint("hi")
	*/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9front] cpp bug?
  2024-12-21 17:27       ` rod
@ 2024-12-22  1:30         ` ori
  0 siblings, 0 replies; 7+ messages in thread
From: ori @ 2024-12-22  1:30 UTC (permalink / raw)
  To: 9front

Quoth rod@hemiola.co.uk:
> >Would you mind sharing your change with the 9front list?
> 
> As requested, attached is a diff of cpp/macro.c with a fix
> for the incorrect macro expansions.
> 
> Please do note that the diff is off the original sources version
> of cpp, so may be of limited use. The 9front version has undergone
> some subsequent changes that I haven't kept up to date with.
> 
> -rod


Thanks!

I committed a part of this with for the empty args, with
modifications to handle empty __VA_ARGS__ lists.

I've got a bit more work to do to figure out how to deal
with the token pasting quirks; that stuff is pretty subtle,
and even OpenBSD's clang and gcc don't fully agree on it.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-22  1:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-18 21:13 [9front] cpp bug? James Cook
2024-12-18 21:47 ` ori
2024-12-18 23:59   ` Michael Forney
2024-12-21 16:33     ` James Cook
2024-12-21 17:27       ` rod
2024-12-22  1:30         ` ori
2024-12-21 20:57       ` ori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).