On Sat, Feb 17, 2024 at 08:34:28PM -0500, Rich Felker wrote:
> On Sun, Feb 18, 2024 at 12:45:35AM +0300, Valery Ushakov wrote:
> > On Fri, Feb 16, 2024 at 19:48:27 -0600, Rob Landley wrote:
> > 
> > > https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s
> > 
> > I haven't touched superh asm in a while and the code has zero
> > comments (*ugh*), but I *think* sigsetjmp clobbers caller's r8.
> > 
> > r8 is callee saved.  sigsetjmp wants to use it to save its env
> > argument across the call to __setjmp.  So it saves caller's r8 and
> > uses r8 to save its env b/c __setjmp it's about to call will clobber
> > it.  Then __setjmp saves this r8 = env in the jump buf, not caller's
> > r8.  The instruction in the delay slot of the tail call to
> > __sigsetjmp_tail vaguely looks like it might have been intended to
> > patch it, but it loads r8, not stores it. I'm not sure why it would
> > want to load r8 at that point.
> > 
> > Sorry, I only skimmed through the code and as I said, there're no
> > comments (which for asm code is borderline criminal, IMHO :) so I
> > might be completely misinterpreting what this code does...
> 
> I think you've stumbled across the bug even if you don't realize it.
> :-)
> 
> The idea of the sigsetjmp implementation is explained in commit
> 583e55122e767b1586286a0d9c35e2a4027998ab. As you've noted, comments
> would be nice, but I'm not sure where they belong since it's the same
> logic on every arch.
> 
> sigsetjmp wants to save a position in itself for longjmp to return to,
> but because sigsetjmp will be returning, it can't save any state on
> the stack; it doesn't have a stack frame. The only storage it has
> available is call-saved registers, which longjmp will necessarily
> restore for it. So, it picks one of its caller's call-saved registers
> (r8) and stores that inside the sigjmp_buf, then uses that register to
> preserve the pointer to the sigjmp_buf to access after setjmp returns
> (either the first time or the second time).
> 
> The only problem with the sh implementation is that, due to limited
> displacement range for load/store instructions, we have to add 60 to
> the sigjmp_buf base address to get the base register for accessing the
> saved return address and r8. This takes place in a temp register r6.
> However, the last line of this code path, the delay slot after the
> tail call to __sigsetjmp_tail, erroneously uses r4 (the base
> sigjmp_buf pointer) as the base for restoring r8:
> 
> 	 mov.l @(4+8,r4), r8
> 
> This corrupts the caller's saved register file. If I'm not mistaken,
> it overwrites the caller's r8 with the caller's r11.
> 
> Presumably Rob didn't actually hit a crash in sigsetjmp, but just
> inferred the crash was there via printf debugging. The actual crash
> must be in the caller once it gets back control with the wrong value
> in r8.
> 
> This has been an area I've wanted to have testing for for a long time,
> but I don't have a really good idea for how to implement the test. We
> would want the compiler to generate code that puts lots of
> intermediates in as many call-saved registers as possible, calls
> setjmp, then calls another function that *also* puts pressure on the
> register allocation and then calls longjmp. If it's okay for the test
> to be arch-specific, this could just work via __asm__ register
> constraints and a call to setjmp from asm, but I'm not sure if that
> would be a good way to do things in libc-test...
> 
> In any case, thanks to everyone who participated in this. This is a
> rather bad bug and a nice one to get fixed up in the release window!

The attached patch should fix it.

Rich