mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Not sure how to debug this one.
@ 2024-02-17  1:48 Rob Landley
  2024-02-17  3:23 ` [musl] Re: [Toybox] " Mouse
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Rob Landley @ 2024-02-17  1:48 UTC (permalink / raw)
  To: toybox, musl

While grinding away at release prep, I hit a WEIRD one. The qemu-system-sh4
target got broken by commit 3e0e8c687eee (PID 1 exits trying to run the init
script), which is the commit that changed the stdout buffering type.

It's not the kernel, if I use the last release kernel with the new root
filesystem I see the problem, and newly built kernel from today's git with last
release's initramfs.cpio.gz boots to a shell prompt.

The actual _problem_ is that sigsetjmp() is faulting (in sh.c function
run_command()), for NO OBVIOUS REASON. Calling memset() to zero the struct
before the sigsetjmp() works fine, but the sigsetjmp() call (built against
musl-libc) never returns.

Not siglongjmp, _sigsetjmp_. Which means it's failing somewhere in:

https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s

And I dunno how to stick a printf into superh assembly code.

The sigjmp_buf lives on the stack, but I confirmed it's 8 byte aligned, and not
even straddling a page boundary. I can access variables I stick before and after
it, so it can't be some kind of "fault due to guard page" weirdness? (I suppose
the optimizer may be invalidating that test, I could try adding "volatile"...)

While debugging I made the problem GO AWAY more than once by sticking printfs()
and similar into the code, but that's not FIXING it. Adding another sigjmp_buf
declaration and call to sigsetjmp() right at the start of the function works
fine (although the other one in the place it's in now still fails). I confirmed
that sigsetjmp() is annotated returns_twice in musl (but even if it _wasn't_
problems would show up when you did a longjmp, it wouldn't manifest as the first
call to setjmp never returning. This isn't making it to the line after the
function on the first pass through, even if I move it outside the if().

I confirmed it happens with both gcc 11.2 (musl 1.2.4?) and the older gcc 9.4
toolchain (musl 1.2.3? I think? It would be nice if musl actually had a way to
identify the version of installed library binaries).

I do not currently have a superh build of gdbserver and corresponding host gdb
that understands foreign binaries, and the last time I built gdb as part of a
cross compiler was many moons ago.

I'm open to suggestions, this one's funky. (The problem with trying to configure
the kernel to produce core dumps and compare against the readelf -d output is
it's running as PID 1. Um... maybe kgdb? Still think I need to build a host
cross-gdb to connect to it though...)

It would be really nice if somebody who understood the assembly could spot
something...

Rob

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [musl] Re: [Toybox] Not sure how to debug this one.
  2024-02-17  1:48 [musl] Not sure how to debug this one Rob Landley
@ 2024-02-17  3:23 ` Mouse
  2024-02-17 13:32   ` Rob Landley
  2024-02-17 17:02 ` [musl] " Rich Felker
  2024-02-17 21:45 ` [musl] " Valery Ushakov
  2 siblings, 1 reply; 21+ messages in thread
From: Mouse @ 2024-02-17  3:23 UTC (permalink / raw)
  To: toybox, musl

> While grinding away at release prep, I hit a WEIRD one.  The
> qemu-system-sh4 target got broken [...by...] the commit that changed
> the stdout buffering type.

> The actual _problem_ is that sigsetjmp() is faulting [...]
[...]
> While debugging I made the problem GO AWAY more than once by sticking
> printfs() and similar into the code, [...]

This smells to me like depending on uninitialized stack trash.

> Not siglongjmp, _sigsetjmp_.  Which means it's failing somewhere in:
> 
> https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s

> And I dunno how to stick a printf into superh assembly code.

The simple way to figure that out is to compile something that uses
printf and look at the assembly, either by using -save-temps or
equivalent or by disassembling the binary.

But, given what sigsetjmp is, sticking a printf in there is likely to
be more difficult than usual.

I know a little about Super-H from some Dreamcast hackery I did a while
back.  I had a look at the .s file you cite - thank you, musl-libc.org,
for resisting the stampede to try to ram HTTPS down everyone's
throat[%]! - and, while I can read it, there is too much I don't know
to really claim to understand it.  I can convert the assembly into
English, certainly, but I don't know how much that would help
(especially since it's the machine language, not assembly language, I
know; the SH assembler I've used is my own, with its own syntax, so I'm
having to guess at the meaning of some parts).

[%] Having HTTP support meant I could just look at the http: version
    instead of needing to wait until I could use a work machine.

> (The problem with trying to configure the kernel to produce core
> dumps and compare against the readelf -d output is it's running as
> PID 1.  [...])

Why is that a problem?  I don't see any statement of what kernel you're
running under, but I can think of two plausible reasons offhand: (1)
the kernel refuses to coredump PID 1 under any circumstances or (2)
there's no writable filesytem to take a coredump on at that point.

To address (1), I'd just build a kernel with that test diked out.

To address (2), I'd normally netboot.  It that's not feasible for some
reason, I'd probably hack on the kernel to remount / read-write before
starting userland.

Of course, you said qemu-something, so you are presumably running under
emulation.  In principle, you could figure this out from emulator
traces, but that is likely to be both extremely difficult and extremely
tedious.

But - you said memset-to-zero on the struct ran but didn't stop it from
failing.  I'd try memset to various other values, to see if you can
find one that makes it stop crashing.  If so, maybe the do two runs,
one with it memset to one value and one with it set to another, take
instruction traces, and see where they differ...?

> It would be really nice if somebody who understood the assembly could
> spot something...

Well, as I said, I can read it, mostly, but I don't know enough of the
context to know whether it's right or not.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [musl] Re: [Toybox] Not sure how to debug this one.
  2024-02-17  3:23 ` [musl] Re: [Toybox] " Mouse
@ 2024-02-17 13:32   ` Rob Landley
  2024-02-17 15:01     ` [musl] " Thorsten Glaser
  2024-02-17 15:21     ` [musl] " Mouse
  0 siblings, 2 replies; 21+ messages in thread
From: Rob Landley @ 2024-02-17 13:32 UTC (permalink / raw)
  To: Mouse, toybox, musl

On 2/16/24 21:23, Mouse wrote:
>> While grinding away at release prep, I hit a WEIRD one.  The
>> qemu-system-sh4 target got broken [...by...] the commit that changed
>> the stdout buffering type.
> 
>> The actual _problem_ is that sigsetjmp() is faulting [...]
> [...]
>> While debugging I made the problem GO AWAY more than once by sticking
>> printfs() and similar into the code, [...]
> 
> This smells to me like depending on uninitialized stack trash.

A write-only function that didn't change its behavior when I memset the
structure before calling it?

Define "uninitialized". (Unless you mean an uninitialized variable inside a
function written entirely in assembly, that's part of a C library shipped and
used by many people for many years?)

>> Not siglongjmp, _sigsetjmp_.  Which means it's failing somewhere in:
>> 
>> https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s
> 
>> And I dunno how to stick a printf into superh assembly code.
> 
> The simple way to figure that out is to compile something that uses
> printf and look at the assembly,

$ ccc/sh4-linux-musl-cross/bin/sh4-linux-musl-objdump -d
generated/unstripped/toybox | grep -A 60 '<printf>:'
0045341c <printf>:
  45341c:	86 2f       	mov.l	r8,@-r15
  45341e:	f8 e2       	mov	#-8,r2
  453420:	22 4f       	sts.l	pr,@-r15
  453422:	a8 7f       	add	#-88,r15
  453424:	18 d0       	mov.l	453488 <printf+0x6c>,r0	! 45622c <memcpy>
  453426:	f3 61       	mov	r15,r1
  453428:	18 71       	add	#24,r1
  45342a:	29 21       	and	r2,r1
  45342c:	43 68       	mov	r4,r8
  45342e:	13 62       	mov	r1,r2
  453430:	18 72       	add	#24,r2
  453432:	7a 11       	mov.l	r7,@(40,r1)
  453434:	04 72       	add	#4,r2
  453436:	58 11       	mov.l	r5,@(32,r1)
  453438:	13 63       	mov	r1,r3
  45343a:	69 11       	mov.l	r6,@(36,r1)
  45343c:	f3 65       	mov	r15,r5
  45343e:	aa f2       	fmov	fr10,@r2
  453440:	44 75       	add	#68,r5
  453442:	bb f2       	fmov	fr11,@-r2
  453444:	20 73       	add	#32,r3
  453446:	13 62       	mov	r1,r2
  453448:	10 72       	add	#16,r2
  45344a:	04 72       	add	#4,r2
  45344c:	f3 64       	mov	r15,r4
  45344e:	8a f2       	fmov	fr8,@r2
  453450:	14 e6       	mov	#20,r6
  453452:	9b f2       	fmov	fr9,@-r2
  453454:	13 62       	mov	r1,r2
  453456:	08 72       	add	#8,r2
  453458:	04 72       	add	#4,r2
  45345a:	6a f2       	fmov	fr6,@r2
  45345c:	04 71       	add	#4,r1
  45345e:	7b f2       	fmov	fr7,@-r2
  453460:	4a f1       	fmov	fr4,@r1
  453462:	5b f1       	fmov	fr5,@-r1
  453464:	12 15       	mov.l	r1,@(8,r5)
  453466:	2c 71       	add	#44,r1
  453468:	11 15       	mov.l	r1,@(4,r5)
  45346a:	60 e1       	mov	#96,r1
  45346c:	fc 31       	add	r15,r1
  45346e:	33 15       	mov.l	r3,@(12,r5)
  453470:	32 25       	mov.l	r3,@r5
  453472:	0b 40       	jsr	@r0
  453474:	14 15       	mov.l	r1,@(16,r5)
  453476:	05 d0       	mov.l	45348c <printf+0x70>,r0	! 45500c <vfprintf>
  453478:	05 d4       	mov.l	453490 <printf+0x74>,r4	! 4cc9ec <__stdout_FILE>
  45347a:	0b 40       	jsr	@r0
  45347c:	83 65       	mov	r8,r5
  45347e:	58 7f       	add	#88,r15
  453480:	26 4f       	lds.l	@r15+,pr
  453482:	0b 00       	rts	
  453484:	f6 68       	mov.l	@r15+,r8
  453486:	09 00       	nop	
  453488:	2c 62       	extu.b	r2,r2
  45348a:	45 00       	mov.w	r4,@(r0,r0)
  45348c:	0c 50       	mov.l	@(48,r0),r0
  45348e:	45 00       	mov.w	r4,@(r0,r0)
  453490:	ec c9       	and	#-20,r0
  453492:	4c 00       	mov.b	@(r0,r4),r0

That's just the vfprintf() wrapper, which has the actual plumbing for escape
parsing and such, and is of course running its output through the ascii FILE *
infrastructure.

No, I would wind up CALLING the function, meaning set up a call stack, but how
you're supposed to do that in the middle of setjmp() without corrupting the
registers you're supposed to be saving... even manually making a _system_call_
in that context is... I mean it's _documented_ in
https://man7.org/linux/man-pages/man2/syscall.2.html:

Arch/ABI    Instruction           System  Ret  Ret  Error    Notes
                                  call #  val  val2
───────────────────────────────────────────────────────────────────
superh      trapa #31             r3      r0   r1   -        4, 6

But again, the point is to SAVE those registers, in a defined order, and there's
no WAY to insert something that big into delicate assembly non-intrusively. This
already heisenbugs if my dprintf() is too elaborate.

> either by using -save-temps or
> equivalent or by disassembling the binary.

Did I mention I once stuck print-to-stderr debugging into the uclibc dynamic
loader while doing system bringup on the hexagon architecture? Which couldn't
use any global variables, function calls, or string constants because it hadn't
relocated itself yet so I assembled a message into a char buffer[] on the stack
and did a syscall(_nr_write).

Similar to debugging uboot before it relocated itself from NOR flash to sram
(and thus all the locations the linker had provided for symbols outside the
current function and stack were wrong), where debug output was a loop that wrote
a byte at a time to the serial port spinning checking the ready-for-next-byte
status bit. In that case I worked out the constants I needed to subtract from
"string constants" (because a string constant resolves to a pointer of type char
so you can "hello"-0x40800300 and that's a byte offset).

In theory the same technique would apply to function pointers (every function
name is a pointer) but the TYPE of said pointer is sizeof(function) and doing
math on them isn't really a thing, so you need to typecast to char and then BACK
again (and the syntax for function pointer typecasting has too many parentheses
in non-obvious locations, I generally find it easier to declare a function
pointer variable and then (void *) typecast assign to that), but that doesn't
help if the function then tries to call ANOTHER function, as so many of them do,
so... didn't turn out to be very useful.

But that's not what I was asking about HERE. "Magic blob of assembly for
architecture I'm not hugely familiar with is throwing an interrupt, I wonder why?"

> But, given what sigsetjmp is, sticking a printf in there is likely to
> be more difficult than usual.

Define "usual".

Oh, I forgot to mention that qemu-system-blah also has a -s option to launch a
gdbserver on a port. (Which as with all the classic qemu options is now
described in the --help text as "-s    shorthand for -gdb tcp::1234" which is
just sad. And that's _after_ they renamed it from
https://landley.net/notes-2008.html#19-03-2008 when it was apparently -g ?)

I believe qemu -s is emulating a jtag, kgdb is SORT of emulating a jtag, and
then normal gdbserver is providing userspace context debugging. Same protocol,
what differs is what the registers mean and symbol visibility/namespace context.
This is why having an unstripped "vmlinux" is so useful: it's an ELF kernel with
all the symbols so gdb can load it and give you kernel namespace context. Even
if what you actually RAN is one of the repackaged versions, the linking's
already been done so the memory layout's fixed.

Except on sparc, with RELOCATES ITSELF. No, I don't know why either, but I broke
it back under aboriginal and had to get help debugging it:

https://lkml.org/lkml/2011/11/12/57

I do not always have the relevant domain expertise, which is why I try to ask
people who _do_:

https://lkml.org/lkml/2011/12/14/324

(One of the big goals of aboriginal linux and now mkroot is the ability to
package up a test case that somebody can reproduce on their machine without
needing specific hardware, INCLUDING a portable build environment that lets them
rebuild the provided binaries. Hence self-contained qemu-system builds built
with provided portable toolchains that plug into a a build that's both "do this,
here's the output" AND "you don't have to use my wrapper, it should be obvious
what it does".)

> I know a little about Super-H from some Dreamcast hackery I did a while
> back.  I had a look at the .s file you cite - thank you, musl-libc.org,
> for resisting the stampede to try to ram HTTPS down everyone's
> throat[%]! - and, while I can read it, there is too much I don't know
> to really claim to understand it.  I can convert the assembly into
> English, certainly, but I don't know how much that would help
> (especially since it's the machine language, not assembly language, I
> know; the SH assembler I've used is my own, with its own syntax, so I'm
> having to guess at the meaning of some parts).

One of the private email replies that didn't go to the list (so I can't politely
publicly reply to it and maybe get more people who know stuff chiming in)
suggested trying it under qemu-user (which reproduced the issue! MUCH easier),
and provided better debug output: I got a register dump (with a program counter
I can probably dig through the sh4-linux-musl-objdump -d
generated/unstripped/toybox (or readelf -a) to identify the failing instruction):

Unhandled trap: 0x180
pc=0x3fffe6b0 sr=0x00000001 pr=0x00427c40 fpscr=0x00080000
spc=0x00000000 ssr=0x00000000 gbr=0x004cd9e0 vbr=0x00000000
sgr=0x00000000 dbr=0x00000000 delayed_pc=0x00451644 fpul=0x00000000
r0=0x3fffe6b0 r1=0x00000000 r2=0x00000000 r3=0x000000af
r4=0x00000002 r5=0x00481afc r6=0x407fffd0 r7=0x00000008
r8=0x3fffe6b0 r9=0x00456bb0 r10=0x004cea74 r11=0x3fffe6b0
r12=0x3fffe510 r13=0x00000000 r14=0x00456fd0 r15=0x407ffe88
r16=0x00000000 r17=0x00000000 r18=0x00000000 r19=0x00000000
r20=0x00000000 r21=0x00000000 r22=0x00000000 r23=0x00000000

And that ALSO says it's a trap 0x180 which in qemu:

sh7750_regs.h:#define SH7750_EVT_ILLEGAL_INSTR       0x180 /* General
Illegal Instruction */

I boggle. (I also tried backing up in qemu to see where it's generated from, but
alas this is MODERN qemu: the macro defined there is never used in the code, and
the fprintf() is the return code from a function that wraps a function pointer
call for a variable that is never assigned to in the sh architecture, so
probably initialized by a macro I can't grep for. Digging is ongoing.

He also pointed me at https://sourceware.org/bugzilla/show_bug.cgi?id=27543
which is interesting, but neither sigsetjmp.s nor the setjmp.S it calls have
those two floating point instructions. (Although it saves floating point
registers by number so... is this a synonym for the same thing? Floating point
flags in weird state throwing an exception that's showing up as illegal
instruction but is actually closer to a division by zero error or overflow or
something? Touched floating point register before setting FPU mode? Dunno. Hmmm,
is any of the code between the start of the function and the failure point doing
floating point math? There isn't any in toysh, but I can't guaratantee libc
functions like sprintf() don't use some, and somehow leave the FPU in a weird
state that faults trying to dump its registers? I'm guessing here...)

> [%] Having HTTP support meant I could just look at the http: version
>     instead of needing to wait until I could use a work machine.
> 
>> (The problem with trying to configure the kernel to produce core
>> dumps and compare against the readelf -d output is it's running as
>> PID 1.  [...])
> 
> Why is that a problem?  I don't see any statement of what kernel you're
> running under, but I can think of two plausible reasons offhand: (1)
> the kernel refuses to coredump PID 1 under any circumstances or (2)
> there's no writable filesytem to take a coredump on at that point.

The kernel panics immediately upon PID 1 exiting and even if the panic is
deferred until after it's written the core dump instead of a check at the START
of exiting, the writeable filesystem is initramfs which is transient.

Best case scenario would be _if_ the panic happens at the _end_ of exiting
(highly unlikely, but maybe patchable) setting up a network block device and
making it O_DIRECT somehow so the data goes out before the exit without being
delayed by disk cache or nagle or kernel tasklets being asynchronous or anything.

Once upon a time (like 2.0 or something) the kernel continued processing network
packets and such after panic, so setting up your firewall rules and then
intentionally panicing the kernel was considered the most secure way to set up a
Linux router. (Try exploiting a system with NO USERSPACE.) But alas, the kernel
got "improved" so that no longer works. (The theory was freeze file IO _now_
because we dunno what's corrupted, so flushing caches to disk and/or network
filesystems may make things worse, so STOP EVERYTHING and preserve as much
forensic evidence as possible in case of kernel crash dumps or kgdb or kexec on
panic or similar. Needing to keep the device you're writing kernel crash dumps
to active was, of course, one of those truly funky sequencing issues the kernel
got subtly wrong for many years, but the plumbing rewrite that gave us sysfs and
years of working on suspend sequencing finally straightened out the dependencies
I think?)

> To address (1), I'd just build a kernel with that test diked out.
> 
> To address (2), I'd normally netboot.  It that's not feasible for some
> reason, I'd probably hack on the kernel to remount / read-write before
> starting userland.

A) I believe you can still pass rw on the kernel command line, B) you can run a
dumb little statically linked shim.c as rdinit= to do stuff and then have it
exec() the next PID 1 process, that's fairly standard procedure in this context.

Don't have to modify the kernel for either, but "file reliably written out as
kernel is in the process of panicing"... I already mentioned kgdb, right?
There's a way to get a serial console out of it so the kernel itself is acting
as your debugger:

https://www.kernel.org/doc/html/v4.14/dev-tools/kgdb.html

There's some sort of unholy sacrifice a chicken in the summoning circle layering
violations going on when this happens, but yes you can panic to a kgdb console.
I've done it! Not recently though. (I suspending linux with kgdb and then
resuming still RCU timeout city on a modern kernel? Or did they fix that?)

But I only pull out gdb when I'm REALLY annoyed. (Cure worse than the disease.
Can't STAND the user interface...)

> Of course, you said qemu-something, so you are presumably running under
> emulation.  In principle, you could figure this out from emulator
> traces, but that is likely to be both extremely difficult and extremely
> tedious.
> 
> But - you said memset-to-zero on the struct ran but didn't stop it from
> failing.  I'd try memset to various other values, to see if you can
> find one that makes it stop crashing.

Except sigsetjmp() is writing to the structure. The function is not supposed to
be reading from the structure. The memset() was to dirty the memory so I could
be sure there wasn't some sort of -EACCESS or a soft fault from stack growth
somehow(?) causing a hiccup.

Rob

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] [Toybox] Not sure how to debug this one.
  2024-02-17 13:32   ` Rob Landley
@ 2024-02-17 15:01     ` Thorsten Glaser
  2024-02-17 15:21     ` [musl] " Mouse
  1 sibling, 0 replies; 21+ messages in thread
From: Thorsten Glaser @ 2024-02-17 15:01 UTC (permalink / raw)
  To: musl; +Cc: toybox

Rob Landley dixit:

>One of the private email replies that didn't go to the list (so I can't politely
>publicly reply to it and maybe get more people who know stuff chiming in)

Sorry, I should have mentioned I only replied privately because
I threw a ton of vague ideas which I wasn’t even sure were at all
relevant at you and wanted to avoid spamming the list. Feel free
to quote.

>On 2/16/24 21:23, Mouse wrote:

>> This smells to me like depending on uninitialized stack trash.
>
>A write-only function that didn't change its behavior when I memset the
>structure before calling it?

More shoots into the blue:

• run under valgrind
• see if the compiler didn’t optimise the memset away or use
  explicit_bzero instead

>> The simple way to figure that out is to compile something that uses
>> printf and look at the assembly,
>
>$ ccc/sh4-linux-musl-cross/bin/sh4-linux-musl-objdump -d
>generated/unstripped/toybox | grep -A 60 '<printf>:'
>0045341c <printf>:
[…]

I think Mouse meant the call site, not the callee.

>But again, the point is to SAVE those registers, in a defined order, and there's
>no WAY to insert something that big into delicate assembly non-intrusively. This

Yeah, tricky if one does not know the architecture…

>relocated itself yet so I assembled a message into a char buffer[] on the stack
>and did a syscall(_nr_write).

That would be another idea for here, but again, complexity…

>Oh, I forgot to mention that qemu-system-blah also has a -s option to launch a

This is useful, but often not so much to debug userspace.
I tend to try and figure out where userspace is running at,
then set a breakpoint and let the kernel run until there,
but you’d most likely also get interrupts and stuff, hence
the qemu-user suggestion.

>I do not always have the relevant domain expertise, which is why I try to ask
>people who _do_:
>
>https://lkml.org/lkml/2011/12/14/324

(Yes, the reloc should totally be split; it’s also possible
to have only a HI load, for example if it’s known the lower
ten bits to be zero. I’ve seen that on BSD/sparc.)

>suggested trying it under qemu-user (which reproduced the issue! MUCH easier),

Oh, good!

>And that ALSO says it's a trap 0x180 which in qemu:
>
>sh7750_regs.h:#define SH7750_EVT_ILLEGAL_INSTR       0x180 /* General
>Illegal Instruction */

O̲U̲C̲H̲.̲

>The kernel panics immediately upon PID 1 exiting and even if the panic is
>deferred until after it's written the core dump instead of a check at the START
>of exiting, the writeable filesystem is initramfs which is transient.

Can you do something like put the stuff from the initramfs instead
onto a normal filesystem on loopback of a file on your host, then
provide that as nbd, then boot the kernel with root=/dev/nbd0… or
something, nbd needs a tool to set this up first… or NFS, is there
a kernel-side NFS that doesn’t rely on initramfs setup?

Or, hey, for on-qemu-system debugging, just plug that as -hda or so.

>A) I believe you can still pass rw on the kernel command line

probably even rdsetroot…

>B) you can run a
>dumb little statically linked shim.c as rdinit= to do stuff and then have it
>exec() the next PID 1 process, that's fairly standard procedure in this context.

Indeed, that would be the next avenue.

>But I only pull out gdb when I'm REALLY annoyed. (Cure worse than the disease.
>Can't STAND the user interface...)

Having used Borland Turbo Debugger before, I concur, but I’ve had to
pull out gdb often enough to at least find my way around enough for
these kinds of debugging necessary.

Good luck,
//mirabilos
-- 
<ch> you introduced a merge commit        │<mika> % g rebase -i HEAD^^
<mika> sorry, no idea and rebasing just fscked │<mika> Segmentation
<ch> should have cloned into a clean repo      │  fault (core dumped)
<ch> if I rebase that now, it's really ugh     │<mika:#grml> wuahhhhhh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [musl] Re: [Toybox] Not sure how to debug this one.
  2024-02-17 13:32   ` Rob Landley
  2024-02-17 15:01     ` [musl] " Thorsten Glaser
@ 2024-02-17 15:21     ` Mouse
  1 sibling, 0 replies; 21+ messages in thread
From: Mouse @ 2024-02-17 15:21 UTC (permalink / raw)
  To: toybox, musl

>> This smells to me like depending on uninitialized stack trash.
> A write-only function that didn't change its behavior when I memset
> the structure before calling it?

The intent may be write-only.  The implementation may not be.

It also may be using _other_ stack trash, what would in a normal
function be its own local variables.  Continuing to fail when you've
zeroed the struct argues for this, actually; while it's _possible_ that
it would succeed with nonzero values somewhere in the struct, it
strikes me - and apparently you :-) - as a good deal less likely.

> Define "uninitialized".

"Contains junk left around from previous use of that memory",
approximately.

> (Unless you mean an uninitialized variable inside a function written
> entirely in assembly, that's part of a C library shipped and used by
> many people for many years?)

Hey, I found a use of uninitialized memory in NetBSD's libc, once.
(Admittedly, that one was concealed.)

>>> And I dunno how to stick a printf into superh assembly code.
>> The simple way to figure that out is to compile something that uses
>> printf and look at the assembly,

> 0045341c <printf>:
>   45341c:	86 2f       	mov.l	r8,@-r15
>   45341e:	f8 e2       	mov	#-8,r2
>   453420:	22 4f       	sts.l	pr,@-r15
[...]

> That's just the vfprintf() wrapper,

I actually meant to look at the assembly for the _call_ to printf.
That's what you'd want to mimic if you want to hand-code such a call,
after all.

> No, I would wind up CALLING the function, meaning set up a call
> stack, but how you're supposed to do that in the middle of setjmp()
> without corrupting the registers you're supposed to be saving...

That's what I was talking about when I wrote

>> But, given what sigsetjmp is, sticking a printf in there is likely
>> to be more difficult than usual.

> even manually making a _system_call_ in that context is... I mean
> it's _documented_ [...]

> But again, the point is to SAVE those registers, in a defined order,
> and there's no WAY to insert something that big into delicate
> assembly non-intrusively.  This already heisenbugs if my dprintf() is
> too elaborate.

Yes; that's in large part why I think it feels like use of stack trash.
That's probably the commonest cause of heisenbugs in my experience.

I think I would do that by saving registers in a global save area and
switching the stack pointer to separate, data-space, memory for the
duration of the printf call(s), so as to not disturb whatever is on the
stack.

Another thing I'd try is to insert code to modify (zero, set to all 1
bits, whatever - I'd experiment with various things) the theoretically
unused stack space below/above the current top-of-stack.  (Below in
terms of memory addresses, assuming of course that the ABI in use uses
a downward-growing stack; above in terms of stack depth.)

> But that's not what I was asking about HERE. "Magic blob of assembly
> for architecture I'm not hugely familiar with is throwing an
> interrupt, I wonder why?"

Fair point.  I was forgetting whom I was writing to.

>> But, given what sigsetjmp is, sticking a printf in there is likely
>> to be more difficult than usual.
> Define "usual".

Doing "the same thing" in an ordinary function, one that's not playing
unusual games with registers (and maybe stack space).

> One of the private email replies [...] suggested trying it under
> qemu-user (which reproduced the issue! MUCH easier),

Oh, good!  But....

> Unhandled trap: 0x180

> And that ALSO says it's a trap 0x180 which in qemu:

> sh7750_regs.h:#define SH7750_EVT_ILLEGAL_INSTR       0x180 /* General
> Illegal Instruction */

This leads me to wonder if it really _did_ reproduce the issue, as
opposed to breaking in a different way.

>>> (The problem with trying to configure the kernel to produce core
>>> dumps and compare against the readelf -d output is it's running as
>>> PID 1.  [...])
>> Why is that a problem?  I don't see any statement of what kernel
>> you're running under, but I can think of two plausible reasons
>> offhand: (1) the kernel refuses to coredump PID 1 under any
>> circumstances or (2) there's no writable filesytem to take a
>> coredump on at that point.
> The kernel panics immediately upon PID 1 exiting and even if the
> panic is deferred until after it's written the core dump instead of a
> check at the START of exiting, the writeable filesystem is initramfs
> which is transient.

:-(

The more I see of Linux the more I wonder why so many developers like
it so much.  But I suppose plenty of Linux people feel, or would feel,
analogously about my preferred environments....

> Once upon a time (like 2.0 or something) the kernel continued
> processing network packets and such after panic, so setting up your
> firewall rules and then intentionally panicing the kernel was
> considered the most secure way to set up a Linux router.

Elegant!  Twisted, but elegant!

> But I only pull out gdb when I'm REALLY annoyed.  (Cure worse than
> the disease.  Can't STAND the user interface...)

I can...tolerate it.  Though the gdb people really seem to be doing
their best to make it less and less tolerable; every once in a while,
when I tangle with a newer version at work, I have to go digging to
figure out how to turn off yet another obnoxious piece.  (The last
iteration of this, I actually had to dive into the source; their
documentation is only barely better than nothing.)

The closest thing I can recall using to a _good_ debugger for C was
probably ups.  Wish I'd kept a copy; it'd be an interesting task to
port it to something more modern, and if I succeeded it would be really
nice to have.

>> [...]
> Except sigsetjmp() is writing to the structure.  The function is not
> supposed to be reading from the structure.

It's not supposed to be crashing, either.

But, given what you said above, I now think it more likely it's using
stack trash from elsewhere on the stack.

Does qemu have the ability to track the initialized status of memory, a
la valgrind?  (I'd suggest valgrind itself except that (a) I think it
doesn't support Super-H, (b) I'd be surprised if you hadn't already
thought of it, and (c) it's intrusive enough it could well disturb a
heisenbug like this one.)  I added uninitialized tracking to my
userland SPARC emulator; it's what let me find that uninitialized
memory use in libc I mentioned above.  Something like that would be the
next tool I'd be looking for for this one.  (Indeed, if it were my
headache I'd likely take my userland SPARC emulator and replace the CPU
code with Super-H emulation.)

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4Bl

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Not sure how to debug this one.
  2024-02-17  1:48 [musl] Not sure how to debug this one Rob Landley
  2024-02-17  3:23 ` [musl] Re: [Toybox] " Mouse
@ 2024-02-17 17:02 ` Rich Felker
  2024-02-17 21:45 ` [musl] " Valery Ushakov
  2 siblings, 0 replies; 21+ messages in thread
From: Rich Felker @ 2024-02-17 17:02 UTC (permalink / raw)
  To: Rob Landley; +Cc: toybox, musl

On Fri, Feb 16, 2024 at 07:48:27PM -0600, Rob Landley wrote:
> While grinding away at release prep, I hit a WEIRD one. The qemu-system-sh4
> target got broken by commit 3e0e8c687eee (PID 1 exits trying to run the init
> script), which is the commit that changed the stdout buffering type.
> 
> It's not the kernel, if I use the last release kernel with the new root
> filesystem I see the problem, and newly built kernel from today's git with last
> release's initramfs.cpio.gz boots to a shell prompt.
> 
> The actual _problem_ is that sigsetjmp() is faulting (in sh.c function
> run_command()), for NO OBVIOUS REASON. Calling memset() to zero the struct
> before the sigsetjmp() works fine, but the sigsetjmp() call (built against
> musl-libc) never returns.
> 
> Not siglongjmp, _sigsetjmp_. Which means it's failing somewhere in:
> 
> https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s
> 
> And I dunno how to stick a printf into superh assembly code.

Rather than "stick a printf in there", can you identify (with gdb or
strace or qemu user execution tracing) exactly which instruction it's
crashing at, and the register values at the point of crash?

Provided it was called with a valid pointer to the sigjmp_buf, there
should be no way the initial call to sigsetjmp can segfault. The only
memory accesses it makes are to that object. It does make a call to
setjmp, which in theory could clobber the call-saved r8 containing
sigjmp_buf address, but setjmp does not do that. It's possible that,
on second return, this has been clobbered; even a single-byte buffer
overflow into the sigjmp_buf would do that, and sh may be unique in
having the relevant register at the beginning of the buffer, which
could explain it happening only on sh. But that would affect second
return not the first.

> The sigjmp_buf lives on the stack, but I confirmed it's 8 byte aligned, and not
> even straddling a page boundary. I can access variables I stick before and after
> it, so it can't be some kind of "fault due to guard page" weirdness? (I suppose
> the optimizer may be invalidating that test, I could try adding "volatile"...)
> 
> While debugging I made the problem GO AWAY more than once by sticking printfs()
> and similar into the code, but that's not FIXING it. Adding another sigjmp_buf
> declaration and call to sigsetjmp() right at the start of the function works
> fine (although the other one in the place it's in now still fails). I confirmed

This all suggests that there's a buffer overflow and shuffling things
around on the stack is preventing it. Have you tried running (even on
unaffected archs) under valgrind to look for such errors?

Rich

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [musl] Re: Not sure how to debug this one.
  2024-02-17  1:48 [musl] Not sure how to debug this one Rob Landley
  2024-02-17  3:23 ` [musl] Re: [Toybox] " Mouse
  2024-02-17 17:02 ` [musl] " Rich Felker
@ 2024-02-17 21:45 ` Valery Ushakov
  2024-02-17 23:09   ` Thorsten Glaser
  2024-02-18  1:34   ` Rich Felker
  2 siblings, 2 replies; 21+ messages in thread
From: Valery Ushakov @ 2024-02-17 21:45 UTC (permalink / raw)
  To: musl

On Fri, Feb 16, 2024 at 19:48:27 -0600, Rob Landley wrote:

> https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s

I haven't touched superh asm in a while and the code has zero
comments (*ugh*), but I *think* sigsetjmp clobbers caller's r8.

r8 is callee saved.  sigsetjmp wants to use it to save its env
argument across the call to __setjmp.  So it saves caller's r8 and
uses r8 to save its env b/c __setjmp it's about to call will clobber
it.  Then __setjmp saves this r8 = env in the jump buf, not caller's
r8.  The instruction in the delay slot of the tail call to
__sigsetjmp_tail vaguely looks like it might have been intended to
patch it, but it loads r8, not stores it. I'm not sure why it would
want to load r8 at that point.

Sorry, I only skimmed through the code and as I said, there're no
comments (which for asm code is borderline criminal, IMHO :) so I
might be completely misinterpreting what this code does...

-uwe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-17 21:45 ` [musl] " Valery Ushakov
@ 2024-02-17 23:09   ` Thorsten Glaser
  2024-02-18 12:15     ` [musl] " Valery Ushakov
  2024-02-18  1:34   ` Rich Felker
  1 sibling, 1 reply; 21+ messages in thread
From: Thorsten Glaser @ 2024-02-17 23:09 UTC (permalink / raw)
  To: musl

Valery Ushakov dixit:

>comments (*ugh*), but I *think* sigsetjmp clobbers caller's r8.
>
>r8 is callee saved.

Say, is there a handy list of which registers are preserved
or clobbered across function calls like the one for syscalls
from Linux man-pages’ syscall(2) for all arches?

I have indeed chased this same problem on unfamiliar arches
often enough as well…

bye,
//mirabilos
-- 
When he found out that the m68k port was in a pretty bad shape, he did
not, like many before him, shrug and move on; instead, he took it upon
himself to start compiling things, just so he could compile his shell.
How's that for dedication. -- Wouter, about my Debian/m68k revival

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-17 21:45 ` [musl] " Valery Ushakov
  2024-02-17 23:09   ` Thorsten Glaser
@ 2024-02-18  1:34   ` Rich Felker
  2024-02-18  1:40     ` Rich Felker
                       ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Rich Felker @ 2024-02-18  1:34 UTC (permalink / raw)
  To: Valery Ushakov; +Cc: musl, Rob Landley, toybox

On Sun, Feb 18, 2024 at 12:45:35AM +0300, Valery Ushakov wrote:
> On Fri, Feb 16, 2024 at 19:48:27 -0600, Rob Landley wrote:
> 
> > https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s
> 
> I haven't touched superh asm in a while and the code has zero
> comments (*ugh*), but I *think* sigsetjmp clobbers caller's r8.
> 
> r8 is callee saved.  sigsetjmp wants to use it to save its env
> argument across the call to __setjmp.  So it saves caller's r8 and
> uses r8 to save its env b/c __setjmp it's about to call will clobber
> it.  Then __setjmp saves this r8 = env in the jump buf, not caller's
> r8.  The instruction in the delay slot of the tail call to
> __sigsetjmp_tail vaguely looks like it might have been intended to
> patch it, but it loads r8, not stores it. I'm not sure why it would
> want to load r8 at that point.
> 
> Sorry, I only skimmed through the code and as I said, there're no
> comments (which for asm code is borderline criminal, IMHO :) so I
> might be completely misinterpreting what this code does...

I think you've stumbled across the bug even if you don't realize it.
:-)

The idea of the sigsetjmp implementation is explained in commit
583e55122e767b1586286a0d9c35e2a4027998ab. As you've noted, comments
would be nice, but I'm not sure where they belong since it's the same
logic on every arch.

sigsetjmp wants to save a position in itself for longjmp to return to,
but because sigsetjmp will be returning, it can't save any state on
the stack; it doesn't have a stack frame. The only storage it has
available is call-saved registers, which longjmp will necessarily
restore for it. So, it picks one of its caller's call-saved registers
(r8) and stores that inside the sigjmp_buf, then uses that register to
preserve the pointer to the sigjmp_buf to access after setjmp returns
(either the first time or the second time).

The only problem with the sh implementation is that, due to limited
displacement range for load/store instructions, we have to add 60 to
the sigjmp_buf base address to get the base register for accessing the
saved return address and r8. This takes place in a temp register r6.
However, the last line of this code path, the delay slot after the
tail call to __sigsetjmp_tail, erroneously uses r4 (the base
sigjmp_buf pointer) as the base for restoring r8:

	 mov.l @(4+8,r4), r8

This corrupts the caller's saved register file. If I'm not mistaken,
it overwrites the caller's r8 with the caller's r11.

Presumably Rob didn't actually hit a crash in sigsetjmp, but just
inferred the crash was there via printf debugging. The actual crash
must be in the caller once it gets back control with the wrong value
in r8.

This has been an area I've wanted to have testing for for a long time,
but I don't have a really good idea for how to implement the test. We
would want the compiler to generate code that puts lots of
intermediates in as many call-saved registers as possible, calls
setjmp, then calls another function that *also* puts pressure on the
register allocation and then calls longjmp. If it's okay for the test
to be arch-specific, this could just work via __asm__ register
constraints and a call to setjmp from asm, but I'm not sure if that
would be a good way to do things in libc-test...

In any case, thanks to everyone who participated in this. This is a
rather bad bug and a nice one to get fixed up in the release window!

Rich

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-18  1:34   ` Rich Felker
@ 2024-02-18  1:40     ` Rich Felker
  2024-02-18 12:55       ` Valery Ushakov
  2024-02-19 17:54       ` Rob Landley
  2024-02-18 12:47     ` Valery Ushakov
  2024-02-19 13:12     ` Rob Landley
  2 siblings, 2 replies; 21+ messages in thread
From: Rich Felker @ 2024-02-18  1:40 UTC (permalink / raw)
  To: Valery Ushakov; +Cc: musl, Rob Landley, toybox

[-- Attachment #1: Type: text/plain, Size: 3643 bytes --]

On Sat, Feb 17, 2024 at 08:34:28PM -0500, Rich Felker wrote:
> On Sun, Feb 18, 2024 at 12:45:35AM +0300, Valery Ushakov wrote:
> > On Fri, Feb 16, 2024 at 19:48:27 -0600, Rob Landley wrote:
> > 
> > > https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s
> > 
> > I haven't touched superh asm in a while and the code has zero
> > comments (*ugh*), but I *think* sigsetjmp clobbers caller's r8.
> > 
> > r8 is callee saved.  sigsetjmp wants to use it to save its env
> > argument across the call to __setjmp.  So it saves caller's r8 and
> > uses r8 to save its env b/c __setjmp it's about to call will clobber
> > it.  Then __setjmp saves this r8 = env in the jump buf, not caller's
> > r8.  The instruction in the delay slot of the tail call to
> > __sigsetjmp_tail vaguely looks like it might have been intended to
> > patch it, but it loads r8, not stores it. I'm not sure why it would
> > want to load r8 at that point.
> > 
> > Sorry, I only skimmed through the code and as I said, there're no
> > comments (which for asm code is borderline criminal, IMHO :) so I
> > might be completely misinterpreting what this code does...
> 
> I think you've stumbled across the bug even if you don't realize it.
> :-)
> 
> The idea of the sigsetjmp implementation is explained in commit
> 583e55122e767b1586286a0d9c35e2a4027998ab. As you've noted, comments
> would be nice, but I'm not sure where they belong since it's the same
> logic on every arch.
> 
> sigsetjmp wants to save a position in itself for longjmp to return to,
> but because sigsetjmp will be returning, it can't save any state on
> the stack; it doesn't have a stack frame. The only storage it has
> available is call-saved registers, which longjmp will necessarily
> restore for it. So, it picks one of its caller's call-saved registers
> (r8) and stores that inside the sigjmp_buf, then uses that register to
> preserve the pointer to the sigjmp_buf to access after setjmp returns
> (either the first time or the second time).
> 
> The only problem with the sh implementation is that, due to limited
> displacement range for load/store instructions, we have to add 60 to
> the sigjmp_buf base address to get the base register for accessing the
> saved return address and r8. This takes place in a temp register r6.
> However, the last line of this code path, the delay slot after the
> tail call to __sigsetjmp_tail, erroneously uses r4 (the base
> sigjmp_buf pointer) as the base for restoring r8:
> 
> 	 mov.l @(4+8,r4), r8
> 
> This corrupts the caller's saved register file. If I'm not mistaken,
> it overwrites the caller's r8 with the caller's r11.
> 
> Presumably Rob didn't actually hit a crash in sigsetjmp, but just
> inferred the crash was there via printf debugging. The actual crash
> must be in the caller once it gets back control with the wrong value
> in r8.
> 
> This has been an area I've wanted to have testing for for a long time,
> but I don't have a really good idea for how to implement the test. We
> would want the compiler to generate code that puts lots of
> intermediates in as many call-saved registers as possible, calls
> setjmp, then calls another function that *also* puts pressure on the
> register allocation and then calls longjmp. If it's okay for the test
> to be arch-specific, this could just work via __asm__ register
> constraints and a call to setjmp from asm, but I'm not sure if that
> would be a good way to do things in libc-test...
> 
> In any case, thanks to everyone who participated in this. This is a
> rather bad bug and a nice one to get fixed up in the release window!

The attached patch should fix it.

Rich

[-- Attachment #2: 0001-sh-fix-sigsetjmp-corrupting-call-saved-register-r8.patch --]
[-- Type: text/plain, Size: 747 bytes --]

From 7020e85fd768be02e7f5971f1707229407cfa1e4 Mon Sep 17 00:00:00 2001
From: Rich Felker <dalias@aerifal.cx>
Date: Sat, 17 Feb 2024 20:36:42 -0500
Subject: [PATCH] sh: fix sigsetjmp corrupting call-saved register r8

due to incorrect base address register when attempting to reload the
saved value of r8, the caller's value of r8 was not preserved.
---
 src/signal/sh/sigsetjmp.s | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/signal/sh/sigsetjmp.s b/src/signal/sh/sigsetjmp.s
index 1e2270be..f0f604e2 100644
--- a/src/signal/sh/sigsetjmp.s
+++ b/src/signal/sh/sigsetjmp.s
@@ -27,7 +27,7 @@ __sigsetjmp:
 
 	mov.l 3f, r0
 4:	braf r0
-	 mov.l @(4+8,r4), r8
+	 mov.l @(4+8,r6), r8
 
 9:	mov.l 5f, r0
 6:	braf r0
-- 
2.21.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [musl] Re: Re: Not sure how to debug this one.
  2024-02-17 23:09   ` Thorsten Glaser
@ 2024-02-18 12:15     ` Valery Ushakov
  2024-02-18 22:51       ` [musl] " Thorsten Glaser
  0 siblings, 1 reply; 21+ messages in thread
From: Valery Ushakov @ 2024-02-18 12:15 UTC (permalink / raw)
  To: musl

On Sat, Feb 17, 2024 at 23:09:24 +0000, Thorsten Glaser wrote:

> Valery Ushakov dixit:
> 
> >comments (*ugh*), but I *think* sigsetjmp clobbers caller's r8.
> >
> >r8 is callee saved.
> 
> Say, is there a handy list of which registers are preserved
> or clobbered across function calls like the one for syscalls
> from Linux man-pages’ syscall(2) for all arches?
> 
> I have indeed chased this same problem on unfamiliar arches
> often enough as well…

Calling conventions etc are defined in architecture's ELF psABI (ps =
processor specific, as opposed to generic (gABI)).  Though it seems
this terminology is now a bit dated.  E.g. Renesas ABI document still
has a running title of "SH-4 Generic and Specific ABI", where you can
see the echo of "generic" and "specific", but the word "processor"
never even occurs in it once.

The ELF ABI spec used to be hosted by (the old) SCO, iirc.  I'm not
sure what is the official source of truth nowadays of if even there is
one.

I would just google for "$arch psABI" and/or "$arch calling convention" :)

-uwe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-18  1:34   ` Rich Felker
  2024-02-18  1:40     ` Rich Felker
@ 2024-02-18 12:47     ` Valery Ushakov
  2024-02-19 13:12     ` Rob Landley
  2 siblings, 0 replies; 21+ messages in thread
From: Valery Ushakov @ 2024-02-18 12:47 UTC (permalink / raw)
  To: musl

On Sat, Feb 17, 2024 at 20:34:28 -0500, Rich Felker wrote:

> As you've noted, comments would be nice, but I'm not sure where they
> belong since it's the same logic on every arch.

In all of them?  You are looking at it from the PoV of someone who
maintains all of them and for you they are all in the picture at once.
But consider someone who tries to read one specific file in isolation.
Also in asm code the gory details of the specific CPU/ABI are mixed in
with the actual logic of what the code does, so at least in that
regard the comments will differ.  After 20+ years of working on and
off on netbsd code base, hopping all over the tree component-wise and
across half a dozen cpus for asm code, sometimes with gaps measured in
years - I've grown to appreciate well commented code that lets me
quickly swap the context into my brain when get back to that code a
decade later.

Like, what is jmp_buf::__fl? :) It's probably referred to in asm files
by a magic number, so there's just one instance of this name in the
source tree and it's not commented and is not cross-referencable with
the asm code without doing the math (after checking MD definitions).

-uwe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-18  1:40     ` Rich Felker
@ 2024-02-18 12:55       ` Valery Ushakov
  2024-02-18 14:33         ` Rich Felker
  2024-02-19 17:54       ` Rob Landley
  1 sibling, 1 reply; 21+ messages in thread
From: Valery Ushakov @ 2024-02-18 12:55 UTC (permalink / raw)
  To: musl, toybox

On Sat, Feb 17, 2024 at 20:40:50 -0500, Rich Felker wrote:

> due to incorrect base address register when attempting to reload the
> saved value of r8, the caller's value of r8 was not preserved.
> ---
>  src/signal/sh/sigsetjmp.s | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/signal/sh/sigsetjmp.s b/src/signal/sh/sigsetjmp.s
> index 1e2270be..f0f604e2 100644
> --- a/src/signal/sh/sigsetjmp.s
> +++ b/src/signal/sh/sigsetjmp.s
> @@ -27,7 +27,7 @@ __sigsetjmp:
>  
>  	mov.l 3f, r0
>  4:	braf r0
> -	 mov.l @(4+8,r4), r8
> +	 mov.l @(4+8,r6), r8
>  
>  9:	mov.l 5f, r0
>  6:	braf r0

That takes care of restoring caller's r8 for the first return from
sigsetjmp, but isn't there still the problem that the jump buffer
contains the wrong one, so on the second return from sigsetjmp the
caller will have clobbered r8?

Sorry for a drive-by reply.  I'll try to take a closer look in the
evening.


-uwe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-18 12:55       ` Valery Ushakov
@ 2024-02-18 14:33         ` Rich Felker
  2024-02-18 15:06           ` Valery Ushakov
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2024-02-18 14:33 UTC (permalink / raw)
  To: Valery Ushakov; +Cc: musl, toybox

[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]

On Sun, Feb 18, 2024 at 03:55:36PM +0300, Valery Ushakov wrote:
> On Sat, Feb 17, 2024 at 20:40:50 -0500, Rich Felker wrote:
> 
> > due to incorrect base address register when attempting to reload the
> > saved value of r8, the caller's value of r8 was not preserved.
> > ---
> >  src/signal/sh/sigsetjmp.s | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/signal/sh/sigsetjmp.s b/src/signal/sh/sigsetjmp.s
> > index 1e2270be..f0f604e2 100644
> > --- a/src/signal/sh/sigsetjmp.s
> > +++ b/src/signal/sh/sigsetjmp.s
> > @@ -27,7 +27,7 @@ __sigsetjmp:
> >  
> >  	mov.l 3f, r0
> >  4:	braf r0
> > -	 mov.l @(4+8,r4), r8
> > +	 mov.l @(4+8,r6), r8
> >  
> >  9:	mov.l 5f, r0
> >  6:	braf r0
> 
> That takes care of restoring caller's r8 for the first return from
> sigsetjmp, but isn't there still the problem that the jump buffer
> contains the wrong one, so on the second return from sigsetjmp the
> caller will have clobbered r8?
> 
> Sorry for a drive-by reply.  I'll try to take a closer look in the
> evening.

No, that's the return path for both returns.

The whole reason a call-saved register like r8 is used here is so that
we can return twice into the body of sigsetjmp, in order to tailcall
__sigsetjmp_tail at both the first return and subsequent return. This
is what makes it possible to restore the signal mask from the
returned-to frame rather than the returning-from frame (which is why
the attached doesn't crash with stack overflow on musl like it does on
glibc).

Rich

[-- Attachment #2: sigsetjmp_of.c --]
[-- Type: text/plain, Size: 260 bytes --]

#include <signal.h>
#include <setjmp.h>

static volatile long cnt = 1000000;
sigjmp_buf jb;

void handle(int s)
{
	if (!cnt--) return;
	raise(s);
	siglongjmp(jb, 1);
}

int main()
{
	if (sigsetjmp(jb, 1)) return 0;
	signal(SIGALRM, handle);
	raise(SIGALRM);
}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-18 14:33         ` Rich Felker
@ 2024-02-18 15:06           ` Valery Ushakov
  2024-02-18 20:33             ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Valery Ushakov @ 2024-02-18 15:06 UTC (permalink / raw)
  To: musl, toybox

On Sun, Feb 18, 2024 at 09:33:13 -0500, Rich Felker wrote:

> On Sun, Feb 18, 2024 at 03:55:36PM +0300, Valery Ushakov wrote:
> > On Sat, Feb 17, 2024 at 20:40:50 -0500, Rich Felker wrote:
> > 
> > > due to incorrect base address register when attempting to reload the
> > > saved value of r8, the caller's value of r8 was not preserved.
> > > ---
> > >  src/signal/sh/sigsetjmp.s | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/src/signal/sh/sigsetjmp.s b/src/signal/sh/sigsetjmp.s
> > > index 1e2270be..f0f604e2 100644
> > > --- a/src/signal/sh/sigsetjmp.s
> > > +++ b/src/signal/sh/sigsetjmp.s
> > > @@ -27,7 +27,7 @@ __sigsetjmp:
> > >  
> > >  	mov.l 3f, r0
> > >  4:	braf r0
> > > -	 mov.l @(4+8,r4), r8
> > > +	 mov.l @(4+8,r6), r8
> > >  
> > >  9:	mov.l 5f, r0
> > >  6:	braf r0
> > 
> > That takes care of restoring caller's r8 for the first return from
> > sigsetjmp, but isn't there still the problem that the jump buffer
> > contains the wrong one, so on the second return from sigsetjmp the
> > caller will have clobbered r8?
> > 
> > Sorry for a drive-by reply.  I'll try to take a closer look in the
> > evening.
> 
> No, that's the return path for both returns.
>
> The whole reason a call-saved register like r8 is used here is so
> that we can return twice into the body of sigsetjmp, in order to
> tailcall __sigsetjmp_tail at both the first return and subsequent
> return.

Doh, right!  Sorry.  A comment to that effect to alert the reader
would certainly have helped :) Neat trick that I missed on the quick
reading.


> This is what makes it possible to restore the signal mask from the
> returned-to frame rather than the returning-from frame (which is why
> the attached doesn't crash with stack overflow on musl like it does
> on glibc).

Restoring the context in siglongjmp should not be a problem per-se.
NetBSD libc does that and the example code doesn't crash there (quick
unscientific test on a ppc that I happen to have a terminal open on).
But then NetBSD libc doesn't bother to carefully factor that code to
minimize the need for MD asm.

Thanks, and sorry for the noise.

-uwe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-18 15:06           ` Valery Ushakov
@ 2024-02-18 20:33             ` Rich Felker
  2024-02-19 11:00               ` Valery Ushakov
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2024-02-18 20:33 UTC (permalink / raw)
  To: Valery Ushakov; +Cc: musl, toybox

On Sun, Feb 18, 2024 at 06:06:46PM +0300, Valery Ushakov wrote:
> On Sun, Feb 18, 2024 at 09:33:13 -0500, Rich Felker wrote:
> 
> > On Sun, Feb 18, 2024 at 03:55:36PM +0300, Valery Ushakov wrote:
> > > On Sat, Feb 17, 2024 at 20:40:50 -0500, Rich Felker wrote:
> > > 
> > > > due to incorrect base address register when attempting to reload the
> > > > saved value of r8, the caller's value of r8 was not preserved.
> > > > ---
> > > >  src/signal/sh/sigsetjmp.s | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/src/signal/sh/sigsetjmp.s b/src/signal/sh/sigsetjmp.s
> > > > index 1e2270be..f0f604e2 100644
> > > > --- a/src/signal/sh/sigsetjmp.s
> > > > +++ b/src/signal/sh/sigsetjmp.s
> > > > @@ -27,7 +27,7 @@ __sigsetjmp:
> > > >  
> > > >  	mov.l 3f, r0
> > > >  4:	braf r0
> > > > -	 mov.l @(4+8,r4), r8
> > > > +	 mov.l @(4+8,r6), r8
> > > >  
> > > >  9:	mov.l 5f, r0
> > > >  6:	braf r0
> > > 
> > > That takes care of restoring caller's r8 for the first return from
> > > sigsetjmp, but isn't there still the problem that the jump buffer
> > > contains the wrong one, so on the second return from sigsetjmp the
> > > caller will have clobbered r8?
> > > 
> > > Sorry for a drive-by reply.  I'll try to take a closer look in the
> > > evening.
> > 
> > No, that's the return path for both returns.
> >
> > The whole reason a call-saved register like r8 is used here is so
> > that we can return twice into the body of sigsetjmp, in order to
> > tailcall __sigsetjmp_tail at both the first return and subsequent
> > return.
> 
> Doh, right!  Sorry.  A comment to that effect to alert the reader
> would certainly have helped :) Neat trick that I missed on the quick
> reading.

Yes. Perhaps a single comment in each asm file pointing to a common
document location (the dummy sigsetjmp.c file would be a good
candidate) would be a good approach. This could also document what
needs to be done when writing a new port.

> > This is what makes it possible to restore the signal mask from the
> > returned-to frame rather than the returning-from frame (which is why
> > the attached doesn't crash with stack overflow on musl like it does
> > on glibc).
> 
> Restoring the context in siglongjmp should not be a problem per-se.
> NetBSD libc does that and the example code doesn't crash there (quick
> unscientific test on a ppc that I happen to have a terminal open on).
> But then NetBSD libc doesn't bother to carefully factor that code to
> minimize the need for MD asm.
> 
> Thanks, and sorry for the noise.

If you restore the signal mask from the returning context rather than
in the returned-to context, there's always the possibility of stack
overflow; in the worst case, this happens on the sigaltstack where
you're specifically taking measures to avoid stack overflow being a
fatal error. The test program is artificial, but the real-world way
this would happen is getting a flood of signals like SIGINT or SIGTSTP
or something coming in faster than you can respond to them, so that
every time you try to return via siglongjmp, you actually consume
another stack frame on the signal stack.

If NetBSD didn't crash, maybe it just has a much larger default stack
size limit? Or maybe they reload sp before calling sigprocmask? That
would work too, but the reason musl doesn't do it that way is that our
setjmp/longjmp are compatible with an old ABI where there is no extra
space in the jmp_buf for sigjmp_buf stuff.

Rich

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [musl] Re: Not sure how to debug this one.
  2024-02-18 12:15     ` [musl] " Valery Ushakov
@ 2024-02-18 22:51       ` Thorsten Glaser
  0 siblings, 0 replies; 21+ messages in thread
From: Thorsten Glaser @ 2024-02-18 22:51 UTC (permalink / raw)
  To: musl

Valery Ushakov dixit:

>Calling conventions etc are defined in architecture's ELF psABI (ps =

I know, but digging through all these documents is tiresome,
and sometimes, they’re not applicable to the OS in question
(I have a vague thing in the back of my head that OpenBSD
derivates on sparc differ in some aspect), or the OS defines
additional nōn-standard extensions, PIC, TLS or whatever…

Having a nice per-arch quick overview of all registers in a
sort of standard order, with quick indicators of preserved,
trashed, stack pointer, instruction pointer, link register,
etc. would be nice (even hardcoded-0 register could be
annotated there, to make things even easier for casual
cross-arch debugging), and then for *all* arches.

Oh well…

bye,
//mirabilos
-- 
<cnuke> den AGP stecker anfeilen, damit er in den slot aufm 440BX board passt…
oder netzteile, an die man auch den monitor angeschlossen hat und die dann für
ein elektrisch aufgeladenes gehäuse gesorgt haben […] für lacher gut auf jeder
LAN party │ <nvb> damals, als der pizzateig noch auf dem monior "gegangen" ist

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-18 20:33             ` Rich Felker
@ 2024-02-19 11:00               ` Valery Ushakov
  0 siblings, 0 replies; 21+ messages in thread
From: Valery Ushakov @ 2024-02-19 11:00 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Sun, Feb 18, 2024 at 15:33:06 -0500, Rich Felker wrote:

> > Restoring the context in siglongjmp should not be a problem per-se.
> > NetBSD libc does that and the example code doesn't crash there (quick
> > unscientific test on a ppc that I happen to have a terminal open on).
> > But then NetBSD libc doesn't bother to carefully factor that code to
> > minimize the need for MD asm.
> > 
> > Thanks, and sorry for the noise.
> 
> If you restore the signal mask from the returning context rather than
> in the returned-to context, there's always the possibility of stack
> overflow; in the worst case, this happens on the sigaltstack where
> you're specifically taking measures to avoid stack overflow being a
> fatal error. The test program is artificial, but the real-world way
> this would happen is getting a flood of signals like SIGINT or SIGTSTP
> or something coming in faster than you can respond to them, so that
> every time you try to return via siglongjmp, you actually consume
> another stack frame on the signal stack.
> 
> If NetBSD didn't crash, maybe it just has a much larger default stack
> size limit? Or maybe they reload sp before calling sigprocmask? That
> would work too, but the reason musl doesn't do it that way is that our
> setjmp/longjmp are compatible with an old ABI where there is no extra
> space in the jmp_buf for sigjmp_buf stuff.

As luck would have it, powerpc was doing the right thing.  We filed a
bug to fix the arches that don't.

Thanks!

-uwe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-18  1:34   ` Rich Felker
  2024-02-18  1:40     ` Rich Felker
  2024-02-18 12:47     ` Valery Ushakov
@ 2024-02-19 13:12     ` Rob Landley
  2 siblings, 0 replies; 21+ messages in thread
From: Rob Landley @ 2024-02-19 13:12 UTC (permalink / raw)
  To: Rich Felker, Valery Ushakov; +Cc: musl, toybox

On 2/17/24 19:34, Rich Felker wrote:
> On Sun, Feb 18, 2024 at 12:45:35AM +0300, Valery Ushakov wrote:
>> On Fri, Feb 16, 2024 at 19:48:27 -0600, Rob Landley wrote:
>> 
>> > https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s

FYI the next round of debugging I had queued up was cutting and pasting the
sigsetjmp+setjmp assembly blob inline in the code (either as a C asm { }
directive or my own .s->.o nailed to the side of the link) and removing one
instruction from the end of the testfunc() at a time until the crash went away.
(Doesn't matter if it _works_, the point is does it _return_. Find the
instruction that poked the bear.)

I was just AFK most of the weekend...

> The only problem with the sh implementation is that, due to limited
> displacement range for load/store instructions, we have to add 60 to
> the sigjmp_buf base address to get the base register for accessing the
> saved return address and r8. This takes place in a temp register r6.
> However, the last line of this code path, the delay slot after the
> tail call to __sigsetjmp_tail, erroneously uses r4 (the base
> sigjmp_buf pointer) as the base for restoring r8:
> 
> 	 mov.l @(4+8,r4), r8
> 
> This corrupts the caller's saved register file. If I'm not mistaken,
> it overwrites the caller's r8 with the caller's r11.
> 
> Presumably Rob didn't actually hit a crash in sigsetjmp, but just
> inferred the crash was there via printf debugging. The actual crash
> must be in the caller once it gets back control with the wrong value
> in r8.

I saw that the printf after the sigsetjmp did not trigger, yes. Why was the
pending question...

> This has been an area I've wanted to have testing for for a long time,
> but I don't have a really good idea for how to implement the test. We
> would want the compiler to generate code that puts lots of
> intermediates in as many call-saved registers as possible, calls
> setjmp, then calls another function that *also* puts pressure on the
> register allocation and then calls longjmp. If it's okay for the test
> to be arch-specific, this could just work via __asm__ register
> constraints and a call to setjmp from asm, but I'm not sure if that
> would be a good way to do things in libc-test...
> 
> In any case, thanks to everyone who participated in this. This is a
> rather bad bug and a nice one to get fixed up in the release window!

Very much so. Thank you.

I pulled musl-git but didn't see a commit yet, lemme know when I can build a new
test toolchain to confirm the fix.

> Rich

Thanks,

Rob

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-18  1:40     ` Rich Felker
  2024-02-18 12:55       ` Valery Ushakov
@ 2024-02-19 17:54       ` Rob Landley
  2024-02-19 23:05         ` Rich Felker
  1 sibling, 1 reply; 21+ messages in thread
From: Rob Landley @ 2024-02-19 17:54 UTC (permalink / raw)
  To: Rich Felker, Valery Ushakov; +Cc: musl, toybox

On 2/17/24 19:40, Rich Felker wrote:
> The attached patch should fix it.

Confirmed: worked for me.

Thanks,

Rob

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [musl] Re: Not sure how to debug this one.
  2024-02-19 17:54       ` Rob Landley
@ 2024-02-19 23:05         ` Rich Felker
  0 siblings, 0 replies; 21+ messages in thread
From: Rich Felker @ 2024-02-19 23:05 UTC (permalink / raw)
  To: Rob Landley; +Cc: Valery Ushakov, musl, toybox

On Mon, Feb 19, 2024 at 11:54:00AM -0600, Rob Landley wrote:
> On 2/17/24 19:40, Rich Felker wrote:
> > The attached patch should fix it.
> 
> Confirmed: worked for me.
> 
> Thanks,

Thanks for confirming!

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-02-19 23:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-17  1:48 [musl] Not sure how to debug this one Rob Landley
2024-02-17  3:23 ` [musl] Re: [Toybox] " Mouse
2024-02-17 13:32   ` Rob Landley
2024-02-17 15:01     ` [musl] " Thorsten Glaser
2024-02-17 15:21     ` [musl] " Mouse
2024-02-17 17:02 ` [musl] " Rich Felker
2024-02-17 21:45 ` [musl] " Valery Ushakov
2024-02-17 23:09   ` Thorsten Glaser
2024-02-18 12:15     ` [musl] " Valery Ushakov
2024-02-18 22:51       ` [musl] " Thorsten Glaser
2024-02-18  1:34   ` Rich Felker
2024-02-18  1:40     ` Rich Felker
2024-02-18 12:55       ` Valery Ushakov
2024-02-18 14:33         ` Rich Felker
2024-02-18 15:06           ` Valery Ushakov
2024-02-18 20:33             ` Rich Felker
2024-02-19 11:00               ` Valery Ushakov
2024-02-19 17:54       ` Rob Landley
2024-02-19 23:05         ` Rich Felker
2024-02-18 12:47     ` Valery Ushakov
2024-02-19 13:12     ` Rob Landley

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).