musl without atomic instructions?

mailing list of musl libc
 help / color / mirror / code / Atom feed

* musl without atomic instructions?
@ 2016-03-12 23:47 Masanori Ogino
  2016-03-13  0:21 ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Masanori Ogino @ 2016-03-12 23:47 UTC (permalink / raw)
  To: musl

Hello,

While I work on my GSoC proposal, I doubt whether musl can be built
without hardware atomic operation supports.

Could we build musl without such instructions?
If we could, what will happen with the features of musl?

-- 
Masanori Ogino

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl without atomic instructions?
  2016-03-12 23:47 musl without atomic instructions? Masanori Ogino
@ 2016-03-13  0:21 ` Rich Felker
  2016-03-13  0:54   ` Masanori Ogino
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2016-03-13  0:21 UTC (permalink / raw)
  To: musl

On Sun, Mar 13, 2016 at 08:47:36AM +0900, Masanori Ogino wrote:
> Hello,
> 
> While I work on my GSoC proposal, I doubt whether musl can be built
> without hardware atomic operation supports.
> 
> Could we build musl without such instructions?
> If we could, what will happen with the features of musl?

Atomic compare and swap (usually provided by either a direct cas
instruction or ll/sc pair type) is a hard requirement for musl. The
normal profiles of riscv have at least ll/sc style and possibly cas
too. Minimal profiles for microcontroller use lack it (this was a
mistake in the riscv ISA specification, IMO), so if supporting these
ISA levels is interesting, there are at least three options:

1. Have the kernel trap the unimplemented instructions and emulate
   them.

2. Have userspace issue a system call to have the kernel mediate
   atomic accesses.

3. Integrate atomic sequence restart with the scheduler: at scheduling
   time, the kernel determines if the task being resumed was
   interrupted in the middle of a sequence of instructions that's
   supposed to be atomic, and if so, resets the program counter to the
   beginning of the sequence. (This is how pre-v6 ARM and most SH
   models work.)

Option 3 offers by far the best performance but inherently only works
on uniprocessor. Options 1 and 2 could theoretically support SMP as
long as the kernel has some other way of ensuring mutual exclusion and
memory synchronization between the processors.

Of course the best of all worlds is to have the kernel provide a vdso
function for atomic cas which it can then provide an optimal
implementation of for the particular processor being used. Then
baseline-ISA-level riscv binaries would use the vdso, and ones
targeting an ISA level that's known to have native atomic instructions
would use the inline instructions.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl without atomic instructions?
  2016-03-13  0:21 ` Rich Felker
@ 2016-03-13  0:54   ` Masanori Ogino
  2016-03-14  1:34     ` Masanori Ogino
  0 siblings, 1 reply; 8+ messages in thread
From: Masanori Ogino @ 2016-03-13  0:54 UTC (permalink / raw)
  To: musl

2016-03-13 9:21 GMT+09:00 Rich Felker <dalias@libc.org>:
> On Sun, Mar 13, 2016 at 08:47:36AM +0900, Masanori Ogino wrote:
>> Hello,
>>
>> While I work on my GSoC proposal, I doubt whether musl can be built
>> without hardware atomic operation supports.
>>
>> Could we build musl without such instructions?
>> If we could, what will happen with the features of musl?
>
> Atomic compare and swap (usually provided by either a direct cas
> instruction or ll/sc pair type) is a hard requirement for musl.

OK, I understood.

> The normal profiles of riscv have at least ll/sc style and possibly cas
> too.

Yes, ll/sc style primitives are provided in A standard extension,
according to the ISA spec v2.0, section 5.3.

> Minimal profiles for microcontroller use lack it (this was a
> mistake in the riscv ISA specification, IMO), so if supporting these
> ISA levels is interesting, there are at least three options:
>
> 1. Have the kernel trap the unimplemented instructions and emulate
>    them.
>
> 2. Have userspace issue a system call to have the kernel mediate
>    atomic accesses.
>
> 3. Integrate atomic sequence restart with the scheduler: at scheduling
>    time, the kernel determines if the task being resumed was
>    interrupted in the middle of a sequence of instructions that's
>    supposed to be atomic, and if so, resets the program counter to the
>    beginning of the sequence. (This is how pre-v6 ARM and most SH
>    models work.)
>
> Option 3 offers by far the best performance but inherently only works
> on uniprocessor. Options 1 and 2 could theoretically support SMP as
> long as the kernel has some other way of ensuring mutual exclusion and
> memory synchronization between the processors.
>
> Of course the best of all worlds is to have the kernel provide a vdso
> function for atomic cas which it can then provide an optimal
> implementation of for the particular processor being used. Then
> baseline-ISA-level riscv binaries would use the vdso, and ones
> targeting an ISA level that's known to have native atomic instructions
> would use the inline instructions.

OK, I will ask about the current status on the RISC-V sw-dev ML.

Thank you for clarification.

-- 
Masanori Ogino


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl without atomic instructions?
  2016-03-13  0:54   ` Masanori Ogino
@ 2016-03-14  1:34     ` Masanori Ogino
  2016-03-14  2:13       ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Masanori Ogino @ 2016-03-14  1:34 UTC (permalink / raw)
  To: musl

2016-03-13 9:54 GMT+09:00 Masanori Ogino <masanori.ogino@gmail.com>:
> 2016-03-13 9:21 GMT+09:00 Rich Felker <dalias@libc.org>:
>> Minimal profiles for microcontroller use lack it (this was a
>> mistake in the riscv ISA specification, IMO), so if supporting these
>> ISA levels is interesting, there are at least three options:
>>
>> 1. Have the kernel trap the unimplemented instructions and emulate
>>    them.
>>
>> 2. Have userspace issue a system call to have the kernel mediate
>>    atomic accesses.
>>
>> 3. Integrate atomic sequence restart with the scheduler: at scheduling
>>    time, the kernel determines if the task being resumed was
>>    interrupted in the middle of a sequence of instructions that's
>>    supposed to be atomic, and if so, resets the program counter to the
>>    beginning of the sequence. (This is how pre-v6 ARM and most SH
>>    models work.)
>>
>> Option 3 offers by far the best performance but inherently only works
>> on uniprocessor. Options 1 and 2 could theoretically support SMP as
>> long as the kernel has some other way of ensuring mutual exclusion and
>> memory synchronization between the processors.
>>
>> Of course the best of all worlds is to have the kernel provide a vdso
>> function for atomic cas which it can then provide an optimal
>> implementation of for the particular processor being used. Then
>> baseline-ISA-level riscv binaries would use the vdso, and ones
>> targeting an ISA level that's known to have native atomic instructions
>> would use the inline instructions.
>
> OK, I will ask about the current status on the RISC-V sw-dev ML.

On sw-dev, Darius Rad taught me that there is a syscall to perform CAS
on RISC-V without the A standard extension. CONFIG_RV_SYSRISCV_ATOMIC
enables it (with RISC-V patches.)

For reference, the source code is here:
https://github.com/riscv/riscv-linux/blob/master/arch/riscv/kernel/sys_riscv.c

-- 
Masanori Ogino


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl without atomic instructions?
  2016-03-14  1:34     ` Masanori Ogino
@ 2016-03-14  2:13       ` Rich Felker
  2016-03-14  2:55         ` Masanori Ogino
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2016-03-14  2:13 UTC (permalink / raw)
  To: musl

On Mon, Mar 14, 2016 at 10:34:22AM +0900, Masanori Ogino wrote:
> 2016-03-13 9:54 GMT+09:00 Masanori Ogino <masanori.ogino@gmail.com>:
> > 2016-03-13 9:21 GMT+09:00 Rich Felker <dalias@libc.org>:
> >> Minimal profiles for microcontroller use lack it (this was a
> >> mistake in the riscv ISA specification, IMO), so if supporting these
> >> ISA levels is interesting, there are at least three options:
> >>
> >> 1. Have the kernel trap the unimplemented instructions and emulate
> >>    them.
> >>
> >> 2. Have userspace issue a system call to have the kernel mediate
> >>    atomic accesses.
> >>
> >> 3. Integrate atomic sequence restart with the scheduler: at scheduling
> >>    time, the kernel determines if the task being resumed was
> >>    interrupted in the middle of a sequence of instructions that's
> >>    supposed to be atomic, and if so, resets the program counter to the
> >>    beginning of the sequence. (This is how pre-v6 ARM and most SH
> >>    models work.)
> >>
> >> Option 3 offers by far the best performance but inherently only works
> >> on uniprocessor. Options 1 and 2 could theoretically support SMP as
> >> long as the kernel has some other way of ensuring mutual exclusion and
> >> memory synchronization between the processors.
> >>
> >> Of course the best of all worlds is to have the kernel provide a vdso
> >> function for atomic cas which it can then provide an optimal
> >> implementation of for the particular processor being used. Then
> >> baseline-ISA-level riscv binaries would use the vdso, and ones
> >> targeting an ISA level that's known to have native atomic instructions
> >> would use the inline instructions.
> >
> > OK, I will ask about the current status on the RISC-V sw-dev ML.
> 
> On sw-dev, Darius Rad taught me that there is a syscall to perform CAS
> on RISC-V without the A standard extension. CONFIG_RV_SYSRISCV_ATOMIC
> enables it (with RISC-V patches.)
> 
> For reference, the source code is here:
> https://github.com/riscv/riscv-linux/blob/master/arch/riscv/kernel/sys_riscv.c

IMO a vdso function should be added that makes the syscall, rather
than having libc call the syscall directly; this would allow the
kernel to automatically provide a better implementation in the future
without the need to rebuild applications. Using a syscall for this is
very slow. Working with kernel people to propose such a thing (or even
implementing it and submitting kernel patches) is certainly one option
for something to add to a GSoC project proposal to make it more
substantial.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl without atomic instructions?
  2016-03-14  2:13       ` Rich Felker
@ 2016-03-14  2:55         ` Masanori Ogino
  2016-03-14  3:43           ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Masanori Ogino @ 2016-03-14  2:55 UTC (permalink / raw)
  To: musl

2016-03-14 11:13 GMT+09:00 Rich Felker <dalias@libc.org>:
> IMO a vdso function should be added that makes the syscall, rather
> than having libc call the syscall directly; this would allow the
> kernel to automatically provide a better implementation in the future
> without the need to rebuild applications. Using a syscall for this is
> very slow. Working with kernel people to propose such a thing (or even
> implementing it and submitting kernel patches) is certainly one option
> for something to add to a GSoC project proposal to make it more
> substantial.

Well, it seems that I don't really understand vDSO.

My current understanding is, vDSO make it possible that:

1. programs targeting without-A processors use syscalls on without-A
processors, and
2. the programs use atomic instructions on with-A processors. (no
interruption, no context switching!)
(3. programs targeting with-A processors runs normally, without
calling such vDSO function)

Is it correct? If so, it would be really nice.

-- 
Masanori Ogino

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl without atomic instructions?
  2016-03-14  2:55         ` Masanori Ogino
@ 2016-03-14  3:43           ` Rich Felker
  2016-03-14  4:24             ` Masanori Ogino
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2016-03-14  3:43 UTC (permalink / raw)
  To: musl

On Mon, Mar 14, 2016 at 11:55:15AM +0900, Masanori Ogino wrote:
> 2016-03-14 11:13 GMT+09:00 Rich Felker <dalias@libc.org>:
> > IMO a vdso function should be added that makes the syscall, rather
> > than having libc call the syscall directly; this would allow the
> > kernel to automatically provide a better implementation in the future
> > without the need to rebuild applications. Using a syscall for this is
> > very slow. Working with kernel people to propose such a thing (or even
> > implementing it and submitting kernel patches) is certainly one option
> > for something to add to a GSoC project proposal to make it more
> > substantial.
> 
> Well, it seems that I don't really understand vDSO.

The way vdso works is that the kernel contains an image of a small ELF
shared library file, and maps it into the virtual address space of
each user process, and exposes its address as part of the "aux vector"
that the dynamic linker or main program entry point receives and can
process.

While anything could be included in the vdso, normally what the kernel
puts there are functions that allow userspace to bypass actually
making a system call for some things that _can_ be done without a
system call (no need for kernel privs) but where the _way_ to do them
is only known by the kernel (e.g. hardware model specific, or
dependent on memory structures the kernel writes and exposes to
userspace but does not guarantee stability for). Some examples are
time/gettimeofday/clock_gettime, getcpu, etc.

If userspace chooses to use the vdso, it does symbol lookups in it
using the same mechanisms used for dynamic library symbol lookup, then
calls the resulting function instead of making a syscall.

> My current understanding is, vDSO make it possible that:
> 
> 1. programs targeting without-A processors use syscalls on without-A
> processors, and
> 2. the programs use atomic instructions on with-A processors. (no
> interruption, no context switching!)
> (3. programs targeting with-A processors runs normally, without
> calling such vDSO function)
> 
> Is it correct? If so, it would be really nice.

Even better.

Indeed, a baseline vdso-based compare-and-swap for riscv would look
like your above items 1 and 2, and item 3 if you build binaries that
depend on a processor with the "A" option.

But in the future, for non-SMP setups, case 1 could be replaced with a
scheduler-based restart approach like pre-v6 ARM and SH3/SH4 use,
yielding a huge performance boost (maybe around 100x speedup in
locking/atomics). The way this works is that, when resuming a task
that was preempted, the scheduler just has to check if the program
counter is in the cas function in the vdso. If so, it resets the
program counter to the start of that function before resuming
userspace. At one point there was a good article on how the ARM
implementation of this works, but I can't find it right now.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: musl without atomic instructions?
  2016-03-14  3:43           ` Rich Felker
@ 2016-03-14  4:24             ` Masanori Ogino
  0 siblings, 0 replies; 8+ messages in thread
From: Masanori Ogino @ 2016-03-14  4:24 UTC (permalink / raw)
  To: musl

2016-03-14 12:43 GMT+09:00 Rich Felker <dalias@libc.org>:
> On Mon, Mar 14, 2016 at 11:55:15AM +0900, Masanori Ogino wrote:
>> Well, it seems that I don't really understand vDSO.
>
> The way vdso works is that the kernel contains an image of a small ELF
> shared library file, and maps it into the virtual address space of
> each user process, and exposes its address as part of the "aux vector"
> that the dynamic linker or main program entry point receives and can
> process.
>
> While anything could be included in the vdso, normally what the kernel
> puts there are functions that allow userspace to bypass actually
> making a system call for some things that _can_ be done without a
> system call (no need for kernel privs) but where the _way_ to do them
> is only known by the kernel (e.g. hardware model specific, or
> dependent on memory structures the kernel writes and exposes to
> userspace but does not guarantee stability for). Some examples are
> time/gettimeofday/clock_gettime, getcpu, etc.
>
> If userspace chooses to use the vdso, it does symbol lookups in it
> using the same mechanisms used for dynamic library symbol lookup, then
> calls the resulting function instead of making a syscall.

OK, it is getting clear to me now. Thank you.

>> My current understanding is, vDSO make it possible that:
>>
>> 1. programs targeting without-A processors use syscalls on without-A
>> processors, and
>> 2. the programs use atomic instructions on with-A processors. (no
>> interruption, no context switching!)
>> (3. programs targeting with-A processors runs normally, without
>> calling such vDSO function)
>>
>> Is it correct? If so, it would be really nice.
>
> Even better.
>
> Indeed, a baseline vdso-based compare-and-swap for riscv would look
> like your above items 1 and 2, and item 3 if you build binaries that
> depend on a processor with the "A" option.
>
> But in the future, for non-SMP setups, case 1 could be replaced with a
> scheduler-based restart approach like pre-v6 ARM and SH3/SH4 use,
> yielding a huge performance boost (maybe around 100x speedup in
> locking/atomics). The way this works is that, when resuming a task
> that was preempted, the scheduler just has to check if the program
> counter is in the cas function in the vdso. If so, it resets the
> program counter to the start of that function before resuming
> userspace. At one point there was a good article on how the ARM
> implementation of this works, but I can't find it right now.

Fantastic! I will append this to the work list. It is really
worthwhile to work on.

-- 
Masanori Ogino


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-03-14  4:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-12 23:47 musl without atomic instructions? Masanori Ogino
2016-03-13  0:21 ` Rich Felker
2016-03-13  0:54   ` Masanori Ogino
2016-03-14  1:34     ` Masanori Ogino
2016-03-14  2:13       ` Rich Felker
2016-03-14  2:55         ` Masanori Ogino
2016-03-14  3:43           ` Rich Felker
2016-03-14  4:24             ` Masanori Ogino

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).