vfork on ARM

mailing list of musl libc
 help / color / mirror / code / Atom feed

* vfork on ARM
@ 2016-04-01  0:42 Patrick Oppenlander
  2016-04-01  1:53 ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Oppenlander @ 2016-04-01  0:42 UTC (permalink / raw)
  To: musl

I'm looking at what would be involved in using musl on a nommu arm system.

As far as I know SYS_vfork is available on ARM, but musl is currently 
falling back to fork.

Are there any plans to support vfork on ARM and other architectures?

         Patrick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfork on ARM
  2016-04-01  0:42 vfork on ARM Patrick Oppenlander
@ 2016-04-01  1:53 ` Rich Felker
  2016-04-03 23:18   ` Patrick Oppenlander
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2016-04-01  1:53 UTC (permalink / raw)
  To: musl

On Fri, Apr 01, 2016 at 11:42:47AM +1100, Patrick Oppenlander wrote:
> I'm looking at what would be involved in using musl on a nommu arm system.
> 
> As far as I know SYS_vfork is available on ARM, but musl is
> currently falling back to fork.
> 
> Are there any plans to support vfork on ARM and other architectures?

It's trivial to add vfork, but the usefulness is limited without other
changes:

1. To my knowledge, all nommu ARM systems are thumb[2]-only, so
supporting them as targets requires adapting all the asm files to
support building as thumb. This is a task in progress and, as long as
we only care about thumb2 (available on armv7-m, i.e. Corext-M3 and
up, I think) it's almost done.

2. For pre-v7, there's no way to do atomics without kernel help, and
no established kernel API for this as far as I know. For v7-m this is
probably not a problem.

3. Running on nommu without shareable program text is not much fun;
execve is really slow (memcpy of full program) and you need lots of
memory. Some people at ST have implemented an FDPIC abi for ARM which
solves this problem, but it's not upstream in the toolchain or kernel,
and the relocation types it needs are not officially assigned. Getting
it officially stabilized, supported, and forward-ported to modern tool
versions is going to be a lot of work. Here are some slides on it:

http://www.slideshare.net/linaroorg/sfo15406-arm-fdpic-toolset-kernel-libraries-for-cortexm-cortexr-mmuless-cores

Without FDPIC, it's possible to build a toolchain that produces
static-PIE executables that will work on nommu (with my recently
committed kernel patch for running non-FDPIC PIE ELF files on nommu,
and some additional work still needed to hook it up to work on ARM)
but these cannot use a shared mapping of the program.

If you or anyone else is up for helping with these tasks that would be
great.

Rich

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfork on ARM
  2016-04-01  1:53 ` Rich Felker
@ 2016-04-03 23:18   ` Patrick Oppenlander
  2016-04-04  0:14     ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Oppenlander @ 2016-04-03 23:18 UTC (permalink / raw)
  To: musl

On 01/04/16 12:53, Rich Felker wrote:
> On Fri, Apr 01, 2016 at 11:42:47AM +1100, Patrick Oppenlander wrote:
>> I'm looking at what would be involved in using musl on a nommu arm system.
>>
>> As far as I know SYS_vfork is available on ARM, but musl is
>> currently falling back to fork.
>>
>> Are there any plans to support vfork on ARM and other architectures?
> It's trivial to add vfork, but the usefulness is limited without other
> changes:

vfork should also be more efficient than fork which may be motivation 
for supporting it as an optimisation on mmu targets.

It's also theoretically possible to run a nommu kernel on an mmu capable 
target.

> 1. To my knowledge, all nommu ARM systems are thumb[2]-only, so
> supporting them as targets requires adapting all the asm files to
> support building as thumb. This is a task in progress and, as long as
> we only care about thumb2 (available on armv7-m, i.e. Corext-M3 and
> up, I think) it's almost done.

OK that's great!

> 2. For pre-v7, there's no way to do atomics without kernel help, and
> no established kernel API for this as far as I know. For v7-m this is
> probably not a problem.

V6K has support for hardware atomics too.

v7-m supports 32-bit atomics but drops support for 64-bit (no LDREXD or 
STREXD). Is a problem for musl?

Do you know if v7-m has the hardware TLS registers?

> 3. Running on nommu without shareable program text is not much fun;
> execve is really slow (memcpy of full program) and you need lots of
> memory. Some people at ST have implemented an FDPIC abi for ARM which
> solves this problem, but it's not upstream in the toolchain or kernel,
> and the relocation types it needs are not officially assigned. Getting
> it officially stabilized, supported, and forward-ported to modern tool
> versions is going to be a lot of work. Here are some slides on it:
>
> http://www.slideshare.net/linaroorg/sfo15406-arm-fdpic-toolset-kernel-libraries-for-cortexm-cortexr-mmuless-cores

Thanks for the link. I wasn't aware of this.

> Without FDPIC, it's possible to build a toolchain that produces
> static-PIE executables that will work on nommu (with my recently
> committed kernel patch for running non-FDPIC PIE ELF files on nommu,
> and some additional work still needed to hook it up to work on ARM)
> but these cannot use a shared mapping of the program.
>
> If you or anyone else is up for helping with these tasks that would be
> great.

Right now I'm working on my own small kernel which will (hopefully) 
implement enough of the linux syscall interface to be useful. It's meant 
for small embedded microcontrollers where 4MiB of RAM is considered 
luxurious.

It's based on the now abandoned Prex operating system 
(http://prex.sourceforge.net/) but is a major fork which goes back to a 
traditional monolithic kernel model. I've replaced the C libary with 
musl and userspace is currently toybox.

I'm planning on releasing on github (BSD or no-license) once I can boot 
the first targets (arm-mmu and arm-nommu) to a working userspace and 
pass some unit tests.

Maybe once I've learnt enough about how all this stuff works I'll be 
able to contribute to other projects like linux/musl.

         Patrick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfork on ARM
  2016-04-03 23:18   ` Patrick Oppenlander
@ 2016-04-04  0:14     ` Rich Felker
  2016-04-04  2:25       ` Patrick Oppenlander
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2016-04-04  0:14 UTC (permalink / raw)
  To: musl

On Mon, Apr 04, 2016 at 09:18:15AM +1000, Patrick Oppenlander wrote:
> On 01/04/16 12:53, Rich Felker wrote:
> >On Fri, Apr 01, 2016 at 11:42:47AM +1100, Patrick Oppenlander wrote:
> >>I'm looking at what would be involved in using musl on a nommu arm system.
> >>
> >>As far as I know SYS_vfork is available on ARM, but musl is
> >>currently falling back to fork.
> >>
> >>Are there any plans to support vfork on ARM and other architectures?
> >It's trivial to add vfork, but the usefulness is limited without other
> >changes:
> 
> vfork should also be more efficient than fork which may be
> motivation for supporting it as an optimisation on mmu targets.
> 
> It's also theoretically possible to run a nommu kernel on an mmu
> capable target.
> 
> >1. To my knowledge, all nommu ARM systems are thumb[2]-only, so
> >supporting them as targets requires adapting all the asm files to
> >support building as thumb. This is a task in progress and, as long as
> >we only care about thumb2 (available on armv7-m, i.e. Corext-M3 and
> >up, I think) it's almost done.
> 
> OK that's great!
> 
> >2. For pre-v7, there's no way to do atomics without kernel help, and
> >no established kernel API for this as far as I know. For v7-m this is
> >probably not a problem.
> 
> V6K has support for hardware atomics too.

AFAIK even baseline v6 has ldrex/strex if you don't care about
non-32-bit sizes (which musl doesn't). However it lacks a barrier
instruction which is needed to make them useful. (Technically you can
omit the barriee on UP but then you have dangerous binaries that break
subtly when you move them to a SMP machine, and musl won't support
making those, at least not upstream, as a matter of policy.)

> v7-m supports 32-bit atomics but drops support for 64-bit (no LDREXD
> or STREXD). Is a problem for musl?

v7-m is fine with regard to atomics...

> Do you know if v7-m has the hardware TLS registers?

...but it lacks the coprocessor register for TLS. However since the
instruction to access it is representable in thumb2, the kernel could
trap and emulate it. I think the people doing nommu ARM Linux stuff
added a syscall for get_tls, but in theory that's just as costly as
trap-and-emulate, so I'd rather get trap-and-emulate working so that
the same binaries can run on v7-a without runtime selection of the TLS
method.

> >3. Running on nommu without shareable program text is not much fun;
> >execve is really slow (memcpy of full program) and you need lots of
> >memory. Some people at ST have implemented an FDPIC abi for ARM which
> >solves this problem, but it's not upstream in the toolchain or kernel,
> >and the relocation types it needs are not officially assigned. Getting
> >it officially stabilized, supported, and forward-ported to modern tool
> >versions is going to be a lot of work. Here are some slides on it:
> >
> >http://www.slideshare.net/linaroorg/sfo15406-arm-fdpic-toolset-kernel-libraries-for-cortexm-cortexr-mmuless-cores
> 
> Thanks for the link. I wasn't aware of this.
> 
> >Without FDPIC, it's possible to build a toolchain that produces
> >static-PIE executables that will work on nommu (with my recently
> >committed kernel patch for running non-FDPIC PIE ELF files on nommu,
> >and some additional work still needed to hook it up to work on ARM)
> >but these cannot use a shared mapping of the program.
> >
> >If you or anyone else is up for helping with these tasks that would be
> >great.
> 
> Right now I'm working on my own small kernel which will (hopefully)
> implement enough of the linux syscall interface to be useful. It's
> meant for small embedded microcontrollers where 4MiB of RAM is
> considered luxurious.
> 
> It's based on the now abandoned Prex operating system
> (http://prex.sourceforge.net/) but is a major fork which goes back
> to a traditional monolithic kernel model. I've replaced the C libary
> with musl and userspace is currently toybox.
> 
> I'm planning on releasing on github (BSD or no-license) once I can
> boot the first targets (arm-mmu and arm-nommu) to a working
> userspace and pass some unit tests.
> 
> Maybe once I've learnt enough about how all this stuff works I'll be
> able to contribute to other projects like linux/musl.

If your intent to run a whole userspace environment on it, or just a
single process? If the latter, plain (non-FDPIC) PIE ELF is not a bad
solution at all. It precludes XIP from ROM, but at least you don't
have repeated per-process overhead from many instances of same
executable.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfork on ARM
  2016-04-04  0:14     ` Rich Felker
@ 2016-04-04  2:25       ` Patrick Oppenlander
  2016-04-04  3:37         ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Oppenlander @ 2016-04-04  2:25 UTC (permalink / raw)
  To: musl

On 04/04/16 10:14, Rich Felker wrote:
>> Do you know if v7-m has the hardware TLS registers?
> ...but it lacks the coprocessor register for TLS. However since the
> instruction to access it is representable in thumb2, the kernel could
> trap and emulate it. I think the people doing nommu ARM Linux stuff
> added a syscall for get_tls, but in theory that's just as costly as
> trap-and-emulate, so I'd rather get trap-and-emulate working so that
> the same binaries can run on v7-a without runtime selection of the TLS
> method.

Trap-and-emulate makes perfect sense to me. It's common to fix floating 
point behaviours like this so why not TLS.

Actually, I had a question on this point. I never got to the bottom of 
why ARM uses an architecture specific set_tls syscall rather than 
SYS_set_thread_area like i386 & others. Is this just a historic thing?

>>
>> Right now I'm working on my own small kernel which will (hopefully)
>> implement enough of the linux syscall interface to be useful. It's
>> meant for small embedded microcontrollers where 4MiB of RAM is
>> considered luxurious.
>>
>> It's based on the now abandoned Prex operating system
>> (http://prex.sourceforge.net/) but is a major fork which goes back
>> to a traditional monolithic kernel model. I've replaced the C libary
>> with musl and userspace is currently toybox.
>>
>> I'm planning on releasing on github (BSD or no-license) once I can
>> boot the first targets (arm-mmu and arm-nommu) to a working
>> userspace and pass some unit tests.
>>
>> Maybe once I've learnt enough about how all this stuff works I'll be
>> able to contribute to other projects like linux/musl.
> If your intent to run a whole userspace environment on it, or just a
> single process? If the latter, plain (non-FDPIC) PIE ELF is not a bad
> solution at all. It precludes XIP from ROM, but at least you don't
> have repeated per-process overhead from many instances of same
> executable.

It will be single user, single session, multi process. One long term 
goal is to be self hosting.

Why does PIE preclude XIP? I hoped that it would still be possible to 
XIP a static PIE ELF if the XIP address is known at link time, then use 
a GOT. I haven't thoroughly studied the ABI's here yet and may well be 
barking up the wrong tree.

Worst case scenario I'll just start with relocatable code for nommu and 
work from there.

FDPIC is quite a compelling solution. Hopefully this gains some momentum.

         Patrick



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfork on ARM
  2016-04-04  2:25       ` Patrick Oppenlander
@ 2016-04-04  3:37         ` Rich Felker
  2016-04-04  6:56           ` Patrick Oppenlander
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2016-04-04  3:37 UTC (permalink / raw)
  To: musl

On Mon, Apr 04, 2016 at 12:25:00PM +1000, Patrick Oppenlander wrote:
> On 04/04/16 10:14, Rich Felker wrote:
> >>Do you know if v7-m has the hardware TLS registers?
> >...but it lacks the coprocessor register for TLS. However since the
> >instruction to access it is representable in thumb2, the kernel could
> >trap and emulate it. I think the people doing nommu ARM Linux stuff
> >added a syscall for get_tls, but in theory that's just as costly as
> >trap-and-emulate, so I'd rather get trap-and-emulate working so that
> >the same binaries can run on v7-a without runtime selection of the TLS
> >method.
> 
> Trap-and-emulate makes perfect sense to me. It's common to fix
> floating point behaviours like this so why not TLS.
> 
> Actually, I had a question on this point. I never got to the bottom
> of why ARM uses an architecture specific set_tls syscall rather than
> SYS_set_thread_area like i386 & others. Is this just a historic
> thing?

I think it's just a historical mistake.

> >>Right now I'm working on my own small kernel which will (hopefully)
> >>implement enough of the linux syscall interface to be useful. It's
> >>meant for small embedded microcontrollers where 4MiB of RAM is
> >>considered luxurious.
> >>
> >>It's based on the now abandoned Prex operating system
> >>(http://prex.sourceforge.net/) but is a major fork which goes back
> >>to a traditional monolithic kernel model. I've replaced the C libary
> >>with musl and userspace is currently toybox.
> >>
> >>I'm planning on releasing on github (BSD or no-license) once I can
> >>boot the first targets (arm-mmu and arm-nommu) to a working
> >>userspace and pass some unit tests.
> >>
> >>Maybe once I've learnt enough about how all this stuff works I'll be
> >>able to contribute to other projects like linux/musl.
> >If your intent to run a whole userspace environment on it, or just a
> >single process? If the latter, plain (non-FDPIC) PIE ELF is not a bad
> >solution at all. It precludes XIP from ROM, but at least you don't
> >have repeated per-process overhead from many instances of same
> >executable.
> 
> It will be single user, single session, multi process. One long term
> goal is to be self hosting.
> 
> Why does PIE preclude XIP? I hoped that it would still be possible
> to XIP a static PIE ELF if the XIP address is known at link time,
> then use a GOT. I haven't thoroughly studied the ABI's here yet and
> may well be barking up the wrong tree.

PIE does not hard-code a load address (the loader can pick the load
address, and could match it to ROM) but the relative offset between
load segments (the read-only text and read-write data) is fixed at
ld-time as usual for ELF. This certainly precludes using the text
in-place if there can be more than once instance executing (since they
can't both have their data at the same offset from text) and makes it
difficult to even run one instance in-place (only possible if you can
arrange for free RAM to exist at the right fixed offset. If you really
wanted to hack up such a setup, you would want non-PIE ELF files where
you pick the absolute addresses for load segments, not PIE where you
can only pick the relative address.

> Worst case scenario I'll just start with relocatable code for nommu
> and work from there.

I'm not sure what you mean by relocatable code here.

> FDPIC is quite a compelling solution. Hopefully this gains some momentum.

Yes, it's the right solution for nommu.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: vfork on ARM
  2016-04-04  3:37         ` Rich Felker
@ 2016-04-04  6:56           ` Patrick Oppenlander
  0 siblings, 0 replies; 7+ messages in thread
From: Patrick Oppenlander @ 2016-04-04  6:56 UTC (permalink / raw)
  To: musl

On 04/04/16 13:37, Rich Felker wrote:
> PIE does not hard-code a load address (the loader can pick the load
> address, and could match it to ROM) but the relative offset between
> load segments (the read-only text and read-write data) is fixed at
> ld-time as usual for ELF. This certainly precludes using the text
> in-place if there can be more than once instance executing (since they
> can't both have their data at the same offset from text) and makes it
> difficult to even run one instance in-place (only possible if you can
> arrange for free RAM to exist at the right fixed offset. If you really
> wanted to hack up such a setup, you would want non-PIE ELF files where
> you pick the absolute addresses for load segments, not PIE where you
> can only pick the relative address.

I see the problem in my understanding. I had assumed that for a PIC 
executable the GOT was always accessed through a register rather than 
linked at a fixed location. In my testing this only happens for ARM 
under gcc with a combination of "-fpic -msingle-pic-base 
-mno-pic-data-is-text-relative". Then r9 is used to hold the GOT 
location and needs to be initialised by the program loader.

That might be enough to be able to XIP on arm for static executables.

I think I have some issues with my compiler flags or link script. -fpie 
is generating identical code to -fpic and the resultant ELF is still 
flagged EXEC_P rather than DYNAMIC.

> I'm not sure what you mean by relocatable code here.

I was talking about linking with the --relocatable option to ld then 
processing the relocations at program load time. This is how the project 
I forked from works. The result is still a complete copy of text/data 
for each process.

Also, thank you for taking the time to discuss this now off-topic topic.

         Patrick


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-04-04  6:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01  0:42 vfork on ARM Patrick Oppenlander
2016-04-01  1:53 ` Rich Felker
2016-04-03 23:18   ` Patrick Oppenlander
2016-04-04  0:14     ` Rich Felker
2016-04-04  2:25       ` Patrick Oppenlander
2016-04-04  3:37         ` Rich Felker
2016-04-04  6:56           ` Patrick Oppenlander

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).