mailing list of musl libc
 help / color / mirror / code / Atom feed
* musl/SH-FDPIC progress
@ 2015-09-11  1:48 Rich Felker
  2015-09-11 19:05 ` [0pf] " Rich Felker
  2015-09-22  5:27 ` Updated " Rich Felker
  0 siblings, 2 replies; 7+ messages in thread
From: Rich Felker @ 2015-09-11  1:48 UTC (permalink / raw)
  To: musl; +Cc: 0pf

I now have a working prototype of static-linked FDPIC binary support
in musl libc using gcc 5.2 (with the forward-ported SH-FDPIC patch)
and binutils 2.25.1. I've tested simple example programs and some
non-trivial examples with threads and they work both under qemu-sh4eb
with FDPIC support added (needs a small patch) and on real J2
hardware, where they successfully share text/execute-in-place.

The gcc patch is presently against gcc with all the other patches from
my musl-cross-make repo applied, so some refactoring will be needed to
propose it upstream, but I think it's ready for initial review. After
a little bit more cleanup (mainly bad specs logic) I'll go ahead and
put a version of this patch in the toolchain repo.

On the musl side, the changes are not ready for upstream. Adding FDPIC
revealed that the way we're bootstrapping the dynamic linker and PIE
entry point does not make sense, but I already knew that anyway --
having to use -export-dynamic for static-linked PIE was already a
problem. I might however be able to get just the non-PIE, static
linking only version of FDPIC support upstreamable in the next few
days and go ahead and commit that before working on the bigger
upstream changes that will be needed to make full FDPIC support
(including dynamic linking) possible.

On the kernel side, a really ugly issue is blocking FDPIC deployment:
the kernel interprets an ELF header bit the opposite of how it's
specified and how ld sets it. Details are in this thread:

http://www.spinics.net/lists/linux-sh/msg44965.html

Until that's resolved, it's impossible to make future-proof FDPIC
binaries that will reliably share text. (They'll work regardless, but
won't share text.)

I'll follow up soon with details on patches needed to make this all
work.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [0pf] musl/SH-FDPIC progress
  2015-09-11  1:48 musl/SH-FDPIC progress Rich Felker
@ 2015-09-11 19:05 ` Rich Felker
  2015-09-12  3:29   ` Rich Felker
  2015-09-13 23:45   ` Rob Landley
  2015-09-22  5:27 ` Updated " Rich Felker
  1 sibling, 2 replies; 7+ messages in thread
From: Rich Felker @ 2015-09-11 19:05 UTC (permalink / raw)
  To: musl; +Cc: 0pf

On Thu, Sep 10, 2015 at 09:48:50PM -0400, Rich Felker wrote:
> I now have a working prototype of static-linked FDPIC binary support
> in musl libc using gcc 5.2 (with the forward-ported SH-FDPIC patch)
> and binutils 2.25.1. I've tested simple example programs and some
> non-trivial examples with threads and they work both under qemu-sh4eb
> with FDPIC support added (needs a small patch) and on real J2
> hardware, where they successfully share text/execute-in-place.
> 
> The gcc patch is presently against gcc with all the other patches from
> my musl-cross-make repo applied, so some refactoring will be needed to
> propose it upstream, but I think it's ready for initial review. After
> a little bit more cleanup (mainly bad specs logic) I'll go ahead and
> put a version of this patch in the toolchain repo.

I've posted the current version of my patches in my toolchain repo:

https://github.com/richfelker/musl-cross-make/commit/3a0c9775b69ae311d89b6a9df6788d43f206cf6c

As noted in the commit message, a patch on the musl side is needed to
get actual working binaries. I'll post a version of this (not
appropriate for upstream) soon.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [0pf] musl/SH-FDPIC progress
  2015-09-11 19:05 ` [0pf] " Rich Felker
@ 2015-09-12  3:29   ` Rich Felker
  2015-09-13 23:45   ` Rob Landley
  1 sibling, 0 replies; 7+ messages in thread
From: Rich Felker @ 2015-09-12  3:29 UTC (permalink / raw)
  To: musl; +Cc: 0pf

On Fri, Sep 11, 2015 at 03:05:22PM -0400, Rich Felker wrote:
> On Thu, Sep 10, 2015 at 09:48:50PM -0400, Rich Felker wrote:
> > I now have a working prototype of static-linked FDPIC binary support
> > in musl libc using gcc 5.2 (with the forward-ported SH-FDPIC patch)
> > and binutils 2.25.1. I've tested simple example programs and some
> > non-trivial examples with threads and they work both under qemu-sh4eb
> > with FDPIC support added (needs a small patch) and on real J2
> > hardware, where they successfully share text/execute-in-place.
> > 
> > The gcc patch is presently against gcc with all the other patches from
> > my musl-cross-make repo applied, so some refactoring will be needed to
> > propose it upstream, but I think it's ready for initial review. After
> > a little bit more cleanup (mainly bad specs logic) I'll go ahead and
> > put a version of this patch in the toolchain repo.
> 
> I've posted the current version of my patches in my toolchain repo:
> 
> https://github.com/richfelker/musl-cross-make/commit/3a0c9775b69ae311d89b6a9df6788d43f206cf6c
> 
> As noted in the commit message, a patch on the musl side is needed to
> get actual working binaries. I'll post a version of this (not
> appropriate for upstream) soon.

Actually I just went ahead and got the static-linking-only part
suitable for upstream and committed it. As of this commit to musl:

http://git.musl-libc.org/cgit/musl/commit/?id=d4c82d05b8d0ee97f6356d60986799a95ed5bd74

it should be possible to get a working fdpic build. Note that PIE is
not supported yet and my toolchain build has a GCC patch to always
force -pie. So patches/gcc-5.2.0/0006-defaultpie.diff needs to be
removed before starting to get a working fdpic toolchain right now.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [0pf] musl/SH-FDPIC progress
  2015-09-11 19:05 ` [0pf] " Rich Felker
  2015-09-12  3:29   ` Rich Felker
@ 2015-09-13 23:45   ` Rob Landley
  2015-09-14  1:36     ` Rich Felker
  2015-09-14 23:10     ` Rich Felker
  1 sibling, 2 replies; 7+ messages in thread
From: Rob Landley @ 2015-09-13 23:45 UTC (permalink / raw)
  To: Rich Felker, musl; +Cc: 0pf

On 09/11/2015 02:05 PM, Rich Felker wrote:
> As noted in the commit message, a patch on the musl side is needed to
> get actual working binaries. I'll post a version of this (not
> appropriate for upstream) soon.

Did you ever write more documentation beyond:

http://www.aerifal.cx/~dalias/binfmts.html

I should add a link to that from the web page, but I dunno if there's
more. I'd like to get a section on nommu.org comparing binflt, static
pie, and fdpic.

Jeff Dionne commented on your toolchain:

> The only issue is he seems to be exclusively targeting fdpic.  The
> issue is you loose a few registers.   I don't know what gcc will do
> performance wise in that case, we need to test.  Hopefully we don't
> need a 'fall back' to bFLT, or something.
>
> A few % hit in an embedded system is a lot...

To which I don't know how to respond. Register starvation mostly seems
to crop up on x86 (where they have horrible behind the scenes hardware
hacks with register renaming and multiple register profiles and so on to
get decent performance out of a lousy assembly design). But sh2 has 16
general purpose registers, which is much less constained...

I'd say "get 'em both out and benchmark them" but the binflt toolchain
I've got (cutting an aboriginal linux release as we speak by the way) is
gcc 4.2.1+binutils 2.17, and yours is something like 8 years later so
the whole code generation backend is basically redone. Not remotely
apples to apples there.

And then there's llvm and you said you got libfirm working? What's
involved in making a libfirm toolchain, trying to build a system with
it, and getting j2 code generation out of it?

Rob


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [0pf] musl/SH-FDPIC progress
  2015-09-13 23:45   ` Rob Landley
@ 2015-09-14  1:36     ` Rich Felker
  2015-09-14 23:10     ` Rich Felker
  1 sibling, 0 replies; 7+ messages in thread
From: Rich Felker @ 2015-09-14  1:36 UTC (permalink / raw)
  To: Rob Landley; +Cc: musl, 0pf

On Sun, Sep 13, 2015 at 06:45:49PM -0500, Rob Landley wrote:
> On 09/11/2015 02:05 PM, Rich Felker wrote:
> > As noted in the commit message, a patch on the musl side is needed to
> > get actual working binaries. I'll post a version of this (not
> > appropriate for upstream) soon.
> 
> Did you ever write more documentation beyond:
> 
> http://www.aerifal.cx/~dalias/binfmts.html
> 
> I should add a link to that from the web page, but I dunno if there's
> more. I'd like to get a section on nommu.org comparing binflt, static
> pie, and fdpic.

No, that was more brainstorming. I think there's a certain audience
that would find the form I worked out there useful, but I agree
something simpler would be nice for a general audience. I'll see what
I can put together. Some more discussions with people (yourself or
others) interested in the issue but who don't really understand the
motivations for fdpic would be really helpful for me to figure out how
to best get this across.

> Jeff Dionne commented on your toolchain:
> 
> > The only issue is he seems to be exclusively targeting fdpic.  The
> > issue is you loose a few registers.   I don't know what gcc will do
> > performance wise in that case, we need to test.  Hopefully we don't
> > need a 'fall back' to bFLT, or something.
> >
> > A few % hit in an embedded system is a lot...
> 
> To which I don't know how to respond. Register starvation mostly seems
> to crop up on x86 (where they have horrible behind the scenes hardware
> hacks with register renaming and multiple register profiles and so on to
> get decent performance out of a lousy assembly design). But sh2 has 16
> general purpose registers, which is much less constained...

There's only one register "lost", r12, and the loss is much less
severe than in "normal" pic code because on fdpic r12 is
call-clobbered rather than call-saved. This means that leaf functions
which do not access global data can freely clobber r12 -- a situation
even better than the normal non-fdpic ABI, since you have an extra
free register you don't have to save/restore -- and in functions that
do need to make calls, but which also have high register pressure, the
got pointer can be spilled to the stack and only reloaded at call
time.

> I'd say "get 'em both out and benchmark them" but the binflt toolchain
> I've got (cutting an aboriginal linux release as we speak by the way) is
> gcc 4.2.1+binutils 2.17, and yours is something like 8 years later so
> the whole code generation backend is basically redone. Not remotely
> apples to apples there.

We could do some measurements, but in theory the only thing that's
more expensive than normal PIC is indirect calls via function pointers
or PLT. Calls within the same DSO/main-program can still be direct.
Versus non-PIC code, you of course end up doing a little more work to
load globals, as in:

	mov.l 1f, rn   // load got slot offset
	add r12, rn    // add got pointer
	mov.l @rn, rn  // load address from got
	mov.l @rn, rn  // load data

vs:

	mov.l 1f, rn   // load absolute address of data
	mov.l @rn, rn  // load data

Of course if we're comparing against bFLT, the only want bFLT can have
faster code is by having TEXTRELs all over the place, i.e. not being
PIC at all. If you want shared-flat, you need to be using a GOT
register and you have essentially the same costs as fdpic but almost
none of the advantages. I'm not sure which was used in practice:
non-shareable bFLT with TEXTRELs all over the place or PIC bFLT.

> And then there's llvm and you said you got libfirm working? What's
> involved in making a libfirm toolchain, trying to build a system with
> it, and getting j2 code generation out of it?

I think you misinterpreted one of my tweets. I was announcing that
libfirm now has working PIC support to make a working libc.so, but
only on the archs where it has mature codegen to begin with. That's
mainly i386, but other archs including x86_64, arm, and mips are
progressing. If there's interest I could push for or work on sh
support; afaik there's none at all right now. IMO a good starting
point would be a target-generic framework for fdpic-style pic
variants. Being able to offer fdpic as a security measure for MMU-ful
archs like x86 would be quite interesting (there are published papers
on why ASLR is not very useful because of the known constant
displacement between text and data) and something I could pitch to
them as a chance to be the first to offer it. But this is all
longer-term.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [0pf] musl/SH-FDPIC progress
  2015-09-13 23:45   ` Rob Landley
  2015-09-14  1:36     ` Rich Felker
@ 2015-09-14 23:10     ` Rich Felker
  1 sibling, 0 replies; 7+ messages in thread
From: Rich Felker @ 2015-09-14 23:10 UTC (permalink / raw)
  To: Rob Landley; +Cc: musl, 0pf

On Sun, Sep 13, 2015 at 06:45:49PM -0500, Rob Landley wrote:
> Jeff Dionne commented on your toolchain:
> 
> > The only issue is he seems to be exclusively targeting fdpic.  The
> > issue is you loose a few registers.   I don't know what gcc will do
> > performance wise in that case, we need to test.  Hopefully we don't
> > need a 'fall back' to bFLT, or something.
> >
> > A few % hit in an embedded system is a lot...
> 
> To which I don't know how to respond. Register starvation mostly seems
> to crop up on x86 (where they have horrible behind the scenes hardware
> hacks with register renaming and multiple register profiles and so on to
> get decent performance out of a lousy assembly design). But sh2 has 16
> general purpose registers, which is much less constained...
> 
> I'd say "get 'em both out and benchmark them" but the binflt toolchain
> I've got (cutting an aboriginal linux release as we speak by the way) is
> gcc 4.2.1+binutils 2.17, and yours is something like 8 years later so
> the whole code generation backend is basically redone. Not remotely
> apples to apples there.

With an aim of assessing what the old bFLT toolchain for sh2 is doing,
I dug through a lot more legacy uClinux docs and other nommu targets
in GCC. The bFLT format is documented as being able to do shared-text
and XIP, but of course for this to be possible, the compiler's codegen
needs to be compatible with data being loaded at an arbitrary
location. In both upstream gcc and the old sh2-uclinux toolchains I
have, the only targets which have the -msep-data option, which is
what's needed for shared-text/XIP, are Blackfin and m68k.

What the old sh2-uclinux bFLT toolchain is producing are "Fully
Relocated Binaries". These are (although they don't necessarily have
to be) non-PIC, and thus have the most efficient possible code in
terms of register availability, code density, and performance. On the
other hand they cannot share text or execute in place, and they have a
lot of relocations, which increases file size and startup time -- but
of course the biggest contributor to startup time is the need to
memcpy the whole file in kernelspace, since it's not shareable.

Aside from a few bytes of header data and slightly different entry
point code sequences, binaries of the above form are completely
equivalent to what you get with my PIE toolchain using the options
-static -fno-pic -pie, an unconventional but valid combination that
gives you an ET_DYN format binary that can be loaded at arbitrary
address but that uses TEXTRELs instead of PIC to achieve that.

If we did have -msep-data bFLT for sh (note: afaik no such gcc patch
exists, but it would not be hard to make one) the results would be
near-identical to my new fdpic toolchain with static linking. Since ld
will optimize any calls to func@PLT to a direct call to func (simply
by resolving the address that way), the GOT register never has to get
reloaded in a static-linked program except in the case of indirect
calls (via a function pointer) in which case there are 1-2 extra
instructions involved in making the call.

TL;DR:

I don't see any cases where bFLT could give a measurable advantage.

- The bFLT toolchain we have now is equivalent to my non-FDPIC PIE
  toolchain with -static -no-pic -pie.

- A hypothetical shared-text/XIP bFLT toolchain is essentially
  equivalent static linking to my FDPIC toolchain.

- Both of my toolchains are much more flexible (and free of hacks)
  than anything bFLT-based, support dynamic linking, etc.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Updated musl/SH-FDPIC progress
  2015-09-11  1:48 musl/SH-FDPIC progress Rich Felker
  2015-09-11 19:05 ` [0pf] " Rich Felker
@ 2015-09-22  5:27 ` Rich Felker
  1 sibling, 0 replies; 7+ messages in thread
From: Rich Felker @ 2015-09-22  5:27 UTC (permalink / raw)
  To: musl; +Cc: 0pf

musl's fdpic dynamic linker is now working with basic functionality.
There are some limitations I hope to lift soon; see the commit
message:

http://git.musl-libc.org/cgit/musl/commit/?id=7a9669e977e5f750cf72ccbd2614f8b72ce02c4c

I've also made some important fixes to the toolchain patches. The
issue with the backwards fdpic flag is now fixed on the binutils side,
and some gcc codegen issues are fixed on the gcc side. They're all
available (with build scripts) here:

https://github.com/richfelker/musl-cross-make

Make sure to enable fdpic in config.mak if you want to use it.

Binaries produced with this toolchain will run out of the box on:

- qemu-sh4eb (with or without fdpic loader added)
- Real sh3/4 kernels and qemu-system-sh4eb or real hardware
- Real sh2/j2 (but unified syscall trap patch is needed; latest is
  attached)

This is all new stuff so I'd love to hear feedback from anyone who
tries it, good or bad.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-09-22  5:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-11  1:48 musl/SH-FDPIC progress Rich Felker
2015-09-11 19:05 ` [0pf] " Rich Felker
2015-09-12  3:29   ` Rich Felker
2015-09-13 23:45   ` Rob Landley
2015-09-14  1:36     ` Rich Felker
2015-09-14 23:10     ` Rich Felker
2015-09-22  5:27 ` Updated " Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).