mailing list of musl libc
 help / color / mirror / code / Atom feed
* Some questions
@ 2018-04-30  2:52 Patrick Oppenlander
  2018-04-30  3:16 ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick Oppenlander @ 2018-04-30  2:52 UTC (permalink / raw)
  To: musl

Hi,

while working on another project I've been looking through quite a bit
of the musl code and have come up with a few minor questions I think
worth raising:

- Is there a way that spinlocks could be disabled or bypassed on
uniprocessor systems?

- sigisemptyset uses bss. Could be implemented in a similar fashion to
sigemptyset, save bss & would probably be faster.

- getcwd returned buffer size can be incorrect. If you call
getcwd(NULL, 1234) the returned buffer is sized to match the path
length but should be 1234 to be compatible with the glibc extension.

- Are there any plans to support static linking with LTO?

Thanks,

Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30  2:52 Some questions Patrick Oppenlander
@ 2018-04-30  3:16 ` Rich Felker
  2018-04-30  3:55   ` Patrick Oppenlander
                     ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Rich Felker @ 2018-04-30  3:16 UTC (permalink / raw)
  To: musl

On Mon, Apr 30, 2018 at 12:52:06PM +1000, Patrick Oppenlander wrote:
> Hi,
> 
> while working on another project I've been looking through quite a bit
> of the musl code and have come up with a few minor questions I think
> worth raising:
> 
> - Is there a way that spinlocks could be disabled or bypassed on
> uniprocessor systems?

Whether locks are needed is a matter of whether there are multiple
threads, not whether it's uniprocessor or multiprocessor. For some
things where it's likely to matter (stdio, malloc, some other
internals), locks are already optimized out when there is only one
thread. In other cases it was deemed either too costly/difficult or
irrelevant to overall performance.

> - sigisemptyset uses bss. Could be implemented in a similar fashion to
> sigemptyset, save bss & would probably be faster.

Ahh, yes. I wonder if gcc has any way to force const zero objects into
rodata rather than bss..? In any case memcmp is not an efficient way
to implement this, so maybe it should be changed.

> - getcwd returned buffer size can be incorrect. If you call
> getcwd(NULL, 1234) the returned buffer is sized to match the path
> length but should be 1234 to be compatible with the glibc extension.

I'll look at this. It seems like a worse behavior for most callers,
but maybe it should match.

> - Are there any plans to support static linking with LTO?

It should be possible, but gcc mishandles the crt files when doing
LTO. I think if you suppress LTO for them it works. Someone posted
results experimenting with it either on the list or IRC channel
recently. I'll see if I can find anything useful.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30  3:16 ` Rich Felker
@ 2018-04-30  3:55   ` Patrick Oppenlander
  2018-04-30 15:35     ` Rich Felker
  2018-04-30  5:17   ` Patrick Oppenlander
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Patrick Oppenlander @ 2018-04-30  3:55 UTC (permalink / raw)
  To: musl

On Mon, Apr 30, 2018 at 1:16 PM, Rich Felker <dalias@libc.org> wrote:
> On Mon, Apr 30, 2018 at 12:52:06PM +1000, Patrick Oppenlander wrote:
>> - Is there a way that spinlocks could be disabled or bypassed on
>> uniprocessor systems?
>
> Whether locks are needed is a matter of whether there are multiple
> threads, not whether it's uniprocessor or multiprocessor. For some
> things where it's likely to matter (stdio, malloc, some other
> internals), locks are already optimized out when there is only one
> thread. In other cases it was deemed either too costly/difficult or
> irrelevant to overall performance.

I was talking about the case of a uniprocessor system running a multi
theaded process.

In that case the "spin" part of spinlock just burns time & electrons.
The "lock" part obviously can't be omitted. Calling straight through
to the kernel is the most efficient thing to do.

Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30  3:16 ` Rich Felker
  2018-04-30  3:55   ` Patrick Oppenlander
@ 2018-04-30  5:17   ` Patrick Oppenlander
  2018-04-30 15:29     ` Rich Felker
  2018-04-30  5:29   ` Patrick Oppenlander
  2018-05-01  0:10   ` Patrick Oppenlander
  3 siblings, 1 reply; 21+ messages in thread
From: Patrick Oppenlander @ 2018-04-30  5:17 UTC (permalink / raw)
  To: musl

On Mon, Apr 30, 2018 at 1:16 PM, Rich Felker <dalias@libc.org> wrote:
>> - Are there any plans to support static linking with LTO?
>
> It should be possible, but gcc mishandles the crt files when doing
> LTO. I think if you suppress LTO for them it works. Someone posted
> results experimenting with it either on the list or IRC channel
> recently. I'll see if I can find anything useful.

Last time I tried to LTO a C library I ran into this:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63215#c1

I ended up building parts of the library as -fno-lto which worked but
was fairly painful.

But that was 2014, so maybe the situation is different now.

Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30  3:16 ` Rich Felker
  2018-04-30  3:55   ` Patrick Oppenlander
  2018-04-30  5:17   ` Patrick Oppenlander
@ 2018-04-30  5:29   ` Patrick Oppenlander
  2018-04-30 15:31     ` Rich Felker
  2018-05-01  0:10   ` Patrick Oppenlander
  3 siblings, 1 reply; 21+ messages in thread
From: Patrick Oppenlander @ 2018-04-30  5:29 UTC (permalink / raw)
  To: musl

On Mon, Apr 30, 2018 at 1:16 PM, Rich Felker <dalias@libc.org> wrote:
> On Mon, Apr 30, 2018 at 12:52:06PM +1000, Patrick Oppenlander wrote:
>> - getcwd returned buffer size can be incorrect. If you call
>> getcwd(NULL, 1234) the returned buffer is sized to match the path
>> length but should be 1234 to be compatible with the glibc extension.
>
> I'll look at this. It seems like a worse behavior for most callers,
> but maybe it should match.

Actually, my biggest issue with getcwd is that it allocates a PATH_MAX
sized buffer on the stack. That's painful on deeply embedded stuff.

Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30  5:17   ` Patrick Oppenlander
@ 2018-04-30 15:29     ` Rich Felker
  2018-05-01  2:32       ` Patrick Oppenlander
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2018-04-30 15:29 UTC (permalink / raw)
  To: Patrick Oppenlander; +Cc: musl

On Mon, Apr 30, 2018 at 03:17:11PM +1000, Patrick Oppenlander wrote:
> On Mon, Apr 30, 2018 at 1:16 PM, Rich Felker <dalias@libc.org> wrote:
> >> - Are there any plans to support static linking with LTO?
> >
> > It should be possible, but gcc mishandles the crt files when doing
> > LTO. I think if you suppress LTO for them it works. Someone posted
> > results experimenting with it either on the list or IRC channel
> > recently. I'll see if I can find anything useful.
> 
> Last time I tried to LTO a C library I ran into this:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63215#c1

I don't quite understand the issue as reported there. The example is
defining abs in a way that could be self-referential in the absence of
-fno-builtin or -ffreestanding (which implies -fno-builtin). But the
use of -ffreestanding is always necessary to build a libc. This is in
no way specific to LTO.

Anyway I don't think this issue affects musl but I'm not sure.

> I ended up building parts of the library as -fno-lto which worked but
> was fairly painful.
> 
> But that was 2014, so maybe the situation is different now.

The issue I'm aware of is that the reference to _start_c from the
top-level asm _start function is not seen by LTO. This should be
possible to work around just by teaching configure to check if
-fno-lto is supported, and if so having the Makefile pass it for crt
files. But I'm skeptical that you'd get much benefit applying LTO to
musl itself. It's known that you don't for producing a shared library
(ppl have tested this), but static may fare better. In theory a really
smart LTO could optimize out the whole floating point support from
printf when all formst strings are literals and none contain %f's, but
in practice none are anywhere near that smart.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30  5:29   ` Patrick Oppenlander
@ 2018-04-30 15:31     ` Rich Felker
  2018-05-01  2:34       ` Patrick Oppenlander
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2018-04-30 15:31 UTC (permalink / raw)
  To: Patrick Oppenlander; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 759 bytes --]

On Mon, Apr 30, 2018 at 03:29:39PM +1000, Patrick Oppenlander wrote:
> On Mon, Apr 30, 2018 at 1:16 PM, Rich Felker <dalias@libc.org> wrote:
> > On Mon, Apr 30, 2018 at 12:52:06PM +1000, Patrick Oppenlander wrote:
> >> - getcwd returned buffer size can be incorrect. If you call
> >> getcwd(NULL, 1234) the returned buffer is sized to match the path
> >> length but should be 1234 to be compatible with the glibc extension.
> >
> > I'll look at this. It seems like a worse behavior for most callers,
> > but maybe it should match.
> 
> Actually, my biggest issue with getcwd is that it allocates a PATH_MAX
> sized buffer on the stack. That's painful on deeply embedded stuff.

That's unrelated, and could/should be fixed by the attached patch I
think.

Rich

[-- Attachment #2: getcwd-stack.diff --]
[-- Type: text/plain, Size: 376 bytes --]

diff --git a/src/unistd/getcwd.c b/src/unistd/getcwd.c
index 103fbbb..f407ffe 100644
--- a/src/unistd/getcwd.c
+++ b/src/unistd/getcwd.c
@@ -6,10 +6,10 @@
 
 char *getcwd(char *buf, size_t size)
 {
-	char tmp[PATH_MAX];
+	char tmp[buf ? 1 : PATH_MAX];
 	if (!buf) {
 		buf = tmp;
-		size = PATH_MAX;
+		size = sizeof tmp;
 	} else if (!size) {
 		errno = EINVAL;
 		return 0;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30  3:55   ` Patrick Oppenlander
@ 2018-04-30 15:35     ` Rich Felker
  2018-05-01  2:35       ` Patrick Oppenlander
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2018-04-30 15:35 UTC (permalink / raw)
  To: Patrick Oppenlander; +Cc: musl

On Mon, Apr 30, 2018 at 01:55:16PM +1000, Patrick Oppenlander wrote:
> On Mon, Apr 30, 2018 at 1:16 PM, Rich Felker <dalias@libc.org> wrote:
> > On Mon, Apr 30, 2018 at 12:52:06PM +1000, Patrick Oppenlander wrote:
> >> - Is there a way that spinlocks could be disabled or bypassed on
> >> uniprocessor systems?
> >
> > Whether locks are needed is a matter of whether there are multiple
> > threads, not whether it's uniprocessor or multiprocessor. For some
> > things where it's likely to matter (stdio, malloc, some other
> > internals), locks are already optimized out when there is only one
> > thread. In other cases it was deemed either too costly/difficult or
> > irrelevant to overall performance.
> 
> I was talking about the case of a uniprocessor system running a multi
> theaded process.
> 
> In that case the "spin" part of spinlock just burns time & electrons.
> The "lock" part obviously can't be omitted. Calling straight through
> to the kernel is the most efficient thing to do.

I see. Is this an issue you've actually hit? I don't see any obvious
way to make this decision at runtime that doesn't incur unwanted costs
or failure modes, and I suspect we're spinning way too many times
anyway even for SMP (i.e. the ideal solution might just be
significantly reducing the # of spins).

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30  3:16 ` Rich Felker
                     ` (2 preceding siblings ...)
  2018-04-30  5:29   ` Patrick Oppenlander
@ 2018-05-01  0:10   ` Patrick Oppenlander
  2018-05-01 14:19     ` Szabolcs Nagy
  2018-05-01 21:05     ` Rich Felker
  3 siblings, 2 replies; 21+ messages in thread
From: Patrick Oppenlander @ 2018-05-01  0:10 UTC (permalink / raw)
  To: musl

On Mon, Apr 30, 2018 at 1:16 PM, Rich Felker <dalias@libc.org> wrote:
> On Mon, Apr 30, 2018 at 12:52:06PM +1000, Patrick Oppenlander wrote:
>> - sigisemptyset uses bss. Could be implemented in a similar fashion to
>> sigemptyset, save bss & would probably be faster.
>
> Ahh, yes. I wonder if gcc has any way to force const zero objects into
> rodata rather than bss..? In any case memcmp is not an efficient way
> to implement this, so maybe it should be changed.

Doing this:

diff --git a/src/signal/sigisemptyset.c b/src/signal/sigisemptyset.c
index 312c66cf..28f74203 100644
--- a/src/signal/sigisemptyset.c
+++ b/src/signal/sigisemptyset.c
@@ -4,6 +4,6 @@

 int sigisemptyset(const sigset_t *set)
 {
-       static const unsigned long zeroset[_NSIG/8/sizeof(long)];
+       static const unsigned long zeroset[_NSIG/8/sizeof(long)] = {};
        return !memcmp(set, &zeroset, _NSIG/8);
 }

Makes it go into rodata.

I agree that it should probably be rewritten to not use memcmp.

Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30 15:29     ` Rich Felker
@ 2018-05-01  2:32       ` Patrick Oppenlander
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick Oppenlander @ 2018-05-01  2:32 UTC (permalink / raw)
  Cc: musl

On Tue, May 1, 2018 at 1:29 AM, Rich Felker <dalias@libc.org> wrote:
> On Mon, Apr 30, 2018 at 03:17:11PM +1000, Patrick Oppenlander wrote:
>> Last time I tried to LTO a C library I ran into this:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63215#c1
>
> I don't quite understand the issue as reported there. The example is
> defining abs in a way that could be self-referential in the absence of
> -fno-builtin or -ffreestanding (which implies -fno-builtin). But the
> use of -ffreestanding is always necessary to build a libc. This is in
> no way specific to LTO.

There were other related bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60395
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58203

It all boiled down to the case where gcc was emitting a runtime symbol
and couldn't resolve the reference at static link time.

I've just run through the various test cases and they appear to be
resolved in the gcc on this arch box (7.3.1).

> Anyway I don't think this issue affects musl but I'm not sure.

You're probably right.

> The issue I'm aware of is that the reference to _start_c from the
> top-level asm _start function is not seen by LTO. This should be
> possible to work around just by teaching configure to check if
> -fno-lto is supported, and if so having the Makefile pass it for crt
> files. But I'm skeptical that you'd get much benefit applying LTO to
> musl itself. It's known that you don't for producing a shared library
> (ppl have tested this), but static may fare better. In theory a really
> smart LTO could optimize out the whole floating point support from
> printf when all formst strings are literals and none contain %f's, but
> in practice none are anywhere near that smart.

Marking _start_c with __attribute__((used)) should also work.

In a static link LTO will also inline many trivial functions from the
C library which can reduce binary size and help with instruction cache
footprint & locality. I've had good results especially on targets with
small instruction caches executing in place.

Another benefit is constant propagation across the library boundary
which can also eliminate reasonable amounts of code.

Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30 15:31     ` Rich Felker
@ 2018-05-01  2:34       ` Patrick Oppenlander
  2018-05-01 15:52         ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick Oppenlander @ 2018-05-01  2:34 UTC (permalink / raw)
  Cc: musl

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]

On Tue, May 1, 2018 at 1:31 AM, Rich Felker <dalias@libc.org> wrote:
> On Mon, Apr 30, 2018 at 03:29:39PM +1000, Patrick Oppenlander wrote:
>> Actually, my biggest issue with getcwd is that it allocates a PATH_MAX
>> sized buffer on the stack. That's painful on deeply embedded stuff.
>
> That's unrelated, and could/should be fixed by the attached patch I
> think.

Unfortunately that fails to build on arm with:

src/unistd/getcwd.c: In function 'getcwd':
src/unistd/getcwd.c:25:1: error: r7 cannot be used in asm here

I was also having a go at resolving the stack & the buffer size issue
and came up with the attached (untested) patch.

Patrick

[-- Attachment #2: getcwd2.diff --]
[-- Type: text/x-patch, Size: 1208 bytes --]

diff --git a/src/unistd/getcwd.c b/src/unistd/getcwd.c
index 103fbbb5..306dbc4f 100644
--- a/src/unistd/getcwd.c
+++ b/src/unistd/getcwd.c
@@ -3,17 +3,10 @@
 #include <limits.h>
 #include <string.h>
 #include "syscall.h"
+#include "libc.h"
 
-char *getcwd(char *buf, size_t size)
+static char *do_getcwd(char *buf, size_t size)
 {
-	char tmp[PATH_MAX];
-	if (!buf) {
-		buf = tmp;
-		size = PATH_MAX;
-	} else if (!size) {
-		errno = EINVAL;
-		return 0;
-	}
 	long ret = syscall(SYS_getcwd, buf, size);
 	if (ret < 0)
 		return 0;
@@ -21,5 +14,37 @@ char *getcwd(char *buf, size_t size)
 		errno = ENOENT;
 		return 0;
 	}
-	return buf == tmp ? strdup(buf) : buf;
+	return buf;
+}
+
+static char *getcwd_glibc(size_t size)
+{
+	char tmp[PATH_MAX];
+	if (!do_getcwd(tmp, sizeof tmp))
+		return 0;
+	size_t len = strlen(tmp) + 1;
+	if (!size)
+		size = len;
+	else if (size < len) {
+		errno = ERANGE;
+		return 0;
+	}
+	char *buf = malloc(size);
+	if (!buf) {
+		errno = ENOMEM;
+		return 0;
+	}
+	memcpy(buf, tmp, len);
+	return buf;
+}
+
+char *getcwd(char *buf, size_t size)
+{
+	if (!buf)
+		return getcwd_glibc(size);
+	if (!size) {
+		errno = EINVAL;
+		return 0;
+	}
+	return do_getcwd(buf, size);
 }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-04-30 15:35     ` Rich Felker
@ 2018-05-01  2:35       ` Patrick Oppenlander
  2018-05-01 21:03         ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick Oppenlander @ 2018-05-01  2:35 UTC (permalink / raw)
  Cc: musl

On Tue, May 1, 2018 at 1:35 AM, Rich Felker <dalias@libc.org> wrote:
> On Mon, Apr 30, 2018 at 01:55:16PM +1000, Patrick Oppenlander wrote:
>> I was talking about the case of a uniprocessor system running a multi
>> theaded process.
>>
>> In that case the "spin" part of spinlock just burns time & electrons.
>> The "lock" part obviously can't be omitted. Calling straight through
>> to the kernel is the most efficient thing to do.
>
> I see. Is this an issue you've actually hit? I don't see any obvious
> way to make this decision at runtime that doesn't incur unwanted costs
> or failure modes, and I suspect we're spinning way too many times
> anyway even for SMP (i.e. the ideal solution might just be
> significantly reducing the # of spins).

I haven't measured the performance impact of it.

One option could be to configure the number of spins at compile time
and set to zero for known uniprocessor architectures (like armv7m). Or
have a configure override. Really this is just performance tuning,
there's no danger of generating incorrect code.

I can't find a way of detecting a SMP kernel other than parsing the
result of uname(2) which sucks. I was hoping there might be something
in auxv hwcap.

Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-05-01  0:10   ` Patrick Oppenlander
@ 2018-05-01 14:19     ` Szabolcs Nagy
  2018-05-01 21:05     ` Rich Felker
  1 sibling, 0 replies; 21+ messages in thread
From: Szabolcs Nagy @ 2018-05-01 14:19 UTC (permalink / raw)
  To: musl

* Patrick Oppenlander <patrick.oppenlander@gmail.com> [2018-05-01 10:10:45 +1000]:
> On Mon, Apr 30, 2018 at 1:16 PM, Rich Felker <dalias@libc.org> wrote:
> > On Mon, Apr 30, 2018 at 12:52:06PM +1000, Patrick Oppenlander wrote:
> >> - sigisemptyset uses bss. Could be implemented in a similar fashion to
> >> sigemptyset, save bss & would probably be faster.
> >
> > Ahh, yes. I wonder if gcc has any way to force const zero objects into
> > rodata rather than bss..? In any case memcmp is not an efficient way
> > to implement this, so maybe it should be changed.
> 
> Doing this:
> -       static const unsigned long zeroset[_NSIG/8/sizeof(long)];
> +       static const unsigned long zeroset[_NSIG/8/sizeof(long)] = {};
> 
> Makes it go into rodata.

{} is invalid initializer in c, use {0}.

it was a bug in gcc that it treated

 static const int x;

and

 static const int x = {0};

differently, they are completely equivalent in c,
it seems to me that gcc-trunk fixed it and clang
was never broken.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-05-01  2:34       ` Patrick Oppenlander
@ 2018-05-01 15:52         ` Rich Felker
  2018-05-01 17:35           ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2018-05-01 15:52 UTC (permalink / raw)
  To: Patrick Oppenlander; +Cc: musl

On Tue, May 01, 2018 at 12:34:13PM +1000, Patrick Oppenlander wrote:
> On Tue, May 1, 2018 at 1:31 AM, Rich Felker <dalias@libc.org> wrote:
> > On Mon, Apr 30, 2018 at 03:29:39PM +1000, Patrick Oppenlander wrote:
> >> Actually, my biggest issue with getcwd is that it allocates a PATH_MAX
> >> sized buffer on the stack. That's painful on deeply embedded stuff.
> >
> > That's unrelated, and could/should be fixed by the attached patch I
> > think.
> 
> Unfortunately that fails to build on arm with:
> 
> src/unistd/getcwd.c: In function 'getcwd':
> src/unistd/getcwd.c:25:1: error: r7 cannot be used in asm here

Then that's a bug we need to fix or work around elsewhere.
Non-arch-specific source files can't be constrained not to use
perfectly valid C constructs because gcc breaks on them for particular
archs.

I believe it's related to the thumb+framepointer issue that was raised
a while back, but I forget how that ended and if we ever solved it.

> I was also having a go at resolving the stack & the buffer size issue
> and came up with the attached (untested) patch.
> 
> Patrick

> diff --git a/src/unistd/getcwd.c b/src/unistd/getcwd.c
> index 103fbbb5..306dbc4f 100644
> --- a/src/unistd/getcwd.c
> +++ b/src/unistd/getcwd.c
> @@ -3,17 +3,10 @@
>  #include <limits.h>
>  #include <string.h>
>  #include "syscall.h"
> +#include "libc.h"
>  
> -char *getcwd(char *buf, size_t size)
> +static char *do_getcwd(char *buf, size_t size)
>  {
> -	char tmp[PATH_MAX];
> -	if (!buf) {
> -		buf = tmp;
> -		size = PATH_MAX;
> -	} else if (!size) {
> -		errno = EINVAL;
> -		return 0;
> -	}
>  	long ret = syscall(SYS_getcwd, buf, size);
>  	if (ret < 0)
>  		return 0;
> @@ -21,5 +14,37 @@ char *getcwd(char *buf, size_t size)
>  		errno = ENOENT;
>  		return 0;
>  	}
> -	return buf == tmp ? strdup(buf) : buf;
> +	return buf;
> +}
> +
> +static char *getcwd_glibc(size_t size)
> +{
> +	char tmp[PATH_MAX];
> +	if (!do_getcwd(tmp, sizeof tmp))
> +		return 0;
> +	size_t len = strlen(tmp) + 1;
> +	if (!size)
> +		size = len;
> +	else if (size < len) {
> +		errno = ERANGE;
> +		return 0;
> +	}
> +	char *buf = malloc(size);
> +	if (!buf) {
> +		errno = ENOMEM;
> +		return 0;
> +	}
> +	memcpy(buf, tmp, len);
> +	return buf;
> +}
> +
> +char *getcwd(char *buf, size_t size)
> +{
> +	if (!buf)
> +		return getcwd_glibc(size);
> +	if (!size) {
> +		errno = EINVAL;
> +		return 0;
> +	}
> +	return do_getcwd(buf, size);
>  }

This isn't acceptable. It makes the code much larger (at the source
level) and harder to read, and the only reason it works is failure of
gcc to optimize heavily. It could just as easily still end up using
the full PATH_MAX space on the stack, if gcc inlines and hoists stuff,
or if gcc wanted to be really awful it could still end up using a
frame pointer.

Let's look back at the framepointer mess and see if there's a way to
get gcc not to break. If not we may need to skip inline syscalls and
call out to the extern __syscall when building for thumb, but I'd
really rather not have to do that.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-05-01 15:52         ` Rich Felker
@ 2018-05-01 17:35           ` Rich Felker
  2018-05-01 21:49             ` Andre McCurdy
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2018-05-01 17:35 UTC (permalink / raw)
  To: Patrick Oppenlander; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 3961 bytes --]

On Tue, May 01, 2018 at 11:52:33AM -0400, Rich Felker wrote:
> On Tue, May 01, 2018 at 12:34:13PM +1000, Patrick Oppenlander wrote:
> > On Tue, May 1, 2018 at 1:31 AM, Rich Felker <dalias@libc.org> wrote:
> > > On Mon, Apr 30, 2018 at 03:29:39PM +1000, Patrick Oppenlander wrote:
> > >> Actually, my biggest issue with getcwd is that it allocates a PATH_MAX
> > >> sized buffer on the stack. That's painful on deeply embedded stuff.
> > >
> > > That's unrelated, and could/should be fixed by the attached patch I
> > > think.
> > 
> > Unfortunately that fails to build on arm with:
> > 
> > src/unistd/getcwd.c: In function 'getcwd':
> > src/unistd/getcwd.c:25:1: error: r7 cannot be used in asm here
> 
> Then that's a bug we need to fix or work around elsewhere.
> Non-arch-specific source files can't be constrained not to use
> perfectly valid C constructs because gcc breaks on them for particular
> archs.
> 
> I believe it's related to the thumb+framepointer issue that was raised
> a while back, but I forget how that ended and if we ever solved it.
> 
> > I was also having a go at resolving the stack & the buffer size issue
> > and came up with the attached (untested) patch.
> > 
> > Patrick
> 
> > diff --git a/src/unistd/getcwd.c b/src/unistd/getcwd.c
> > index 103fbbb5..306dbc4f 100644
> > --- a/src/unistd/getcwd.c
> > +++ b/src/unistd/getcwd.c
> > @@ -3,17 +3,10 @@
> >  #include <limits.h>
> >  #include <string.h>
> >  #include "syscall.h"
> > +#include "libc.h"
> >  
> > -char *getcwd(char *buf, size_t size)
> > +static char *do_getcwd(char *buf, size_t size)
> >  {
> > -	char tmp[PATH_MAX];
> > -	if (!buf) {
> > -		buf = tmp;
> > -		size = PATH_MAX;
> > -	} else if (!size) {
> > -		errno = EINVAL;
> > -		return 0;
> > -	}
> >  	long ret = syscall(SYS_getcwd, buf, size);
> >  	if (ret < 0)
> >  		return 0;
> > @@ -21,5 +14,37 @@ char *getcwd(char *buf, size_t size)
> >  		errno = ENOENT;
> >  		return 0;
> >  	}
> > -	return buf == tmp ? strdup(buf) : buf;
> > +	return buf;
> > +}
> > +
> > +static char *getcwd_glibc(size_t size)
> > +{
> > +	char tmp[PATH_MAX];
> > +	if (!do_getcwd(tmp, sizeof tmp))
> > +		return 0;
> > +	size_t len = strlen(tmp) + 1;
> > +	if (!size)
> > +		size = len;
> > +	else if (size < len) {
> > +		errno = ERANGE;
> > +		return 0;
> > +	}
> > +	char *buf = malloc(size);
> > +	if (!buf) {
> > +		errno = ENOMEM;
> > +		return 0;
> > +	}
> > +	memcpy(buf, tmp, len);
> > +	return buf;
> > +}
> > +
> > +char *getcwd(char *buf, size_t size)
> > +{
> > +	if (!buf)
> > +		return getcwd_glibc(size);
> > +	if (!size) {
> > +		errno = EINVAL;
> > +		return 0;
> > +	}
> > +	return do_getcwd(buf, size);
> >  }
> 
> This isn't acceptable. It makes the code much larger (at the source
> level) and harder to read, and the only reason it works is failure of
> gcc to optimize heavily. It could just as easily still end up using
> the full PATH_MAX space on the stack, if gcc inlines and hoists stuff,
> or if gcc wanted to be really awful it could still end up using a
> frame pointer.
> 
> Let's look back at the framepointer mess and see if there's a way to
> get gcc not to break. If not we may need to skip inline syscalls and
> call out to the extern __syscall when building for thumb, but I'd
> really rather not have to do that.

Looking back, it seems where we left it is just that you need to make
sure frame pointer is disabled if building as thumb. But that's not
reliable because gcc forcibly re-enables frame pointer (including
frame pointer ABI constraints, which it doesn't need to) if you use a
VLA or alloca.

I'm considering applying the attached patch, which would make it so
VLAs don't break thumb syscalls and eliminate the need to force frame
pointer off when building as thumb. This is all a workaround for gcc
being wrong about not letting you use r7, but it seems reasonable and
non-invasive. It just omits r7 from the constraints and uses a temp
register to save/restore it.

Rich

[-- Attachment #2: thumb-syscall.diff --]
[-- Type: text/plain, Size: 3417 bytes --]

diff --git a/arch/arm/syscall_arch.h b/arch/arm/syscall_arch.h
index 6023303..30863eb 100644
--- a/arch/arm/syscall_arch.h
+++ b/arch/arm/syscall_arch.h
@@ -3,74 +3,89 @@
 ((union { long long ll; long l[2]; }){ .ll = x }).l[1]
 #define __SYSCALL_LL_O(x) 0, __SYSCALL_LL_E((x))
 
+#ifdef __thumb__
+
+#define __ASM____R7__
+#define R7_OPERAND "ri"(r7)
+#define __asm_syscall(...) do { \
+	__asm__ __volatile__ ( "mov %1,r7 ; mov r7,%2 ; svc 0 ; mov r7,%1" \
+	: "=r"(r0), "=&r"((int){0}) : __VA_ARGS__ : "memory"); \
+	return r0; \
+	} while (0)
+
+#else
+
+#define __ASM____R7__ __asm__("r7")
+#define R7_OPERAND "r"(r7)
 #define __asm_syscall(...) do { \
 	__asm__ __volatile__ ( "svc 0" \
 	: "=r"(r0) : __VA_ARGS__ : "memory"); \
 	return r0; \
 	} while (0)
+#endif
 
 static inline long __syscall0(long n)
 {
-	register long r7 __asm__("r7") = n;
+	register long r7 __ASM____R7__ = n;
 	register long r0 __asm__("r0");
-	__asm_syscall("r"(r7));
+	__asm_syscall(R7_OPERAND);
 }
 
 static inline long __syscall1(long n, long a)
 {
-	register long r7 __asm__("r7") = n;
+	register long r7 __ASM____R7__ = n;
 	register long r0 __asm__("r0") = a;
-	__asm_syscall("r"(r7), "0"(r0));
+	__asm_syscall(R7_OPERAND, "0"(r0));
 }
 
 static inline long __syscall2(long n, long a, long b)
 {
-	register long r7 __asm__("r7") = n;
+	register long r7 __ASM____R7__ = n;
 	register long r0 __asm__("r0") = a;
 	register long r1 __asm__("r1") = b;
-	__asm_syscall("r"(r7), "0"(r0), "r"(r1));
+	__asm_syscall(R7_OPERAND, "0"(r0), "r"(r1));
 }
 
 static inline long __syscall3(long n, long a, long b, long c)
 {
-	register long r7 __asm__("r7") = n;
+	register long r7 __ASM____R7__ = n;
 	register long r0 __asm__("r0") = a;
 	register long r1 __asm__("r1") = b;
 	register long r2 __asm__("r2") = c;
-	__asm_syscall("r"(r7), "0"(r0), "r"(r1), "r"(r2));
+	__asm_syscall(R7_OPERAND, "0"(r0), "r"(r1), "r"(r2));
 }
 
 static inline long __syscall4(long n, long a, long b, long c, long d)
 {
-	register long r7 __asm__("r7") = n;
+	register long r7 __ASM____R7__ = n;
 	register long r0 __asm__("r0") = a;
 	register long r1 __asm__("r1") = b;
 	register long r2 __asm__("r2") = c;
 	register long r3 __asm__("r3") = d;
-	__asm_syscall("r"(r7), "0"(r0), "r"(r1), "r"(r2), "r"(r3));
+	__asm_syscall(R7_OPERAND, "0"(r0), "r"(r1), "r"(r2), "r"(r3));
 }
 
 static inline long __syscall5(long n, long a, long b, long c, long d, long e)
 {
-	register long r7 __asm__("r7") = n;
+	register long r7 __ASM____R7__ = n;
 	register long r0 __asm__("r0") = a;
 	register long r1 __asm__("r1") = b;
 	register long r2 __asm__("r2") = c;
 	register long r3 __asm__("r3") = d;
 	register long r4 __asm__("r4") = e;
-	__asm_syscall("r"(r7), "0"(r0), "r"(r1), "r"(r2), "r"(r3), "r"(r4));
+	__asm_syscall(R7_OPERAND, "0"(r0), "r"(r1), "r"(r2), "r"(r3), "r"(r4));
 }
 
 static inline long __syscall6(long n, long a, long b, long c, long d, long e, long f)
 {
-	register long r7 __asm__("r7") = n;
+	register long r7 __ASM____R7__ = n;
 	register long r0 __asm__("r0") = a;
 	register long r1 __asm__("r1") = b;
 	register long r2 __asm__("r2") = c;
 	register long r3 __asm__("r3") = d;
 	register long r4 __asm__("r4") = e;
 	register long r5 __asm__("r5") = f;
-	__asm_syscall("r"(r7), "0"(r0), "r"(r1), "r"(r2), "r"(r3), "r"(r4), "r"(r5));
+	__asm_syscall(R7_OPERAND, "0"(r0), "r"(r1), "r"(r2), "r"(r3), "r"(r4), "r"(r5));
 }
 
 #define VDSO_USEFUL

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-05-01  2:35       ` Patrick Oppenlander
@ 2018-05-01 21:03         ` Rich Felker
  2018-05-01 22:14           ` Patrick Oppenlander
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2018-05-01 21:03 UTC (permalink / raw)
  To: Patrick Oppenlander; +Cc: musl

On Tue, May 01, 2018 at 12:35:58PM +1000, Patrick Oppenlander wrote:
> On Tue, May 1, 2018 at 1:35 AM, Rich Felker <dalias@libc.org> wrote:
> > On Mon, Apr 30, 2018 at 01:55:16PM +1000, Patrick Oppenlander wrote:
> >> I was talking about the case of a uniprocessor system running a multi
> >> theaded process.
> >>
> >> In that case the "spin" part of spinlock just burns time & electrons.
> >> The "lock" part obviously can't be omitted. Calling straight through
> >> to the kernel is the most efficient thing to do.
> >
> > I see. Is this an issue you've actually hit? I don't see any obvious
> > way to make this decision at runtime that doesn't incur unwanted costs
> > or failure modes, and I suspect we're spinning way too many times
> > anyway even for SMP (i.e. the ideal solution might just be
> > significantly reducing the # of spins).
> 
> I haven't measured the performance impact of it.
> 
> One option could be to configure the number of spins at compile time
> and set to zero for known uniprocessor architectures (like armv7m). Or
> have a configure override. Really this is just performance tuning,
> there's no danger of generating incorrect code.
> 
> I can't find a way of detecting a SMP kernel other than parsing the
> result of uname(2) which sucks. I was hoping there might be something
> in auxv hwcap.

There doesn't seem to be any good way. Unless you find this is a real
performance bottleneck for you, I'd like to punt on it for now and
come back when someone has time to do real research on number of spins
that make sense and whether the number is low enough not to care for
non-SMP.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-05-01  0:10   ` Patrick Oppenlander
  2018-05-01 14:19     ` Szabolcs Nagy
@ 2018-05-01 21:05     ` Rich Felker
  1 sibling, 0 replies; 21+ messages in thread
From: Rich Felker @ 2018-05-01 21:05 UTC (permalink / raw)
  To: Patrick Oppenlander; +Cc: musl

On Tue, May 01, 2018 at 10:10:45AM +1000, Patrick Oppenlander wrote:
> On Mon, Apr 30, 2018 at 1:16 PM, Rich Felker <dalias@libc.org> wrote:
> > On Mon, Apr 30, 2018 at 12:52:06PM +1000, Patrick Oppenlander wrote:
> >> - sigisemptyset uses bss. Could be implemented in a similar fashion to
> >> sigemptyset, save bss & would probably be faster.
> >
> > Ahh, yes. I wonder if gcc has any way to force const zero objects into
> > rodata rather than bss..? In any case memcmp is not an efficient way
> > to implement this, so maybe it should be changed.
> 
> Doing this:
> 
> diff --git a/src/signal/sigisemptyset.c b/src/signal/sigisemptyset.c
> index 312c66cf..28f74203 100644
> --- a/src/signal/sigisemptyset.c
> +++ b/src/signal/sigisemptyset.c
> @@ -4,6 +4,6 @@
> 
>  int sigisemptyset(const sigset_t *set)
>  {
> -       static const unsigned long zeroset[_NSIG/8/sizeof(long)];
> +       static const unsigned long zeroset[_NSIG/8/sizeof(long)] = {};
>         return !memcmp(set, &zeroset, _NSIG/8);
>  }
> 
> Makes it go into rodata.
> 
> I agree that it should probably be rewritten to not use memcmp.

I'm just changing it to avoid memcmp. This gets rid of the bss and
makes the code smaller.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-05-01 17:35           ` Rich Felker
@ 2018-05-01 21:49             ` Andre McCurdy
  2018-05-01 22:14               ` Szabolcs Nagy
  0 siblings, 1 reply; 21+ messages in thread
From: Andre McCurdy @ 2018-05-01 21:49 UTC (permalink / raw)
  To: musl; +Cc: Patrick Oppenlander

On Tue, May 1, 2018 at 10:35 AM, Rich Felker <dalias@libc.org> wrote:
> On Tue, May 01, 2018 at 11:52:33AM -0400, Rich Felker wrote:
>> On Tue, May 01, 2018 at 12:34:13PM +1000, Patrick Oppenlander wrote:
>> > On Tue, May 1, 2018 at 1:31 AM, Rich Felker <dalias@libc.org> wrote:
>> > > On Mon, Apr 30, 2018 at 03:29:39PM +1000, Patrick Oppenlander wrote:
>> > >> Actually, my biggest issue with getcwd is that it allocates a PATH_MAX
>> > >> sized buffer on the stack. That's painful on deeply embedded stuff.
>> > >
>> > > That's unrelated, and could/should be fixed by the attached patch I
>> > > think.
>> >
>> > Unfortunately that fails to build on arm with:
>> >
>> > src/unistd/getcwd.c: In function 'getcwd':
>> > src/unistd/getcwd.c:25:1: error: r7 cannot be used in asm here
>>
>> Then that's a bug we need to fix or work around elsewhere.
>> Non-arch-specific source files can't be constrained not to use
>> perfectly valid C constructs because gcc breaks on them for particular
>> archs.
>>
>> I believe it's related to the thumb+framepointer issue that was raised
>> a while back, but I forget how that ended and if we ever solved it.
>>
>> > I was also having a go at resolving the stack & the buffer size issue
>> > and came up with the attached (untested) patch.
>> >
>> > Patrick
>>
>> > diff --git a/src/unistd/getcwd.c b/src/unistd/getcwd.c
>> > index 103fbbb5..306dbc4f 100644
>> > --- a/src/unistd/getcwd.c
>> > +++ b/src/unistd/getcwd.c
>> > @@ -3,17 +3,10 @@
>> >  #include <limits.h>
>> >  #include <string.h>
>> >  #include "syscall.h"
>> > +#include "libc.h"
>> >
>> > -char *getcwd(char *buf, size_t size)
>> > +static char *do_getcwd(char *buf, size_t size)
>> >  {
>> > -   char tmp[PATH_MAX];
>> > -   if (!buf) {
>> > -           buf = tmp;
>> > -           size = PATH_MAX;
>> > -   } else if (!size) {
>> > -           errno = EINVAL;
>> > -           return 0;
>> > -   }
>> >     long ret = syscall(SYS_getcwd, buf, size);
>> >     if (ret < 0)
>> >             return 0;
>> > @@ -21,5 +14,37 @@ char *getcwd(char *buf, size_t size)
>> >             errno = ENOENT;
>> >             return 0;
>> >     }
>> > -   return buf == tmp ? strdup(buf) : buf;
>> > +   return buf;
>> > +}
>> > +
>> > +static char *getcwd_glibc(size_t size)
>> > +{
>> > +   char tmp[PATH_MAX];
>> > +   if (!do_getcwd(tmp, sizeof tmp))
>> > +           return 0;
>> > +   size_t len = strlen(tmp) + 1;
>> > +   if (!size)
>> > +           size = len;
>> > +   else if (size < len) {
>> > +           errno = ERANGE;
>> > +           return 0;
>> > +   }
>> > +   char *buf = malloc(size);
>> > +   if (!buf) {
>> > +           errno = ENOMEM;
>> > +           return 0;
>> > +   }
>> > +   memcpy(buf, tmp, len);
>> > +   return buf;
>> > +}
>> > +
>> > +char *getcwd(char *buf, size_t size)
>> > +{
>> > +   if (!buf)
>> > +           return getcwd_glibc(size);
>> > +   if (!size) {
>> > +           errno = EINVAL;
>> > +           return 0;
>> > +   }
>> > +   return do_getcwd(buf, size);
>> >  }
>>
>> This isn't acceptable. It makes the code much larger (at the source
>> level) and harder to read, and the only reason it works is failure of
>> gcc to optimize heavily. It could just as easily still end up using
>> the full PATH_MAX space on the stack, if gcc inlines and hoists stuff,
>> or if gcc wanted to be really awful it could still end up using a
>> frame pointer.
>>
>> Let's look back at the framepointer mess and see if there's a way to
>> get gcc not to break. If not we may need to skip inline syscalls and
>> call out to the extern __syscall when building for thumb, but I'd
>> really rather not have to do that.
>
> Looking back, it seems where we left it is just that you need to make
> sure frame pointer is disabled if building as thumb. But that's not
> reliable because gcc forcibly re-enables frame pointer (including
> frame pointer ABI constraints, which it doesn't need to) if you use a
> VLA or alloca.
>
> I'm considering applying the attached patch, which would make it so
> VLAs don't break thumb syscalls and eliminate the need to force frame
> pointer off when building as thumb. This is all a workaround for gcc
> being wrong about not letting you use r7, but it seems reasonable and
> non-invasive. It just omits r7 from the constraints and uses a temp
> register to save/restore it.

This seems to fail when compiling src/thread/arm/__set_thread_area.c:

  {standard input}: Assembler messages:
  {standard input}:45: Error: invalid constant (f0005) after fixup
  make: *** [obj/src/thread/arm/__set_thread_area.o] Error 1

Without the patch, __set_thread_area() effectively compiles to:

__set_thread_area:
    push    {r7, lr}
    ldr    r7, .L2
    pop    {r7, pc}
.L2:
    .word    983045

With the patch:

__set_thread_area:
    push    {r7, lr}
    add    r7, sp, #0
    mov r3,r7 ; mov r7,#983045 ; svc 0 ; mov r7,r3
    pop    {r7, pc}

ie the immediate value 0xf0005 can't be loaded directly into r7 with a
single Thumb2 mov instruction.

I tried a quick test to replace the single mov instruction in
__asm_syscall() with a movw + movt pair:

#define __asm_syscall(...) do { \
    __asm__ __volatile__ ( "mov %1,r7 ; movw r7,%2 & 0xffff ; movt
r7,%2 >> 16 ; svc 0 ; mov r7,%1" \
    : "=r"(r0), "=&r"((int){0}) : __VA_ARGS__ : "memory"); \
    return r0; \
    } while (0)

It fixes __set_thread_area() but causes a new error in syscall() as
the syscall number is passed to __asm_syscall() as a variable rather
than an immediate:

  {standard input}: Assembler messages:
  {standard input}:75: Error: constant expression expected -- `movw
r7,r6&0xffff'
  {standard input}:75: Error: constant expression expected -- `movt r7,r6>>16'
  make: *** [obj/src/misc/syscall.o] Error 1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-05-01 21:03         ` Rich Felker
@ 2018-05-01 22:14           ` Patrick Oppenlander
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick Oppenlander @ 2018-05-01 22:14 UTC (permalink / raw)
  Cc: musl

On Wed, May 2, 2018 at 7:03 AM, Rich Felker <dalias@libc.org> wrote:
> On Tue, May 01, 2018 at 12:35:58PM +1000, Patrick Oppenlander wrote:
>> On Tue, May 1, 2018 at 1:35 AM, Rich Felker <dalias@libc.org> wrote:
>> > On Mon, Apr 30, 2018 at 01:55:16PM +1000, Patrick Oppenlander wrote:
>> >> I was talking about the case of a uniprocessor system running a multi
>> >> theaded process.
>> >>
>> >> In that case the "spin" part of spinlock just burns time & electrons.
>> >> The "lock" part obviously can't be omitted. Calling straight through
>> >> to the kernel is the most efficient thing to do.
>> >
>> > I see. Is this an issue you've actually hit? I don't see any obvious
>> > way to make this decision at runtime that doesn't incur unwanted costs
>> > or failure modes, and I suspect we're spinning way too many times
>> > anyway even for SMP (i.e. the ideal solution might just be
>> > significantly reducing the # of spins).
>>
>> I haven't measured the performance impact of it.
>>
>> One option could be to configure the number of spins at compile time
>> and set to zero for known uniprocessor architectures (like armv7m). Or
>> have a configure override. Really this is just performance tuning,
>> there's no danger of generating incorrect code.
>>
>> I can't find a way of detecting a SMP kernel other than parsing the
>> result of uname(2) which sucks. I was hoping there might be something
>> in auxv hwcap.
>
> There doesn't seem to be any good way. Unless you find this is a real
> performance bottleneck for you, I'd like to punt on it for now and
> come back when someone has time to do real research on number of spins
> that make sense and whether the number is low enough not to care for
> non-SMP.

No problem.

Patrick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-05-01 21:49             ` Andre McCurdy
@ 2018-05-01 22:14               ` Szabolcs Nagy
  2018-05-02 13:42                 ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2018-05-01 22:14 UTC (permalink / raw)
  To: musl; +Cc: Patrick Oppenlander

* Andre McCurdy <armccurdy@gmail.com> [2018-05-01 14:49:00 -0700]:
> On Tue, May 1, 2018 at 10:35 AM, Rich Felker <dalias@libc.org> wrote:
> > I'm considering applying the attached patch, which would make it so
> > VLAs don't break thumb syscalls and eliminate the need to force frame
> > pointer off when building as thumb. This is all a workaround for gcc
> > being wrong about not letting you use r7, but it seems reasonable and
> > non-invasive. It just omits r7 from the constraints and uses a temp
> > register to save/restore it.
> 
> This seems to fail when compiling src/thread/arm/__set_thread_area.c:
> 
>   {standard input}: Assembler messages:
>   {standard input}:45: Error: invalid constant (f0005) after fixup
>   make: *** [obj/src/thread/arm/__set_thread_area.o] Error 1
> 
> Without the patch, __set_thread_area() effectively compiles to:
> 
> __set_thread_area:
>     push    {r7, lr}
>     ldr    r7, .L2
>     pop    {r7, pc}
> .L2:
>     .word    983045
> 
> With the patch:
> 
> __set_thread_area:
>     push    {r7, lr}
>     add    r7, sp, #0
>     mov r3,r7 ; mov r7,#983045 ; svc 0 ; mov r7,r3
>     pop    {r7, pc}
> 
> ie the immediate value 0xf0005 can't be loaded directly into r7 with a
> single Thumb2 mov instruction.
> 
> I tried a quick test to replace the single mov instruction in
> __asm_syscall() with a movw + movt pair:
> 

i think the syscall can be just inline asm here,
since __set_thread_area is arm specific code.

in generic code the mov r7,.. hack should work
and fixes the vla issue.

(alternatively using just "r" operand instead
of "rI" does not generate immediate, but will
use more registers/instructions)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Some questions
  2018-05-01 22:14               ` Szabolcs Nagy
@ 2018-05-02 13:42                 ` Rich Felker
  0 siblings, 0 replies; 21+ messages in thread
From: Rich Felker @ 2018-05-02 13:42 UTC (permalink / raw)
  To: musl; +Cc: Patrick Oppenlander

On Wed, May 02, 2018 at 12:14:41AM +0200, Szabolcs Nagy wrote:
> * Andre McCurdy <armccurdy@gmail.com> [2018-05-01 14:49:00 -0700]:
> > On Tue, May 1, 2018 at 10:35 AM, Rich Felker <dalias@libc.org> wrote:
> > > I'm considering applying the attached patch, which would make it so
> > > VLAs don't break thumb syscalls and eliminate the need to force frame
> > > pointer off when building as thumb. This is all a workaround for gcc
> > > being wrong about not letting you use r7, but it seems reasonable and
> > > non-invasive. It just omits r7 from the constraints and uses a temp
> > > register to save/restore it.
> > 
> > This seems to fail when compiling src/thread/arm/__set_thread_area.c:
> > 
> >   {standard input}: Assembler messages:
> >   {standard input}:45: Error: invalid constant (f0005) after fixup
> >   make: *** [obj/src/thread/arm/__set_thread_area.o] Error 1
> > 
> > Without the patch, __set_thread_area() effectively compiles to:
> > 
> > __set_thread_area:
> >     push    {r7, lr}
> >     ldr    r7, .L2
> >     pop    {r7, pc}
> > .L2:
> >     .word    983045
> > 
> > With the patch:
> > 
> > __set_thread_area:
> >     push    {r7, lr}
> >     add    r7, sp, #0
> >     mov r3,r7 ; mov r7,#983045 ; svc 0 ; mov r7,r3
> >     pop    {r7, pc}
> > 
> > ie the immediate value 0xf0005 can't be loaded directly into r7 with a
> > single Thumb2 mov instruction.
> > 
> > I tried a quick test to replace the single mov instruction in
> > __asm_syscall() with a movw + movt pair:
> > 
> 
> i think the syscall can be just inline asm here,
> since __set_thread_area is arm specific code.
> 
> in generic code the mov r7,.. hack should work
> and fixes the vla issue.
> 
> (alternatively using just "r" operand instead
> of "rI" does not generate immediate, but will
> use more registers/instructions)

The above posted patch used "ri" not "rI". The "I" constraint, which I
switched to after posting the patch, only allows immediates that work
with mov, so the issue does not arise.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-05-02 13:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-30  2:52 Some questions Patrick Oppenlander
2018-04-30  3:16 ` Rich Felker
2018-04-30  3:55   ` Patrick Oppenlander
2018-04-30 15:35     ` Rich Felker
2018-05-01  2:35       ` Patrick Oppenlander
2018-05-01 21:03         ` Rich Felker
2018-05-01 22:14           ` Patrick Oppenlander
2018-04-30  5:17   ` Patrick Oppenlander
2018-04-30 15:29     ` Rich Felker
2018-05-01  2:32       ` Patrick Oppenlander
2018-04-30  5:29   ` Patrick Oppenlander
2018-04-30 15:31     ` Rich Felker
2018-05-01  2:34       ` Patrick Oppenlander
2018-05-01 15:52         ` Rich Felker
2018-05-01 17:35           ` Rich Felker
2018-05-01 21:49             ` Andre McCurdy
2018-05-01 22:14               ` Szabolcs Nagy
2018-05-02 13:42                 ` Rich Felker
2018-05-01  0:10   ` Patrick Oppenlander
2018-05-01 14:19     ` Szabolcs Nagy
2018-05-01 21:05     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).