mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Eliminating preference for avoiding thread pointer? Cost on MIPS?
Date: Fri, 15 May 2015 23:55:44 -0400	[thread overview]
Message-ID: <20150516035544.GA4274@brightrain.aerifal.cx> (raw)

[-- Attachment #1: Type: text/plain, Size: 3203 bytes --]

Traditionally, musl has gone to pretty great lengths to avoid
depending on the thread pointer. The original reason was that it was
not always initialized, and when it was, the init was lazy. This
resulted in a lot of cruft, where we would have lots of constructs of
the form:

	bar = some_predicate ? __pthread_self()->foo : global_foo

or similar. Being that these predicates depend(ed) on globals, they
were/are rather expensive in position-independent code on most archs.
Now that the thread pointer is always initialized at startup (since
1.1.0) and assumed to have succeeded (since 1.1.9; musl now performs
HCF if it fails), this seems to be an unnecessary cost. Not only does
it cost cycles; it also has a complexity cost in terms of code to
maintain the state of the predicates (e.g. the atomics for locale
state) and in terms of libc-internal assumptions. So I'd like to just
use the thread pointer directly wherever it makes sense, and take
advantage of the fact that we have it.

Unfortunately, there's one arch where thread-pointer access may be
prohibitively costly: old MIPS. On the MIPS o32 ABI, the thread
pointer is accessed via the "rdhwr $3,$29" instruction, which was only
introduced in MIPS32rev2. MIPS-I, MIPS-II, and possibly the original
MIPS32 lack it, and while Linux has a "fast path" trap to emulate it,
I'm not clear on how "fast" it is.

First, I'd like to find out how slow this trap is. If it's something
like 150 cycles, that's ugly but probably acceptable. If it's more
like 1000 cycles, that's a big problem. If anyone can run the attached
test program on real MIPS-I or MIPS-II hardware and give me the
results, please do! Compile it once with -O3 -DDO_RDHWR and once with
just -O3 and send the (one-line) output of both to the list. It
doesn't matter what libc your MIPS system is using -- any should be
fine, but you might need to link with -lrt on glibc or uclibc.

Now, depending on the results, we have 2 options:

1. If rdhwr emulation on old MIPS is not horribly slow, just do the
   unconditional thread-pointer usage with no MIPS-specific changes.

2. If introducing rdhwr all over the place on old MIPS would be a
   serious performance regression, we take advantage of the fact that
   we're not using compiler-generate TLS access (which would emit
   rdhwr instructions) in musl. We control the definition of
   __pthread_self(), which musl uses internally to get the thread
   pointer (adjusted to point to the pthread structure), so when
   compiling code that might run on old MIPS (according to -march
   settings and the resulting predefined macros), we can define
   __pthread_self() to an expression or function that first checks a
   global to see if process is multi-threaded, and if not, just reads
   the thread pointer from a global instead of using rdhwr. Basically,
   this would be keeping the same way we're doing things now, but
   tucking it away as an old-MIPS-specific hack and encapsulating it
   in __pthread_self() rather than having it in every caller.

So I think, whatever the performance results end up being, we have an
acceptable path forward to use the (possibly virtual) thread pointer
unconditionally throughout musl.

Rich

[-- Attachment #2: mips_rdhwr.c --]
[-- Type: text/plain, Size: 559 bytes --]

#include <time.h>
#include <stdio.h>

int main()
{
	struct timespec t0, t;
	unsigned i, x=0;
	clock_gettime(CLOCK_REALTIME, &t0);
	for (i=0; i<1000000; i++) {
		register void *tp __asm__("$3");
#ifdef DO_RDHWR
		__asm__ __volatile__(".word 0x7c03e83b" : "=r"(tp));
#else
		__asm__ __volatile__("move %0,$0" : "=r"(tp));
#endif
		x += (unsigned)tp;
	}
	clock_gettime(CLOCK_REALTIME, &t);
	t.tv_sec -= t0.tv_sec;
	if ((t.tv_nsec -= t0.tv_nsec) < 0) {
		t.tv_nsec += 1000000000;
		t.tv_sec--;
	}
	printf("%u %lld.%.9ld\n", x, (long long)t.tv_sec, t.tv_nsec);
}


             reply	other threads:[~2015-05-16  3:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-16  3:55 Rich Felker [this message]
2015-05-16  6:19 ` Rich Felker
2015-05-16 16:33 ` Isaac Dunham
2015-05-16 16:48   ` Rich Felker
2015-05-18 19:35     ` Andre McCurdy
2015-05-18 20:16       ` Rich Felker
2015-05-18 20:20         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150516035544.GA4274@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).