mailing list of musl libc
 help / color / mirror / code / Atom feed
* Locale change ideas
@ 2015-05-27  3:12 Rich Felker
  2015-05-27  3:53 ` Rich Felker
  2015-05-27 20:37 ` Rich Felker
  0 siblings, 2 replies; 3+ messages in thread
From: Rich Felker @ 2015-05-27  3:12 UTC (permalink / raw)
  To: musl

Just some ideas I want to put in writing and put out for discussion:

Implementing a static all-C locale_t object (and, if we make C locale
byte-based, also an all-C.UTF-8 one) returned by newlocale when
possible. This would ensure that uses of newlocale "C" for robustness
against LC_NUMERIC radix points and such would be fail-safe and
high-performance. One caveat: freelocale needs to be aware of these
static locale objects so it doesn't try to free them, and newlocale
also needs to call calloc rather than modifying-in-place when using
them as a base.

Eliminating the messages_name field from the locale struct. I don't
like the way it's subject to data races. Even if changing locale out
from under other threads is not meaningful, it should not have data
races this bad. My thought is to intern a __locale_map object whenever
a new name is selected for LC_MESSAGES, even if the name does not map
to a locale file (now we intern only existant locales). When the
locale file does not exist, the map would just contain an empty/NOP mo
image and the requested name. Then (1) the object would not be
accessed until its permanent name is in place, and (2) switching
messages language names would inherit atomic semantics from the
handling of all other locale categories.

Handling non-synchronized reads of locale categories. The cat[]
pointers are updated atomically, but there's no barrier on the reading
side. This is okay and totally correct from an "I don't want
atrociously bad performance to support broken code" standpoint, but I
think we should take some effort to make sure it's safe. I don't mind
if a thread that hsan't synchronized with the changes to the locale
object doesn't see the latest settings, but I am worried about what
happenes when it reads the __locale_map objects that the cat[]
pointers point to. Is there a way to guarantee that if they see p,
they also see any stores to *p sequenced before the update of p by a
barrier? This is related to the nightmare that is consume-order, I
think.

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Locale change ideas
  2015-05-27  3:12 Locale change ideas Rich Felker
@ 2015-05-27  3:53 ` Rich Felker
  2015-05-27 20:37 ` Rich Felker
  1 sibling, 0 replies; 3+ messages in thread
From: Rich Felker @ 2015-05-27  3:53 UTC (permalink / raw)
  To: musl

On Tue, May 26, 2015 at 11:12:08PM -0400, Rich Felker wrote:
> Just some ideas I want to put in writing and put out for discussion:
> 
> Implementing a static all-C locale_t object (and, if we make C locale
> byte-based, also an all-C.UTF-8 one) returned by newlocale when
> possible. This would ensure that uses of newlocale "C" for robustness
> against LC_NUMERIC radix points and such would be fail-safe and
> high-performance. One caveat: freelocale needs to be aware of these
> static locale objects so it doesn't try to free them, and newlocale
> also needs to call calloc rather than modifying-in-place when using
> them as a base.

Trying to work out how to do this, I ran into some interesting
things...

The naive way to do the above is just to check for the C locale name
(and its aliases) and return the static object in that case. But that
misses a lot of chances for optimization when the C locale is only
selected implicitly because "" is used and env vars are not set. The
big time this is likely is if someone does something like:

	new = newlocale(LC_CTYPE_MASK, "C", (locale_t)0);

In principle, this need not be the (static) C locale since categories
other than LC_CTYPE will be initialized with the default locale.
However, in the common case where locale vars are not set, this would
yield an all-C locale.

An alternate approach is to first create the new locale_t object on
the stack, then check if it's equal to the static C locale, and only
allocate storage and copy it if it's not equal. This is what I'll
probably do, but I noticed issues that should be resolved first.

My first thought was that first creating a temp locale, then copying,
would have twice the atomic overhead, since __setlocalecat performs an
atomic operation for each category. Fortunately, it turns out that's
entirely unnecessary.

Conceptually, locale objects are immutable for their lifetimes. Even
though newlocale can modify an existing locale object, what it's
formally doing is ending the lifetime of the old one and creating a
new one. Thus there is no legal way to modify a non-global locale
object while other threads may be using it. So we can do away with the
atomics for non-global locales.

We can also get rid of atomics for the global locale simply by having
setlocale use a lock while modifying it. Since the categories might be
read concurrently without holding a lock, though, they need to be
volatile. But rather than keeping them volatile like they are now:

	struct __locale_map *volatile cat[4];

let's just make the whole global_locale object volatile:

	volatile struct __locale_struct global_locale;

Since __pthread_self()->locale might point to global_locale, it needs
to be a pointer-to-volatile now, but it's still nice to make
__locale_struct itself free of volatile members so we can memcpy it.

I still think we need to consider the 'consume' semantics for threads
accessing global_locale->cat[n]->... without synchronization, but
that's orthogonal to the above changes which I should be able to get
started on.

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Locale change ideas
  2015-05-27  3:12 Locale change ideas Rich Felker
  2015-05-27  3:53 ` Rich Felker
@ 2015-05-27 20:37 ` Rich Felker
  1 sibling, 0 replies; 3+ messages in thread
From: Rich Felker @ 2015-05-27 20:37 UTC (permalink / raw)
  To: musl

On Tue, May 26, 2015 at 11:12:08PM -0400, Rich Felker wrote:
> Just some ideas I want to put in writing and put out for discussion:
> 
> Implementing a static all-C locale_t object (and, if we make C locale
> byte-based, also an all-C.UTF-8 one) returned by newlocale when
> possible. This would ensure that uses of newlocale "C" for robustness

Done; see commit aeeac9ca5490d7d90fe061ab72da446c01ddf746.

> Eliminating the messages_name field from the locale struct. I don't

Done; see commit 61a3364d246e72b903da8b76c2e27a225a51351e. This also
made handling of LC_CTYPE and LC_NUMERIC uniform with the rest of the
categories rather than special-cased.

> Handling non-synchronized reads of locale categories. The cat[]

This still needs to be reviewed. There is a barrier (due to UNLOCK)
between the writes to a new __locale_map object (which is immutable
once it's written) and storing that object's address anywhere. It's
not clear to me whether any further synchronization is necessary. It's
not needed for archs with strong memory order, and I'm fairly sure it
should not be needed for archs with dependency ordering, but I'm not
sure if this covers everything. The case to worry about is where the
new object gets allocated in memory that happens to already be cached
(because it was used then freed) by another thread, and whether that
thread might read stale data after obtainin/reading the pointer to it.
Anyone know if this is a real possibility, and if there are good
mitigations without putting a barrier at the point of read? (This is
surely related to the consume-order stuff everybody is so fond of
discussing and arguing about these days... :)

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-27 20:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-27  3:12 Locale change ideas Rich Felker
2015-05-27  3:53 ` Rich Felker
2015-05-27 20:37 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).