From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7782 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Locale change ideas Date: Tue, 26 May 2015 23:53:45 -0400 Message-ID: <20150527035345.GJ17573@brightrain.aerifal.cx> References: <20150527031208.GA5255@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1432698843 25028 80.91.229.3 (27 May 2015 03:54:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 27 May 2015 03:54:03 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7794-gllmg-musl=m.gmane.org@lists.openwall.com Wed May 27 05:54:03 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YxSPp-0005dC-Kc for gllmg-musl@m.gmane.org; Wed, 27 May 2015 05:54:01 +0200 Original-Received: (qmail 20124 invoked by uid 550); 27 May 2015 03:53:59 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 20096 invoked from network); 27 May 2015 03:53:59 -0000 Content-Disposition: inline In-Reply-To: <20150527031208.GA5255@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:7782 Archived-At: On Tue, May 26, 2015 at 11:12:08PM -0400, Rich Felker wrote: > Just some ideas I want to put in writing and put out for discussion: > > Implementing a static all-C locale_t object (and, if we make C locale > byte-based, also an all-C.UTF-8 one) returned by newlocale when > possible. This would ensure that uses of newlocale "C" for robustness > against LC_NUMERIC radix points and such would be fail-safe and > high-performance. One caveat: freelocale needs to be aware of these > static locale objects so it doesn't try to free them, and newlocale > also needs to call calloc rather than modifying-in-place when using > them as a base. Trying to work out how to do this, I ran into some interesting things... The naive way to do the above is just to check for the C locale name (and its aliases) and return the static object in that case. But that misses a lot of chances for optimization when the C locale is only selected implicitly because "" is used and env vars are not set. The big time this is likely is if someone does something like: new = newlocale(LC_CTYPE_MASK, "C", (locale_t)0); In principle, this need not be the (static) C locale since categories other than LC_CTYPE will be initialized with the default locale. However, in the common case where locale vars are not set, this would yield an all-C locale. An alternate approach is to first create the new locale_t object on the stack, then check if it's equal to the static C locale, and only allocate storage and copy it if it's not equal. This is what I'll probably do, but I noticed issues that should be resolved first. My first thought was that first creating a temp locale, then copying, would have twice the atomic overhead, since __setlocalecat performs an atomic operation for each category. Fortunately, it turns out that's entirely unnecessary. Conceptually, locale objects are immutable for their lifetimes. Even though newlocale can modify an existing locale object, what it's formally doing is ending the lifetime of the old one and creating a new one. Thus there is no legal way to modify a non-global locale object while other threads may be using it. So we can do away with the atomics for non-global locales. We can also get rid of atomics for the global locale simply by having setlocale use a lock while modifying it. Since the categories might be read concurrently without holding a lock, though, they need to be volatile. But rather than keeping them volatile like they are now: struct __locale_map *volatile cat[4]; let's just make the whole global_locale object volatile: volatile struct __locale_struct global_locale; Since __pthread_self()->locale might point to global_locale, it needs to be a pointer-to-volatile now, but it's still nice to make __locale_struct itself free of volatile members so we can memcpy it. I still think we need to consider the 'consume' semantics for threads accessing global_locale->cat[n]->... without synchronization, but that's orthogonal to the above changes which I should be able to get started on. Rich