From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7666 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Deduplicating atomics written in terms of CAS Date: Sun, 17 May 2015 13:59:18 -0400 Message-ID: <20150517175918.GQ17573@brightrain.aerifal.cx> References: <20150517045536.GA25046@brightrain.aerifal.cx> <20150517061430.GL17573@brightrain.aerifal.cx> <1431848239.11628.3.camel@inria.fr> <20150517162854.GN17573@brightrain.aerifal.cx> <1431881993.4219.1.camel@inria.fr> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1431885577 26684 80.91.229.3 (17 May 2015 17:59:37 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 17 May 2015 17:59:37 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7678-gllmg-musl=m.gmane.org@lists.openwall.com Sun May 17 19:59:37 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Yu2qd-0004hm-2s for gllmg-musl@m.gmane.org; Sun, 17 May 2015 19:59:35 +0200 Original-Received: (qmail 21925 invoked by uid 550); 17 May 2015 17:59:34 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 21907 invoked from network); 17 May 2015 17:59:33 -0000 Content-Disposition: inline In-Reply-To: <1431881993.4219.1.camel@inria.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:7666 Archived-At: On Sun, May 17, 2015 at 06:59:53PM +0200, Jens Gustedt wrote: > Am Sonntag, den 17.05.2015, 12:28 -0400 schrieb Rich Felker: > > On Sun, May 17, 2015 at 09:37:19AM +0200, Jens Gustedt wrote: > > > Am Sonntag, den 17.05.2015, 02:14 -0400 schrieb Rich Felker: > > > > - a_and_64/a_or_64 (malloc only; these are misnamed too) > > > > > > I should have checked the use before my last mail. They are > > > definitively misnamed. > > > > > > Both uses of them look ok concerning atomicity, only one of the a_and > > > or a_or calls triggers. > > > > > > The only object (mal.binmap) to which this is applied is in fact > > > volatile, so it must actually be reloaded all the time it is used. > > > > > > But in line 352 the code uses another assumption, then, that 64 bit > > > loads always are atomic. I don't see why this should hold in general. > > > > I don't think there's such an assumption. The only assumption is that > > each bit is read exactly the number of times it would be on the > > abstract machine, so that we can't observe inconsistent values for the > > same object. Lack of any heavy synchronization around reading the mask > > may result in failure to see some changes or seeing them out of order, > > but it doesn't matter: If a bin is wrongly seen as non-empty, locking > > and attempting to unbin from it will fail. If it is wrongly seen as > > empty, the worst that can happen is a less-optimal (but would have > > been optimal an instant earlier) larger chunk gets split instead of > > using a smaller one to satisfy the allocation. > > So to summarize what you are saying that in this special context, an > out-of-sync load of one of the sub-words of a 64 bit word, would only > impact on performance and not correctness. Nice. Right. Technically it may also impact fragmentation-optimality, but only in a way that would also be affected by timing/scheduling differences, so I don't think it makes sense to insist on an optimality condition there. > A maybe stupid question, then: why do atomics at all, here? You could > perhaps remove all that 64 bit pseudo atomic stuff then. We absolutely would not want two concurrent attempts to set a bit to clobber each others result, so that the other result would _never_ be seen. This would result in free memory being lost -- the bin would appear to be empty until something was added to it again. That's the motivation for the atomics. > > Of course it's an open question whether the complex atomics and > > fine-grained locking in malloc help or hurt performance more on > > average. I'd really like to measure this at some point. Overhauling > > malloc to try to get significantly better multi-threaded performance > > without the fragmentation-optimality sacrifices other mallocs make is > > a long-term goal I have open. > > > > > We already have a similar assumption for 32 bit int all over the > > > place, and I am not too happy with such "silent" assumption. For 64 > > > bit, this assumption looks wrong to me. > > > > I agree I wouldn't be happy with such an assumption, but I don't think > > it's being made here. > > > > > I would be much happier by using explicit atomic types and atomic load > > > functions or macros everywhere. For normal builds these could be dummy > > > types made to resolve to the actual code that we have, now. But this > > > would allow to have hardening builds, that check for consistency of > > > all atomic accesses. > > > > There is no way to do an atomic 64-bit load on most of the archs we > > support. So trying to make it explicit wouldn't help. > > Ah sorry, I probably went too fast. My last paragraph would be for all > atomic operations, so in particular 32 bit. A macro "a_load" would > make intentions clearer and would perhaps allow to implement an > optional compile time check to see if we use any object consistently > as atomic or not. The reason I'm mildly against this is that all current reads of atomics, except via the return value of a_cas or a_fetch_add, are relaxed-order. We don't care if we see a stale value; if staleness could be a problem, the caller takes care of that in an efficient way. Having a_load that's relaxed-order whereas all the existing atomics are seq_cst order would be an inconsistent API design. Adding a_load_relaxed or something would be an option, but I'm still not really a fan. Compilers always aim to do volatile ops as a single load/store when possible (this matters for some MMIO-type uses that volatile is intended for; some hardware treats 4 byte writes differently from one word write, and of course the read direction matters when reading MMIO back from hardware) so there's no reason to expect the loads to break. I think it was more of an issue before we moved to using volatile to model atomics. Rich