Re: Deduplicating atomics written in terms of CAS

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Deduplicating atomics written in terms of CAS
Date: Sun, 17 May 2015 13:59:18 -0400	[thread overview]
Message-ID: <20150517175918.GQ17573@brightrain.aerifal.cx> (raw)
In-Reply-To: <1431881993.4219.1.camel@inria.fr>

On Sun, May 17, 2015 at 06:59:53PM +0200, Jens Gustedt wrote:
> Am Sonntag, den 17.05.2015, 12:28 -0400 schrieb Rich Felker:
> > On Sun, May 17, 2015 at 09:37:19AM +0200, Jens Gustedt wrote:
> > > Am Sonntag, den 17.05.2015, 02:14 -0400 schrieb Rich Felker:
> > > > - a_and_64/a_or_64 (malloc only; these are misnamed too)
> > > 
> > > I should have checked the use before my last mail. They are
> > > definitively misnamed.
> > > 
> > > Both uses of them look ok concerning atomicity, only one of the a_and
> > > or a_or calls triggers.
> > > 
> > > The only object (mal.binmap) to which this is applied is in fact
> > > volatile, so it must actually be reloaded all the time it is used.
> > > 
> > > But in line 352 the code uses another assumption, then, that 64 bit
> > > loads always are atomic. I don't see why this should hold in general.
> > 
> > I don't think there's such an assumption. The only assumption is that
> > each bit is read exactly the number of times it would be on the
> > abstract machine, so that we can't observe inconsistent values for the
> > same object. Lack of any heavy synchronization around reading the mask
> > may result in failure to see some changes or seeing them out of order,
> > but it doesn't matter: If a bin is wrongly seen as non-empty, locking
> > and attempting to unbin from it will fail. If it is wrongly seen as
> > empty, the worst that can happen is a less-optimal (but would have
> > been optimal an instant earlier) larger chunk gets split instead of
> > using a smaller one to satisfy the allocation.
> 
> So to summarize what you are saying that in this special context, an
> out-of-sync load of one of the sub-words of a 64 bit word, would only
> impact on performance and not correctness. Nice.

Right. Technically it may also impact fragmentation-optimality, but
only in a way that would also be affected by timing/scheduling
differences, so I don't think it makes sense to insist on an
optimality condition there.

> A maybe stupid question, then: why do atomics at all, here? You could
> perhaps remove all that 64 bit pseudo atomic stuff then.

We absolutely would not want two concurrent attempts to set a bit to
clobber each others result, so that the other result would _never_ be
seen. This would result in free memory being lost -- the bin would
appear to be empty until something was added to it again. That's the
motivation for the atomics.

> > Of course it's an open question whether the complex atomics and
> > fine-grained locking in malloc help or hurt performance more on
> > average. I'd really like to measure this at some point. Overhauling
> > malloc to try to get significantly better multi-threaded performance
> > without the fragmentation-optimality sacrifices other mallocs make is
> > a long-term goal I have open.
> > 
> > > We already have a similar assumption for 32 bit int all over the
> > > place, and I am not too happy with such "silent" assumption. For 64
> > > bit, this assumption looks wrong to me.
> > 
> > I agree I wouldn't be happy with such an assumption, but I don't think
> > it's being made here.
> > 
> > > I would be much happier by using explicit atomic types and atomic load
> > > functions or macros everywhere. For normal builds these could be dummy
> > > types made to resolve to the actual code that we have, now. But this
> > > would allow to have hardening builds, that check for consistency of
> > > all atomic accesses.
> > 
> > There is no way to do an atomic 64-bit load on most of the archs we
> > support. So trying to make it explicit wouldn't help.
> 
> Ah sorry, I probably went too fast. My last paragraph would be for all
> atomic operations, so in particular 32 bit. A macro "a_load" would
> make intentions clearer and would perhaps allow to implement an
> optional compile time check to see if we use any object consistently
> as atomic or not.

The reason I'm mildly against this is that all current reads of
atomics, except via the return value of a_cas or a_fetch_add, are
relaxed-order. We don't care if we see a stale value; if staleness
could be a problem, the caller takes care of that in an efficient way.
Having a_load that's relaxed-order whereas all the existing atomics
are seq_cst order would be an inconsistent API design.

Adding a_load_relaxed or something would be an option, but I'm still
not really a fan. Compilers always aim to do volatile ops as a single
load/store when possible (this matters for some MMIO-type uses that
volatile is intended for; some hardware treats 4 byte writes
differently from one word write, and of course the read direction
matters when reading MMIO back from hardware) so there's no reason to
expect the loads to break. I think it was more of an issue before we
moved to using volatile to model atomics.

Rich

next prev parent reply	other threads:[~2015-05-17 17:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-17  4:55 Rich Felker
2015-05-17  6:00 ` Alexander Monakov
2015-05-17  6:14   ` Rich Felker
2015-05-17  7:37     ` Jens Gustedt
2015-05-17 16:28       ` Rich Felker
2015-05-17 16:59         ` Jens Gustedt
2015-05-17 17:59           ` Rich Felker [this message]
2015-05-17 22:23             ` Jens Gustedt
2015-05-17 22:33               ` Rich Felker
2015-05-17 23:22                 ` Jens Gustedt
2015-05-18 10:19               ` Szabolcs Nagy
2015-05-18 11:03                 ` Jens Gustedt
2015-05-17  6:49 ` Jens Gustedt
2015-05-17 16:22   ` Rich Felker
2015-05-17 17:19     ` Jens Gustedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150517175918.GQ17573@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).