Am Sonntag, den 17.05.2015, 12:28 -0400 schrieb Rich Felker:
> On Sun, May 17, 2015 at 09:37:19AM +0200, Jens Gustedt wrote:
> > Am Sonntag, den 17.05.2015, 02:14 -0400 schrieb Rich Felker:
> > > - a_and_64/a_or_64 (malloc only; these are misnamed too)
> > 
> > I should have checked the use before my last mail. They are
> > definitively misnamed.
> > 
> > Both uses of them look ok concerning atomicity, only one of the a_and
> > or a_or calls triggers.
> > 
> > The only object (mal.binmap) to which this is applied is in fact
> > volatile, so it must actually be reloaded all the time it is used.
> > 
> > But in line 352 the code uses another assumption, then, that 64 bit
> > loads always are atomic. I don't see why this should hold in general.
> 
> I don't think there's such an assumption. The only assumption is that
> each bit is read exactly the number of times it would be on the
> abstract machine, so that we can't observe inconsistent values for the
> same object. Lack of any heavy synchronization around reading the mask
> may result in failure to see some changes or seeing them out of order,
> but it doesn't matter: If a bin is wrongly seen as non-empty, locking
> and attempting to unbin from it will fail. If it is wrongly seen as
> empty, the worst that can happen is a less-optimal (but would have
> been optimal an instant earlier) larger chunk gets split instead of
> using a smaller one to satisfy the allocation.

So to summarize what you are saying that in this special context, an
out-of-sync load of one of the sub-words of a 64 bit word, would only
impact on performance and not correctness. Nice.

A maybe stupid question, then: why do atomics at all, here? You could
perhaps remove all that 64 bit pseudo atomic stuff then.

> Of course it's an open question whether the complex atomics and
> fine-grained locking in malloc help or hurt performance more on
> average. I'd really like to measure this at some point. Overhauling
> malloc to try to get significantly better multi-threaded performance
> without the fragmentation-optimality sacrifices other mallocs make is
> a long-term goal I have open.
> 
> > We already have a similar assumption for 32 bit int all over the
> > place, and I am not too happy with such "silent" assumption. For 64
> > bit, this assumption looks wrong to me.
> 
> I agree I wouldn't be happy with such an assumption, but I don't think
> it's being made here.
> 
> > I would be much happier by using explicit atomic types and atomic load
> > functions or macros everywhere. For normal builds these could be dummy
> > types made to resolve to the actual code that we have, now. But this
> > would allow to have hardening builds, that check for consistency of
> > all atomic accesses.
> 
> There is no way to do an atomic 64-bit load on most of the archs we
> support. So trying to make it explicit wouldn't help.

Ah sorry, I probably went too fast. My last paragraph would be for all
atomic operations, so in particular 32 bit. A macro "a_load" would
make intentions clearer and would perhaps allow to implement an
optional compile time check to see if we use any object consistently
as atomic or not.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::