What's left for 1.1.11 release?

mailing list of musl libc
 help / color / mirror / code / Atom feed

* What's left for 1.1.11 release?
@ 2015-07-28  3:40 Rich Felker
  2015-07-28 14:09 ` Jens Gustedt
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2015-07-28  3:40 UTC (permalink / raw)
  To: musl

This release cycle has gotten way behind-schedule and I'd like to wrap
it up in the next few days. The CFI generation patch is the last
actual feature/roadmap item I want to get committed still, but I
believe there may be some important bugs to try to fix first. In
particular:

- Deadlocks in malloc due to a_store lacking acquire barrier on x86.
- Unbounded VSZ growth under free contention.

In principle the a_store issue affects all libc-internal __lock/LOCK
uses, and stdio locks too, but it's only been observed in malloc.
Since there don't seem to be any performance-relevant uses of a_store
that don't actually need the proper barrier, I think we have to just
put an explicit barrier (lock orl $0,(%esp) or mfence) after the store
and live with the loss of performance. Our x86 a_barrier is also
"wrong" for the same reasons as a_store, but I don't think any of its
callers actually want the full strength of a barrier, just some (much
weaker) ordering guarantees. This should be revisited after release to
assess what properties the callers actually want.

The VSZ growth issue is much harder to address before a release. I
would not be comfortable with pushing the changes needed for a proper
fix without a long testing window before a release, and even then I'm
not eagar to write this code. "Big hammer" solutions are of course
possible (e.g. serializing all malloc operations with a big lock) but
undesirable. The best I can probably do is put together an optional
patch which affected users can try until a real fix is available.

I'm also aware of the following open issues with patch discussion
going on, but they're not bugs/regressions affecting existing users,
and I don't see us reaching a resolution within a short timeframe:

- Adding powerpc soft-float.
- ARM asm incompatibility withe clang.

Anything else I'm missing in the way of bug reports of pending patches
that need to be addressed?

Rich

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28  3:40 What's left for 1.1.11 release? Rich Felker
@ 2015-07-28 14:09 ` Jens Gustedt
  2015-07-28 14:18   ` Rich Felker
  2015-07-28 14:33   ` Alexander Monakov
  0 siblings, 2 replies; 11+ messages in thread
From: Jens Gustedt @ 2015-07-28 14:09 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

Hello,

Am Montag, den 27.07.2015, 23:40 -0400 schrieb Rich Felker:
> In principle the a_store issue affects all libc-internal __lock/LOCK
> uses,

so this worries me since I assumed that UNLOCK had release consistency
for the __atomic implementation.

> and stdio locks too, but it's only been observed in malloc.
> Since there don't seem to be any performance-relevant uses of a_store
> that don't actually need the proper barrier, I think we have to just
> put an explicit barrier (lock orl $0,(%esp) or mfence) after the store
> and live with the loss of performance.

How about using a xchg as instruction? This would perhaps "waste" a
register, but that sort of optimization should not be critical in the
vicinity of code that needs memory synchronization, anyhow.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28 14:09 ` Jens Gustedt
@ 2015-07-28 14:18   ` Rich Felker
  2015-07-28 14:50     ` Jens Gustedt
  2015-07-28 14:33   ` Alexander Monakov
  1 sibling, 1 reply; 11+ messages in thread
From: Rich Felker @ 2015-07-28 14:18 UTC (permalink / raw)
  To: musl

On Tue, Jul 28, 2015 at 04:09:38PM +0200, Jens Gustedt wrote:
> Hello,
> 
> Am Montag, den 27.07.2015, 23:40 -0400 schrieb Rich Felker:
> > In principle the a_store issue affects all libc-internal __lock/LOCK
> > uses,
> 
> so this worries me since I assumed that UNLOCK had release consistency
> for the __atomic implementation.

It does. The problem is that it lacks acquire consistency, which we
need in order to know whether to wake.

> > and stdio locks too, but it's only been observed in malloc.
> > Since there don't seem to be any performance-relevant uses of a_store
> > that don't actually need the proper barrier, I think we have to just
> > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store
> > and live with the loss of performance.
> 
> How about using a xchg as instruction? This would perhaps "waste" a
> register, but that sort of optimization should not be critical in the
> vicinity of code that needs memory synchronization, anyhow.

How is this better? My intent was to avoid incurring a read on the
cache line that's being written and instead achieve the
synchronization by poking at a cache line (the stack) that should not
be shared.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28 14:09 ` Jens Gustedt
  2015-07-28 14:18   ` Rich Felker
@ 2015-07-28 14:33   ` Alexander Monakov
  2015-07-28 17:31     ` Rich Felker
  1 sibling, 1 reply; 11+ messages in thread
From: Alexander Monakov @ 2015-07-28 14:33 UTC (permalink / raw)
  To: musl

> > and stdio locks too, but it's only been observed in malloc.
> > Since there don't seem to be any performance-relevant uses of a_store
> > that don't actually need the proper barrier, I think we have to just
> > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store
> > and live with the loss of performance.
> 
> How about using a xchg as instruction? This would perhaps "waste" a
> register, but that sort of optimization should not be critical in the
> vicinity of code that needs memory synchronization, anyhow.

xchg is what compilers use in lieu of mfence, but Rich's preference for 'lock
orl' on the top of the stack stems from the idea that locking on the store
destination is not desired here (you might not even have the corresponding
line in the cache), so it might be better to have the store land in the store
buffers, and do a serializing 'lock orl' on the cache line you have anyhow.

Alexander


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28 14:18   ` Rich Felker
@ 2015-07-28 14:50     ` Jens Gustedt
  2015-07-28 14:58       ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Gustedt @ 2015-07-28 14:50 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2465 bytes --]

Am Dienstag, den 28.07.2015, 10:18 -0400 schrieb Rich Felker:
> On Tue, Jul 28, 2015 at 04:09:38PM +0200, Jens Gustedt wrote:
> > Hello,
> > 
> > Am Montag, den 27.07.2015, 23:40 -0400 schrieb Rich Felker:
> > > In principle the a_store issue affects all libc-internal __lock/LOCK
> > > uses,
> > 
> > so this worries me since I assumed that UNLOCK had release consistency
> > for the __atomic implementation.
> 
> It does. The problem is that it lacks acquire consistency, which we
> need in order to know whether to wake.

ah, I think we are speaking of different things here. I want release
consistency for the lock operation, in the sense to be guaranteed that
all threads that are waiting for the lock will eventually know that it
has been released. So you are telling me, that the current version
doesn't warrant this?

The operation for which you need acquire consistency, is in fact the
load of l[1]. Somehow the current approach is ambiguous to which is
the atomic object. Is it l[0], is it l[1] or is it the pair of them?

> > > and stdio locks too, but it's only been observed in malloc.
> > > Since there don't seem to be any performance-relevant uses of a_store
> > > that don't actually need the proper barrier, I think we have to just
> > > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store
> > > and live with the loss of performance.
> > 
> > How about using a xchg as instruction? This would perhaps "waste" a
> > register, but that sort of optimization should not be critical in the
> > vicinity of code that needs memory synchronization, anyhow.
> 
> How is this better? My intent was to avoid incurring a read on the
> cache line that's being written and instead achieve the
> synchronization by poking at a cache line (the stack) that should not
> be shared.

In fact, I think you need a read on the cache line, here, don't you?
You want to know the real value of l[1], no?

To be safe, I think this needs a full cmpxchg on the pair (l[0],
l[1]), otherwise you can't know if the waiter count l[1] corresponds
to the value just before the release of the lock.


Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28 14:50     ` Jens Gustedt
@ 2015-07-28 14:58       ` Rich Felker
  2015-07-28 15:15         ` Jens Gustedt
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2015-07-28 14:58 UTC (permalink / raw)
  To: musl

On Tue, Jul 28, 2015 at 04:50:33PM +0200, Jens Gustedt wrote:
> Am Dienstag, den 28.07.2015, 10:18 -0400 schrieb Rich Felker:
> > On Tue, Jul 28, 2015 at 04:09:38PM +0200, Jens Gustedt wrote:
> > > Hello,
> > > 
> > > Am Montag, den 27.07.2015, 23:40 -0400 schrieb Rich Felker:
> > > > In principle the a_store issue affects all libc-internal __lock/LOCK
> > > > uses,
> > > 
> > > so this worries me since I assumed that UNLOCK had release consistency
> > > for the __atomic implementation.
> > 
> > It does. The problem is that it lacks acquire consistency, which we
> > need in order to know whether to wake.
> 
> ah, I think we are speaking of different things here. I want release
> consistency for the lock operation, in the sense to be guaranteed that
> all threads that are waiting for the lock will eventually know that it
> has been released. So you are telling me, that the current version
> doesn't warrant this?

This is no problem; you get it for free on x86, and it's properly
achieved with explicit barriers on all other archs.

> The operation for which you need acquire consistency, is in fact the
> load of l[1]. Somehow the current approach is ambiguous to which is
> the atomic object. Is it l[0], is it l[1] or is it the pair of them?

l[0] is the lock word. l[1] is the waiters count and while it's
modified atomically, the read is relaxed-order. Contrary to my
expectations, real-world x86 chips will actually reorder the read of
l[1] before the store to l[0], resulting in a failure-to-wake
deadlock.

> > > > and stdio locks too, but it's only been observed in malloc.
> > > > Since there don't seem to be any performance-relevant uses of a_store
> > > > that don't actually need the proper barrier, I think we have to just
> > > > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store
> > > > and live with the loss of performance.
> > > 
> > > How about using a xchg as instruction? This would perhaps "waste" a
> > > register, but that sort of optimization should not be critical in the
> > > vicinity of code that needs memory synchronization, anyhow.
> > 
> > How is this better? My intent was to avoid incurring a read on the
> > cache line that's being written and instead achieve the
> > synchronization by poking at a cache line (the stack) that should not
> > be shared.
> 
> In fact, I think you need a read on the cache line, here, don't you?
> You want to know the real value of l[1], no?

These specific callers do need l[1], but that's specific to the
callers, not fundamental to a_store. Also in principle l[1] need not
even be in the same cache line (ideally, it wouldn't be, but it's
likely to be anyway) since the alignment of l[] is just 32-bit.

> To be safe, I think this needs a full cmpxchg on the pair (l[0],
> l[1]), otherwise you can't know if the waiter count l[1] corresponds
> to the value just before the release of the lock.

No. As long as a_store behaves as a full barrier (including acquire
behavior) as it was intended to, you cannot read a value of l[1] older
than the value it had at the time of a_store, because there's a
globally consistent order between the a_inc and a_store.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28 14:58       ` Rich Felker
@ 2015-07-28 15:15         ` Jens Gustedt
  2015-07-28 16:07           ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Gustedt @ 2015-07-28 15:15 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 4771 bytes --]

Am Dienstag, den 28.07.2015, 10:58 -0400 schrieb Rich Felker:
> On Tue, Jul 28, 2015 at 04:50:33PM +0200, Jens Gustedt wrote:
> > Am Dienstag, den 28.07.2015, 10:18 -0400 schrieb Rich Felker:
> > > On Tue, Jul 28, 2015 at 04:09:38PM +0200, Jens Gustedt wrote:
> > > > Hello,
> > > > 
> > > > Am Montag, den 27.07.2015, 23:40 -0400 schrieb Rich Felker:
> > > > > In principle the a_store issue affects all libc-internal __lock/LOCK
> > > > > uses,
> > > > 
> > > > so this worries me since I assumed that UNLOCK had release consistency
> > > > for the __atomic implementation.
> > > 
> > > It does. The problem is that it lacks acquire consistency, which we
> > > need in order to know whether to wake.
> > 
> > ah, I think we are speaking of different things here. I want release
> > consistency for the lock operation, in the sense to be guaranteed that
> > all threads that are waiting for the lock will eventually know that it
> > has been released. So you are telling me, that the current version
> > doesn't warrant this?
> 
> This is no problem; you get it for free on x86, and it's properly
> achieved with explicit barriers on all other archs.
> 
> > The operation for which you need acquire consistency, is in fact the
> > load of l[1]. Somehow the current approach is ambiguous to which is
> > the atomic object. Is it l[0], is it l[1] or is it the pair of them?
> 
> l[0] is the lock word. l[1] is the waiters count and while it's
> modified atomically, the read is relaxed-order. Contrary to my
> expectations, real-world x86 chips will actually reorder the read of
> l[1] before the store to l[0], resulting in a failure-to-wake
> deadlock.

ok, I understand the arch issue now.

But then, again, it seems that this failure-to-wake deadlock would be
relevant to my stdatomic implementation.

> > > > > and stdio locks too, but it's only been observed in malloc.
> > > > > Since there don't seem to be any performance-relevant uses of a_store
> > > > > that don't actually need the proper barrier, I think we have to just
> > > > > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store
> > > > > and live with the loss of performance.
> > > > 
> > > > How about using a xchg as instruction? This would perhaps "waste" a
> > > > register, but that sort of optimization should not be critical in the
> > > > vicinity of code that needs memory synchronization, anyhow.
> > > 
> > > How is this better? My intent was to avoid incurring a read on the
> > > cache line that's being written and instead achieve the
> > > synchronization by poking at a cache line (the stack) that should not
> > > be shared.
> > 
> > In fact, I think you need a read on the cache line, here, don't you?
> > You want to know the real value of l[1], no?
> 
> These specific callers do need l[1], but that's specific to the
> callers, not fundamental to a_store. Also in principle l[1] need not
> even be in the same cache line (ideally, it wouldn't be, but it's
> likely to be anyway) since the alignment of l[] is just 32-bit.
> 
> > To be safe, I think this needs a full cmpxchg on the pair (l[0],
> > l[1]), otherwise you can't know if the waiter count l[1] corresponds
> > to the value just before the release of the lock.
> 
> No. As long as a_store behaves as a full barrier (including acquire
> behavior) as it was intended to, you cannot read a value of l[1] older
> than the value it had at the time of a_store, because there's a
> globally consistent order between the a_inc and a_store.

Yes, you are talking of the intended behavior, but which you said
isn't achieved. I was talking of one possible scenario to resolve that
problem.

 - One possibility, that you talked about previously, is to introduce
   an additional fence after the store to l[0] and so the read of l[1]
   would be guaranteed to be no older than that.

 - The other possibility is that you force the two values to be
   exchanged in one atomic operation, since you have to do one full
   read and one full write, anyhow. Here again you'd have several
   possibilities. One would be to ensure that the atomic operation is
   a cmpxchg on the pair.

   Another possibility would be to squeeze the lock bit and the wait
   counter into a single int, and operate on the bit with some
   fetch_and and fetch_or operations. But that would probably be much
   more of a code change.

Jens


-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28 15:15         ` Jens Gustedt
@ 2015-07-28 16:07           ` Rich Felker
  2015-07-28 16:42             ` Jens Gustedt
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2015-07-28 16:07 UTC (permalink / raw)
  To: musl

On Tue, Jul 28, 2015 at 05:15:29PM +0200, Jens Gustedt wrote:
> > > The operation for which you need acquire consistency, is in fact the
> > > load of l[1]. Somehow the current approach is ambiguous to which is
> > > the atomic object. Is it l[0], is it l[1] or is it the pair of them?
> > 
> > l[0] is the lock word. l[1] is the waiters count and while it's
> > modified atomically, the read is relaxed-order. Contrary to my
> > expectations, real-world x86 chips will actually reorder the read of
> > l[1] before the store to l[0], resulting in a failure-to-wake
> > deadlock.
> 
> ok, I understand the arch issue now.
> 
> But then, again, it seems that this failure-to-wake deadlock would be
> relevant to my stdatomic implementation.

Yes, I think it would.

> > > To be safe, I think this needs a full cmpxchg on the pair (l[0],
> > > l[1]), otherwise you can't know if the waiter count l[1] corresponds
> > > to the value just before the release of the lock.
> > 
> > No. As long as a_store behaves as a full barrier (including acquire
> > behavior) as it was intended to, you cannot read a value of l[1] older
> > than the value it had at the time of a_store, because there's a
> > globally consistent order between the a_inc and a_store.
> 
> Yes, you are talking of the intended behavior, but which you said
> isn't achieved. I was talking of one possible scenario to resolve that
> problem.
> 
>  - One possibility, that you talked about previously, is to introduce
>    an additional fence after the store to l[0] and so the read of l[1]
>    would be guaranteed to be no older than that.

Right.

>  - The other possibility is that you force the two values to be
>    exchanged in one atomic operation, since you have to do one full
>    read and one full write, anyhow. Here again you'd have several
>    possibilities. One would be to ensure that the atomic operation is
>    a cmpxchg on the pair.

The vast majority of archs do not have double-word CAS, so I don't
want to depend on it. In any case, I don't see how it would be any
faster than doing a swap/cas type operation on the lock word followed
by a read of the waiter count, which in turn might be slower than the
solution I proposed.

>    Another possibility would be to squeeze the lock bit and the wait
>    counter into a single int, and operate on the bit with some
>    fetch_and and fetch_or operations. But that would probably be much
>    more of a code change.

Yes, this is the "new normal-type mutex" thread which I think you
already saw and commented on. I suspect it's the preferred approach in
the long term, but I don't like using redesigns as bug fixes unless
there's something fundamentally wrong with the old design that makes
it impossible to fix. In this case there is no such issue. The lock
design is perfectly valid; x86 a_store is just buggy.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28 16:07           ` Rich Felker
@ 2015-07-28 16:42             ` Jens Gustedt
  2015-07-28 17:33               ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Gustedt @ 2015-07-28 16:42 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1372 bytes --]

Am Dienstag, den 28.07.2015, 12:07 -0400 schrieb Rich Felker:
> On Tue, Jul 28, 2015 at 05:15:29PM +0200, Jens Gustedt wrote:
> >    Another possibility would be to squeeze the lock bit and the wait
> >    counter into a single int, and operate on the bit with some
> >    fetch_and and fetch_or operations. But that would probably be much
> >    more of a code change.
> 
> Yes, this is the "new normal-type mutex" thread which I think you
> already saw and commented on. I suspect it's the preferred approach in
> the long term, but I don't like using redesigns as bug fixes

agreed

> unless there's something fundamentally wrong with the old design that makes
> it impossible to fix. In this case there is no such issue. The lock
> design is perfectly valid; x86 a_store is just buggy.

Assuming just release consistency from a store operation is a
reasonable thing to do and to have such a relaxed operation in the set
makes sense. So somehow the error is more in the assumptions about the
operation than in the operation itself.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28 14:33   ` Alexander Monakov
@ 2015-07-28 17:31     ` Rich Felker
  0 siblings, 0 replies; 11+ messages in thread
From: Rich Felker @ 2015-07-28 17:31 UTC (permalink / raw)
  To: musl

On Tue, Jul 28, 2015 at 05:33:18PM +0300, Alexander Monakov wrote:
> > > and stdio locks too, but it's only been observed in malloc.
> > > Since there don't seem to be any performance-relevant uses of a_store
> > > that don't actually need the proper barrier, I think we have to just
> > > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store
> > > and live with the loss of performance.
> > 
> > How about using a xchg as instruction? This would perhaps "waste" a
> > register, but that sort of optimization should not be critical in the
> > vicinity of code that needs memory synchronization, anyhow.
> 
> xchg is what compilers use in lieu of mfence, but Rich's preference for 'lock
> orl' on the top of the stack stems from the idea that locking on the store
> destination is not desired here (you might not even have the corresponding
> line in the cache), so it might be better to have the store land in the store
> buffers, and do a serializing 'lock orl' on the cache line you have anyhow.

I did a quick run of my old malloc stress test with both approaches.
The outputs are not sufficiently stable to gather a lot, but on my
machine, there seems to be no loss in performance with the stack
approach and a 1-5% loss from using xchg to do the store. I'd like to
have a better measurement to confirm this, but being that my
measurements so far agree with the theoretical prediction, I think
I'll just go with the stack approach for now.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's left for 1.1.11 release?
  2015-07-28 16:42             ` Jens Gustedt
@ 2015-07-28 17:33               ` Rich Felker
  0 siblings, 0 replies; 11+ messages in thread
From: Rich Felker @ 2015-07-28 17:33 UTC (permalink / raw)
  To: musl

On Tue, Jul 28, 2015 at 06:42:59PM +0200, Jens Gustedt wrote:
> Am Dienstag, den 28.07.2015, 12:07 -0400 schrieb Rich Felker:
> > On Tue, Jul 28, 2015 at 05:15:29PM +0200, Jens Gustedt wrote:
> > >    Another possibility would be to squeeze the lock bit and the wait
> > >    counter into a single int, and operate on the bit with some
> > >    fetch_and and fetch_or operations. But that would probably be much
> > >    more of a code change.
> > 
> > Yes, this is the "new normal-type mutex" thread which I think you
> > already saw and commented on. I suspect it's the preferred approach in
> > the long term, but I don't like using redesigns as bug fixes
> 
> agreed
> 
> > unless there's something fundamentally wrong with the old design that makes
> > it impossible to fix. In this case there is no such issue. The lock
> > design is perfectly valid; x86 a_store is just buggy.
> 
> Assuming just release consistency from a store operation is a
> reasonable thing to do and to have such a relaxed operation in the set
> makes sense. So somehow the error is more in the assumptions about the
> operation than in the operation itself.

Well all musl's a_* operations are intended to be seq_cst order. So
versus that intent, it really is a bug in a_store. Note that other
archs' versions of a_store perform barriers both before and after the
store to achieve seq_cst.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-07-28 17:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-28  3:40 What's left for 1.1.11 release? Rich Felker
2015-07-28 14:09 ` Jens Gustedt
2015-07-28 14:18   ` Rich Felker
2015-07-28 14:50     ` Jens Gustedt
2015-07-28 14:58       ` Rich Felker
2015-07-28 15:15         ` Jens Gustedt
2015-07-28 16:07           ` Rich Felker
2015-07-28 16:42             ` Jens Gustedt
2015-07-28 17:33               ` Rich Felker
2015-07-28 14:33   ` Alexander Monakov
2015-07-28 17:31     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).