mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] RE: Maybe A Bug about timer_create and pthread_barrier_wait
@ 2024-07-10 19:11 Tavian Barnes
  2024-07-11 14:56 ` 回复: " Li JinCheng
  0 siblings, 1 reply; 2+ messages in thread
From: Tavian Barnes @ 2024-07-10 19:11 UTC (permalink / raw)
  To: musl; +Cc: 250200715

(I cleaned up the HTML entities in this email, but please use plain text
mode next time)

> Hello:
>
> I had a low-probability crash in the child thread when using the
> timer_create interface. After debug, I found that the crash occured
> when the sub-thread accessed in code "if (b->_b_waiters)" which is a
> stack variable created in the main thread and passed to child thread
> by args. It looks like the main thread's timer_create has finished
> executing at this point, so the variables (start_args) on the stack
> have been cleaned up. I take a look at the pthread_barrier_wait code
> and I think it should be a scheduling problem in pthread_barrier_wait.
>
> Take the timer_create as an example, when the child thread is the
> first thread for "pthread_barrier_wait" and it is suspened after it
> executes the code "a_store(&b->_b_lock, 0)", then the main thread in
> timer_create will arrive as the last thread, it will nerver wait for
> the child thread to be rescheduled, the main thread can pass the
> barrier and continue execution, the args created in timer_create will
> be cleaned up. when the child thread is finally rescheduled, it access
> the "b->_b_waiters" which has already been cleaned up by main thread
> and the crash will occur.
>
> Is there a bug here? Looking forward to your reply.
>
>     /* First thread to enter the barrier becomes the "instance owner" */
>     if (!inst) {
>         struct instance new_inst = { 0 };
>         int spins = 200;
>         b->_b_inst = inst = &new_inst;
>         a_store(&b->_b_lock, 0);
>         if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);  // crash here b->_b_waiters
>         while (spins-- && !inst->finished)
>
>     /* First thread to enter the barrier becomes the "instance owner" */
>     if (!inst) {
>         struct instance new_inst = { 0 };
>         int spins = 200;
>         b->_b_inst = inst = &new_inst;
>         a_store(&b->_b_lock, 0);
>         // when the child thread is the first thread and is scheduled out here
>
>         if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);
>         while (spins-- && !inst->finished)

This looks like a real bug in timer_create() to me.  Here's an untested
fix:

From: Tavian Barnes <tavianator@tavianator.com>
Date: Wed, 10 Jul 2024 14:27:21 -0400
Subject: [PATCH] timer_create: destroy the barrier before returning

pthread_barrier_destroy() waits for all threads to be done using the
barrier before destroying it.  Without it, the storage for the barrier
can be deallocated when timer_create() returns while the other thread is
still using it inside the pthread_barrier_wait() implementation.

Link: https://www.openwall.com/lists/musl/2024/07/08/1
---
 src/time/timer_create.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/time/timer_create.c b/src/time/timer_create.c
index 9216b3ab..42c69848 100644
--- a/src/time/timer_create.c
+++ b/src/time/timer_create.c
@@ -121,6 +121,7 @@ int timer_create(clockid_t clk, struct sigevent *restrict evp, timer_t *restrict
 		}
 		td->timer_id = timerid;
 		pthread_barrier_wait(&args.b);
+		pthread_barrier_destroy(&args.b);
 		if (timerid < 0) return -1;
 		*res = (void *)(INTPTR_MIN | (uintptr_t)td>>1);
 		break;

^ permalink raw reply	[flat|nested] 2+ messages in thread

* 回复: [musl] RE: Maybe A Bug about timer_create and pthread_barrier_wait
  2024-07-10 19:11 [musl] RE: Maybe A Bug about timer_create and pthread_barrier_wait Tavian Barnes
@ 2024-07-11 14:56 ` Li JinCheng
  0 siblings, 0 replies; 2+ messages in thread
From: Li JinCheng @ 2024-07-11 14:56 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 4719 bytes --]

Hi
>pthread_barrier_destroy() waits for all threads to be done using the
>barrier before destroying it.  Without it, the storage for the barrier
>can be deallocated when timer_create() returns while the other thread is
>still using it inside the pthread_barrier_wait() implementation.

Sorry, I have changed to use outlook to send emails. Back to the original topic, I don't think add pthread_barrier_destroy in timer_create can solve this bug. In timer_create, b->_b_limit should be 1, and it will never meet the waiting condition (b->_b_limit < 0) in pthread_barrier_destroy.
And I think add additional synchronization to timer_create  is a temporary method. Maybe some bugfix is needed in pthread_barrier_wait.

int pthread_barrier_destroy(pthread_barrier_t *b)
{
    if (b->_b_limit < 0) {
        if (b->_b_lock) {
            int v;
            a_or(&b->_b_lock, INT_MIN);
            while ((v = b->_b_lock) & INT_MAX)
                __wait(&b->_b_lock, 0, v, 0);
        }
        __vm_wait();
    }
    return 0;
}


Li
________________________________
发件人: Tavian Barnes <tavianator@tavianator.com>
发送时间: 2024年7月11日 3:11
收件人: musl@lists.openwall.com <musl@lists.openwall.com>
抄送: 250200715@qq.com <250200715@qq.com>
主题: [musl] RE: Maybe A Bug about timer_create and pthread_barrier_wait

(I cleaned up the HTML entities in this email, but please use plain text
mode next time)

> Hello:
>
> I had a low-probability crash in the child thread when using the
> timer_create interface. After debug, I found that the crash occured
> when the sub-thread accessed in code "if (b->_b_waiters)" which is a
> stack variable created in the main thread and passed to child thread
> by args. It looks like the main thread's timer_create has finished
> executing at this point, so the variables (start_args) on the stack
> have been cleaned up. I take a look at the pthread_barrier_wait code
> and I think it should be a scheduling problem in pthread_barrier_wait.
>
> Take the timer_create as an example, when the child thread is the
> first thread for "pthread_barrier_wait" and it is suspened after it
> executes the code "a_store(&b->_b_lock, 0)", then the main thread in
> timer_create will arrive as the last thread, it will nerver wait for
> the child thread to be rescheduled, the main thread can pass the
> barrier and continue execution, the args created in timer_create will
> be cleaned up. when the child thread is finally rescheduled, it access
> the "b->_b_waiters" which has already been cleaned up by main thread
> and the crash will occur.
>
> Is there a bug here? Looking forward to your reply.
>
>     /* First thread to enter the barrier becomes the "instance owner" */
>     if (!inst) {
>         struct instance new_inst = { 0 };
>         int spins = 200;
>         b->_b_inst = inst = &new_inst;
>         a_store(&b->_b_lock, 0);
>         if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);  // crash here b->_b_waiters
>         while (spins-- && !inst->finished)
>
>     /* First thread to enter the barrier becomes the "instance owner" */
>     if (!inst) {
>         struct instance new_inst = { 0 };
>         int spins = 200;
>         b->_b_inst = inst = &new_inst;
>         a_store(&b->_b_lock, 0);
>         // when the child thread is the first thread and is scheduled out here
>
>         if (b->_b_waiters) __wake(&b->_b_lock, 1, 1);
>         while (spins-- && !inst->finished)

This looks like a real bug in timer_create() to me.  Here's an untested
fix:

From: Tavian Barnes <tavianator@tavianator.com>
Date: Wed, 10 Jul 2024 14:27:21 -0400
Subject: [PATCH] timer_create: destroy the barrier before returning

pthread_barrier_destroy() waits for all threads to be done using the
barrier before destroying it.  Without it, the storage for the barrier
can be deallocated when timer_create() returns while the other thread is
still using it inside the pthread_barrier_wait() implementation.

Link: https://www.openwall.com/lists/musl/2024/07/08/1
---
 src/time/timer_create.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/time/timer_create.c b/src/time/timer_create.c
index 9216b3ab..42c69848 100644
--- a/src/time/timer_create.c
+++ b/src/time/timer_create.c
@@ -121,6 +121,7 @@ int timer_create(clockid_t clk, struct sigevent *restrict evp, timer_t *restrict
                 }
                 td->timer_id = timerid;
                 pthread_barrier_wait(&args.b);
+               pthread_barrier_destroy(&args.b);
                 if (timerid < 0) return -1;
                 *res = (void *)(INTPTR_MIN | (uintptr_t)td>>1);
                 break;

[-- Attachment #2: Type: text/html, Size: 10930 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-07-11 14:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-10 19:11 [musl] RE: Maybe A Bug about timer_create and pthread_barrier_wait Tavian Barnes
2024-07-11 14:56 ` 回复: " Li JinCheng

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).