From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 11361 invoked from network); 18 May 2023 14:20:27 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 18 May 2023 14:20:27 -0000 Received: (qmail 28214 invoked by uid 550); 18 May 2023 14:20:25 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 28179 invoked from network); 18 May 2023 14:20:24 -0000 Date: Thu, 18 May 2023 10:20:12 -0400 From: Rich Felker To: Jeffrey Walton Cc: musl@lists.openwall.com, 847567161 <847567161@qq.com> Message-ID: <20230518142011.GR4163@brightrain.aerifal.cx> References: <20230518122306.GU3630668@port70.net> <20230518132905.GP4163@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: Re: [musl] =?utf-8?B?UXVlc3Rpb27vvJpX?= =?utf-8?Q?hy?= musl call a_barrier in __pthread_once? On Thu, May 18, 2023 at 10:15:36AM -0400, Jeffrey Walton wrote: > On Thu, May 18, 2023 at 9:29 AM Rich Felker wrote: > > > > On Thu, May 18, 2023 at 02:23:06PM +0200, Szabolcs Nagy wrote: > > > * 847567161 <847567161@qq.com> [2023-05-18 10:49:44 +0800]: > > > > > There is an alternate algorithm for pthread_once that doesn't require > > > > > a barrier in the common case, which I've considered implementing.. But > > > > > it does need efficient access to thread-local storage. At one time, > > > > > this was a kinda bad assumption (especially legacy mips is horribly > > > > > slow at TLS) but nowadays it's probably the right choice to make, and > > > > > we should check that out again... > > > > > > > > 1、Can we move dmb after we get the value of control? like this: > > > > > > > > int __pthread_once(pthread_once_t *control, void (*init)(void)) > > > > { > > > > /* Return immediately if init finished before, but ensure that > > > > * effects of the init routine are visible to the caller. */ > > > > if (*(volatile int *)control == 2) { > > > > // a_barrier(); > > > > return 0; > > > > } > > > > > > writes in init may not be visible when *control==2, without > > > the barrier. (there are many explanations on the web why > > > double-checked locking is wrong without an acquire barrier, > > > that's the same issue if you are interested in the details) > > > > > > > 2、Can we use 'ldar' to instead of dmb here? I see musl > > > > already use 'stlxr' in a_sc. like this: > > > > > > > > static inline int load(volatile int *p) > > > > { > > > > int v; > > > > __asm__ __volatile__ ("ldar %w0,%1" : "=r"(v) : "Q"(*p)); > > > > return v; > > > > } > > > > > > > > if (load((volatile int *)control) == 2) { > > > > return 0; > > > > } > > > > > > i think acquire ordering is enough because posix does not > > > require pthread_once to synchronize memory, but musl does > > > not have an acquire barrier/load, so it uses a_barrier. > > > > POSIX does require this. It's specified where Memory Synchronization > > is defined, > > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12 > > > > "The pthread_once() function shall synchronize memory for the > > first call in each thread for a given pthread_once_t object." > > > > > it is probably not worth optimizing the memory order since > > > we know there is an algorithm that does not need a barrier > > > in the common case. > > > > Arguably the above might make the barrier-free algorithm invalid for > > pthread_once, but I'm not sure if the lack of "synchronize memory" > > property in this case would be observable. It probably is with an > > intentional construct trying to observe it. There may be some way to > > salvage this with a second thread-local counter to account for > > gratuitous extra synchronization needed. > > > > Of course call_once is exempt from any such requirements (also exempt > > from cancellation shenanigans) and is probably the optimal thing for > > programs to use. If needed we can make call_once have a different, > > more optimal implementation than pthread_once. > > Be careful of call_once. > > Several years ago I cut over to C++11's call_once. The problem was, it > only worked reliably on 32-bit and 64-bit Intel platforms. It was a > disaster on Aarch64, PowerPC and Sparc. I had to back it out. That is about the C++ std::call_once, implemented by GNU libstdc++, and is unrelated to the C11 call_once, which libc implements. > The problems happened back when GCC 6 and 7 were popular. The problem > was due to something sideways in glibc. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146 > > If you want a call_once-like initialization then rely on N2660: > Dynamic Initialization and Destruction with Concurrency. That's the general algorithm we've been talking about (though without bad properties like gratuitously inlining it to lock-in implementation details as ABI). Rich