From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8230 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: What's left for 1.1.11 release? Date: Tue, 28 Jul 2015 13:31:41 -0400 Message-ID: <20150728173141.GV16376@brightrain.aerifal.cx> References: <20150728034036.GA25643@brightrain.aerifal.cx> <1438092578.19958.4.camel@inria.fr> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1438104716 10696 80.91.229.3 (28 Jul 2015 17:31:56 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 28 Jul 2015 17:31:56 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-8243-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jul 28 19:31:56 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1ZK8jL-0000FK-DG for gllmg-musl@m.gmane.org; Tue, 28 Jul 2015 19:31:55 +0200 Original-Received: (qmail 8177 invoked by uid 550); 28 Jul 2015 17:31:53 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 8153 invoked from network); 28 Jul 2015 17:31:53 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:8230 Archived-At: On Tue, Jul 28, 2015 at 05:33:18PM +0300, Alexander Monakov wrote: > > > and stdio locks too, but it's only been observed in malloc. > > > Since there don't seem to be any performance-relevant uses of a_store > > > that don't actually need the proper barrier, I think we have to just > > > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store > > > and live with the loss of performance. > > > > How about using a xchg as instruction? This would perhaps "waste" a > > register, but that sort of optimization should not be critical in the > > vicinity of code that needs memory synchronization, anyhow. > > xchg is what compilers use in lieu of mfence, but Rich's preference for 'lock > orl' on the top of the stack stems from the idea that locking on the store > destination is not desired here (you might not even have the corresponding > line in the cache), so it might be better to have the store land in the store > buffers, and do a serializing 'lock orl' on the cache line you have anyhow. I did a quick run of my old malloc stress test with both approaches. The outputs are not sufficiently stable to gather a lot, but on my machine, there seems to be no loss in performance with the stack approach and a 1-5% loss from using xchg to do the store. I'd like to have a better measurement to confirm this, but being that my measurements so far agree with the theoretical prediction, I think I'll just go with the stack approach for now. Rich