From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3794 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Solving the recursive memcpy/memset/etc. issue Date: Thu, 1 Aug 2013 02:20:08 -0400 Message-ID: <20130801062007.GI221@brightrain.aerifal.cx> References: <20130801004940.GA20323@brightrain.aerifal.cx> <51F9FA8D.2000403@gentoo.org> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1375338020 12777 80.91.229.3 (1 Aug 2013 06:20:20 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 1 Aug 2013 06:20:20 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3798-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 01 08:20:21 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V4mFJ-00018B-2G for gllmg-musl@plane.gmane.org; Thu, 01 Aug 2013 08:20:21 +0200 Original-Received: (qmail 12026 invoked by uid 550); 1 Aug 2013 06:20:20 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 12015 invoked from network); 1 Aug 2013 06:20:20 -0000 Content-Disposition: inline In-Reply-To: <51F9FA8D.2000403@gentoo.org> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3794 Archived-At: On Thu, Aug 01, 2013 at 08:05:01AM +0200, Luca Barbato wrote: > > The only fully viable option I see is replacing the code for these > > functions with code that uses volatile objects so as to make > > optimization utterly impossible. This will of course make them > > incredibly slow, but at least we would have safe, working C code, and > > we could add asm for each supported arch. > > Not exactly great. Well, we really need to add the arch asm anyway, as ugly as it is. Right now most archs have memcpy running 2-5x slower than it should. I could _try_ writing C to handle the unaligned (hard) cases well, basically mimicing what the proposed asm for arm does, but I don't think it will be competitive, just "not as slow". And we'd still have to worry about it getting miscompiled... > > An alternative might be to test the compiler in configure to determine > > if, with the selected CFLAGS, it generates recursive code for these > > functions, and if so, defining a macro that causes musl to revert to > > the volatile code. > > Sounds much better. Well, it would be an ugly heuristic like running cc -S -o - on src/string/memcpy.c, with -Dmemcpy=noname or something, and grepping the output for memcpy... > > Other ideas? For now, if -fno-tree-loop-distribute-patterns fixes it > > (still waiting on confirmation for this) I'm going to commit that to > > configure, but it doesn't seem like a viable long-term solution. > > I'd rather check and error out reporting the compiler is broken. Then > have an explicit configure option to try to workaround it. If it were just a temporary regression, I would agree, but I think the GCC position is that this is not a bug... > > My ideal outcome would be a promise from the GCC developers that, in > > future GCC versions, -ffreestanding implies disabling any options > > which would generate calls to the mem* functions. However that sounds > > unlikely. > > They have competition, if clang works better then we could just suggest > to use it and nowadays gcc has no deployment advantage to it anymore. I figured someone would say that, and almost put a preemptive note in my post. clang/LLVM was the first to have this sort of bug of ignoring -ffreestanding, only much worse, making invalid assumptions about the result value of malloc inside the malloc implementation... Competition is unfortunately the source of our woes, not the solution. GCC and clang/LLVM are facing competition to be the best at compiling application code, and since compiling the implementation itself is an unusual, unexciting usage case, nobody's really watching out for how they break that one in the race to have the fastest application code... Rich