From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 17575 invoked from network); 25 Jun 2020 20:50:40 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 25 Jun 2020 20:50:40 -0000 Received: (qmail 27978 invoked by uid 550); 25 Jun 2020 20:50:37 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 27954 invoked from network); 25 Jun 2020 20:50:36 -0000 Date: Thu, 25 Jun 2020 16:50:24 -0400 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20200625205024.GR6430@brightrain.aerifal.cx> References: <20200624204243.GL6430@brightrain.aerifal.cx> <20200625081504.GE2048759@port70.net> <20200625153936.GP6430@brightrain.aerifal.cx> <20200625173125.GF2048759@port70.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200625173125.GF2048759@port70.net> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Release prep for 1.2.1, and afterwards On Thu, Jun 25, 2020 at 07:31:25PM +0200, Szabolcs Nagy wrote: > * Rich Felker [2020-06-25 11:39:36 -0400]: > > > On Thu, Jun 25, 2020 at 10:15:04AM +0200, Szabolcs Nagy wrote: > > > * Rich Felker [2020-06-24 16:42:44 -0400]: > > > > > > > I'm about to do last work of merging mallocng, followed soon by > > > > release. Is there anything in the way of overlooked bug reports or > > > > patches that should still be addressed in this release cycle? > > > > > > > > Things I'm aware of: > > > > > > > > - "Proposal to match behaviour of gethostbyname to glibc". Latest > > > > patch is probably ok, but could be deferred to after release. > > > > > > > > - nsz's new sqrt{,f,l}. I'm hesitant to do all three right away > > > > without time to test, but replacing sqrtl.c could be appropriate > > > > since the current one is badly broken on archs with ld wider than > > > > double. However it would need to accept ld80 in order not to be > > > > build-breaking on m68k, or m68k would need an alternative. > > > > > > that's still under work > > > > Won't it work just to make it decode/encode the ldshape, and otherwise > > use exactly the same code? Or are there double-rounding issues if the > > quad code is used with ld80? > > i think the same code may work for ld80 too, > but i'm still testing the single/double/quad > code, it's not ready for inclusion. OK. I had in mind possibly adding just sqrtl.c since it can't really be worse than what we have now. But I'm ok with waiting too. One alternative to getting it working for ld80 right away would be just adding an asm version of sqrtl for m68k. However we have users who've indicated an interest in disabling asm optimizations (see thread "build: allow forcing generic implementations of library functions") so in the long term I think we should aim for all generic math functions to work on all ld formats and FLT_EVAL_METHOD rather than just assuming they get replaced on i386/x86_64 and m68k. > > > but it would be nice if we could get the aarch64 > > > memcpy patch in (the c implementation is really > > > slow and i've seen ppl compare aarch64 vs x86 > > > server performance with some benchmark on alpine..) > > > > OK, I'll look again. > > thanks. > > (there are more aarch64 string functions in the > optimized-routines github repo but i think they > are not as important as memcpy/memmove/memset) I found the code. Can you commend on performance and whether memset is needed? (The C memset should be rather good already, moreso than memcpy.) As noted in the past I'd like to get rid of having high level flow logic in the arch asm and instead have the arch provide string asm fragments, if desired, to copy blocks, which could then be used in a shared C skeleton. However as you noted this has been a point of practical performance problem for a long time and I don't think it's fair to just keep putting it off for a better solution. Rich