On Thu, Jun 25, 2020 at 05:15:42PM -0400, Rich Felker wrote: > On Thu, Jun 25, 2020 at 04:50:24PM -0400, Rich Felker wrote: > > > > > but it would be nice if we could get the aarch64 > > > > > memcpy patch in (the c implementation is really > > > > > slow and i've seen ppl compare aarch64 vs x86 > > > > > server performance with some benchmark on alpine..) > > > > > > > > OK, I'll look again. > > > > > > thanks. > > > > > > (there are more aarch64 string functions in the > > > optimized-routines github repo but i think they > > > are not as important as memcpy/memmove/memset) > > > > I found the code. Can you commend on performance and whether memset is > > needed? (The C memset should be rather good already, moreso than > > memcpy.) > > Are the assumptions (v8-a, unaligned access) documented in memcpy.S > valid for all presently supportable aarch64? > > A couple comments for merging if we do, that aren't hard requirements > but preferences: > > - I'd like to expand out the macros from ../asmdefs.h since that won't > be available and they just hide things (I guess they're attractive > for Apple/macho users or something but not relevant to musl) and > since the symbol name lines need to be changed anyway to public > name. "Local var name" macros are ok to leave; changing them would > be too error-prone and they make the code more readable anyway. > > - I'd prefer not to have memmove logic in memcpy since it makes it > larger and implies that misuse of memcpy when you mean memmove is > supported usage. I'd be happy with an approach like x86 though, > defining an __memcpy_fwd alias and having memmove tail call to that > unless len>128 and reverse is needed, or just leaving memmove.c. Something like the attached. Rich