minimal edits to upstream version for easier updates and because this code was benchmarked across many cores. gcc generates slow code for the current c implementations. the integer memcpy was chosen instead of the simd one, this performs better on little cores, i think this is the more conservative choice for now. note: there are upcoming security architectures which may mean updates to these functions (BTI - landing pads, PAUTH - return address signing, MTE - 16byte tag granule may affect optimized strcmp etc, not relevant yet), but runtime support for these will need other libc changes.