* string word-at-a-time and atomic.h FAQ on twitter @ 2016-01-05 16:46 Szabolcs Nagy 2016-01-05 17:50 ` Rich Felker 2016-01-12 21:02 ` Alexander Cherepanov 0 siblings, 2 replies; 16+ messages in thread From: Szabolcs Nagy @ 2016-01-05 16:46 UTC (permalink / raw) To: musl https://twitter.com/johnregehr/status/684126374966198281 these are old faq items but since we dont have docs about internals i try to address them based on my understanding: 1) musl strlen oob access: the os manages memory with page granularity, this is an internal detail that the user should not rely on in c, but the libc can, (otherwise malloc/free and the dynamic linker could not be implemented in c) so oob accesses in word-at-a-time string algorithms do not cause segfault on the os level. in theory the compiler is part of the implementation so it can treat libc code specially, but in practice libc code is normal freestanding c code. this means that the compiler can treat oob access arbitrarily following the abstract c semantics if it can see through the implementation with lto. however i find lto a weak excuse to rewrite strlen in asm for all targets since lto of libc is still not practical and asm implementations historically had a lot of target specific bugs in other libcs. i think compiler attributes should be used here on compilers that might break the code, but there is no attribute for this kind of oob access yet (although may_alias attribute is missing here too and should be added like in other string functions). this takes care of oob access, but the bytes outside the passed object might change concurrently i.e. strlen might introduce a data race: again this is a problem on the abstract c language level that may be solved e.g. by making all accesses to those bytes relaxed atomic, but user code is not under libc control. in practice the code works if HASZERO reads the word once so it does arithmetics with a consistent value (because the memory model of the underlying machine does not treat such race undefined and it does not propagate unspecified value bits nor has trap representations). we do not try to enforce these behaviours on the c level yet (only a very narrow set of string functions are affected which are also very performance critical), but fortunately those who are worried that the code is not correct can always generate asm and compile that into the libc. (and then one can verify that indeed the generated code is completely correct on the asm level. maybe musl will add generated asm to the repo, but there are other pending cleanup works related to asm vs c level semantics and these should be considered together.) 2) musl atomic.h sync primitives the primitives in atomic.h are carefully designed for musl's pthread implementation (which seems to me far ahead of other implementations in terms of correctness and portability). however they are not documented in the code (only in the git log) so ppl assume they understand their precise interface contract by guessing (which is usually wrong because the names are misleading). musl does not use 64-bit atomic primitives, a_and_64 and a_or_64 have secific uses in the malloc implementation which determine their semantics. 3) a_crash formally a_crash can be anything (only called if user invoked ub or underlying system broke interface contract). in practice a_crash should be __builtin_trap (i.e. the most lightweight way of terminating the process and this matters for security which is of course not c level semantics), but builtin usage is mininmized in musl which makes it possible to compile it with several c compilers with consistent behaviour (e.g. gcc does not guarantee consistent behaviour for __builtin_trap across targets and falls back to abort if a target does not have appropriate target hook defined), keeping the interface between the compiler and libc minimal is a key design choice in musl. at some point this should be cleaned up and all targets should have proper single instruction crash, but that's low priority cleanup work so on some targets this is not yet done (there are other pending atomic.h cleanup works). ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-05 16:46 string word-at-a-time and atomic.h FAQ on twitter Szabolcs Nagy @ 2016-01-05 17:50 ` Rich Felker 2016-01-05 23:39 ` Matthew Fernandez 2016-01-08 21:59 ` Alexander Cherepanov 2016-01-12 21:02 ` Alexander Cherepanov 1 sibling, 2 replies; 16+ messages in thread From: Rich Felker @ 2016-01-05 17:50 UTC (permalink / raw) To: musl On Tue, Jan 05, 2016 at 05:46:41PM +0100, Szabolcs Nagy wrote: > https://twitter.com/johnregehr/status/684126374966198281 > > these are old faq items but since we dont have docs about internals > i try to address them based on my understanding: An "internals FAQ" would be a welcome addition to the wiki. I think your text is mostly ok, but I'm including some thoughts below on improving/fixing these issues and the possible directions we might take. > 1) musl strlen oob access: > > the os manages memory with page granularity, this is an internal > detail that the user should not rely on in c, but the libc can, > (otherwise malloc/free and the dynamic linker could not be > implemented in c) so oob accesses in word-at-a-time string > algorithms do not cause segfault on the os level. > > in theory the compiler is part of the implementation so it can > treat libc code specially, but in practice libc code is normal > freestanding c code. this means that the compiler can treat oob > access arbitrarily following the abstract c semantics if it can > see through the implementation with lto. however i find lto a > weak excuse to rewrite strlen in asm for all targets since lto > of libc is still not practical and asm implementations > historically had a lot of target specific bugs in other libcs. > i think compiler attributes should be used here on compilers that > might break the code, but there is no attribute for this kind of > oob access yet (although may_alias attribute is missing here too > and should be added like in other string functions). I agree that we should use compiler attributes and fall back to an utterly naive implementation if __GNUC__ (or other indication that such attributes are supported) is not defined. Clearly we should be using may_alias for the word-at-a-time access, but as you've said there doesn't seem to be any attribute to safely access past the end of the object. In the particular case of strlen, the naive unrolled strlen with no OOB access is actually optimal on most or all 32-bit archs, better than what we have now. I suspect the same is true for strchr and other related functions. So we could just consider trying to drop the OOB accesses. Do we have a list of affected functions? That might be nice to include. > this takes care of oob access, but the bytes outside the passed > object might change concurrently i.e. strlen might introduce a > data race: again this is a problem on the abstract c language > level that may be solved e.g. by making all accesses to those > bytes relaxed atomic, but user code is not under libc control. > in practice the code works if HASZERO reads the word once so it > does arithmetics with a consistent value (because the memory > model of the underlying machine does not treat such race > undefined and it does not propagate unspecified value bits nor > has trap representations). Indeed, this seems like less of a practical concern. I think we could make it a non-concern via volatile accesses (the practical concern with data races like this is transformations that convert one read to multiple reads and expect to see the same value each time) but that would preclude most optimization by the compiler. > we do not try to enforce these behaviours on the c level yet > (only a very narrow set of string functions are affected which > are also very performance critical), but fortunately those who > are worried that the code is not correct can always generate asm > and compile that into the libc. (and then one can verify that > indeed the generated code is completely correct on the asm level. > maybe musl will add generated asm to the repo, but there are other > pending cleanup works related to asm vs c level semantics and > these should be considered together.) I would rather not add generated asm; not only is it ugly and potentially full of directives we may not want (like cfi and stuff that may not work on some as versions or alternate assemblers) but also it can't be optimized for the specific target (e.g. modern x86 with sse instead of base i386, armv7 instead of armv4t, etc.), and it can't be inlined or otherwise optimized with LTO. > 2) musl atomic.h sync primitives > > the primitives in atomic.h are carefully designed for musl's > pthread implementation (which seems to me far ahead of other > implementations in terms of correctness and portability). > > however they are not documented in the code (only in the git > log) so ppl assume they understand their precise interface > contract by guessing (which is usually wrong because the names > are misleading). > > musl does not use 64-bit atomic primitives, a_and_64 and a_or_64 > have secific uses in the malloc implementation which determine > their semantics. I think we could replace them with atomic bitset/bitclear. They're only used in a couple places, mainly malloc. > 3) a_crash > > formally a_crash can be anything (only called if user invoked > ub or underlying system broke interface contract). > > in practice a_crash should be __builtin_trap (i.e. the most > lightweight way of terminating the process and this matters for > security which is of course not c level semantics), but builtin > usage is mininmized in musl which makes it possible to compile > it with several c compilers with consistent behaviour (e.g. gcc > does not guarantee consistent behaviour for __builtin_trap > across targets and falls back to abort if a target does not > have appropriate target hook defined), keeping the interface > between the compiler and libc minimal is a key design choice > in musl. > > at some point this should be cleaned up and all targets should > have proper single instruction crash, but that's low priority > cleanup work so on some targets this is not yet done (there are > other pending atomic.h cleanup works). Indeed. Actually I'd kind of prefer improving a_crash to block signals (via inline syscall) before issuing the SIGSEGV/SIGILL to ensure that the kernel terminates the process rather than allowing a handler to run. I forget -- do you have an account on the wiki to add these things? If not we should get you one, but in the mean time someone else could add this content. Rich ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-05 17:50 ` Rich Felker @ 2016-01-05 23:39 ` Matthew Fernandez 2016-01-06 2:56 ` Szabolcs Nagy 2016-01-08 21:59 ` Alexander Cherepanov 1 sibling, 1 reply; 16+ messages in thread From: Matthew Fernandez @ 2016-01-05 23:39 UTC (permalink / raw) To: musl On 06/01/16 04:50, Rich Felker wrote: > On Tue, Jan 05, 2016 at 05:46:41PM +0100, Szabolcs Nagy wrote: >> ... however i find lto a >> weak excuse to rewrite strlen in asm for all targets since lto >> of libc is still not practical ... This is an interesting comment; would you mind expanding on this a little? Though we don't do it regularly, we have used LTO on Musl C in the past. I am not aware of any Musl-specific issues we encountered, though it's entirely possible our build contained latent bugs. >> 2) musl atomic.h sync primitives >> >> the primitives in atomic.h are carefully designed for musl's >> pthread implementation (which seems to me far ahead of other >> implementations in terms of correctness and portability). >> >> however they are not documented in the code (only in the git >> log) so ppl assume they understand their precise interface >> contract by guessing (which is usually wrong because the names >> are misleading). >> >> musl does not use 64-bit atomic primitives, a_and_64 and a_or_64 >> have secific uses in the malloc implementation which determine >> their semantics. > > I think we could replace them with atomic bitset/bitclear. They're > only used in a couple places, mainly malloc. This would certainly make us happier. We've been bitten a couple of times by newcomers (and myself once) assuming these were suitable 64-bit atomic primitives based on the naming. ________________________________ The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-05 23:39 ` Matthew Fernandez @ 2016-01-06 2:56 ` Szabolcs Nagy 0 siblings, 0 replies; 16+ messages in thread From: Szabolcs Nagy @ 2016-01-06 2:56 UTC (permalink / raw) To: musl * Matthew Fernandez <matthew.fernandez@nicta.com.au> [2016-01-06 10:39:34 +1100]: > On 06/01/16 04:50, Rich Felker wrote: > >On Tue, Jan 05, 2016 at 05:46:41PM +0100, Szabolcs Nagy wrote: > >>... however i find lto a > >>weak excuse to rewrite strlen in asm for all targets since lto > >>of libc is still not practical ... > > This is an interesting comment; would you mind expanding on this a little? Though we don't do it > regularly, we have used LTO on Musl C in the past. I am not aware of any Musl-specific issues we > encountered, though it's entirely possible our build contained latent bugs. two things: 1) does not give the expected improvements (smaller, faster) on code that's factored well. 2) wrong code genertion (because of lto implementation bugs, or because of invalid c code) i think we saw issues around file scope inline asm, we speculated about issues around fenv access because the compilers don't model that correctly and early dynamic linker code. it is easy to imagine other nasty cases especially if gcc extensions are in use: they are often not carefully specified enough to have robust semantics with lto (e.g. attribute const for tls was an example where the semantics is unclear but lto makes it easier to break code if the user and compiler disagree) in short: i dont see the benefits given the risks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-05 17:50 ` Rich Felker 2016-01-05 23:39 ` Matthew Fernandez @ 2016-01-08 21:59 ` Alexander Cherepanov 2016-01-08 22:05 ` Rich Felker 1 sibling, 1 reply; 16+ messages in thread From: Alexander Cherepanov @ 2016-01-08 21:59 UTC (permalink / raw) To: musl On 2016-01-05 20:50, Rich Felker wrote: > So we could just consider trying to drop the OOB > accesses. Do we have a list of affected functions? That might be nice > to include. I think it would be nice to have a full list of intentional UB. For example, this: http://git.musl-libc.org/cgit/musl/tree/src/stdio/vsnprintf.c#n33 if (n > (char *)0+SIZE_MAX-s-1) n = (char *)0+SIZE_MAX-s-1; If I understand the code correctly, fixing it will require changes to the FILE structure. Are there such plans? >> this takes care of oob access, but the bytes outside the passed >> object might change concurrently i.e. strlen might introduce a >> data race: again this is a problem on the abstract c language >> level that may be solved e.g. by making all accesses to those >> bytes relaxed atomic, but user code is not under libc control. >> in practice the code works if HASZERO reads the word once so it >> does arithmetics with a consistent value (because the memory >> model of the underlying machine does not treat such race >> undefined and it does not propagate unspecified value bits nor >> has trap representations). > > Indeed, this seems like less of a practical concern. HASZERO reads the word twice so this should be a problem for unoptimized code on big-endian platforms. -- Alexander Cherepanov ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-08 21:59 ` Alexander Cherepanov @ 2016-01-08 22:05 ` Rich Felker 2016-01-08 22:39 ` Alexander Cherepanov 0 siblings, 1 reply; 16+ messages in thread From: Rich Felker @ 2016-01-08 22:05 UTC (permalink / raw) To: musl On Sat, Jan 09, 2016 at 12:59:55AM +0300, Alexander Cherepanov wrote: > On 2016-01-05 20:50, Rich Felker wrote: > >So we could just consider trying to drop the OOB > >accesses. Do we have a list of affected functions? That might be nice > >to include. > > I think it would be nice to have a full list of intentional UB. For > example, this: > > http://git.musl-libc.org/cgit/musl/tree/src/stdio/vsnprintf.c#n33 > > if (n > (char *)0+SIZE_MAX-s-1) n = (char *)0+SIZE_MAX-s-1; > > If I understand the code correctly, fixing it will require changes > to the FILE structure. Are there such plans? Fixing it requires not just changes to the structure, but abandoning compatibility with buffer reads and writes via the glibc ABI (used by glibc getc_unlocked and putc_unlocked macros). This is not as bad as it sounds; if we just abandoned use of the glibc-offset-defined fields and made them always null pointers, then legacy glibc code would always see no buffer available and make a function call instead. I'm not clear on whether this might break code using the gnulib junk that pokes at glibc FILE internals, though, or whether such code works now, nor am I clear whether we even care. > >>this takes care of oob access, but the bytes outside the passed > >>object might change concurrently i.e. strlen might introduce a > >>data race: again this is a problem on the abstract c language > >>level that may be solved e.g. by making all accesses to those > >>bytes relaxed atomic, but user code is not under libc control. > >>in practice the code works if HASZERO reads the word once so it > >>does arithmetics with a consistent value (because the memory > >>model of the underlying machine does not treat such race > >>undefined and it does not propagate unspecified value bits nor > >>has trap representations). > > > >Indeed, this seems like less of a practical concern. > > HASZERO reads the word twice so this should be a problem for > unoptimized code on big-endian platforms. The number of abstract-machine reads is irrelevant unless we use volatile here. A good compiler will always reduce it to one read, and a bad compiler is always free to turn it into multiple reads. Rich ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-08 22:05 ` Rich Felker @ 2016-01-08 22:39 ` Alexander Cherepanov 2016-01-08 22:59 ` Rich Felker 0 siblings, 1 reply; 16+ messages in thread From: Alexander Cherepanov @ 2016-01-08 22:39 UTC (permalink / raw) To: musl On 2016-01-09 01:05, Rich Felker wrote: > On Sat, Jan 09, 2016 at 12:59:55AM +0300, Alexander Cherepanov wrote: >> On 2016-01-05 20:50, Rich Felker wrote: >>> So we could just consider trying to drop the OOB >>> accesses. Do we have a list of affected functions? That might be nice >>> to include. >> >> I think it would be nice to have a full list of intentional UB. For >> example, this: >> >> http://git.musl-libc.org/cgit/musl/tree/src/stdio/vsnprintf.c#n33 >> >> if (n > (char *)0+SIZE_MAX-s-1) n = (char *)0+SIZE_MAX-s-1; >> >> If I understand the code correctly, fixing it will require changes >> to the FILE structure. Are there such plans? > > Fixing it requires not just changes to the structure, but abandoning > compatibility with buffer reads and writes via the glibc ABI (used by > glibc getc_unlocked and putc_unlocked macros). This is not as bad as > it sounds; if we just abandoned use of the glibc-offset-defined fields > and made them always null pointers, then legacy glibc code would > always see no buffer available and make a function call instead. I'm > not clear on whether this might break code using the gnulib junk that > pokes at glibc FILE internals, though, or whether such code works now, > nor am I clear whether we even care. Ugh. Documenting UB in it in the mean time seems like non-controversial thing:-) >>>> this takes care of oob access, but the bytes outside the passed >>>> object might change concurrently i.e. strlen might introduce a >>>> data race: again this is a problem on the abstract c language >>>> level that may be solved e.g. by making all accesses to those >>>> bytes relaxed atomic, but user code is not under libc control. >>>> in practice the code works if HASZERO reads the word once so it >>>> does arithmetics with a consistent value (because the memory >>>> model of the underlying machine does not treat such race >>>> undefined and it does not propagate unspecified value bits nor >>>> has trap representations). >>> >>> Indeed, this seems like less of a practical concern. >> >> HASZERO reads the word twice so this should be a problem for >> unoptimized code on big-endian platforms. > > The number of abstract-machine reads is irrelevant unless we use > volatile here. A good compiler will always reduce it to one read, and > a bad compiler is always free to turn it into multiple reads. Ok, I'll reformulate: is compiling musl on a big-endian platform with optimizations turned off officially supported? -- Alexander Cherepanov ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-08 22:39 ` Alexander Cherepanov @ 2016-01-08 22:59 ` Rich Felker 2016-01-09 1:40 ` Szabolcs Nagy 2016-01-12 12:41 ` Alexander Cherepanov 0 siblings, 2 replies; 16+ messages in thread From: Rich Felker @ 2016-01-08 22:59 UTC (permalink / raw) To: musl On Sat, Jan 09, 2016 at 01:39:10AM +0300, Alexander Cherepanov wrote: > >>>>this takes care of oob access, but the bytes outside the passed > >>>>object might change concurrently i.e. strlen might introduce a > >>>>data race: again this is a problem on the abstract c language > >>>>level that may be solved e.g. by making all accesses to those > >>>>bytes relaxed atomic, but user code is not under libc control. > >>>>in practice the code works if HASZERO reads the word once so it > >>>>does arithmetics with a consistent value (because the memory > >>>>model of the underlying machine does not treat such race > >>>>undefined and it does not propagate unspecified value bits nor > >>>>has trap representations). > >>> > >>>Indeed, this seems like less of a practical concern. > >> > >>HASZERO reads the word twice so this should be a problem for > >>unoptimized code on big-endian platforms. > > > >The number of abstract-machine reads is irrelevant unless we use > >volatile here. A good compiler will always reduce it to one read, and > >a bad compiler is always free to turn it into multiple reads. > > Ok, I'll reformulate: is compiling musl on a big-endian platform > with optimizations turned off officially supported? Yes, and I don't see why you expect this case to break due to data race issues. Rich ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-08 22:59 ` Rich Felker @ 2016-01-09 1:40 ` Szabolcs Nagy 2016-01-12 12:41 ` Alexander Cherepanov 1 sibling, 0 replies; 16+ messages in thread From: Szabolcs Nagy @ 2016-01-09 1:40 UTC (permalink / raw) To: musl * Rich Felker <dalias@libc.org> [2016-01-08 17:59:51 -0500]: > On Sat, Jan 09, 2016 at 01:39:10AM +0300, Alexander Cherepanov wrote: > > >>>>this takes care of oob access, but the bytes outside the passed > > >>>>object might change concurrently i.e. strlen might introduce a > > >>>>data race: again this is a problem on the abstract c language > > >>>>level that may be solved e.g. by making all accesses to those > > >>>>bytes relaxed atomic, but user code is not under libc control. > > >>>>in practice the code works if HASZERO reads the word once so it > > >>>>does arithmetics with a consistent value (because the memory > > >>>>model of the underlying machine does not treat such race > > >>>>undefined and it does not propagate unspecified value bits nor > > >>>>has trap representations). > > >>> > > >>>Indeed, this seems like less of a practical concern. > > >> > > >>HASZERO reads the word twice so this should be a problem for > > >>unoptimized code on big-endian platforms. > > > > > >The number of abstract-machine reads is irrelevant unless we use > > >volatile here. A good compiler will always reduce it to one read, and > > >a bad compiler is always free to turn it into multiple reads. > > > > Ok, I'll reformulate: is compiling musl on a big-endian platform > > with optimizations turned off officially supported? > > Yes, and I don't see why you expect this case to break due to data > race issues. i didnt think it through, just assumed changing bits would ruin the computation. but it seems if the zero byte is unchanged then other bits can arbitrarily change between the two evaluations of x in #define HASZERO(x) ((x)-ONES & ~(x) & HIGHS) and the predicate is still true ..unless some silly code transformation is applied like x -> x+x-x ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-08 22:59 ` Rich Felker 2016-01-09 1:40 ` Szabolcs Nagy @ 2016-01-12 12:41 ` Alexander Cherepanov 1 sibling, 0 replies; 16+ messages in thread From: Alexander Cherepanov @ 2016-01-12 12:41 UTC (permalink / raw) To: musl On 2016-01-09 01:59, Rich Felker wrote: > On Sat, Jan 09, 2016 at 01:39:10AM +0300, Alexander Cherepanov wrote: >>>>>> this takes care of oob access, but the bytes outside the passed >>>>>> object might change concurrently i.e. strlen might introduce a >>>>>> data race: again this is a problem on the abstract c language >>>>>> level that may be solved e.g. by making all accesses to those >>>>>> bytes relaxed atomic, but user code is not under libc control. >>>>>> in practice the code works if HASZERO reads the word once so it >>>>>> does arithmetics with a consistent value (because the memory >>>>>> model of the underlying machine does not treat such race >>>>>> undefined and it does not propagate unspecified value bits nor >>>>>> has trap representations). >>>>> >>>>> Indeed, this seems like less of a practical concern. >>>> >>>> HASZERO reads the word twice so this should be a problem for >>>> unoptimized code on big-endian platforms. >>> >>> The number of abstract-machine reads is irrelevant unless we use >>> volatile here. A good compiler will always reduce it to one read, and >>> a bad compiler is always free to turn it into multiple reads. >> >> Ok, I'll reformulate: is compiling musl on a big-endian platform >> with optimizations turned off officially supported? > > Yes, and I don't see why you expect this case to break due to data > race issues. Right, I was too fast. I checked that outside bytes affect computations on inside bytes (on BE platforms) but the effect is not strong enough to break the code. Sorry for the noise. Perhaps a comment is warranted in this and other hairy cases (like qsort)? As for a list of affected functions, the first approximation: $ grep -rF HASZERO | grep -v '#define' src/string/stpcpy.c: for (; !HASZERO(*ws); *wd++ = *ws++); src/string/memchr.c: for (w = (const void *)s; n>=SS && !HASZERO(*w^k); w++, n-=SS); src/string/strlen.c: for (w = (const void *)s; !HASZERO(*w); w++); src/string/strchrnul.c: for (w = (void *)s; !HASZERO(*w) && !HASZERO(*w^k); w++); src/string/memccpy.c: for (; n>=sizeof(size_t) && !HASZERO(*ws^k); src/string/strlcpy.c: for (; n>=sizeof(size_t) && !HASZERO(*ws); src/string/stpncpy.c: for (; n>=sizeof(size_t) && !HASZERO(*ws); HASZERO is not guarded by a size check in strpcpy, strlen and strchrnul. -- Alexander Cherepanov ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-05 16:46 string word-at-a-time and atomic.h FAQ on twitter Szabolcs Nagy 2016-01-05 17:50 ` Rich Felker @ 2016-01-12 21:02 ` Alexander Cherepanov 2016-01-12 21:09 ` Alexander Cherepanov 1 sibling, 1 reply; 16+ messages in thread From: Alexander Cherepanov @ 2016-01-12 21:02 UTC (permalink / raw) To: musl On 2016-01-05 19:46, Szabolcs Nagy wrote: > i think compiler attributes should be used here on compilers that > might break the code, but there is no attribute for this kind of > oob access yet (although may_alias attribute is missing here too > and should be added like in other string functions). Perhaps the noclone function attribute could be used in the meantime? https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bnoclone_007d-function-attribute-3205 -- Alexander Cherepanov ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-12 21:02 ` Alexander Cherepanov @ 2016-01-12 21:09 ` Alexander Cherepanov 2016-01-12 23:07 ` Szabolcs Nagy 0 siblings, 1 reply; 16+ messages in thread From: Alexander Cherepanov @ 2016-01-12 21:09 UTC (permalink / raw) To: musl On 2016-01-13 00:02, Alexander Cherepanov wrote: > On 2016-01-05 19:46, Szabolcs Nagy wrote: >> i think compiler attributes should be used here on compilers that >> might break the code, but there is no attribute for this kind of >> oob access yet (although may_alias attribute is missing here too >> and should be added like in other string functions). > > Perhaps the noclone function attribute could be used in the meantime? > > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bnoclone_007d-function-attribute-3205 Probably together with the noinline attribute... Another attribute which looks relevant is no_sanitize_address. https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bno_005fsanitize_005faddress_007d-function-attribute-3199 -- Alexander Cherepanov ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-12 21:09 ` Alexander Cherepanov @ 2016-01-12 23:07 ` Szabolcs Nagy 2016-01-13 17:30 ` Szabolcs Nagy 0 siblings, 1 reply; 16+ messages in thread From: Szabolcs Nagy @ 2016-01-12 23:07 UTC (permalink / raw) To: musl * Alexander Cherepanov <ch3root@openwall.com> [2016-01-13 00:09:56 +0300]: > On 2016-01-13 00:02, Alexander Cherepanov wrote: > >On 2016-01-05 19:46, Szabolcs Nagy wrote: > >>i think compiler attributes should be used here on compilers that > >>might break the code, but there is no attribute for this kind of > >>oob access yet (although may_alias attribute is missing here too > >>and should be added like in other string functions). > > > >Perhaps the noclone function attribute could be used in the meantime? > > > >https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bnoclone_007d-function-attribute-3205 > > Probably together with the noinline attribute... > > Another attribute which looks relevant is no_sanitize_address. > > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bno_005fsanitize_005faddress_007d-function-attribute-3199 > i think a no-lto attr should be used, maybe noinline can achieve that. no-instrumentation attrs are ugly because new instrumentations may appear in new compilers and then your code becomes decorated. (libc does not work with asan instrumentation anyway without serious changes and asan runtime implementation inside the libc) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-12 23:07 ` Szabolcs Nagy @ 2016-01-13 17:30 ` Szabolcs Nagy 2016-01-14 12:49 ` Szabolcs Nagy 2016-01-14 22:51 ` Rich Felker 0 siblings, 2 replies; 16+ messages in thread From: Szabolcs Nagy @ 2016-01-13 17:30 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 2528 bytes --] * Szabolcs Nagy <nsz@port70.net> [2016-01-13 00:07:39 +0100]: > * Alexander Cherepanov <ch3root@openwall.com> [2016-01-13 00:09:56 +0300]: > > On 2016-01-13 00:02, Alexander Cherepanov wrote: > > >On 2016-01-05 19:46, Szabolcs Nagy wrote: > > >>i think compiler attributes should be used here on compilers that > > >>might break the code, but there is no attribute for this kind of > > >>oob access yet (although may_alias attribute is missing here too > > >>and should be added like in other string functions). > > > > > >Perhaps the noclone function attribute could be used in the meantime? > > > > > >https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bnoclone_007d-function-attribute-3205 > > > > Probably together with the noinline attribute... > > > > Another attribute which looks relevant is no_sanitize_address. > > > > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bno_005fsanitize_005faddress_007d-function-attribute-3199 > > > > i think a no-lto attr should be used, maybe noinline > can achieve that. > i tried to do it with -fno-lto but it seems gcc-6 miscompiles musl with -flto anyway: lto incorrectly dead code eliminates _dlstart_c. (the libc entry point, _dlstart, is defined in toplevel inline asm in ldso/dlstart.c and it jumps to _dlstart_c) lto breaks symbol binding for environ, _environ, ___environ. (they should be weak, without that environ in a main binary has different address than in libc.so) libc.so built with -flto: $ readelf --dyn-syms -W libc.so |grep envi 22: 000000000028eb90 8 OBJECT GLOBAL DEFAULT 15 __environ 398: 000000000028eb90 8 OBJECT GLOBAL PROTECTED 15 ___environ 1034: 000000000028eb90 8 OBJECT GLOBAL PROTECTED 15 _environ 1107: 000000000028eb90 8 OBJECT GLOBAL DEFAULT 15 environ libc.so without -flto: $ readelf --dyn-syms -W libc.so |grep envi 22: 000000000028d2d8 8 OBJECT GLOBAL DEFAULT 15 __environ 398: 000000000028d2d8 8 OBJECT WEAK PROTECTED 15 ___environ 1034: 000000000028d2d8 8 OBJECT WEAK PROTECTED 15 _environ 1107: 000000000028d2d8 8 OBJECT WEAK DEFAULT 15 environ so i tried to -fno-lto to crt/*, dlstart.c and __environ.c and then libc seemed to build correctly, but during tests gcc lto1 ICE crashed. (i havent reported the bugs yet) given these issues i'm not convinced that lto build of libc is a good idea, but i attached a patch how the string issues might be worked around. [-- Attachment #2: lto.diff --] [-- Type: text/x-diff, Size: 1779 bytes --] diff --git a/Makefile b/Makefile index df20f94..3586697 100644 --- a/Makefile +++ b/Makefile @@ -113,6 +113,15 @@ NOSSP_SRCS = $(wildcard crt/*.c) \ src/ldso/dlstart.c src/ldso/dynlink.c $(NOSSP_SRCS:%.c=%.o) $(NOSSP_SRCS:%.c=%.lo): CFLAGS_ALL += $(CFLAGS_NOSSP) +# TODO: update the list when aliasing violations are fixed +NOLTO_SRCS = $(wildcard crt/*.c) \ + src/ldso/dlstart.c \ + src/string/stpcpy.c src/string/strlen.c \ + src/string/strchrnul.c src/string/memchr.c \ + src/string/memccpy.c str/string/strlcpy.c \ + src/string/strncpy.c +$(NOLTO_SRCS:%.c=%.o) $(NOLTO_SRCS:%.c=%.lo): CFLAGS_ALL += $(CFLAGS_NOLTO) + $(CRT_LIBS:lib/%=crt/%): CFLAGS_ALL += -DCRT # This incantation ensures that changes to any subarch asm files will diff --git a/configure b/configure index ee21771..4672ebb 100755 --- a/configure +++ b/configure @@ -113,6 +113,7 @@ CFLAGS_C99FSE= CFLAGS_AUTO= CFLAGS_MEMOPS= CFLAGS_NOSSP= +CFLAGS_NOLTO= CFLAGS_TRY= LDFLAGS_AUTO= LDFLAGS_TRY= @@ -344,6 +345,13 @@ tryflag CFLAGS_C99FSE -Wa,--noexecstack tryflag CFLAGS_NOSSP -fno-stack-protector # +# Check for options to disable LTO, which is needed for executable +# entry points and functions with aliasing violations. If not found, +# this is not an error; we assume the toolchain does not do LTO. +# +tryflag CFLAGS_NOLTO -fno-lto + +# # Check for options that may be needed to prevent the compiler from # generating self-referential versions of memcpy,, memmove, memcmp, # and memset. Really, we should add a check to determine if this @@ -660,6 +668,7 @@ CFLAGS_AUTO = $CFLAGS_AUTO CFLAGS_C99FSE = $CFLAGS_C99FSE CFLAGS_MEMOPS = $CFLAGS_MEMOPS CFLAGS_NOSSP = $CFLAGS_NOSSP +CFLAGS_NOLTO = $CFLAGS_NOLTO CPPFLAGS = $CPPFLAGS LDFLAGS = $LDFLAGS LDFLAGS_AUTO = $LDFLAGS_AUTO ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-13 17:30 ` Szabolcs Nagy @ 2016-01-14 12:49 ` Szabolcs Nagy 2016-01-14 22:51 ` Rich Felker 1 sibling, 0 replies; 16+ messages in thread From: Szabolcs Nagy @ 2016-01-14 12:49 UTC (permalink / raw) To: musl * Szabolcs Nagy <nsz@port70.net> [2016-01-13 18:30:50 +0100]: > > lto breaks symbol binding for environ, _environ, ___environ. > (they should be weak, without that environ in a main binary > has different address than in libc.so) > all weak symbol aliases are affected on gcc-5 and gcc-trunk, gcc-49 is fine (at least the 4.9.3 i tried). https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69271 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: string word-at-a-time and atomic.h FAQ on twitter 2016-01-13 17:30 ` Szabolcs Nagy 2016-01-14 12:49 ` Szabolcs Nagy @ 2016-01-14 22:51 ` Rich Felker 1 sibling, 0 replies; 16+ messages in thread From: Rich Felker @ 2016-01-14 22:51 UTC (permalink / raw) To: musl On Wed, Jan 13, 2016 at 06:30:50PM +0100, Szabolcs Nagy wrote: > i tried to do it with -fno-lto > > but it seems gcc-6 miscompiles musl with -flto anyway: > > lto incorrectly dead code eliminates _dlstart_c. > (the libc entry point, _dlstart, is defined in toplevel > inline asm in ldso/dlstart.c and it jumps to _dlstart_c) I think the files that receive "crt" treatment (-DCRT?) should be hard-coded as no-LTO. Even if it weren't for this issue, there are also potential issues with optimizations that might move code across relocations and initialization. At the very least there should be a full asm barrier between such init code and the rest of the program. > lto breaks symbol binding for environ, _environ, ___environ. > (they should be weak, without that environ in a main binary > has different address than in libc.so) > > libc.so built with -flto: > $ readelf --dyn-syms -W libc.so |grep envi > 22: 000000000028eb90 8 OBJECT GLOBAL DEFAULT 15 __environ > 398: 000000000028eb90 8 OBJECT GLOBAL PROTECTED 15 ___environ > 1034: 000000000028eb90 8 OBJECT GLOBAL PROTECTED 15 _environ > 1107: 000000000028eb90 8 OBJECT GLOBAL DEFAULT 15 environ > > libc.so without -flto: > $ readelf --dyn-syms -W libc.so |grep envi > 22: 000000000028d2d8 8 OBJECT GLOBAL DEFAULT 15 __environ > 398: 000000000028d2d8 8 OBJECT WEAK PROTECTED 15 ___environ > 1034: 000000000028d2d8 8 OBJECT WEAK PROTECTED 15 _environ > 1107: 000000000028d2d8 8 OBJECT WEAK DEFAULT 15 environ This is purely a gcc bug and indicates LTO is not really ready for use with libc, which means I don't see much sense in trying to support it yet. Of course that doesn't mean I'm disinterested in fixing actual C language usage errors (aliasing, etc.) that might break with LTO. Rather I'm disinterested in LTO-specific hacks for working around bugs in GCC's LTO. OTOH your patch is nice from the standpoint of being able to test LTO and find and report the bugs. Rich ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-01-14 22:51 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-05 16:46 string word-at-a-time and atomic.h FAQ on twitter Szabolcs Nagy 2016-01-05 17:50 ` Rich Felker 2016-01-05 23:39 ` Matthew Fernandez 2016-01-06 2:56 ` Szabolcs Nagy 2016-01-08 21:59 ` Alexander Cherepanov 2016-01-08 22:05 ` Rich Felker 2016-01-08 22:39 ` Alexander Cherepanov 2016-01-08 22:59 ` Rich Felker 2016-01-09 1:40 ` Szabolcs Nagy 2016-01-12 12:41 ` Alexander Cherepanov 2016-01-12 21:02 ` Alexander Cherepanov 2016-01-12 21:09 ` Alexander Cherepanov 2016-01-12 23:07 ` Szabolcs Nagy 2016-01-13 17:30 ` Szabolcs Nagy 2016-01-14 12:49 ` Szabolcs Nagy 2016-01-14 22:51 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).