mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Szabolcs Nagy <nsz@port70.net>
To: musl@lists.openwall.com
Subject: string word-at-a-time and atomic.h FAQ on twitter
Date: Tue, 5 Jan 2016 17:46:41 +0100	[thread overview]
Message-ID: <20160105164640.GL23362@port70.net> (raw)

https://twitter.com/johnregehr/status/684126374966198281

these are old faq items but since we dont have docs about internals
i try to address them based on my understanding:

1) musl strlen oob access:

the os manages memory with page granularity, this is an internal
detail that the user should not rely on in c, but the libc can,
(otherwise malloc/free and the dynamic linker could not be
implemented in c) so oob accesses in word-at-a-time string
algorithms do not cause segfault on the os level.

in theory the compiler is part of the implementation so it can
treat libc code specially, but in practice libc code is normal
freestanding c code. this means that the compiler can treat oob
access arbitrarily following the abstract c semantics if it can
see through the implementation with lto.  however i find lto a
weak excuse to rewrite strlen in asm for all targets since lto
of libc is still not practical and asm implementations
historically had a lot of target specific bugs in other libcs.
i think compiler attributes should be used here on compilers that
might break the code, but there is no attribute for this kind of
oob access yet (although may_alias attribute is missing here too
and should be added like in other string functions).

this takes care of oob access, but the bytes outside the passed
object might change concurrently i.e. strlen might introduce a
data race: again this is a problem on the abstract c language
level that may be solved e.g. by making all accesses to those
bytes relaxed atomic, but user code is not under libc control.
in practice the code works if HASZERO reads the word once so it
does arithmetics with a consistent value (because the memory
model of the underlying machine does not treat such race
undefined and it does not propagate unspecified value bits nor
has trap representations).

we do not try to enforce these behaviours on the c level yet
(only a very narrow set of string functions are affected which
are also very performance critical), but fortunately those who
are worried that the code is not correct can always generate asm
and compile that into the libc. (and then one can verify that
indeed the generated code is completely correct on the asm level.
maybe musl will add generated asm to the repo, but there are other
pending cleanup works related to asm vs c level semantics and
these should be considered together.)


2) musl atomic.h sync primitives

the primitives in atomic.h are carefully designed for musl's
pthread implementation (which seems to me far ahead of other
implementations in terms of correctness and portability).

however they are not documented in the code (only in the git
log) so ppl assume they understand their precise interface
contract by guessing (which is usually wrong because the names
are misleading).

musl does not use 64-bit atomic primitives, a_and_64 and a_or_64
have secific uses in the malloc implementation which determine
their semantics.


3) a_crash

formally a_crash can be anything (only called if user invoked
ub or underlying system broke interface contract).

in practice a_crash should be __builtin_trap (i.e. the most
lightweight way of terminating the process and this matters for
security which is of course not c level semantics), but builtin
usage is mininmized in musl which makes it possible to compile
it with several c compilers with consistent behaviour (e.g. gcc
does not guarantee consistent behaviour for __builtin_trap
across targets and falls back to abort if a target does not
have appropriate target hook defined), keeping the interface
between the compiler and libc minimal is a key design choice
in musl.

at some point this should be cleaned up and all targets should
have proper single instruction crash, but that's low priority
cleanup work so on some targets this is not yet done (there are
other pending atomic.h cleanup works).



             reply	other threads:[~2016-01-05 16:46 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-05 16:46 Szabolcs Nagy [this message]
2016-01-05 17:50 ` Rich Felker
2016-01-05 23:39   ` Matthew Fernandez
2016-01-06  2:56     ` Szabolcs Nagy
2016-01-08 21:59   ` Alexander Cherepanov
2016-01-08 22:05     ` Rich Felker
2016-01-08 22:39       ` Alexander Cherepanov
2016-01-08 22:59         ` Rich Felker
2016-01-09  1:40           ` Szabolcs Nagy
2016-01-12 12:41           ` Alexander Cherepanov
2016-01-12 21:02 ` Alexander Cherepanov
2016-01-12 21:09   ` Alexander Cherepanov
2016-01-12 23:07     ` Szabolcs Nagy
2016-01-13 17:30       ` Szabolcs Nagy
2016-01-14 12:49         ` Szabolcs Nagy
2016-01-14 22:51         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160105164640.GL23362@port70.net \
    --to=nsz@port70.net \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).