Word-sized reads access memory past the bound of objects

mailing list of musl libc
 help / color / mirror / code / Atom feed

* Word-sized reads access memory past the bound of objects
@ 2013-04-30 15:11 Jonas Wagner
  2013-04-30 15:40 ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Jonas Wagner @ 2013-04-30 15:11 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

Hi,

I'm currently experimenting with MUSL and automated bug finding tools. One
issue I'm facing is that the tool reports several errors in functions such
as strlen, that perform word-size accesses. What happens is that strlen
reads a word at a time, then checks whether there is a zero in there. If
the zero happens to be in the first byte, it thus reads three bytes past
the end of the string.

In principle, the tool is correct and MUSL does cause undefined behavior
here. In practice, I don't see a way how MUSL's behavior could cause any
damage...

My questions are:
- How prevalent is such code in MUSL?
- Would there be an easy way to find all these places and change them?
- Are there other types of "soft" undefined behavior that MUSL exploits?

I guess doing changing MUSL would lose a lot of performance... so maybe
I'll adapt the bug finding tool instead...

Best,
Jonas

[-- Attachment #2: Type: text/html, Size: 1189 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Word-sized reads access memory past the bound of objects
  2013-04-30 15:11 Word-sized reads access memory past the bound of objects Jonas Wagner
@ 2013-04-30 15:40 ` Rich Felker
  2013-04-30 16:22   ` Jonas Wagner
  2013-04-30 16:34   ` Szabolcs Nagy
  0 siblings, 2 replies; 4+ messages in thread
From: Rich Felker @ 2013-04-30 15:40 UTC (permalink / raw)
  To: musl

On Tue, Apr 30, 2013 at 05:11:14PM +0200, Jonas Wagner wrote:
> Hi,
> 
> I'm currently experimenting with MUSL and automated bug finding tools. One
> issue I'm facing is that the tool reports several errors in functions such
> as strlen, that perform word-size accesses. What happens is that strlen
> reads a word at a time, then checks whether there is a zero in there. If
> the zero happens to be in the first byte, it thus reads three bytes past
> the end of the string.
> 
> In principle, the tool is correct and MUSL does cause undefined behavior

Yes and no. The "underlying freestanding implementation" musl assumes
and is built on has a representation arrays for all of mapped memory
in page-size units with mapping properties/permissions on page
granularity. However, testing and analysis tools might offer a more
restrictive underlying model.

> here. In practice, I don't see a way how MUSL's behavior could cause any
> damage...

Read-only accesses aligned to the size of the access, and where the
initial byte is accessible, can never fault under the assumed memory
model.

> My questions are:
> - How prevalent is such code in MUSL?

Not very. Probably src/string and src/multibyte are the only places.

> - Would there be an easy way to find all these places and change them?

The tool you're using is probably the best way. Or, any static
analysis that can detect conversions (even indirect) from character
pointer types to a pointer to a non-character type.

> - Are there other types of "soft" undefined behavior that MUSL exploits?

I don't think so. The closest things I can think of:

- UTF-8 code depends on sign-extending right-shift. This could be
  easily fixed if it can be verified that the standard trick to work
  around it generates the same (or equally efficient) code. Note this
  is implementation-defined, not undefined.

- Floating point conversion to/from strings depends on IEEE arithmetic
  properties and on long double being an IEEE conforming type. (x87
  ld80 is fine, so is IEEE quad, but IBM double-double will not work,
  and systems that typically use IBM double-double should instead have
  their compiler configured for 64-bit long double instead.)

- calloc assumes its own implementation of malloc. Compilers and
  analysis tools which assume negative offsets from the pointer
  returned by malloc are invalid will falsely detect problems and/or
  miscompile calloc.c. This issue affected old versions of clang.

- The dynamic linker also makes some assumptions about the
  implementation of malloc and passes pointers not obtained by malloc
  to free, as part of its mechanism to reclaim wasted slack space in
  shared libraries due to page alignment.

- POSIX timers with SIGEV_THREAD perform a longjmp out of a
  cancellation handler to intercept cancellation/exit so the same
  physical thread can be kept to handle the next timer expiration. For
  an application to do this would be UB (at the POSIX level, not the C
  level) but since they're both part of the same implementation they
  can assume things about each other.

That's all that comes to mind right now. Thanks for bringing up this
question, because it's something that should be documented in case
people want to reuse parts of musl in contexts where some of the
assumptions may no longer be valid.

> I guess doing changing MUSL would lose a lot of performance... so maybe
> I'll adapt the bug finding tool instead...

Maybe. With a compiler that can do vectorization and a machine with
vector instructions, the "naive" versions of these functions can be
just as fast in practice, and perhaps even faster in theory. The big
problem is that gcc won't vectorize 4 byte accesses into a 32-bit word
in a normal 32-bit register, even though it could... Maybe in the long
term this won't matter if we have asm for the important archs without
vector ops...?

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Word-sized reads access memory past the bound of objects
  2013-04-30 15:40 ` Rich Felker
@ 2013-04-30 16:22   ` Jonas Wagner
  2013-04-30 16:34   ` Szabolcs Nagy
  1 sibling, 0 replies; 4+ messages in thread
From: Jonas Wagner @ 2013-04-30 16:22 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 568 bytes --]

Rich, thank you very much for the answer. It was very quick and helpful.

> - How prevalent is such code in MUSL?
>
> Not very. Probably src/string and src/multibyte are the only places.
>

In this case, I will probably adapt the copy of MUSL I'm using. I plan to
test it with several different program analyzers, and the issue would
probably come up multiple times. As you say, these program analyzers often
assume quite restrictive models.

Thank you also for the list of other places where MUSL has specific
assumptions. I will look into them.

Best regards,
Jonas

[-- Attachment #2: Type: text/html, Size: 948 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Word-sized reads access memory past the bound of objects
  2013-04-30 15:40 ` Rich Felker
  2013-04-30 16:22   ` Jonas Wagner
@ 2013-04-30 16:34   ` Szabolcs Nagy
  1 sibling, 0 replies; 4+ messages in thread
From: Szabolcs Nagy @ 2013-04-30 16:34 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@aerifal.cx> [2013-04-30 11:40:45 -0400]:
> On Tue, Apr 30, 2013 at 05:11:14PM +0200, Jonas Wagner wrote:
> > - Are there other types of "soft" undefined behavior that MUSL exploits?
> 
> I don't think so. The closest things I can think of:
> 
> - UTF-8 code depends on sign-extending right-shift. This could be
>   easily fixed if it can be verified that the standard trick to work
>   around it generates the same (or equally efficient) code. Note this
>   is implementation-defined, not undefined.
> 

there are similar cases in src/math
(implementation defined signed int arithmetics)
although those are not intentional and are planned
to be cleaned up (they came from fdlibm of freebsd)

> - Floating point conversion to/from strings depends on IEEE arithmetic
>   properties and on long double being an IEEE conforming type. (x87
>   ld80 is fine, so is IEEE quad, but IBM double-double will not work,
>   and systems that typically use IBM double-double should instead have
>   their compiler configured for 64-bit long double instead.)
> 
> - calloc assumes its own implementation of malloc. Compilers and
>   analysis tools which assume negative offsets from the pointer
>   returned by malloc are invalid will falsely detect problems and/or
>   miscompile calloc.c. This issue affected old versions of clang.
> 
> - The dynamic linker also makes some assumptions about the
>   implementation of malloc and passes pointers not obtained by malloc
>   to free, as part of its mechanism to reclaim wasted slack space in
>   shared libraries due to page alignment.
> 
> - POSIX timers with SIGEV_THREAD perform a longjmp out of a
>   cancellation handler to intercept cancellation/exit so the same
>   physical thread can be kept to handle the next timer expiration. For
>   an application to do this would be UB (at the POSIX level, not the C
>   level) but since they're both part of the same implementation they
>   can assume things about each other.
> 
> That's all that comes to mind right now. Thanks for bringing up this
> question, because it's something that should be documented in case
> people want to reuse parts of musl in contexts where some of the
> assumptions may no longer be valid.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-04-30 16:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-30 15:11 Word-sized reads access memory past the bound of objects Jonas Wagner
2013-04-30 15:40 ` Rich Felker
2013-04-30 16:22   ` Jonas Wagner
2013-04-30 16:34   ` Szabolcs Nagy

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).