mailing list of musl libc
 help / color / mirror / code / Atom feed
* [PATCH] slim down and avoid undefined behavior in unsetenv
@ 2016-02-20 18:09 Alexander Monakov
  2016-02-20 18:18 ` Alexander Monakov
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Alexander Monakov @ 2016-02-20 18:09 UTC (permalink / raw)
  To: musl

Implementation of unsetenv used to invoke (a benign) undefined
behavior by using pointer value that was passed to free() in
comparisons against NULL in two loops that would remove that pointer
from __env_map and __environ arrays.

Factor out handling of __env_map into a separate function __env_free
and move it to putenv.c, allowing to make __env_map private to that
file. Do not attempt to preserve order in __env_map when removing the
entry (it is not important).
---
On Mon, 4 Jan 2016, Rich Felker wrote:
> Oh. In that case I guess it's unnecessary to rewind, yes.
> 
> BTW what might be best to do is something like:
> 
> 	char *tmp = __environ[i];
> 	for (j=i ; __environ[j]; j++)
> 		__environ[j] = __environ[j+1];
> 	__env_free(tmp);
> 
> where __env_free has a weak def as a nop and gets its strong def from
> setenv.c or putenv.c. This refactoring would make it possible for
> unsetenv not to pull in free, and the reordering might make it less
> likely for dangerous things to happen in a broken program that reads
> the environment concurrently with unsetenv.

Thanks -- here's a patch.

 src/env/putenv.c   | 15 ++++++++++++++-
 src/env/unsetenv.c | 29 +++++++++++++----------------
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/src/env/putenv.c b/src/env/putenv.c
index 4042869..86b2024 100644
--- a/src/env/putenv.c
+++ b/src/env/putenv.c
@@ -2,7 +2,20 @@
 #include <string.h>
 
 extern char **__environ;
-char **__env_map;
+static char **__env_map;
+
+void __env_free(char *p)
+{
+	if (__env_map)
+		for (char **e = __env_map; *e; e++)
+			if (*e == p) {
+				char **ee = e;
+				while (*(ee+1)) ee++;
+				*e = *ee; *ee = 0;
+				free(p);
+				return;
+			}
+}
 
 int __putenv(char *s, int a)
 {
diff --git a/src/env/unsetenv.c b/src/env/unsetenv.c
index 3569335..f0f369f 100644
--- a/src/env/unsetenv.c
+++ b/src/env/unsetenv.c
@@ -1,31 +1,28 @@
 #include <stdlib.h>
 #include <string.h>
 #include <errno.h>
+#include "libc.h"
 
 extern char **__environ;
-extern char **__env_map;
+
+static void dummy(char *p) {}
+weak_alias(dummy, __env_free);
 
 int unsetenv(const char *name)
 {
-	int i, j;
 	size_t l = strlen(name);
 
-	if (!*name || strchr(name, '=')) {
+	if (!*name || memchr(name, '=', l)) {
 		errno = EINVAL;
 		return -1;
 	}
-again:
-	for (i=0; __environ[i] && (memcmp(name, __environ[i], l) || __environ[i][l] != '='); i++);
-	if (__environ[i]) {
-		if (__env_map) {
-			for (j=0; __env_map[j] && __env_map[j] != __environ[i]; j++);
-			free (__env_map[j]);
-			for (; __env_map[j]; j++)
-				__env_map[j] = __env_map[j+1];
-		}
-		for (; __environ[i]; i++)
-			__environ[i] = __environ[i+1];
-		goto again;
-	}
+	for (char **e = __environ; *e; )
+		if (!memcmp(name, *e, l) && l[*e] == '=') {
+			char **ee = e, *tmp = *e;
+			do *ee = *(ee+1);
+			while (*++ee);
+			__env_free(tmp);
+		} else
+			e++;
 	return 0;
 }


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slim down and avoid undefined behavior in unsetenv
  2016-02-20 18:09 [PATCH] slim down and avoid undefined behavior in unsetenv Alexander Monakov
@ 2016-02-20 18:18 ` Alexander Monakov
  2016-02-21  9:51 ` Alexander Monakov
  2016-03-05  5:18 ` Rich Felker
  2 siblings, 0 replies; 8+ messages in thread
From: Alexander Monakov @ 2016-02-20 18:18 UTC (permalink / raw)
  To: musl

On Sat, 20 Feb 2016, Alexander Monakov wrote:
> diff --git a/src/env/unsetenv.c b/src/env/unsetenv.c
> index 3569335..f0f369f 100644
> --- a/src/env/unsetenv.c
> +++ b/src/env/unsetenv.c
> @@ -1,31 +1,28 @@
>  #include <stdlib.h>
   ^^^^^^^^^^^^^^^^^^^

This was used for the declaration of free() and can be removed with that patch.

Alexander


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slim down and avoid undefined behavior in unsetenv
  2016-02-20 18:09 [PATCH] slim down and avoid undefined behavior in unsetenv Alexander Monakov
  2016-02-20 18:18 ` Alexander Monakov
@ 2016-02-21  9:51 ` Alexander Monakov
  2016-03-05 16:30   ` Rich Felker
  2016-03-05  5:18 ` Rich Felker
  2 siblings, 1 reply; 8+ messages in thread
From: Alexander Monakov @ 2016-02-21  9:51 UTC (permalink / raw)
  To: musl

... aaand the OCD kicks in :)

On Sat, 20 Feb 2016, Alexander Monakov wrote:
> diff --git a/src/env/unsetenv.c b/src/env/unsetenv.c
> index 3569335..f0f369f 100644
> --- a/src/env/unsetenv.c
> +++ b/src/env/unsetenv.c
[snip]
>  	size_t l = strlen(name);
>  
> -	if (!*name || strchr(name, '=')) {
> +	if (!*name || memchr(name, '=', l)) {

Here I could have changed '!*name' to '!l' for a small cleanup as well.

>  		errno = EINVAL;
>  		return -1;
>  	}

Some places in musl tail-call to __syscall_ret(-Exxx) to set errno. I wonder
if it's accidental or there's a guideline for using one style or the other?
The only place I imagine the tailcall style might be undesired is sem_trywait,
where returning failure is not expected to be rare.

What do you think about a change that introduces __set_errno that accepts
positive errno and returns -1L? With that change __syscall_ret can become

    return r < -4095UL ? r : __set_errno(-r);

> -again:
[snip]
> +	for (char **e = __environ; *e; )
> +		if (!memcmp(name, *e, l) && l[*e] == '=') {

Here the usage of memcmp requires that it scans buffers left-to-right and
stops on first mismatch. As I understand the standards do not guarantee that,
but musl's current implementation does, and is not interposable. Still, a
gotcha.

Alexander


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slim down and avoid undefined behavior in unsetenv
  2016-02-20 18:09 [PATCH] slim down and avoid undefined behavior in unsetenv Alexander Monakov
  2016-02-20 18:18 ` Alexander Monakov
  2016-02-21  9:51 ` Alexander Monakov
@ 2016-03-05  5:18 ` Rich Felker
  2016-03-05  6:01   ` Alexander Monakov
  2 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2016-03-05  5:18 UTC (permalink / raw)
  To: musl

On Sat, Feb 20, 2016 at 09:09:57PM +0300, Alexander Monakov wrote:
> Implementation of unsetenv used to invoke (a benign) undefined
> behavior by using pointer value that was passed to free() in
> comparisons against NULL in two loops that would remove that pointer
> from __env_map and __environ arrays.
> 
> Factor out handling of __env_map into a separate function __env_free
> and move it to putenv.c, allowing to make __env_map private to that
> file. Do not attempt to preserve order in __env_map when removing the
> entry (it is not important).
> ---
> On Mon, 4 Jan 2016, Rich Felker wrote:
> > Oh. In that case I guess it's unnecessary to rewind, yes.
> > 
> > BTW what might be best to do is something like:
> > 
> > 	char *tmp = __environ[i];
> > 	for (j=i ; __environ[j]; j++)
> > 		__environ[j] = __environ[j+1];
> > 	__env_free(tmp);
> > 
> > where __env_free has a weak def as a nop and gets its strong def from
> > setenv.c or putenv.c. This refactoring would make it possible for
> > unsetenv not to pull in free, and the reordering might make it less
> > likely for dangerous things to happen in a broken program that reads
> > the environment concurrently with unsetenv.
> 
> Thanks -- here's a patch.
> 
>  src/env/putenv.c   | 15 ++++++++++++++-
>  src/env/unsetenv.c | 29 +++++++++++++----------------
>  2 files changed, 27 insertions(+), 17 deletions(-)
> 
> diff --git a/src/env/putenv.c b/src/env/putenv.c
> index 4042869..86b2024 100644
> --- a/src/env/putenv.c
> +++ b/src/env/putenv.c
> @@ -2,7 +2,20 @@
>  #include <string.h>
>  
>  extern char **__environ;
> -char **__env_map;
> +static char **__env_map;
> +
> +void __env_free(char *p)
> +{
> +	if (__env_map)

Perhaps if (!__env_map) return; to avoid gratuitous indention of the
whole rest of the function?

> +		for (char **e = __env_map; *e; e++)
> +			if (*e == p) {
> +				char **ee = e;
> +				while (*(ee+1)) ee++;
> +				*e = *ee; *ee = 0;
> +				free(p);
> +				return;
> +			}
> +}

Aside from that, I really like this, especially making __env_map
private. But perhaps we should rename it not to use __ prefix now that
it's static?

>  int __putenv(char *s, int a)
>  {
> diff --git a/src/env/unsetenv.c b/src/env/unsetenv.c
> index 3569335..f0f369f 100644
> --- a/src/env/unsetenv.c
> +++ b/src/env/unsetenv.c
> @@ -1,31 +1,28 @@
>  #include <stdlib.h>
>  #include <string.h>
>  #include <errno.h>
> +#include "libc.h"
>  
>  extern char **__environ;
> -extern char **__env_map;
> +
> +static void dummy(char *p) {}
> +weak_alias(dummy, __env_free);

This makes it so unsetenv no longer requires full malloc, I think,
right? Nice.

>  int unsetenv(const char *name)
>  {
> -	int i, j;
>  	size_t l = strlen(name);
>  
> -	if (!*name || strchr(name, '=')) {
> +	if (!*name || memchr(name, '=', l)) {
>  		errno = EINVAL;
>  		return -1;
>  	}
> -again:
> -	for (i=0; __environ[i] && (memcmp(name, __environ[i], l) || __environ[i][l] != '='); i++);
> -	if (__environ[i]) {
> -		if (__env_map) {
> -			for (j=0; __env_map[j] && __env_map[j] != __environ[i]; j++);
> -			free (__env_map[j]);
> -			for (; __env_map[j]; j++)
> -				__env_map[j] = __env_map[j+1];
> -		}
> -		for (; __environ[i]; i++)
> -			__environ[i] = __environ[i+1];
> -		goto again;
> -	}
> +	for (char **e = __environ; *e; )
> +		if (!memcmp(name, *e, l) && l[*e] == '=') {
> +			char **ee = e, *tmp = *e;
> +			do *ee = *(ee+1);
> +			while (*++ee);

We could use memmove here but I'm not sure if it's nicer or not.

> +			__env_free(tmp);
> +		} else
> +			e++;

As a matter of style, if the 'if' body is a block I generally try to
do the same for the else.

Also we're not using clause-1 declarations in for statements elsewhere
in musl afaik, but I'm not opposed to adopting their use where it
makes sense.

I think the loop logic might be clearer with indices instead of
pointers, but I'm not sure. Is there a reason you preferred switching
to pointers? One nice (I think; others may disagree) aspect of indices
is that instead of the if/else we could just have an unconditional i++
in the for's expression-3 and an i-- inside the if block.

These are all minor comments, and the patch looks like it could be
okay as-is. I didn't see any bugs. Have you done any testing?

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slim down and avoid undefined behavior in unsetenv
  2016-03-05  5:18 ` Rich Felker
@ 2016-03-05  6:01   ` Alexander Monakov
  2016-03-05  6:14     ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Monakov @ 2016-03-05  6:01 UTC (permalink / raw)
  To: musl

On Sat, 5 Mar 2016, Rich Felker wrote:
> > -char **__env_map;
> > +static char **__env_map;
> > +
> > +void __env_free(char *p)
> > +{
> > +	if (__env_map)
> 
> Perhaps if (!__env_map) return; to avoid gratuitous indention of the
> whole rest of the function?

I don't mind; I did consider this point and went with this style because
indent increase is not too bad and it's a bit easier to see that it just
guards the for loop. But I can change it when resubmitting the patch.

> Aside from that, I really like this, especially making __env_map
> private. But perhaps we should rename it not to use __ prefix now that
> it's static?

Indeed, I haven't noticed that. Personally I'd prefer to make the rename a
separate patch, though.

> >  extern char **__environ;
> > -extern char **__env_map;
> > +
> > +static void dummy(char *p) {}
> > +weak_alias(dummy, __env_free);
> 
> This makes it so unsetenv no longer requires full malloc, I think,
> right? Nice.

That was your idea from the previous discussion :)

> > +	for (char **e = __environ; *e; )
> > +		if (!memcmp(name, *e, l) && l[*e] == '=') {
> > +			char **ee = e, *tmp = *e;
> > +			do *ee = *(ee+1);
> > +			while (*++ee);
> 
> We could use memmove here but I'm not sure if it's nicer or not.

I guess not, without additional code tracking current size?..

> > +			__env_free(tmp);
> > +		} else
> > +			e++;
> 
> As a matter of style, if the 'if' body is a block I generally try to
> do the same for the else.

In that case I'd like to make that change while swapping the if/else branches
around.

> Also we're not using clause-1 declarations in for statements elsewhere
> in musl afaik, but I'm not opposed to adopting their use where it
> makes sense.

There were a few instances of 'for (int i=0; ...)' already so I felt I have a
license to do this :)

> I think the loop logic might be clearer with indices instead of
> pointers, but I'm not sure. Is there a reason you preferred switching
> to pointers?

Well, the whole thing started with removing benign undefined behavior, so I
felt it's in line to remove int-indexing (not ssize_t) on an unbounded array.
Apart from that, I like more how the 'for' statement reads with this change.

> One nice (I think; others may disagree) aspect of indices
> is that instead of the if/else we could just have an unconditional i++
> in the for's expression-3 and an i-- inside the if block.

Yeah, that's a bit of a loss, but I hope it's alright and not too obfuscated.

> These are all minor comments, and the patch looks like it could be
> okay as-is. I didn't see any bugs. Have you done any testing?

Nope, sorry; I'm not dogfooding musl.

Alexander


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slim down and avoid undefined behavior in unsetenv
  2016-03-05  6:01   ` Alexander Monakov
@ 2016-03-05  6:14     ` Rich Felker
  0 siblings, 0 replies; 8+ messages in thread
From: Rich Felker @ 2016-03-05  6:14 UTC (permalink / raw)
  To: musl

On Sat, Mar 05, 2016 at 09:01:54AM +0300, Alexander Monakov wrote:
> On Sat, 5 Mar 2016, Rich Felker wrote:
> > > -char **__env_map;
> > > +static char **__env_map;
> > > +
> > > +void __env_free(char *p)
> > > +{
> > > +	if (__env_map)
> > 
> > Perhaps if (!__env_map) return; to avoid gratuitous indention of the
> > whole rest of the function?
> 
> I don't mind; I did consider this point and went with this style because
> indent increase is not too bad and it's a bit easier to see that it just
> guards the for loop. But I can change it when resubmitting the patch.

OK.

> > Aside from that, I really like this, especially making __env_map
> > private. But perhaps we should rename it not to use __ prefix now that
> > it's static?
> 
> Indeed, I haven't noticed that. Personally I'd prefer to make the rename a
> separate patch, though.

Either way is fine with me.

> > >  extern char **__environ;
> > > -extern char **__env_map;
> > > +
> > > +static void dummy(char *p) {}
> > > +weak_alias(dummy, __env_free);
> > 
> > This makes it so unsetenv no longer requires full malloc, I think,
> > right? Nice.
> 
> That was your idea from the previous discussion :)

Ah. :)

> > > +	for (char **e = __environ; *e; )
> > > +		if (!memcmp(name, *e, l) && l[*e] == '=') {
> > > +			char **ee = e, *tmp = *e;
> > > +			do *ee = *(ee+1);
> > > +			while (*++ee);
> > 
> > We could use memmove here but I'm not sure if it's nicer or not.
> 
> I guess not, without additional code tracking current size?..

You're right; I missed that. Perhaps tracking the current size would
be nice, but I think it would have complexity cost we might not like.

On the other hand tracking both the number of slots currently use and
the allocated size of __env_map might be a good idea to avoid
pathological realloc behavior. But I think this should be done
separately if at all.

> 
> > > +			__env_free(tmp);
> > > +		} else
> > > +			e++;
> > 
> > As a matter of style, if the 'if' body is a block I generally try to
> > do the same for the else.
> 
> In that case I'd like to make that change while swapping the if/else branches
> around.

OK.

> > Also we're not using clause-1 declarations in for statements elsewhere
> > in musl afaik, but I'm not opposed to adopting their use where it
> > makes sense.
> 
> There were a few instances of 'for (int i=0; ...)' already so I felt I have a
> license to do this :)

Oh, I didn't remember there being any. OK.

> > I think the loop logic might be clearer with indices instead of
> > pointers, but I'm not sure. Is there a reason you preferred switching
> > to pointers?
> 
> Well, the whole thing started with removing benign undefined behavior, so I
> felt it's in line to remove int-indexing (not ssize_t) on an unbounded array.
> Apart from that, I like more how the 'for' statement reads with this change.

Uhg, I missed that the types were wrong too. For the initial
environment it's safe in practice to assume the indices fit in int, I
think, but there's no reason it couldn't grow beyond that size via
setenv/putenv.

> > One nice (I think; others may disagree) aspect of indices
> > is that instead of the if/else we could just have an unconditional i++
> > in the for's expression-3 and an i-- inside the if block.
> 
> Yeah, that's a bit of a loss, but I hope it's alright and not too obfuscated.

Yeah, I think it's okay.

> > These are all minor comments, and the patch looks like it could be
> > okay as-is. I didn't see any bugs. Have you done any testing?
> 
> Nope, sorry; I'm not dogfooding musl.

Then we should probably come up with a few good sanity-check and
corner-case tests for the env functions to add to libc-test. The
changes are probably all fine but in the 1.1.13 release cycle we had a
bunch of stupid regressions in stuff that looked trivial so I'd like
to get in a habit of adding testing when making changes in places that
could cause widespready breakage, especially when the changes are not
fixing presently-observable bugs.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slim down and avoid undefined behavior in unsetenv
  2016-02-21  9:51 ` Alexander Monakov
@ 2016-03-05 16:30   ` Rich Felker
  2016-03-05 17:24     ` Alexander Monakov
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2016-03-05 16:30 UTC (permalink / raw)
  To: musl

On Sun, Feb 21, 2016 at 12:51:30PM +0300, Alexander Monakov wrote:
> >  		errno = EINVAL;
> >  		return -1;
> >  	}
> 
> Some places in musl tail-call to __syscall_ret(-Exxx) to set errno. I wonder
> if it's accidental or there's a guideline for using one style or the other?

A large portion of the source files in musl are written purely with
public APIs. This is especially desirable for legacy functions, where
there's no reason to optimize them and where being maintenance-free is
the most valuable property, but it's also nice for other files from
the standpoints of maintenance, ease of reading, and ease of reuse in
other software (as drop-ins for systems lacking working versions of
these functions) or libcs.

With this in mind, __syscall_ret is mainly used in places where the
surrounding code already deals directly with making syscalls or does
other things that necessitate use of libc-internal APIs. This might
not be followed everywhere as a strict rule, but it's the intent.

> The only place I imagine the tailcall style might be undesired is sem_trywait,
> where returning failure is not expected to be rare.
> 
> What do you think about a change that introduces __set_errno that accepts
> positive errno and returns -1L? With that change __syscall_ret can become
> 
>     return r < -4095UL ? r : __set_errno(-r);

I don't see much value to it; return __set_errno(e) is functionally
equivalent to return __syscall_ret(-e) and just saves one "neg"
instruction before the tailcall. It might be a nicer abstraction in
places where we don't use syscalls directly, but for the most part
musl avoids internal APIs in those situations anyway, as explained
above.

> > -again:
> [snip]
> > +	for (char **e = __environ; *e; )
> > +		if (!memcmp(name, *e, l) && l[*e] == '=') {
> 
> Here the usage of memcmp requires that it scans buffers left-to-right and
> stops on first mismatch. As I understand the standards do not guarantee that,
> but musl's current implementation does, and is not interposable. Still, a
> gotcha.

I think you're right. Would it be better to switch such usage to
strncmp then?

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] slim down and avoid undefined behavior in unsetenv
  2016-03-05 16:30   ` Rich Felker
@ 2016-03-05 17:24     ` Alexander Monakov
  0 siblings, 0 replies; 8+ messages in thread
From: Alexander Monakov @ 2016-03-05 17:24 UTC (permalink / raw)
  To: musl

On Sat, 5 Mar 2016, Rich Felker wrote:
> I don't see much value to it; return __set_errno(e) is functionally
> equivalent to return __syscall_ret(-e) and just saves one "neg"
> instruction before the tailcall. It might be a nicer abstraction in
> places where we don't use syscalls directly, but for the most part
> musl avoids internal APIs in those situations anyway, as explained
> above.

Yeah, I was thinking about using it at call sites that currently do 'errno =
Exyz; return -1' for code size improvements; I wasn't aware of why it would be
undesired. Thanks for the explanations.

> > Here the usage of memcmp requires that it scans buffers left-to-right and
> > stops on first mismatch. As I understand the standards do not guarantee that,
> > but musl's current implementation does, and is not interposable. Still, a
> > gotcha.
> 
> I think you're right. Would it be better to switch such usage to
> strncmp then?

Given your point about reusability, I think so; were it a place where
resulting speed penalty is unwelcome, one could introduce __memcmp_fwd
to make the requirement explicitely visible, at zero runtime cost.

Alexander


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-03-05 17:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-20 18:09 [PATCH] slim down and avoid undefined behavior in unsetenv Alexander Monakov
2016-02-20 18:18 ` Alexander Monakov
2016-02-21  9:51 ` Alexander Monakov
2016-03-05 16:30   ` Rich Felker
2016-03-05 17:24     ` Alexander Monakov
2016-03-05  5:18 ` Rich Felker
2016-03-05  6:01   ` Alexander Monakov
2016-03-05  6:14     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).