Use of size_t and ssize

mailing list of musl libc
 help / color / mirror / code / Atom feed

* Use of size_t and ssize_t in mseek
@ 2013-06-27  3:52 Matthew Fernandez
  2013-06-27  4:10 ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Fernandez @ 2013-06-27  3:52 UTC (permalink / raw)
  To: musl

Hi all,

My question refers to the latest git commit at time of writing,
b17c75a4d539d7ec5b81cc7ce7ce6b065a87e7a6. My issue was encountered on
ARM, but it applies to most 32-bit platforms.

The function mseek() accesses a size_t variable, c->size, and casts this
to a ssize_t. I know there aren't strong standards on what to expect
from ssize_t, but the Musl C constants SIZE_MAX (== UINT32_MAX) and
SSIZE_MAX (== LONG_MAX) seem to imply that you would be wise to assume
ssize_t is signed and the same width as size_t.

As a result, the cast I mentioned produces some unexpected results when
operating on a file of size greater than SSIZE_MAX. In my case I had an
in-memory file of size SIZE_MAX and was surprised to find I couldn't
fseek this file.

Is the code in mseek() correct? If so, I would recommend failing
fmemopen() when the requested size is greater than SSIZE_MAX. OTOH
perhaps I'm misunderstanding something more subtle here. If so, please
correct me.

Thanks,
Matthew

[It is perhaps worth noting that, yes, a UINT32_MAX-sized in-memory file
on a 32-bit platform is a bit odd. In my case I don't know how big the
file is until I've read its header. You could object that this is
unwise, but either way I believe fmemopen/mseek should handle this case.]

________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-27  3:52 Use of size_t and ssize_t in mseek Matthew Fernandez
@ 2013-06-27  4:10 ` Rich Felker
  2013-06-27  4:16   ` Matthew Fernandez
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-06-27  4:10 UTC (permalink / raw)
  To: musl

On Thu, Jun 27, 2013 at 01:52:01PM +1000, Matthew Fernandez wrote:
> Hi all,
> 
> My question refers to the latest git commit at time of writing,
> b17c75a4d539d7ec5b81cc7ce7ce6b065a87e7a6. My issue was encountered on
> ARM, but it applies to most 32-bit platforms.
> 
> The function mseek() accesses a size_t variable, c->size, and casts this
> to a ssize_t. I know there aren't strong standards on what to expect
> from ssize_t, but the Musl C constants SIZE_MAX (== UINT32_MAX) and
> SSIZE_MAX (== LONG_MAX) seem to imply that you would be wise to assume
> ssize_t is signed and the same width as size_t.
> 
> As a result, the cast I mentioned produces some unexpected results when
> operating on a file of size greater than SSIZE_MAX. In my case I had an
> in-memory file of size SIZE_MAX and was surprised to find I couldn't
> fseek this file.
> 
> Is the code in mseek() correct? If so, I would recommend failing
> fmemopen() when the requested size is greater than SSIZE_MAX. OTOH
> perhaps I'm misunderstanding something more subtle here. If so, please
> correct me.

The argument to a function like fmemopen that takes a size_t
representing the size of an object must actually correspond to the
size of an object. It's always been the intent in musl that objects
larger than SSIZE_MAX (which is also PTRDIFF_MAX) not be able to
exist, since they result in overflow (and undefined behavior) if you
subtract pointers within them, and implementation-defined behavior if
you pass their sizes to functions like read, write, etc. The malloc
implementation in musl explicitly rejects size arguments greater than
SSIZE_MAX, with the intent of ensuring that such objects don't exist,
so as far as I can tell, the only remaining "backdoor" to get such an
object is mmap.

Unless there are strong objections, I'd like to make mmap reject such
sizes too. On 32-bit archs, the maximum vm size is already bounded by
3gb, and once you get the program and a few shared libraries loaded
too, it's a good bit less, so it would be hard to map more than 2.5gb
or so anyway.

Once all the doors to obtaining objects of size greater than SSIZE_MAX
are closed, the issue goes away entirely. If you pass a size greater
than SSIZE_MAX to fmemopen or similar functions, it can't actually be
the correct size of the object (since no object that large exists),
and thus you must be invoking UB by passing an incorrect size, so it
doesn't matter what the function does.

Does this all make sense and sound reasonable?

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-27  4:10 ` Rich Felker
@ 2013-06-27  4:16   ` Matthew Fernandez
  2013-06-27  4:23     ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Fernandez @ 2013-06-27  4:16 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On 27/06/13 14:10, Rich Felker wrote:
> On Thu, Jun 27, 2013 at 01:52:01PM +1000, Matthew Fernandez wrote:
>> Hi all,
>>
>> My question refers to the latest git commit at time of writing,
>> b17c75a4d539d7ec5b81cc7ce7ce6b065a87e7a6. My issue was encountered on
>> ARM, but it applies to most 32-bit platforms.
>>
>> The function mseek() accesses a size_t variable, c->size, and casts this
>> to a ssize_t. I know there aren't strong standards on what to expect
>> from ssize_t, but the Musl C constants SIZE_MAX (== UINT32_MAX) and
>> SSIZE_MAX (== LONG_MAX) seem to imply that you would be wise to assume
>> ssize_t is signed and the same width as size_t.
>>
>> As a result, the cast I mentioned produces some unexpected results when
>> operating on a file of size greater than SSIZE_MAX. In my case I had an
>> in-memory file of size SIZE_MAX and was surprised to find I couldn't
>> fseek this file.
>>
>> Is the code in mseek() correct? If so, I would recommend failing
>> fmemopen() when the requested size is greater than SSIZE_MAX. OTOH
>> perhaps I'm misunderstanding something more subtle here. If so, please
>> correct me.
>
> The argument to a function like fmemopen that takes a size_t
> representing the size of an object must actually correspond to the
> size of an object. It's always been the intent in musl that objects
> larger than SSIZE_MAX (which is also PTRDIFF_MAX) not be able to
> exist, since they result in overflow (and undefined behavior) if you
> subtract pointers within them, and implementation-defined behavior if
> you pass their sizes to functions like read, write, etc. The malloc
> implementation in musl explicitly rejects size arguments greater than
> SSIZE_MAX, with the intent of ensuring that such objects don't exist,
> so as far as I can tell, the only remaining "backdoor" to get such an
> object is mmap.
>
> Unless there are strong objections, I'd like to make mmap reject such
> sizes too. On 32-bit archs, the maximum vm size is already bounded by
> 3gb, and once you get the program and a few shared libraries loaded
> too, it's a good bit less, so it would be hard to map more than 2.5gb
> or so anyway.
>
> Once all the doors to obtaining objects of size greater than SSIZE_MAX
> are closed, the issue goes away entirely. If you pass a size greater
> than SSIZE_MAX to fmemopen or similar functions, it can't actually be
> the correct size of the object (since no object that large exists),
> and thus you must be invoking UB by passing an incorrect size, so it
> doesn't matter what the function does.
>
> Does this all make sense and sound reasonable?

Perfectly reasonable to make it UB (and I had assumed it was so
already). It just seemed to me that it would be more user-friendly to
bounds check the size parameter in fmemopen. Is there a reason not to do
this?

________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-27  4:16   ` Matthew Fernandez
@ 2013-06-27  4:23     ` Rich Felker
  2013-06-27  4:31       ` Matthew Fernandez
                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Rich Felker @ 2013-06-27  4:23 UTC (permalink / raw)
  To: musl

On Thu, Jun 27, 2013 at 02:16:15PM +1000, Matthew Fernandez wrote:
> Perfectly reasonable to make it UB (and I had assumed it was so
> already).

Well the UB is just passing a wrong size. But the only way you can
guarantee that such a huge size is "wrong" is by cutting off all ways
of making such a large object.

> It just seemed to me that it would be more user-friendly to
> bounds check the size parameter in fmemopen. Is there a reason not to do
> this?

Mainly just consistency. There are a lot of places where sizes greater
than PTRDIFF_MAX would be problematic due to overflowing differences
and other issues, it's difficult (and ugly) to try to catch them all,
and even if you do catch them, in some cases, there's no obvious
"correct" course of action to take. fmemopen could check and return
some reasonable error, but I still want to find and fix any remaining
places where objects larger than PTRDIFF_MAX could come into existence
since they affect other code too, and once those are fixed, the check
in fmemopen would be obsolete.

As far as I can tell, mmap and maybe shmat are the only functions that
might be able to make such large objects. Do you know any others?

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-27  4:23     ` Rich Felker
@ 2013-06-27  4:31       ` Matthew Fernandez
  2013-06-27 15:34         ` Rich Felker
  2013-06-27 10:35       ` Szabolcs Nagy
  2013-06-27 16:47       ` Rich Felker
  2 siblings, 1 reply; 32+ messages in thread
From: Matthew Fernandez @ 2013-06-27  4:31 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On 27/06/13 14:23, Rich Felker wrote:
> On Thu, Jun 27, 2013 at 02:16:15PM +1000, Matthew Fernandez wrote:
>> Perfectly reasonable to make it UB (and I had assumed it was so
>> already).
>
> Well the UB is just passing a wrong size. But the only way you can
> guarantee that such a huge size is "wrong" is by cutting off all ways
> of making such a large object.

In some sense this is true, but this would not have affected my
scenario. I do not know the size of the object to which I have a pointer
to and the object was not derived from Musl C functionality. As a bit of
context, this is an embedded environment where we don't have much
address space introspection. I suppose you could argue that I should
have re-implemented fmemopen in a way that didn't assume a bounded buffer.

>> It just seemed to me that it would be more user-friendly to
>> bounds check the size parameter in fmemopen. Is there a reason not to do
>> this?
>
> Mainly just consistency. There are a lot of places where sizes greater
> than PTRDIFF_MAX would be problematic due to overflowing differences
> and other issues, it's difficult (and ugly) to try to catch them all,
> and even if you do catch them, in some cases, there's no obvious
> "correct" course of action to take. fmemopen could check and return
> some reasonable error, but I still want to find and fix any remaining
> places where objects larger than PTRDIFF_MAX could come into existence
> since they affect other code too, and once those are fixed, the check
> in fmemopen would be obsolete.

Yes, I agree with this reasoning. It seems fmemopen should really take a
ssize_t, but this would require deviating from the standard which is
undesirable.

> As far as I can tell, mmap and maybe shmat are the only functions that
> might be able to make such large objects. Do you know any others?

Not that I'm aware of. I haven't explored Musl C much and would consider
myself more of a user than developer with respect to its code.

________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-27  4:23     ` Rich Felker
  2013-06-27  4:31       ` Matthew Fernandez
@ 2013-06-27 10:35       ` Szabolcs Nagy
  2013-06-27 15:05         ` Rich Felker
  2013-06-27 16:47       ` Rich Felker
  2 siblings, 1 reply; 32+ messages in thread
From: Szabolcs Nagy @ 2013-06-27 10:35 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@aerifal.cx> [2013-06-27 00:23:14 -0400]:
> some reasonable error, but I still want to find and fix any remaining
> places where objects larger than PTRDIFF_MAX could come into existence
> since they affect other code too, and once those are fixed, the check
> in fmemopen would be obsolete.
> 
> As far as I can tell, mmap and maybe shmat are the only functions that
> might be able to make such large objects. Do you know any others?

void *p=sbrk(1<<30); sbrk(1<<30);

or

int main() { char a[1U<<31]; }

it seems compilers dont like objects >=2G size either
(is there a constraint for this in the standard?
gcc even fails if the sum of the local objects are >=2G,
but tcc, pcc generates code in that case)

i assume isoc would not allow this but you can concatenate
address ranges:

char *p,*q;
q = mmap(0, 1<<30, prot, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
p = mmap(q-(1<<30), 1<<30, prot, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (p && q && p == q-(1<<30)) {

now p points to a 2G continous address range
you could even mprotect(p, 1U<<31, prot);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-27 10:35       ` Szabolcs Nagy
@ 2013-06-27 15:05         ` Rich Felker
  0 siblings, 0 replies; 32+ messages in thread
From: Rich Felker @ 2013-06-27 15:05 UTC (permalink / raw)
  To: musl

On Thu, Jun 27, 2013 at 12:35:22PM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@aerifal.cx> [2013-06-27 00:23:14 -0400]:
> > some reasonable error, but I still want to find and fix any remaining
> > places where objects larger than PTRDIFF_MAX could come into existence
> > since they affect other code too, and once those are fixed, the check
> > in fmemopen would be obsolete.
> > 
> > As far as I can tell, mmap and maybe shmat are the only functions that
> > might be able to make such large objects. Do you know any others?
> 
> void *p=sbrk(1<<30); sbrk(1<<30);

Using sbrk alongside anything else in the standard library invokes
horrible UB, so I don't really care about sbrk.

> or
> 
> int main() { char a[1U<<31]; }
> 
> it seems compilers dont like objects >=2G size either
> (is there a constraint for this in the standard?
> gcc even fails if the sum of the local objects are >=2G,
> but tcc, pcc generates code in that case)

There's not a constraint, but the compiler is providing a low quality
of implementation if it allows them, since its ptrdiff_t is too small
to work with them.

> i assume isoc would not allow this but you can concatenate
> address ranges:
> 
> char *p,*q;
> q = mmap(0, 1<<30, prot, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> p = mmap(q-(1<<30), 1<<30, prot, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> if (p && q && p == q-(1<<30)) {
> 
> now p points to a 2G continous address range
> you could even mprotect(p, 1U<<31, prot);

Formally these should probably be thought of as two objects where the
address of the element one past the end of the first happens to be
equal to the address of the second (which the C language allows). Of
course I agree it could be argued both ways. However, either way, this
kind of thing is sufficiently intentional and fragile that someone
doing it would expect breakage, I think. What I'm concerned about is
the possibility that someone could inadvertently obtain such an
object, e.g. via passing a size obtained from a file or from the
network to malloc, etc. But thanks for the thorough consideration of
the issue. :-)

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-27  4:31       ` Matthew Fernandez
@ 2013-06-27 15:34         ` Rich Felker
  2013-06-28  0:49           ` Matthew Fernandez
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-06-27 15:34 UTC (permalink / raw)
  To: musl

On Thu, Jun 27, 2013 at 02:31:48PM +1000, Matthew Fernandez wrote:
> On 27/06/13 14:23, Rich Felker wrote:
> >On Thu, Jun 27, 2013 at 02:16:15PM +1000, Matthew Fernandez wrote:
> >>Perfectly reasonable to make it UB (and I had assumed it was so
> >>already).
> >
> >Well the UB is just passing a wrong size. But the only way you can
> >guarantee that such a huge size is "wrong" is by cutting off all ways
> >of making such a large object.
> 
> In some sense this is true, but this would not have affected my
> scenario. I do not know the size of the object to which I have a pointer
> to and the object was not derived from Musl C functionality. As a bit of
> context, this is an embedded environment where we don't have much
> address space introspection. I suppose you could argue that I should
> have re-implemented fmemopen in a way that didn't assume a bounded buffer.

Thanks for providing the added context. It helps to understand the
situation a lot better. Am I correct in assuming that you're using
musl not on Linux, but ported to an underlying embedded environment,
possibly without a kernel? The reason I ask is that I don't see how
Linux would give you such an object without you requesting it via a
syscall of some sort.

Anyway, if I'm understanding your usage case correctly, I think we
should eventually provide documentation (perhaps in the porting
section of the future manual) about using musl like this. And either
we should document that it's undefined behavior to pass sizes larger
than PTRDIFF_MAX/SSIZE_MAX to functions in musl, or we should make an
effort to find all the functions affected by this issue and make them
either reject such large sizes, or work safely with them.

As a user of musl, what's your take on this?

> Yes, I agree with this reasoning. It seems fmemopen should really take a
> ssize_t, but this would require deviating from the standard which is
> undesirable.

There are a lot of functions (read, write, ...) that "should" take
ssize_t, in some sense of "should", but that wouldn't really fix the
problem either, since large values would just get converted to
negative values when passed to the functions.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-27  4:23     ` Rich Felker
  2013-06-27  4:31       ` Matthew Fernandez
  2013-06-27 10:35       ` Szabolcs Nagy
@ 2013-06-27 16:47       ` Rich Felker
  2 siblings, 0 replies; 32+ messages in thread
From: Rich Felker @ 2013-06-27 16:47 UTC (permalink / raw)
  To: musl

On Thu, Jun 27, 2013 at 12:23:14AM -0400, Rich Felker wrote:
> As far as I can tell, mmap and maybe shmat are the only functions that
> might be able to make such large objects. Do you know any others?

It's actually shmget, not shmat, that takes the size argument. I
RTFS'd the kernel a bit, and there's a setting for max shm segment
size that normally caps the size much lower than PTRDIFF_MAX. I also
found an integer overflow in the kernel: if size+PAGE_SIZE-1 overflows
(which is possible if this limit is increased near or to SIZE_MAX),
the size will be accounted as 0 pages, but the full size object will
be created. The impact seems low since it only affects the case where
the administrator has attempted to allow huge individual shm objects
but limit the total shm usage on the system.

In any case, fixing shmget in userspace looks hard since the size
should not be rejected as invalid if the key already exists, only if
it's new. And by default the kernel will catch invalid sizes (and even
mildly large sizes) and reject them, so I don't think any action is
required on musl's part.

I am however preparing a patch for mmap.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-27 15:34         ` Rich Felker
@ 2013-06-28  0:49           ` Matthew Fernandez
  2013-06-28  1:22             ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Fernandez @ 2013-06-28  0:49 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On 28/06/13 01:34, Rich Felker wrote:
> On Thu, Jun 27, 2013 at 02:31:48PM +1000, Matthew Fernandez wrote:
>> On 27/06/13 14:23, Rich Felker wrote:
>>> On Thu, Jun 27, 2013 at 02:16:15PM +1000, Matthew Fernandez wrote:
>>>> Perfectly reasonable to make it UB (and I had assumed it was so
>>>> already).
>>>
>>> Well the UB is just passing a wrong size. But the only way you can
>>> guarantee that such a huge size is "wrong" is by cutting off all ways
>>> of making such a large object.
>>
>> In some sense this is true, but this would not have affected my
>> scenario. I do not know the size of the object to which I have a pointer
>> to and the object was not derived from Musl C functionality. As a bit of
>> context, this is an embedded environment where we don't have much
>> address space introspection. I suppose you could argue that I should
>> have re-implemented fmemopen in a way that didn't assume a bounded buffer.
>
> Thanks for providing the added context. It helps to understand the
> situation a lot better. Am I correct in assuming that you're using
> musl not on Linux, but ported to an underlying embedded environment,
> possibly without a kernel? The reason I ask is that I don't see how
> Linux would give you such an object without you requesting it via a
> syscall of some sort.

Yes, Rich, you're correct. I'm using Musl in userspace on the seL4
microkernel [0]. I'm getting this object from a bootstrapping process
that inserts it into my address space. The object is not actually
SIZE_MAX-sized, but I don't have accurate bounds for it at the time I
call fmemopen.

[0]: http://www.ertos.nicta.com.au/research/sel4/

> Anyway, if I'm understanding your usage case correctly, I think we
> should eventually provide documentation (perhaps in the porting
> section of the future manual) about using musl like this. And either
> we should document that it's undefined behavior to pass sizes larger
> than PTRDIFF_MAX/SSIZE_MAX to functions in musl, or we should make an
> effort to find all the functions affected by this issue and make them
> either reject such large sizes, or work safely with them.
>
> As a user of musl, what's your take on this?

A check in fmemopen (and other affected functions) would be my preferred
solution, as an unwitting user like myself who doesn't check all the
assumptions would still be caught out by just documenting it as
undefined. I would be happy with just an assert-fail here if that's easiest.

>> Yes, I agree with this reasoning. It seems fmemopen should really take a
>> ssize_t, but this would require deviating from the standard which is
>> undesirable.
>
> There are a lot of functions (read, write, ...) that "should" take
> ssize_t, in some sense of "should", but that wouldn't really fix the
> problem either, since large values would just get converted to
> negative values when passed to the functions.

True, but I would at least get a compiler warning. Anyway, I agree that
changing the function signature is not appropriate.

________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-28  0:49           ` Matthew Fernandez
@ 2013-06-28  1:22             ` Rich Felker
  2013-06-28  1:34               ` Matthew Fernandez
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-06-28  1:22 UTC (permalink / raw)
  To: musl

On Fri, Jun 28, 2013 at 10:49:41AM +1000, Matthew Fernandez wrote:
> >As a user of musl, what's your take on this?
> 
> A check in fmemopen (and other affected functions) would be my preferred
> solution, as an unwitting user like myself who doesn't check all the
> assumptions would still be caught out by just documenting it as
> undefined. I would be happy with just an assert-fail here if that's easiest..

The easiest might just be making fmemopen so it doesn't care if the
size is insanely large. As far as I can tell, the only place it's an
issue is in mseek, and we could use off_t instead of ssize_t. On
32-bit systems, off_t is 64-bit, so all sizes fit. On 64-bit systems,
there's no way (physically!) to have an object as large as 1UL<<63.

Alternatively, I could adjust the arithmetic to just avoid working
with signed values, and perhaps make it more obvious what it's doing
in the process.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-28  1:22             ` Rich Felker
@ 2013-06-28  1:34               ` Matthew Fernandez
  2013-06-28  1:48                 ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Fernandez @ 2013-06-28  1:34 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On 28/06/13 11:22, Rich Felker wrote:
> On Fri, Jun 28, 2013 at 10:49:41AM +1000, Matthew Fernandez wrote:
>>> As a user of musl, what's your take on this?
>>
>> A check in fmemopen (and other affected functions) would be my preferred
>> solution, as an unwitting user like myself who doesn't check all the
>> assumptions would still be caught out by just documenting it as
>> undefined. I would be happy with just an assert-fail here if that's easiest..
>
> The easiest might just be making fmemopen so it doesn't care if the
> size is insanely large. As far as I can tell, the only place it's an
> issue is in mseek, and we could use off_t instead of ssize_t. On
> 32-bit systems, off_t is 64-bit, so all sizes fit. On 64-bit systems,
> there's no way (physically!) to have an object as large as 1UL<<63.

I suppose this is an option, but this just isolates the problem to
64-bit systems. On x86_64 I would still be able to naïvely call fmemopen
with SIZE_MAX and end up being unable to fseek. Not being able to
physically have an object of that size seems reasonable justification
for making it undefined behaviour, but not justification for eliding
checks. Regardless of the maximum size of a valid object, nothing stops
me casting UINT_MAX to a size_t. I completely agree that one should not
expect sensible behaviour when claiming to have an object that covers
all of memory, but I don't see the harm in warning the user early by
failing in fmemopen.

> Alternatively, I could adjust the arithmetic to just avoid working
> with signed values, and perhaps make it more obvious what it's doing
> in the process.

I would also be happy with this solution. The code in mseek could
definitely be clearer. Not that I don't enjoy switch statements written
as offsets into stack structs and reverse jumps ;)

________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-28  1:34               ` Matthew Fernandez
@ 2013-06-28  1:48                 ` Rich Felker
  2013-06-28  1:56                   ` Matthew Fernandez
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-06-28  1:48 UTC (permalink / raw)
  To: musl

On Fri, Jun 28, 2013 at 11:34:23AM +1000, Matthew Fernandez wrote:
> On 28/06/13 11:22, Rich Felker wrote:
> >On Fri, Jun 28, 2013 at 10:49:41AM +1000, Matthew Fernandez wrote:
> >>>As a user of musl, what's your take on this?
> >>
> >>A check in fmemopen (and other affected functions) would be my preferred
> >>solution, as an unwitting user like myself who doesn't check all the
> >>assumptions would still be caught out by just documenting it as
> >>undefined. I would be happy with just an assert-fail here if that's easiest..
> >
> >The easiest might just be making fmemopen so it doesn't care if the
> >size is insanely large. As far as I can tell, the only place it's an
> >issue is in mseek, and we could use off_t instead of ssize_t. On
> >32-bit systems, off_t is 64-bit, so all sizes fit. On 64-bit systems,
> >there's no way (physically!) to have an object as large as 1UL<<63.
> 
> I suppose this is an option, but this just isolates the problem to
> 64-bit systems.

Well, on a 32-bit system (albeit not the one stock musl, as intended
to run on Linux, implements) it's possible to have an object larger
than 2^31 bytes. It's never possible to have an object larger than
2^63 bytes on any system.

> On x86_64 I would still be able to naïvely call fmemopen
> with SIZE_MAX and end up being unable to fseek. Not being able to

Ah, I missed the fact that you were just passing SIZE_MAX because you
don't know the size; I thought you were passing some large value equal
to the actual size. In general, passing SIZE_MAX for unknown sizes is
dangerous, even if objects larger than SSIZE_MAX are supported,
because the implementation of the function might work like:

    char *end = base + size;
    while (pos < end) ...

In this case, merely adding base+size invoked UB if size was larger
than the size of the object; in practice, what happens is that the
pointer wraps, so end is less than base, and then the loop never runs.

While I'm not opposed to making changes to reject or to support
objects larger than SSIZE_MAX, I am pretty much opposed to making any
attempt to accept a length of SIZE_MAX as "unknown/unlimited size".
This kind of usage, as described above, is flawed and error-prone. If
you want to replace it with something at least halfway portable,
instead of passing SIZE_MAX, pass SIZE_MAX-(size_t)base or some
variant of this concept. This will at least avoid the overflow; it's
the trick musl uses internally for implementing sprintf in terms of
snprintf. However, I would really recommend trying to find some better
approach to obtain and pass the correct size to fmemopen. There is no
contract that fmemopen is not allowed to read arbitrary parts of the
buffer passed to it, so passing an incorrect length and expecting it
to work is depending on the current implementation.

By the way, note that the issue here is not that the size argument is
greater than SSIZE_MAX. Even if you passed a shorter "huge object"
length, like SSIZE_MAX or SSIZE_MAX/2, it could still cause overflows
if the base of the object happened to begin at a high address. The
issue is merely that the size is not the correct size of the object.

> >Alternatively, I could adjust the arithmetic to just avoid working
> >with signed values, and perhaps make it more obvious what it's doing
> >in the process.
> 
> I would also be happy with this solution. The code in mseek could
> definitely be clearer. Not that I don't enjoy switch statements written
> as offsets into stack structs and reverse jumps ;)

Yes, I think this is probably the best solution, even if it makes the
function a few bytes larger. The code should be more clear.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-28  1:48                 ` Rich Felker
@ 2013-06-28  1:56                   ` Matthew Fernandez
  2013-06-29  4:13                     ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Fernandez @ 2013-06-28  1:56 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On 28/06/13 11:48, Rich Felker wrote:
> On Fri, Jun 28, 2013 at 11:34:23AM +1000, Matthew Fernandez wrote:
>>> On Fri, Jun 28, 2013 at 10:49:41AM +1000, Matthew Fernandez wrote:
>> I suppose this is an option, but this just isolates the problem to
>> 64-bit systems.
>
> Well, on a 32-bit system (albeit not the one stock musl, as intended
> to run on Linux, implements) it's possible to have an object larger
> than 2^31 bytes. It's never possible to have an object larger than
> 2^63 bytes on any system.

Yes, however as I said, nothing prevents me type casting to get around
this. Either way, this is a more or less a philosophical argument at
this point.

>> On x86_64 I would still be able to naïvely call fmemopen
>> with SIZE_MAX and end up being unable to fseek. Not being able to
>
> Ah, I missed the fact that you were just passing SIZE_MAX because you
> don't know the size; I thought you were passing some large value equal
> to the actual size. In general, passing SIZE_MAX for unknown sizes is
> dangerous, even if objects larger than SSIZE_MAX are supported,
> because the implementation of the function might work like:
>
>      char *end = base + size;
>      while (pos < end) ...
>
> In this case, merely adding base+size invoked UB if size was larger
> than the size of the object; in practice, what happens is that the
> pointer wraps, so end is less than base, and then the loop never runs.
>
> While I'm not opposed to making changes to reject or to support
> objects larger than SSIZE_MAX, I am pretty much opposed to making any
> attempt to accept a length of SIZE_MAX as "unknown/unlimited size".
> This kind of usage, as described above, is flawed and error-prone. If
> you want to replace it with something at least halfway portable,
> instead of passing SIZE_MAX, pass SIZE_MAX-(size_t)base or some
> variant of this concept. This will at least avoid the overflow; it's
> the trick musl uses internally for implementing sprintf in terms of
> snprintf. However, I would really recommend trying to find some better
> approach to obtain and pass the correct size to fmemopen. There is no
> contract that fmemopen is not allowed to read arbitrary parts of the
> buffer passed to it, so passing an incorrect length and expecting it
> to work is depending on the current implementation.
>
> By the way, note that the issue here is not that the size argument is
> greater than SSIZE_MAX. Even if you passed a shorter "huge object"
> length, like SSIZE_MAX or SSIZE_MAX/2, it could still cause overflows
> if the base of the object happened to begin at a high address. The
> issue is merely that the size is not the correct size of the object.

I completely agree with all of this.

>>> Alternatively, I could adjust the arithmetic to just avoid working
>>> with signed values, and perhaps make it more obvious what it's doing
>>> in the process.
>>
>> I would also be happy with this solution. The code in mseek could
>> definitely be clearer. Not that I don't enjoy switch statements written
>> as offsets into stack structs and reverse jumps ;)
>
> Yes, I think this is probably the best solution, even if it makes the
> function a few bytes larger. The code should be more clear.

Thanks, Rich. I appreciate you taking the time to consider this issue.
Apologies that it seems to have steamrolled into all the ways of
constructing invalid objects and possibly bored everyone else on this
list :)

________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-28  1:56                   ` Matthew Fernandez
@ 2013-06-29  4:13                     ` Rich Felker
  2013-06-29 13:38                       ` Matthew Fernandez
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-06-29  4:13 UTC (permalink / raw)
  To: musl

On Fri, Jun 28, 2013 at 11:56:15AM +1000, Matthew Fernandez wrote:
> >>>Alternatively, I could adjust the arithmetic to just avoid working
> >>>with signed values, and perhaps make it more obvious what it's doing
> >>>in the process.
> >>
> >>I would also be happy with this solution. The code in mseek could
> >>definitely be clearer. Not that I don't enjoy switch statements written
> >>as offsets into stack structs and reverse jumps ;)
> >
> >Yes, I think this is probably the best solution, even if it makes the
> >function a few bytes larger. The code should be more clear.
> 
> Thanks, Rich. I appreciate you taking the time to consider this issue.
> Apologies that it seems to have steamrolled into all the ways of
> constructing invalid objects and possibly bored everyone else on this
> list :)

Looking at the code to "fix" it now, I ran into a problem. :-)

If size_t is 64-bit, there is fundamentally no way a memory buffer (or
disk file) larger than SSIZE_MAX can be accessed, since off_t cannot
store the position in the file. I noticed this as soon as I went to
write:

	case SEEK_SET:
		if (off < 0 || off > c->size) goto fail;

I could still salvage the 32-bit case by simply leaving the code alone
except for changing base to off_t, but I'm starting to remember why I
thought it was bogus to even consider allowing object sizes greater
than the signed size max...

Not sure what the best way to proceed is.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-29  4:13                     ` Rich Felker
@ 2013-06-29 13:38                       ` Matthew Fernandez
  2013-06-29 14:17                         ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Fernandez @ 2013-06-29 13:38 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On 29/06/13 14:13, Rich Felker wrote:
> If size_t is 64-bit, there is fundamentally no way a memory buffer (or
> disk file) larger than SSIZE_MAX can be accessed, since off_t cannot
> store the position in the file. I noticed this as soon as I went to
> write:
>
>       case SEEK_SET:
>               if (off < 0 || off > c->size) goto fail;
>
> I could still salvage the 32-bit case by simply leaving the code alone
> except for changing base to off_t, but I'm starting to remember why I
> thought it was bogus to even consider allowing object sizes greater
> than the signed size max...
>
> Not sure what the best way to proceed is.

I still don't see the barrier to introducing a check for size greater
than SSIZE_MAX in fmemopen and leaving mseek as is. Am I missing something?

________________________________

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-29 13:38                       ` Matthew Fernandez
@ 2013-06-29 14:17                         ` Rich Felker
  2013-06-29 14:56                           ` Jens Gustedt
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-06-29 14:17 UTC (permalink / raw)
  To: musl

On Sat, Jun 29, 2013 at 11:38:09PM +1000, Matthew Fernandez wrote:
> On 29/06/13 14:13, Rich Felker wrote:
> >If size_t is 64-bit, there is fundamentally no way a memory buffer (or
> >disk file) larger than SSIZE_MAX can be accessed, since off_t cannot
> >store the position in the file. I noticed this as soon as I went to
> >write:
> >
> >      case SEEK_SET:
> >              if (off < 0 || off > c->size) goto fail;
> >
> >I could still salvage the 32-bit case by simply leaving the code alone
> >except for changing base to off_t, but I'm starting to remember why I
> >thought it was bogus to even consider allowing object sizes greater
> >than the signed size max...
> >
> >Not sure what the best way to proceed is.
> 
> I still don't see the barrier to introducing a check for size greater
> than SSIZE_MAX in fmemopen and leaving mseek as is. Am I missing something?

Just that this is one of a multitude of places that such a check could
be made, and I question the value of doing it in one place but not
others. Examples include snprintf, strnlen, memchr, and basically any
interface that takes a size_t representing the size of an already
existing object. I'm against adding checks to all these places since
it adds bloat and potentially hurts performance and for most of them
there's nothing they could do except crash if the check failed. So
what I'm questioning is the value of adding such a check to the one
interface you ran into trouble with, when there are plenty more widely
used functions that won't be checked; this inconsistency does not make
sense to me. I'd like to hear what others think, though.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-29 14:17                         ` Rich Felker
@ 2013-06-29 14:56                           ` Jens Gustedt
  2013-06-29 15:48                             ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Jens Gustedt @ 2013-06-29 14:56 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]

Am Samstag, den 29.06.2013, 10:17 -0400 schrieb Rich Felker:
> Just that this is one of a multitude of places that such a check could
> be made, and I question the value of doing it in one place but not
> others. Examples include snprintf, strnlen, memchr, and basically any
> interface that takes a size_t representing the size of an already
> existing object. I'm against adding checks to all these places since
> it adds bloat and potentially hurts performance and for most of them
> there's nothing they could do except crash if the check failed. So
> what I'm questioning is the value of adding such a check to the one
> interface you ran into trouble with, when there are plenty more widely
> used functions that won't be checked; this inconsistency does not make
> sense to me. I'd like to hear what others think, though.

I think C11 has indentified this sort of specification problems and
therefore introduces rsize_t and RSIZE_MAX in the not-loved-by-many
appendix K "bounds-checking intefaces". Interfaces that are specified
with this type are required to check that the value isn't too large
for any object.

If you'd want to go that road (of checking for the size) I'd suggest
that you'd define and use RSIZE_MAX for such a thing, and maybe even
change the interfaces to use rsize_t. Since this is only a typedef
such a change should still be compatible with size_t as in the current
and future standard(s), and it would clearly mark the intent of bounds
checking.

Jens


-- 
:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-29 14:56                           ` Jens Gustedt
@ 2013-06-29 15:48                             ` Rich Felker
  2013-06-29 16:01                               ` Jens Gustedt
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-06-29 15:48 UTC (permalink / raw)
  To: musl

On Sat, Jun 29, 2013 at 04:56:42PM +0200, Jens Gustedt wrote:
> Am Samstag, den 29.06.2013, 10:17 -0400 schrieb Rich Felker:
> > Just that this is one of a multitude of places that such a check could
> > be made, and I question the value of doing it in one place but not
> > others. Examples include snprintf, strnlen, memchr, and basically any
> > interface that takes a size_t representing the size of an already
> > existing object. I'm against adding checks to all these places since
> > it adds bloat and potentially hurts performance and for most of them
> > there's nothing they could do except crash if the check failed. So
> > what I'm questioning is the value of adding such a check to the one
> > interface you ran into trouble with, when there are plenty more widely
> > used functions that won't be checked; this inconsistency does not make
> > sense to me. I'd like to hear what others think, though.
> 
> I think C11 has indentified this sort of specification problems and
> therefore introduces rsize_t and RSIZE_MAX in the not-loved-by-many
> appendix K "bounds-checking intefaces". Interfaces that are specified
> with this type are required to check that the value isn't too large
> for any object.
> 
> If you'd want to go that road (of checking for the size) I'd suggest
> that you'd define and use RSIZE_MAX for such a thing, and maybe even
> change the interfaces to use rsize_t. Since this is only a typedef
> such a change should still be compatible with size_t as in the current
> and future standard(s), and it would clearly mark the intent of bounds
> checking.

The type in the interface can't be changed. It's required by POSIX
(and for many interfaces, required by C). However if rsize_t were
defined as size_t and RSIZE_MAX were SIZE_MAX/2, it would at least
partly convey the intent.

It's the intent in musl that objects larger than the max value of the
signed type corresponding to size_t simply never exist, so they don't
have to be considered.

Matthew was only able to get such a large size_t by 'guessing' - his
bootloader (he ported musl to run on non-Linux) gave him a pointer to
an object of unknown size, and he passed SIZE_MAX to fmemopen to
attempt to read it as a file. This usage (giving size larger than the
actual size of an object) is of course undefined behavior and not
supported, but it's conceivable that his boot loader could have given
him a valid size larger than SIZE_MAX/2 due to the bootloader and musl
disagreeing on whether such objects can exist.

The reason I question just adding a check here is that he could just
as easily have happened to pass this object to snprintf, strnlen,
memchr, etc., so from a standpoint of consistency, it doesn't make
sense to me to add a check in one place but not others (and some of
the others can't even report errors).

My leaning at this point is just to document well that objects larger
than SIZE_MAX/2 are assumed not to exist. I've already fixed the last
two ways (mmap and shmget) they could be created with musl, so it
seems to me the only way they can arise now is when you've done
something like what Matthew has done and ported musl to a different
underlying system, in which case this issue is one of many issues that
should be considered as part of the port.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-29 15:48                             ` Rich Felker
@ 2013-06-29 16:01                               ` Jens Gustedt
  2013-06-29 16:13                                 ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Jens Gustedt @ 2013-06-29 16:01 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2667 bytes --]

Am Samstag, den 29.06.2013, 11:48 -0400 schrieb Rich Felker:
> The type in the interface can't be changed. It's required by POSIX
> (and for many interfaces, required by C). However if rsize_t were
> defined as size_t and RSIZE_MAX were SIZE_MAX/2, it would at least
> partly convey the intent.

it is so defined by C11. In K 3.3 we have:

   The type is rsize_t which is the type size_t.

> It's the intent in musl that objects larger than the max value of the
> signed type corresponding to size_t simply never exist, so they don't
> have to be considered.
> 
> Matthew was only able to get such a large size_t by 'guessing' - his
> bootloader (he ported musl to run on non-Linux) gave him a pointer to
> an object of unknown size, and he passed SIZE_MAX to fmemopen to
> attempt to read it as a file. This usage (giving size larger than the
> actual size of an object) is of course undefined behavior and not
> supported, but it's conceivable that his boot loader could have given
> him a valid size larger than SIZE_MAX/2 due to the bootloader and musl
> disagreeing on whether such objects can exist.
> 
> The reason I question just adding a check here is that he could just
> as easily have happened to pass this object to snprintf, strnlen,
> memchr, etc., so from a standpoint of consistency, it doesn't make
> sense to me to add a check in one place but not others (and some of
> the others can't even report errors).
> 
> My leaning at this point is just to document well that objects larger
> than SIZE_MAX/2 are assumed not to exist. I've already fixed the last
> two ways (mmap and shmget) they could be created with musl, so it
> seems to me the only way they can arise now is when you've done
> something like what Matthew has done and ported musl to a different
> underlying system, in which case this issue is one of many issues that
> should be considered as part of the port.

I basically understood all of this, I think. My point was that when
thinking about ways to handle the maximum size of objects, it would be
good to use the means that the C standard provides for such a
thing. "Implementing" Annex K partially by typedeffing size_t to
rsize_t and defining RSIZE_MAX to SIZE_MAX/2 would be a good point in
that direction, and clearly document the choice.

Jens


-- 
:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-29 16:01                               ` Jens Gustedt
@ 2013-06-29 16:13                                 ` Rich Felker
  2013-06-29 16:39                                   ` Jens Gustedt
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-06-29 16:13 UTC (permalink / raw)
  To: musl

On Sat, Jun 29, 2013 at 06:01:18PM +0200, Jens Gustedt wrote:
> Am Samstag, den 29.06.2013, 11:48 -0400 schrieb Rich Felker:
> > The type in the interface can't be changed. It's required by POSIX
> > (and for many interfaces, required by C). However if rsize_t were
> > defined as size_t and RSIZE_MAX were SIZE_MAX/2, it would at least
> > partly convey the intent.
> 
> it is so defined by C11. In K 3.3 we have:
> 
>    The type is rsize_t which is the type size_t.

OK.

> I basically understood all of this, I think. My point was that when
> thinking about ways to handle the maximum size of objects, it would be
> good to use the means that the C standard provides for such a
> thing. "Implementing" Annex K partially by typedeffing size_t to
> rsize_t and defining RSIZE_MAX to SIZE_MAX/2 would be a good point in
> that direction, and clearly document the choice.

I'm not sure how useful it would be to applications. These macros and
types can't be exposed unless __STDC_WANT_LIB_EXT1__ is defined by the
application, so in practice they won't be exposed.

With that said, I'm not opposed to adding Annex K, but I think we
should look into how invasive it would be, i.e. whether most/all
interfaces can just be wrappers for the non-bounds-checking versions
or whether major internal changes would be required to some existing
interfaces.

Rich


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-29 16:13                                 ` Rich Felker
@ 2013-06-29 16:39                                   ` Jens Gustedt
  2013-07-04  1:28                                     ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Jens Gustedt @ 2013-06-29 16:39 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1457 bytes --]

Am Samstag, den 29.06.2013, 12:13 -0400 schrieb Rich Felker:
> With that said, I'm not opposed to adding Annex K, but I think we
> should look into how invasive it would be, i.e. whether most/all
> interfaces can just be wrappers for the non-bounds-checking versions
> or whether major internal changes would be required to some existing
> interfaces.

I implemented quite a lot of them for P99, so I don't think that there
would be major problems. Many of them are just some if/else clauses
that check the run time constraints.

There are some additional functionalities, though, so these would
demand extra coding and objects, especially the run time constraint
handling, but I think these are quite limited and wouldn't require
much effort.

Then some interfaces are clearly different such that they can't simply
be copied over, notably bsearch and qsort functions, since they
receive additional arguments to provide context to the object
comparison.

IIRC, what I couldn't handle within P99 was checking of printf
arguments, but from within musl this should be relatively straight
forward.

Jens

-- 
:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-06-29 16:39                                   ` Jens Gustedt
@ 2013-07-04  1:28                                     ` Rich Felker
  2013-07-04  6:11                                       ` Jens Gustedt
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-07-04  1:28 UTC (permalink / raw)
  To: musl

On Sat, Jun 29, 2013 at 06:39:27PM +0200, Jens Gustedt wrote:
> Am Samstag, den 29.06.2013, 12:13 -0400 schrieb Rich Felker:
> > With that said, I'm not opposed to adding Annex K, but I think we
> > should look into how invasive it would be, i.e. whether most/all
> > interfaces can just be wrappers for the non-bounds-checking versions
> > or whether major internal changes would be required to some existing
> > interfaces.
> 
> I implemented quite a lot of them for P99, so I don't think that there
> would be major problems. Many of them are just some if/else clauses
> that check the run time constraints.
> 
> There are some additional functionalities, though, so these would
> demand extra coding and objects, especially the run time constraint
> handling, but I think these are quite limited and wouldn't require
> much effort.

The requirements for printf_s, scanf_s, and related functions look
quite invasive and would affect programs not using these interfaces.
Otherwise, the Annex K interfaces look like a considerable amount of
bloat with highly questionable usefulness, but mostly non-invasive. My
feeling is that we should hold off on a decision about them to see if
any applications actually start using them.

Personally, I'd much rather see a libc-agnostic implementation of
_FORTIFY_SOURCE as a set of include files installed in their own
special directory which use #include_next to get the libc versions,
then #undef all the functions and #define them to "fortify" versions,
using purely GCC features rather than any hooks into libc. This would
actually aid in security for real-world applications.

> Then some interfaces are clearly different such that they can't simply
> be copied over, notably bsearch and qsort functions, since they
> receive additional arguments to provide context to the object
> comparison.

These are much easier; the extra argument can be passed via TLS. It's
printf_s and scanf_s that are hard.

> IIRC, what I couldn't handle within P99 was checking of printf
> arguments, but from within musl this should be relatively straight
> forward.

Not really. There would need to be a way to convey to the printf core
that it's supposed to do this extra checking, and a way to make it
call the constraint handlers.

Rich

P.S. One other reason I hate Annex K is that the constraint handler
design is non-thread-safe and non-library-safe. There's only one
global constraint handler, shared by all threads and by all
libraries/modules that might be using Annex K functions. That means
there's really no valid way to write code that depends on a particular
constraint handler being installed. And the default handler is
implementation-defined, so it wouldn't even be reasonable to say
"leave the default handler there". The only thing reasonable code
using these interfaces can expect when a constraint is violated is
implementation-defined behavior, which is only a tiny step up from
undefined behavior...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-07-04  1:28                                     ` Rich Felker
@ 2013-07-04  6:11                                       ` Jens Gustedt
  2013-07-04  6:37                                         ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Jens Gustedt @ 2013-07-04  6:11 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 4093 bytes --]

Hello Rich,

Am Mittwoch, den 03.07.2013, 21:28 -0400 schrieb Rich Felker:
> The requirements for printf_s, scanf_s, and related functions look
> quite invasive and would affect programs not using these interfaces.

unless one would finally implement them separately, of course

> Otherwise, the Annex K interfaces look like a considerable amount of
> bloat with highly questionable usefulness, but mostly non-invasive. My
> feeling is that we should hold off on a decision about them to see if
> any applications actually start using them.

If just some if conditionals are bloat for you, yes.
These conditionals could easily be tagged as likely/unlikely to
privilege the fastpath.

> > Then some interfaces are clearly different such that they can't simply
> > be copied over, notably bsearch and qsort functions, since they
> > receive additional arguments to provide context to the object
> > comparison.
> 
> These are much easier; the extra argument can be passed via TLS. It's
> printf_s and scanf_s that are hard.

Hm, I don't see how this can be done "easily", and in particular such
that there is no performance loss for qsort. I think for these
functions performance is important in any type of platform.

I once implemented such context propagation for POSIX' tsearch family
and it was a real pain, IRC.

> > IIRC, what I couldn't handle within P99 was checking of printf
> > arguments, but from within musl this should be relatively straight
> > forward.
> 
> Not really. There would need to be a way to convey to the printf core
> that it's supposed to do this extra checking, and a way to make it
> call the constraint handlers.

This you could e.g easily to with TLS :) I'd think that for printf and
friends this would be much less critical than for the sort
functions. To my understanding printf functions are IO bound (or
memory bound for sprintf), so just some switching on entry on some TLS
wouldn't be much of an overhead, I think.

> P.S. One other reason I hate Annex K is that the constraint handler
> design is non-thread-safe and non-library-safe.

that is certainly a good point

> There's only one
> global constraint handler, shared by all threads and by all
> libraries/modules that might be using Annex K functions. That means
> there's really no valid way to write code that depends on a particular
> constraint handler being installed.

This is just meant to be like this. These interfaces are meant to give
means to abort more or less gracefully if constraints as they are
described in that Annex occur. They are not meant to have complicated
games that let you "repair" faulty environments and continue
execution.

> And the default handler is
> implementation-defined, so it wouldn't even be reasonable to say
> "leave the default handler there". The only thing reasonable code
> using these interfaces can expect when a constraint is violated is
> implementation-defined behavior, which is only a tiny step up from
> undefined behavior...

You are too much a library implementor :) I think it is easy for an
application to install a different constraint handler (a standard one
or of its own) during startup in its main, before creating any other
thread. I see that as the principal use pattern for this, just straight
and simple.

In particular no library should expect any particular constraint
handler to be in place. It is up to the application to determine what
is to be done if a constraint occurs.

> My feeling is that we should hold off on a decision about them to
> see if any applications actually start using them.

I think we have a hen and egg problem, here. Nobody will use them if
nobody provides an implementation.

Jens

-- 
:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-07-04  6:11                                       ` Jens Gustedt
@ 2013-07-04  6:37                                         ` Rich Felker
  2013-07-04  7:11                                           ` Jens Gustedt
  0 siblings, 1 reply; 32+ messages in thread
From: Rich Felker @ 2013-07-04  6:37 UTC (permalink / raw)
  To: musl

On Thu, Jul 04, 2013 at 08:11:58AM +0200, Jens Gustedt wrote:
> Hello Rich,
> 
> Am Mittwoch, den 03.07.2013, 21:28 -0400 schrieb Rich Felker:
> > The requirements for printf_s, scanf_s, and related functions look
> > quite invasive and would affect programs not using these interfaces.
> 
> unless one would finally implement them separately, of course

Yes but that's a huge maintenance burden (duplicate functionality) and
while it's less bloat for individual static apps that don't use Annex
K, it's much more bloat for libc.a and libc.so.

> > Otherwise, the Annex K interfaces look like a considerable amount of
> > bloat with highly questionable usefulness, but mostly non-invasive. My
> > feeling is that we should hold off on a decision about them to see if
> > any applications actually start using them.
> 
> If just some if conditionals are bloat for you, yes.
> These conditionals could easily be tagged as likely/unlikely to
> privilege the fastpath.

No, I mean just the sheer volume of interfaces to add.

> > > Then some interfaces are clearly different such that they can't simply
> > > be copied over, notably bsearch and qsort functions, since they
> > > receive additional arguments to provide context to the object
> > > comparison.
> > 
> > These are much easier; the extra argument can be passed via TLS. It's
> > printf_s and scanf_s that are hard.
> 
> Hm, I don't see how this can be done "easily", and in particular such
> that there is no performance loss for qsort. I think for these
> functions performance is important in any type of platform.

qsort_s can store the comparison function and context in TLS, and then
pass to qsort a comparison function that grabs these from TLS and
calls the original comparison function with the context pointer. This
is valid assuming qsort does not run the comparisons in new threads.

> > > IIRC, what I couldn't handle within P99 was checking of printf
> > > arguments, but from within musl this should be relatively straight
> > > forward.
> > 
> > Not really. There would need to be a way to convey to the printf core
> > that it's supposed to do this extra checking, and a way to make it
> > call the constraint handlers.
> 
> This you could e.g easily to with TLS :) I'd think that for printf and
> friends this would be much less critical than for the sort
> functions. To my understanding printf functions are IO bound (or
> memory bound for sprintf), so just some switching on entry on some TLS
> wouldn't be much of an overhead, I think.

TLS is not guaranteed to exist when these functions are called;
programs not using any multi-threaded functionality are supposed to
"basically work" on Linux 2.4. I don't mind having the Annex K
functions depend on TLS, since only new programs will use them anyway,
but I don't want to break existing programs.

For fprintf_s and and fscanf_s, it would be possible to instead pass
the special mode info in the FILE structure. However this requires
re-implementing snprintf_s and sscanf_s on top of fprintf_s and
fscanf_s (i.e. duplicating the fake FILE setup), rather than just
implementing them on top of snprintf and sscanf. (v's omitted for
clarity, but obviously we're really talking about the v versions)

> > P.S. One other reason I hate Annex K is that the constraint handler
> > design is non-thread-safe and non-library-safe.
> 
> that is certainly a good point
> 
> > There's only one
> > global constraint handler, shared by all threads and by all
> > libraries/modules that might be using Annex K functions. That means
> > there's really no valid way to write code that depends on a particular
> > constraint handler being installed.
> 
> This is just meant to be like this. These interfaces are meant to give
> means to abort more or less gracefully if constraints as they are
> described in that Annex occur. They are not meant to have complicated
> games that let you "repair" faulty environments and continue
> execution.

What I was saying is that, in library code, you can't rely on this.
The application may have installed a handler that causes the functions
to just return an error, or the default implementation-defined handler
might do so.

> > And the default handler is
> > implementation-defined, so it wouldn't even be reasonable to say
> > "leave the default handler there". The only thing reasonable code
> > using these interfaces can expect when a constraint is violated is
> > implementation-defined behavior, which is only a tiny step up from
> > undefined behavior...
> 
> You are too much a library implementor :) I think it is easy for an
> application to install a different constraint handler (a standard one
> or of its own) during startup in its main, before creating any other
> thread. I see that as the principal use pattern for this, just straight
> and simple.
> 
> In particular no library should expect any particular constraint
> handler to be in place. It is up to the application to determine what
> is to be done if a constraint occurs.

Yes, I agree with your analysis here.

> > My feeling is that we should hold off on a decision about them to
> > see if any applications actually start using them.
> 
> I think we have a hen and egg problem, here. Nobody will use them if
> nobody provides an implementation.

You presume we would want people to use them. :) I don't. I think
they're very poorly designed interfaces that were crammed into the
standards process by their sponsor's clout rather than any technical
merit of existing practice. _FORTIFY_SOURCE solves pretty much the
same problems these functions were intended to solve, but does a much
better job since it doesn't rely on the application developer to
provide truthful information about object sizes, and instead gets the
compiler to do it.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-07-04  6:37                                         ` Rich Felker
@ 2013-07-04  7:11                                           ` Jens Gustedt
  2013-07-04  8:12                                             ` Rich Felker
  0 siblings, 1 reply; 32+ messages in thread
From: Jens Gustedt @ 2013-07-04  7:11 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3845 bytes --]

Am Donnerstag, den 04.07.2013, 02:37 -0400 schrieb Rich Felker:
> > unless one would finally implement them separately, of course
> 
> Yes but that's a huge maintenance burden (duplicate functionality) and
> while it's less bloat for individual static apps that don't use Annex
> K, it's much more bloat for libc.a and libc.so.

sure, I mentioned this merely for completeness, I don't think that
this is a viable option

> No, I mean just the sheer volume of interfaces to add.

The number of interfaces wouldn't scare me too much, and it would be
relatively easy to start supporting that incrementally.

> qsort_s can store the comparison function and context in TLS, and then
> pass to qsort a comparison function that grabs these from TLS and
> calls the original comparison function with the context pointer. This
> is valid assuming qsort does not run the comparisons in new threads.

sure, but for an execution of qsort_s this would have a lot of
indirections and a call to TLS for every comparison. For performance
sensible functions like this, this doesn't sound very attractive.

(In P99 I do that with inlining and gcc shows to be able to expand all
comparisons in place and to optimize that smoothly.)

> TLS is not guaranteed to exist when these functions are called;
> programs not using any multi-threaded functionality are supposed to
> "basically work" on Linux 2.4. I don't mind having the Annex K
> functions depend on TLS, since only new programs will use them anyway,
> but I don't want to break existing programs.

My guess would be that you can alias the TLS variable that would
control that behavior to a simple global variable in the case of
absence of threads.

> What I was saying is that, in library code, you can't rely on this.
> The application may have installed a handler that causes the functions
> to just return an error, or the default implementation-defined handler
> might do so.

sure, but I don't see any problem in this. continuing execution is
one of the permitted path that a constraint handler may take. these
are user interfaces, not meant to be used internally by the C library,
I think.

> > > My feeling is that we should hold off on a decision about them to
> > > see if any applications actually start using them.
> > 
> > I think we have a hen and egg problem, here. Nobody will use them if
> > nobody provides an implementation.
> 
> You presume we would want people to use them. :) I don't. I think
> they're very poorly designed interfaces that were crammed into the
> standards process by their sponsor's clout rather than any technical
> merit of existing practice.

AFAIK they had some existing practice in the MS world

I think there are some of these interfaces that are not too bad, from
a user perspective these interfaces are relatively simple to use.

Especially qsort_s is nice and I also see advantages in being able to
inhibit certain dangerous printf or scanf formats.

> _FORTIFY_SOURCE solves pretty much the same problems these functions
> were intended to solve, but does a much better job since it doesn't
> rely on the application developer to provide truthful information
> about object sizes, and instead gets the compiler to do it.

In C, applications *have* to have a consistent view of the size of
their buffers. So I don't see much of a burden, here. I started to use
them (through my P99 implementation) and I didn't find anything
remarkably heavy.

Jens

-- 
:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-07-04  7:11                                           ` Jens Gustedt
@ 2013-07-04  8:12                                             ` Rich Felker
  2013-07-04  8:45                                               ` Jens Gustedt
  2013-07-04 11:10                                               ` Szabolcs Nagy
  0 siblings, 2 replies; 32+ messages in thread
From: Rich Felker @ 2013-07-04  8:12 UTC (permalink / raw)
  To: musl

On Thu, Jul 04, 2013 at 09:11:29AM +0200, Jens Gustedt wrote:
> > qsort_s can store the comparison function and context in TLS, and then
> > pass to qsort a comparison function that grabs these from TLS and
> > calls the original comparison function with the context pointer. This
> > is valid assuming qsort does not run the comparisons in new threads.
> 
> sure, but for an execution of qsort_s this would have a lot of
> indirections and a call to TLS for every comparison. For performance
> sensible functions like this, this doesn't sound very attractive.

If it's inside musl, the TLS dereference is very cheap on most archs:
it's just a constant offset from the thread pointer. Same if the code
were static linked in the main program. Otherwise, if it's a dynamic
library, then yes it would be fairly costly, like you say.

> (In P99 I do that with inlining and gcc shows to be able to expand all
> comparisons in place and to optimize that smoothly.)

Nice. I'll have to take a look -- I've always wanted to see a fully
inlined qsort that could be compared to the C++ template-based sorts
to demonstrate that inline functions in C can do just as good or
better, inlining the comparison callback... :)

> > TLS is not guaranteed to exist when these functions are called;
> > programs not using any multi-threaded functionality are supposed to
> > "basically work" on Linux 2.4. I don't mind having the Annex K
> > functions depend on TLS, since only new programs will use them anyway,
> > but I don't want to break existing programs.
> 
> My guess would be that you can alias the TLS variable that would
> control that behavior to a simple global variable in the case of
> absence of threads.

Yes, that can be done, but I'd probably rather just use the FILE...

> > What I was saying is that, in library code, you can't rely on this.
> > The application may have installed a handler that causes the functions
> > to just return an error, or the default implementation-defined handler
> > might do so.
> 
> sure, but I don't see any problem in this. continuing execution is
> one of the permitted path that a constraint handler may take. these
> are user interfaces, not meant to be used internally by the C library,
> I think.

I was thinking of third-party libraries that aim to be proper library
code, not use in the standard library.

> I think there are some of these interfaces that are not too bad, from
> a user perspective these interfaces are relatively simple to use.

I find the str/mem functions rather confusing, with their redundant
size arguments and all.

> Especially qsort_s is nice

I agree. IMO it's a shame it was done as part of Annex K and not the
base standard. Unlike most of Annex K, it serves a real purpose.

> and I also see advantages in being able to
> inhibit certain dangerous printf or scanf formats.

For printf, there's nothing dangerous about %n. This is a
misconception based on knee-jerk reactions to format string bugs. The
only thing that's dangerous is passing non-format-strings as the
format-string argument to printf.

For scanf, having size limits on strings to be read is useful. I was
under the mistaken impression that exceeding the limit was a runtime
constraint violation, which would have made scanf_s useless, but it's
specified to be a matching failure. Still, the same can be achieved
with plain scanf and a field width specifier. And if you need the
width to vary at runtime, you can generate the format string with
snprintf... So scanf_s buys you a little bit of convenience, but not
much more.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-07-04  8:12                                             ` Rich Felker
@ 2013-07-04  8:45                                               ` Jens Gustedt
  2013-07-04 15:24                                                 ` Rich Felker
  2013-07-04 11:10                                               ` Szabolcs Nagy
  1 sibling, 1 reply; 32+ messages in thread
From: Jens Gustedt @ 2013-07-04  8:45 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 4505 bytes --]

Am Donnerstag, den 04.07.2013, 04:12 -0400 schrieb Rich Felker:
> > (In P99 I do that with inlining and gcc shows to be able to expand all
> > comparisons in place and to optimize that smoothly.)
> 
> Nice. I'll have to take a look -- I've always wanted to see a fully
> inlined qsort that could be compared to the C++ template-based sorts
> to demonstrate that inline functions in C can do just as good or
> better, inlining the comparison callback... :)

just to continue on my shameless self advertisement, I have written
this up on my blog

http://gustedt.wordpress.com/2012/12/04/inline-functions-as-good-as-templates/

> > > TLS is not guaranteed to exist when these functions are called;
> > > programs not using any multi-threaded functionality are supposed to
> > > "basically work" on Linux 2.4. I don't mind having the Annex K
> > > functions depend on TLS, since only new programs will use them anyway,
> > > but I don't want to break existing programs.
> > 
> > My guess would be that you can alias the TLS variable that would
> > control that behavior to a simple global variable in the case of
> > absence of threads.
> 
> Yes, that can be done, but I'd probably rather just use the FILE...

well there are certainly different strategies possible, and only a
close investigation of the existing code and the consequence of adding
(or not) new fields or global variables, would show the best way to do
this

the global variable approach would be independent of FILE and just
work for sprintf and friends without headache.

> > > What I was saying is that, in library code, you can't rely on this.
> > > The application may have installed a handler that causes the functions
> > > to just return an error, or the default implementation-defined handler
> > > might do so.
> > 
> > sure, but I don't see any problem in this. continuing execution is
> > one of the permitted path that a constraint handler may take. these
> > are user interfaces, not meant to be used internally by the C library,
> > I think.
> 
> I was thinking of third-party libraries that aim to be proper library
> code, not use in the standard library.

ok, but they also should not expect any particular behavior from the
handler(s), just use the interfaces, then.

> > I think there are some of these interfaces that are not too bad, from
> > a user perspective these interfaces are relatively simple to use.
> 
> I find the str/mem functions rather confusing, with their redundant
> size arguments and all.
> 
> > Especially qsort_s is nice
> 
> I agree. IMO it's a shame it was done as part of Annex K and not the
> base standard. Unlike most of Annex K, it serves a real purpose.

so one possibility, for a start would be to provide the interesting
interfaces as extensions. the access to these could probably be
regulated by the __STDC_WANT_LIB_EXT1__ macro. I think nothing forbids
to provide EXT1 partially, even if __STDC_LIB_EXT1__ is not set.

footnote 382 seem to point out that if  __STDC_WANT_LIB_EXT1__ is set
to something different from 0, we'd have the right to use the symbols
that are defined in K.

> > and I also see advantages in being able to
> > inhibit certain dangerous printf or scanf formats.
> 
> For printf, there's nothing dangerous about %n. This is a
> misconception based on knee-jerk reactions to format string bugs. The
> only thing that's dangerous is passing non-format-strings as the
> format-string argument to printf.

yes, probably, and the fact that there are still compilers arround
that don't check for consistency of the printf arguments,

> For scanf, having size limits on strings to be read is useful. I was
> under the mistaken impression that exceeding the limit was a runtime
> constraint violation, which would have made scanf_s useless, but it's
> specified to be a matching failure. Still, the same can be achieved
> with plain scanf and a field width specifier. And if you need the
> width to vary at runtime, you can generate the format string with
> snprintf... So scanf_s buys you a little bit of convenience, but not
> much more.


Jens


-- 
:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-07-04  8:12                                             ` Rich Felker
  2013-07-04  8:45                                               ` Jens Gustedt
@ 2013-07-04 11:10                                               ` Szabolcs Nagy
  2013-07-04 11:58                                                 ` Jens Gustedt
  2013-07-04 15:26                                                 ` Rich Felker
  1 sibling, 2 replies; 32+ messages in thread
From: Szabolcs Nagy @ 2013-07-04 11:10 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@aerifal.cx> [2013-07-04 04:12:45 -0400]:
> On Thu, Jul 04, 2013 at 09:11:29AM +0200, Jens Gustedt wrote:
> > > qsort_s can store the comparison function and context in TLS, and then
> > > pass to qsort a comparison function that grabs these from TLS and
> > > calls the original comparison function with the context pointer. This
> > > is valid assuming qsort does not run the comparisons in new threads.
> > 
> > sure, but for an execution of qsort_s this would have a lot of
> > indirections and a call to TLS for every comparison. For performance
> > sensible functions like this, this doesn't sound very attractive.
> 
> If it's inside musl, the TLS dereference is very cheap on most archs:
> it's just a constant offset from the thread pointer. Same if the code
> were static linked in the main program. Otherwise, if it's a dynamic
> library, then yes it would be fairly costly, like you say.

it seems to me that if a qsort_s call sets the tls and then before
the callee reads that pointer a signal interrupts with a handler that
calls qsort_s again then the tls is overwritten by another pointer

so you lose signal-safety with the tls design

> > (In P99 I do that with inlining and gcc shows to be able to expand all
> > comparisons in place and to optimize that smoothly.)
> 
> Nice. I'll have to take a look -- I've always wanted to see a fully
> inlined qsort that could be compared to the C++ template-based sorts
> to demonstrate that inline functions in C can do just as good or
> better, inlining the comparison callback... :)

* Jens Gustedt <jens.gustedt@inria.fr> [2013-07-04 10:45:47 +0200]:
> http://gustedt.wordpress.com/2012/12/04/inline-functions-as-good-as-templates/

good to know that this works now..

it's not clear from the article how the compiler knew that the
last arg for qsort_s is supposed to be passed to the comparision
function: was it lto+static linking or was the internal of
qsort_s visible in the same translation unit?

a few years ago i did similar experiments but those failed back then:
https://groups.google.com/d/msg/comp.lang.c/sO2oYdZGdd8/bqFZ4i81P7AJ

(P.S. google seems to broke its usenet archives, if anyone
knows a good one that works without javascript enabled i'm
interested, meanwhile you can just replace /d/ with /forum/print/
in the url and get a reasonable rendering without js)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-07-04 11:10                                               ` Szabolcs Nagy
@ 2013-07-04 11:58                                                 ` Jens Gustedt
  2013-07-04 15:26                                                 ` Rich Felker
  1 sibling, 0 replies; 32+ messages in thread
From: Jens Gustedt @ 2013-07-04 11:58 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]

Am Donnerstag, den 04.07.2013, 13:10 +0200 schrieb Szabolcs Nagy:
> it seems to me that if a qsort_s call sets the tls and then before
> the callee reads that pointer a signal interrupts with a handler that
> calls qsort_s again then the tls is overwritten by another pointer
> 
> so you lose signal-safety with the tls design

this would in fact not be good

in any case I don't think that the tls design is the way to go for
qsort_s, the overhead would be too large

> > > (In P99 I do that with inlining and gcc shows to be able to expand all
> > > comparisons in place and to optimize that smoothly.)
> > 
> > Nice. I'll have to take a look -- I've always wanted to see a fully
> > inlined qsort that could be compared to the C++ template-based sorts
> > to demonstrate that inline functions in C can do just as good or
> > better, inlining the comparison callback... :)
> 
> * Jens Gustedt <jens.gustedt@inria.fr> [2013-07-04 10:45:47 +0200]:
> > http://gustedt.wordpress.com/2012/12/04/inline-functions-as-good-as-templates/
> 
> good to know that this works now..
> 
> it's not clear from the article how the compiler knew that the
> last arg for qsort_s is supposed to be passed to the comparision
> function: was it lto+static linking or was the internal of
> qsort_s visible in the same translation unit?

it is visible. P99 is all inline, there isn't even a precompiled
libp99 to link to.

> a few years ago i did similar experiments but those failed back then:
> https://groups.google.com/d/msg/comp.lang.c/sO2oYdZGdd8/bqFZ4i81P7AJ

I see. So yes, the situation has substantially improved since
then. gcc and clang are both capable of doing such optimizations
nowadays, if they are presented with clean code.

Jens

-- 
:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-07-04  8:45                                               ` Jens Gustedt
@ 2013-07-04 15:24                                                 ` Rich Felker
  0 siblings, 0 replies; 32+ messages in thread
From: Rich Felker @ 2013-07-04 15:24 UTC (permalink / raw)
  To: musl

On Thu, Jul 04, 2013 at 10:45:47AM +0200, Jens Gustedt wrote:
> Am Donnerstag, den 04.07.2013, 04:12 -0400 schrieb Rich Felker:
> > > (In P99 I do that with inlining and gcc shows to be able to expand all
> > > comparisons in place and to optimize that smoothly.)
> > 
> > Nice. I'll have to take a look -- I've always wanted to see a fully
> > inlined qsort that could be compared to the C++ template-based sorts
> > to demonstrate that inline functions in C can do just as good or
> > better, inlining the comparison callback... :)
> 
> just to continue on my shameless self advertisement, I have written
> this up on my blog
> 
> http://gustedt.wordpress.com/2012/12/04/inline-functions-as-good-as-templates/

No need to apologize. :) Thanks for the link.

> > > > TLS is not guaranteed to exist when these functions are called;
> > > > programs not using any multi-threaded functionality are supposed to
> > > > "basically work" on Linux 2.4. I don't mind having the Annex K
> > > > functions depend on TLS, since only new programs will use them anyway,
> > > > but I don't want to break existing programs.
> > > 
> > > My guess would be that you can alias the TLS variable that would
> > > control that behavior to a simple global variable in the case of
> > > absence of threads.
> > 
> > Yes, that can be done, but I'd probably rather just use the FILE...
> 
> well there are certainly different strategies possible, and only a
> close investigation of the existing code and the consequence of adding
> (or not) new fields or global variables, would show the best way to do
> this
> 
> the global variable approach would be independent of FILE and just
> work for sprintf and friends without headache.

It would also possibly break the async-signal-safety of our printf,
unless handled very carefully. Generally the style/policy in musl is
to avoid any use of global variables or other global state that's not
absolutely required to implement something, and that's what's led us
to a library where almost everything is AS-safe.

> > > Especially qsort_s is nice
> > 
> > I agree. IMO it's a shame it was done as part of Annex K and not the
> > base standard. Unlike most of Annex K, it serves a real purpose.
> 
> so one possibility, for a start would be to provide the interesting
> interfaces as extensions. the access to these could probably be
> regulated by the __STDC_WANT_LIB_EXT1__ macro. I think nothing forbids
> to provide EXT1 partially, even if __STDC_LIB_EXT1__ is not set.

I think they violate the namespace if it's not set, but it's not
entirely clear to me whether that's the intent of the standard or not.
If they didn't, the whole idea of having to use a feature test macro
to request them does not make sense. The implementation would just
unconditionally provide them and define a macro reflecting whether
they're available.

It looks like the glibc plan is to make them available under
_GNU_SOURCE or __STDC_WANT_LIB_EXT1__, but not by default. So unless
major BSDs make them available by default, I'd have a hard time
justifying a decision to do so. It seems it would just result in
gratuitous portability problems (programs working on musl but failing
when built on other systems).

> footnote 382 seem to point out that if  __STDC_WANT_LIB_EXT1__ is set
> to something different from 0, we'd have the right to use the symbols
> that are defined in K.

Conforming applications will never set it to anything other than 1, so
I'm not sure how this would end up getting picked up.

On the other hand, as long as we don't define __STDC_LIB_EXT1__, there
is no guarantee to applications that __STDC_WANT_LIB_EXT1__ works or
does anything at all. So it seems like it me be conforming to
provide some, but not all, of these interfaces, or interfaces with the
same names but weaker functionality, when __STDC_WANT_LIB_EXT1__ is
defined, as long as we don't define __STDC_LIB_EXT1__. By "weaker
functionality", what I had in mind is simply crashing (a_crash) on
runtime constraint violations, not doing anything to block %n, etc.
But I'm skeptical of whether this would be a good idea.

> > For printf, there's nothing dangerous about %n. This is a
> > misconception based on knee-jerk reactions to format string bugs. The
> > only thing that's dangerous is passing non-format-strings as the
> > format-string argument to printf.
> 
> yes, probably, and the fact that there are still compilers arround
> that don't check for consistency of the printf arguments,

Well I'm assuming you've done minimal coverage testing. Naturally
there could be dangerous bugs in calls where the format string
statically fails to match the arguments, but this would be detected
during any sanity testing of the code in question.

Rich

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Use of size_t and ssize_t in mseek
  2013-07-04 11:10                                               ` Szabolcs Nagy
  2013-07-04 11:58                                                 ` Jens Gustedt
@ 2013-07-04 15:26                                                 ` Rich Felker
  1 sibling, 0 replies; 32+ messages in thread
From: Rich Felker @ 2013-07-04 15:26 UTC (permalink / raw)
  To: musl

On Thu, Jul 04, 2013 at 01:10:54PM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@aerifal.cx> [2013-07-04 04:12:45 -0400]:
> > On Thu, Jul 04, 2013 at 09:11:29AM +0200, Jens Gustedt wrote:
> > > > qsort_s can store the comparison function and context in TLS, and then
> > > > pass to qsort a comparison function that grabs these from TLS and
> > > > calls the original comparison function with the context pointer. This
> > > > is valid assuming qsort does not run the comparisons in new threads.
> > > 
> > > sure, but for an execution of qsort_s this would have a lot of
> > > indirections and a call to TLS for every comparison. For performance
> > > sensible functions like this, this doesn't sound very attractive.
> > 
> > If it's inside musl, the TLS dereference is very cheap on most archs:
> > it's just a constant offset from the thread pointer. Same if the code
> > were static linked in the main program. Otherwise, if it's a dynamic
> > library, then yes it would be fairly costly, like you say.
> 
> it seems to me that if a qsort_s call sets the tls and then before
> the callee reads that pointer a signal interrupts with a handler that
> calls qsort_s again then the tls is overwritten by another pointer
> 
> so you lose signal-safety with the tls design

As long as qsort_s saves the old value from TLS on its stack and
restores it before returning, you don't lose AS-safety except in the
case of a longjmp out of qsort_s. But I agree this is a negative
aspect of the possible design.

Rich


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2013-07-04 15:26 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-27  3:52 Use of size_t and ssize_t in mseek Matthew Fernandez
2013-06-27  4:10 ` Rich Felker
2013-06-27  4:16   ` Matthew Fernandez
2013-06-27  4:23     ` Rich Felker
2013-06-27  4:31       ` Matthew Fernandez
2013-06-27 15:34         ` Rich Felker
2013-06-28  0:49           ` Matthew Fernandez
2013-06-28  1:22             ` Rich Felker
2013-06-28  1:34               ` Matthew Fernandez
2013-06-28  1:48                 ` Rich Felker
2013-06-28  1:56                   ` Matthew Fernandez
2013-06-29  4:13                     ` Rich Felker
2013-06-29 13:38                       ` Matthew Fernandez
2013-06-29 14:17                         ` Rich Felker
2013-06-29 14:56                           ` Jens Gustedt
2013-06-29 15:48                             ` Rich Felker
2013-06-29 16:01                               ` Jens Gustedt
2013-06-29 16:13                                 ` Rich Felker
2013-06-29 16:39                                   ` Jens Gustedt
2013-07-04  1:28                                     ` Rich Felker
2013-07-04  6:11                                       ` Jens Gustedt
2013-07-04  6:37                                         ` Rich Felker
2013-07-04  7:11                                           ` Jens Gustedt
2013-07-04  8:12                                             ` Rich Felker
2013-07-04  8:45                                               ` Jens Gustedt
2013-07-04 15:24                                                 ` Rich Felker
2013-07-04 11:10                                               ` Szabolcs Nagy
2013-07-04 11:58                                                 ` Jens Gustedt
2013-07-04 15:26                                                 ` Rich Felker
2013-06-27 10:35       ` Szabolcs Nagy
2013-06-27 15:05         ` Rich Felker
2013-06-27 16:47       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).