mailing list of musl libc
 help / color / mirror / code / Atom feed
* Bug in gets function?
@ 2019-02-12  2:55 Keyhan Vakil
  2019-02-12  3:48 ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: Keyhan Vakil @ 2019-02-12  2:55 UTC (permalink / raw)
  To: musl

Hi. It seems that the gets function does not follow the C99 spec. In
particular, if the input contains a null byte in the middle of the
input, then the new-line character is not discarded.

For reference, here's the relevant part in the C99 standard
(7.19.7.7):

> The gets function reads characters from the input stream pointed to
> by stdin, into the array pointed to by s, until end-of-file is
> encountered or a new-line character is read. Any new-line character
> is discarded, and a null character is written immediately after the
> last character read into the array.

Here is an example:

    #include <stdio.h>
    char s[8];
    int main() {
        gets(s);
        for (int i = 0; i < sizeof s; i++) {
            printf("%02x ", s[i]);
        }
        printf("\n");
        return 0;
    }

When compiled against gcc:

    $ echo -e 'A\x00B' | ./a.out
    41 00 42 00 00 00 00 00

When compiled against musl:

    $ echo -e 'A\x00B' | ./a.out
    41 00 42 0a 00 00 00 00

Note the terminating newline, which contradicts the spec.

Thanks,
Keyhan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug in gets function?
  2019-02-12  2:55 Bug in gets function? Keyhan Vakil
@ 2019-02-12  3:48 ` Rich Felker
  2019-02-12  3:51   ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2019-02-12  3:48 UTC (permalink / raw)
  To: musl

On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote:
> Hi. It seems that the gets function does not follow the C99 spec. In
> particular, if the input contains a null byte in the middle of the
> input, then the new-line character is not discarded.
> 
> For reference, here's the relevant part in the C99 standard
> (7.19.7.7):
> 
> > The gets function reads characters from the input stream pointed to
> > by stdin, into the array pointed to by s, until end-of-file is
> > encountered or a new-line character is read. Any new-line character
> > is discarded, and a null character is written immediately after the
> > last character read into the array.
> 
> Here is an example:
> 
>     #include <stdio.h>
>     char s[8];
>     int main() {
>         gets(s);
>         for (int i = 0; i < sizeof s; i++) {
>             printf("%02x ", s[i]);
>         }
>         printf("\n");
>         return 0;
>     }
> 
> When compiled against gcc:
> 
>     $ echo -e 'A\x00B' | ./a.out
>     41 00 42 00 00 00 00 00
> 
> When compiled against musl:
> 
>     $ echo -e 'A\x00B' | ./a.out
>     41 00 42 0a 00 00 00 00
> 
> Note the terminating newline, which contradicts the spec.

I think this bug report is correct; however the gets function is
awful, removed in C11, and should never be used. :-)

I will see what can be done to fix it though.

Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug in gets function?
  2019-02-12  3:48 ` Rich Felker
@ 2019-02-12  3:51   ` Rich Felker
  2019-02-12 14:41     ` James Larrowe
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2019-02-12  3:51 UTC (permalink / raw)
  To: musl

On Mon, Feb 11, 2019 at 10:48:38PM -0500, Rich Felker wrote:
> On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote:
> > Hi. It seems that the gets function does not follow the C99 spec. In
> > particular, if the input contains a null byte in the middle of the
> > input, then the new-line character is not discarded.
> > 
> > For reference, here's the relevant part in the C99 standard
> > (7.19.7.7):
> > 
> > > The gets function reads characters from the input stream pointed to
> > > by stdin, into the array pointed to by s, until end-of-file is
> > > encountered or a new-line character is read. Any new-line character
> > > is discarded, and a null character is written immediately after the
> > > last character read into the array.
> > 
> > Here is an example:
> > 
> >     #include <stdio.h>
> >     char s[8];
> >     int main() {
> >         gets(s);
> >         for (int i = 0; i < sizeof s; i++) {
> >             printf("%02x ", s[i]);
> >         }
> >         printf("\n");
> >         return 0;
> >     }
> > 
> > When compiled against gcc:
> > 
> >     $ echo -e 'A\x00B' | ./a.out
> >     41 00 42 00 00 00 00 00
> > 
> > When compiled against musl:
> > 
> >     $ echo -e 'A\x00B' | ./a.out
> >     41 00 42 0a 00 00 00 00
> > 
> > Note the terminating newline, which contradicts the spec.
> 
> I think this bug report is correct; however the gets function is
> awful, removed in C11, and should never be used. :-)
> 
> I will see what can be done to fix it though.

Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? If so that would be
an appropriately hideous way to implement it that avoids the current
bug? :-)

Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug in gets function?
  2019-02-12  3:51   ` Rich Felker
@ 2019-02-12 14:41     ` James Larrowe
  2019-02-12 14:55       ` Ponnuvel Palaniyappan
  0 siblings, 1 reply; 9+ messages in thread
From: James Larrowe @ 2019-02-12 14:41 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1952 bytes --]

I could probably try patching it. That C99 specification seems descriptive
enough.

On Mon, Feb 11, 2019 at 10:51 PM Rich Felker <dalias@libc.org> wrote:

> On Mon, Feb 11, 2019 at 10:48:38PM -0500, Rich Felker wrote:
> > On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote:
> > > Hi. It seems that the gets function does not follow the C99 spec. In
> > > particular, if the input contains a null byte in the middle of the
> > > input, then the new-line character is not discarded.
> > >
> > > For reference, here's the relevant part in the C99 standard
> > > (7.19.7.7):
> > >
> > > > The gets function reads characters from the input stream pointed to
> > > > by stdin, into the array pointed to by s, until end-of-file is
> > > > encountered or a new-line character is read. Any new-line character
> > > > is discarded, and a null character is written immediately after the
> > > > last character read into the array.
> > >
> > > Here is an example:
> > >
> > >     #include <stdio.h>
> > >     char s[8];
> > >     int main() {
> > >         gets(s);
> > >         for (int i = 0; i < sizeof s; i++) {
> > >             printf("%02x ", s[i]);
> > >         }
> > >         printf("\n");
> > >         return 0;
> > >     }
> > >
> > > When compiled against gcc:
> > >
> > >     $ echo -e 'A\x00B' | ./a.out
> > >     41 00 42 00 00 00 00 00
> > >
> > > When compiled against musl:
> > >
> > >     $ echo -e 'A\x00B' | ./a.out
> > >     41 00 42 0a 00 00 00 00
> > >
> > > Note the terminating newline, which contradicts the spec.
> >
> > I think this bug report is correct; however the gets function is
> > awful, removed in C11, and should never be used. :-)
> >
> > I will see what can be done to fix it though.
>
> Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? If so that would be
> an appropriately hideous way to implement it that avoids the current
> bug? :-)
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 2702 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug in gets function?
  2019-02-12 14:41     ` James Larrowe
@ 2019-02-12 14:55       ` Ponnuvel Palaniyappan
  2019-02-12 16:30         ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: Ponnuvel Palaniyappan @ 2019-02-12 14:55 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2336 bytes --]

>   Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)?

I think it has at least one minor issue: it doesn't null-terminate the
buffer on empty input i.e., just a newline as input.

Regards,
Ponnuvel

On Tue, Feb 12, 2019 at 2:42 PM James Larrowe <larrowe.semaj11@gmail.com>
wrote:

> I could probably try patching it. That C99 specification seems descriptive
> enough.
>
> On Mon, Feb 11, 2019 at 10:51 PM Rich Felker <dalias@libc.org> wrote:
>
>> On Mon, Feb 11, 2019 at 10:48:38PM -0500, Rich Felker wrote:
>> > On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote:
>> > > Hi. It seems that the gets function does not follow the C99 spec. In
>> > > particular, if the input contains a null byte in the middle of the
>> > > input, then the new-line character is not discarded.
>> > >
>> > > For reference, here's the relevant part in the C99 standard
>> > > (7.19.7.7):
>> > >
>> > > > The gets function reads characters from the input stream pointed to
>> > > > by stdin, into the array pointed to by s, until end-of-file is
>> > > > encountered or a new-line character is read. Any new-line character
>> > > > is discarded, and a null character is written immediately after the
>> > > > last character read into the array.
>> > >
>> > > Here is an example:
>> > >
>> > >     #include <stdio.h>
>> > >     char s[8];
>> > >     int main() {
>> > >         gets(s);
>> > >         for (int i = 0; i < sizeof s; i++) {
>> > >             printf("%02x ", s[i]);
>> > >         }
>> > >         printf("\n");
>> > >         return 0;
>> > >     }
>> > >
>> > > When compiled against gcc:
>> > >
>> > >     $ echo -e 'A\x00B' | ./a.out
>> > >     41 00 42 00 00 00 00 00
>> > >
>> > > When compiled against musl:
>> > >
>> > >     $ echo -e 'A\x00B' | ./a.out
>> > >     41 00 42 0a 00 00 00 00
>> > >
>> > > Note the terminating newline, which contradicts the spec.
>> >
>> > I think this bug report is correct; however the gets function is
>> > awful, removed in C11, and should never be used. :-)
>> >
>> > I will see what can be done to fix it though.
>>
>> Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? If so that would be
>> an appropriately hideous way to implement it that avoids the current
>> bug? :-)
>>
>> Rich
>>
>

-- 
Regards,
Ponnuvel P

[-- Attachment #2: Type: text/html, Size: 3488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug in gets function?
  2019-02-12 14:55       ` Ponnuvel Palaniyappan
@ 2019-02-12 16:30         ` Rich Felker
  2019-02-13 21:39           ` Alexey Izbyshev
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2019-02-12 16:30 UTC (permalink / raw)
  To: musl

On Tue, Feb 12, 2019 at 02:55:19PM +0000, Ponnuvel Palaniyappan wrote:
> >   Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)?
> 
> I think it has at least one minor issue: it doesn't null-terminate the
> buffer on empty input i.e., just a newline as input.

Indeed, I omitted what the logic for handling the return value of
scanf would be. But it also seems more complicated than we might like.
If input begins with a newline, it would also fail to consume the
newline without an additional call, and the additional call would make
the operation as a whole non-atomic with respect to the FILE lock,
which is what I was trying to avoid.

Here's an alternate proposal via direct implementation:

char *gets(char *s)
{
	size_t i=0;
	int c;
	FLOCK(stdin);
	while ((c=getc_unlocked(stdin)) != EOF && c != '\n') s[i++] = c;
	s[i] = 0;
	if (c != '\n' && !feof(stdin)) s = 0;
	FUNLOCK(stdin);
	return s;
}

Does this look ok? Of course it's slow compared to a fgets-like
operation on the buffer, but gets is not a usable interface and I
don't see any reason to care whether it's fast.

Rich

> On Tue, Feb 12, 2019 at 2:42 PM James Larrowe <larrowe.semaj11@gmail.com>
> wrote:
> 
> > I could probably try patching it. That C99 specification seems descriptive
> > enough.
> >
> > On Mon, Feb 11, 2019 at 10:51 PM Rich Felker <dalias@libc.org> wrote:
> >
> >> On Mon, Feb 11, 2019 at 10:48:38PM -0500, Rich Felker wrote:
> >> > On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote:
> >> > > Hi. It seems that the gets function does not follow the C99 spec. In
> >> > > particular, if the input contains a null byte in the middle of the
> >> > > input, then the new-line character is not discarded.
> >> > >
> >> > > For reference, here's the relevant part in the C99 standard
> >> > > (7.19.7.7):
> >> > >
> >> > > > The gets function reads characters from the input stream pointed to
> >> > > > by stdin, into the array pointed to by s, until end-of-file is
> >> > > > encountered or a new-line character is read. Any new-line character
> >> > > > is discarded, and a null character is written immediately after the
> >> > > > last character read into the array.
> >> > >
> >> > > Here is an example:
> >> > >
> >> > >     #include <stdio.h>
> >> > >     char s[8];
> >> > >     int main() {
> >> > >         gets(s);
> >> > >         for (int i = 0; i < sizeof s; i++) {
> >> > >             printf("%02x ", s[i]);
> >> > >         }
> >> > >         printf("\n");
> >> > >         return 0;
> >> > >     }
> >> > >
> >> > > When compiled against gcc:
> >> > >
> >> > >     $ echo -e 'A\x00B' | ./a.out
> >> > >     41 00 42 00 00 00 00 00
> >> > >
> >> > > When compiled against musl:
> >> > >
> >> > >     $ echo -e 'A\x00B' | ./a.out
> >> > >     41 00 42 0a 00 00 00 00
> >> > >
> >> > > Note the terminating newline, which contradicts the spec.
> >> >
> >> > I think this bug report is correct; however the gets function is
> >> > awful, removed in C11, and should never be used. :-)
> >> >
> >> > I will see what can be done to fix it though.
> >>
> >> Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? If so that would be
> >> an appropriately hideous way to implement it that avoids the current
> >> bug? :-)
> >>
> >> Rich
> >>
> >
> 
> -- 
> Regards,
> Ponnuvel P


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug in gets function?
  2019-02-12 16:30         ` Rich Felker
@ 2019-02-13 21:39           ` Alexey Izbyshev
  2019-02-13 22:13             ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Izbyshev @ 2019-02-13 21:39 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker, Rich Felker

On 2019-02-12 19:30, Rich Felker wrote:
> Here's an alternate proposal via direct implementation:
> 
> char *gets(char *s)
> {
> 	size_t i=0;
> 	int c;
> 	FLOCK(stdin);
> 	while ((c=getc_unlocked(stdin)) != EOF && c != '\n') s[i++] = c;
> 	s[i] = 0;
> 	if (c != '\n' && !feof(stdin)) s = 0;
> 	FUNLOCK(stdin);
> 	return s;
> }
> 
> Does this look ok? Of course it's slow compared to a fgets-like
> operation on the buffer, but gets is not a usable interface and I
> don't see any reason to care whether it's fast.
> 
gets() must also return NULL if EOF is reached and no bytes were read.

Alexey



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug in gets function?
  2019-02-13 21:39           ` Alexey Izbyshev
@ 2019-02-13 22:13             ` Rich Felker
  2019-02-13 23:19               ` Alexey Izbyshev
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2019-02-13 22:13 UTC (permalink / raw)
  To: musl

On Thu, Feb 14, 2019 at 12:39:07AM +0300, Alexey Izbyshev wrote:
> On 2019-02-12 19:30, Rich Felker wrote:
> >Here's an alternate proposal via direct implementation:
> >
> >char *gets(char *s)
> >{
> >	size_t i=0;
> >	int c;
> >	FLOCK(stdin);
> >	while ((c=getc_unlocked(stdin)) != EOF && c != '\n') s[i++] = c;
> >	s[i] = 0;
> >	if (c != '\n' && !feof(stdin)) s = 0;
> >	FUNLOCK(stdin);
> >	return s;
> >}
> >
> >Does this look ok? Of course it's slow compared to a fgets-like
> >operation on the buffer, but gets is not a usable interface and I
> >don't see any reason to care whether it's fast.
> >
> gets() must also return NULL if EOF is reached and no bytes were read.

So if (c != '\n' && (!feof(stdin) || !i)) ?

Rich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bug in gets function?
  2019-02-13 22:13             ` Rich Felker
@ 2019-02-13 23:19               ` Alexey Izbyshev
  0 siblings, 0 replies; 9+ messages in thread
From: Alexey Izbyshev @ 2019-02-13 23:19 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On 2019-02-14 01:13, Rich Felker wrote:
> On Thu, Feb 14, 2019 at 12:39:07AM +0300, Alexey Izbyshev wrote:
>> On 2019-02-12 19:30, Rich Felker wrote:
>> >Here's an alternate proposal via direct implementation:
>> >
>> >char *gets(char *s)
>> >{
>> >	size_t i=0;
>> >	int c;
>> >	FLOCK(stdin);
>> >	while ((c=getc_unlocked(stdin)) != EOF && c != '\n') s[i++] = c;
>> >	s[i] = 0;
>> >	if (c != '\n' && !feof(stdin)) s = 0;
>> >	FUNLOCK(stdin);
>> >	return s;
>> >}
>> >
>> >Does this look ok? Of course it's slow compared to a fgets-like
>> >operation on the buffer, but gets is not a usable interface and I
>> >don't see any reason to care whether it's fast.
>> >
>> gets() must also return NULL if EOF is reached and no bytes were read.
> 
> So if (c != '\n' && (!feof(stdin) || !i)) ?
> 
Yes, looks good.

Alexey



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-02-13 23:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-12  2:55 Bug in gets function? Keyhan Vakil
2019-02-12  3:48 ` Rich Felker
2019-02-12  3:51   ` Rich Felker
2019-02-12 14:41     ` James Larrowe
2019-02-12 14:55       ` Ponnuvel Palaniyappan
2019-02-12 16:30         ` Rich Felker
2019-02-13 21:39           ` Alexey Izbyshev
2019-02-13 22:13             ` Rich Felker
2019-02-13 23:19               ` Alexey Izbyshev

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).