* Bug in gets function?
@ 2019-02-12 2:55 Keyhan Vakil
2019-02-12 3:48 ` Rich Felker
0 siblings, 1 reply; 9+ messages in thread
From: Keyhan Vakil @ 2019-02-12 2:55 UTC (permalink / raw)
To: musl
Hi. It seems that the gets function does not follow the C99 spec. In
particular, if the input contains a null byte in the middle of the
input, then the new-line character is not discarded.
For reference, here's the relevant part in the C99 standard
(7.19.7.7):
> The gets function reads characters from the input stream pointed to
> by stdin, into the array pointed to by s, until end-of-file is
> encountered or a new-line character is read. Any new-line character
> is discarded, and a null character is written immediately after the
> last character read into the array.
Here is an example:
#include <stdio.h>
char s[8];
int main() {
gets(s);
for (int i = 0; i < sizeof s; i++) {
printf("%02x ", s[i]);
}
printf("\n");
return 0;
}
When compiled against gcc:
$ echo -e 'A\x00B' | ./a.out
41 00 42 00 00 00 00 00
When compiled against musl:
$ echo -e 'A\x00B' | ./a.out
41 00 42 0a 00 00 00 00
Note the terminating newline, which contradicts the spec.
Thanks,
Keyhan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in gets function? 2019-02-12 2:55 Bug in gets function? Keyhan Vakil @ 2019-02-12 3:48 ` Rich Felker 2019-02-12 3:51 ` Rich Felker 0 siblings, 1 reply; 9+ messages in thread From: Rich Felker @ 2019-02-12 3:48 UTC (permalink / raw) To: musl On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote: > Hi. It seems that the gets function does not follow the C99 spec. In > particular, if the input contains a null byte in the middle of the > input, then the new-line character is not discarded. > > For reference, here's the relevant part in the C99 standard > (7.19.7.7): > > > The gets function reads characters from the input stream pointed to > > by stdin, into the array pointed to by s, until end-of-file is > > encountered or a new-line character is read. Any new-line character > > is discarded, and a null character is written immediately after the > > last character read into the array. > > Here is an example: > > #include <stdio.h> > char s[8]; > int main() { > gets(s); > for (int i = 0; i < sizeof s; i++) { > printf("%02x ", s[i]); > } > printf("\n"); > return 0; > } > > When compiled against gcc: > > $ echo -e 'A\x00B' | ./a.out > 41 00 42 00 00 00 00 00 > > When compiled against musl: > > $ echo -e 'A\x00B' | ./a.out > 41 00 42 0a 00 00 00 00 > > Note the terminating newline, which contradicts the spec. I think this bug report is correct; however the gets function is awful, removed in C11, and should never be used. :-) I will see what can be done to fix it though. Rich ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in gets function? 2019-02-12 3:48 ` Rich Felker @ 2019-02-12 3:51 ` Rich Felker 2019-02-12 14:41 ` James Larrowe 0 siblings, 1 reply; 9+ messages in thread From: Rich Felker @ 2019-02-12 3:51 UTC (permalink / raw) To: musl On Mon, Feb 11, 2019 at 10:48:38PM -0500, Rich Felker wrote: > On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote: > > Hi. It seems that the gets function does not follow the C99 spec. In > > particular, if the input contains a null byte in the middle of the > > input, then the new-line character is not discarded. > > > > For reference, here's the relevant part in the C99 standard > > (7.19.7.7): > > > > > The gets function reads characters from the input stream pointed to > > > by stdin, into the array pointed to by s, until end-of-file is > > > encountered or a new-line character is read. Any new-line character > > > is discarded, and a null character is written immediately after the > > > last character read into the array. > > > > Here is an example: > > > > #include <stdio.h> > > char s[8]; > > int main() { > > gets(s); > > for (int i = 0; i < sizeof s; i++) { > > printf("%02x ", s[i]); > > } > > printf("\n"); > > return 0; > > } > > > > When compiled against gcc: > > > > $ echo -e 'A\x00B' | ./a.out > > 41 00 42 00 00 00 00 00 > > > > When compiled against musl: > > > > $ echo -e 'A\x00B' | ./a.out > > 41 00 42 0a 00 00 00 00 > > > > Note the terminating newline, which contradicts the spec. > > I think this bug report is correct; however the gets function is > awful, removed in C11, and should never be used. :-) > > I will see what can be done to fix it though. Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? If so that would be an appropriately hideous way to implement it that avoids the current bug? :-) Rich ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in gets function? 2019-02-12 3:51 ` Rich Felker @ 2019-02-12 14:41 ` James Larrowe 2019-02-12 14:55 ` Ponnuvel Palaniyappan 0 siblings, 1 reply; 9+ messages in thread From: James Larrowe @ 2019-02-12 14:41 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1952 bytes --] I could probably try patching it. That C99 specification seems descriptive enough. On Mon, Feb 11, 2019 at 10:51 PM Rich Felker <dalias@libc.org> wrote: > On Mon, Feb 11, 2019 at 10:48:38PM -0500, Rich Felker wrote: > > On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote: > > > Hi. It seems that the gets function does not follow the C99 spec. In > > > particular, if the input contains a null byte in the middle of the > > > input, then the new-line character is not discarded. > > > > > > For reference, here's the relevant part in the C99 standard > > > (7.19.7.7): > > > > > > > The gets function reads characters from the input stream pointed to > > > > by stdin, into the array pointed to by s, until end-of-file is > > > > encountered or a new-line character is read. Any new-line character > > > > is discarded, and a null character is written immediately after the > > > > last character read into the array. > > > > > > Here is an example: > > > > > > #include <stdio.h> > > > char s[8]; > > > int main() { > > > gets(s); > > > for (int i = 0; i < sizeof s; i++) { > > > printf("%02x ", s[i]); > > > } > > > printf("\n"); > > > return 0; > > > } > > > > > > When compiled against gcc: > > > > > > $ echo -e 'A\x00B' | ./a.out > > > 41 00 42 00 00 00 00 00 > > > > > > When compiled against musl: > > > > > > $ echo -e 'A\x00B' | ./a.out > > > 41 00 42 0a 00 00 00 00 > > > > > > Note the terminating newline, which contradicts the spec. > > > > I think this bug report is correct; however the gets function is > > awful, removed in C11, and should never be used. :-) > > > > I will see what can be done to fix it though. > > Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? If so that would be > an appropriately hideous way to implement it that avoids the current > bug? :-) > > Rich > [-- Attachment #2: Type: text/html, Size: 2702 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in gets function? 2019-02-12 14:41 ` James Larrowe @ 2019-02-12 14:55 ` Ponnuvel Palaniyappan 2019-02-12 16:30 ` Rich Felker 0 siblings, 1 reply; 9+ messages in thread From: Ponnuvel Palaniyappan @ 2019-02-12 14:55 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 2336 bytes --] > Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? I think it has at least one minor issue: it doesn't null-terminate the buffer on empty input i.e., just a newline as input. Regards, Ponnuvel On Tue, Feb 12, 2019 at 2:42 PM James Larrowe <larrowe.semaj11@gmail.com> wrote: > I could probably try patching it. That C99 specification seems descriptive > enough. > > On Mon, Feb 11, 2019 at 10:51 PM Rich Felker <dalias@libc.org> wrote: > >> On Mon, Feb 11, 2019 at 10:48:38PM -0500, Rich Felker wrote: >> > On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote: >> > > Hi. It seems that the gets function does not follow the C99 spec. In >> > > particular, if the input contains a null byte in the middle of the >> > > input, then the new-line character is not discarded. >> > > >> > > For reference, here's the relevant part in the C99 standard >> > > (7.19.7.7): >> > > >> > > > The gets function reads characters from the input stream pointed to >> > > > by stdin, into the array pointed to by s, until end-of-file is >> > > > encountered or a new-line character is read. Any new-line character >> > > > is discarded, and a null character is written immediately after the >> > > > last character read into the array. >> > > >> > > Here is an example: >> > > >> > > #include <stdio.h> >> > > char s[8]; >> > > int main() { >> > > gets(s); >> > > for (int i = 0; i < sizeof s; i++) { >> > > printf("%02x ", s[i]); >> > > } >> > > printf("\n"); >> > > return 0; >> > > } >> > > >> > > When compiled against gcc: >> > > >> > > $ echo -e 'A\x00B' | ./a.out >> > > 41 00 42 00 00 00 00 00 >> > > >> > > When compiled against musl: >> > > >> > > $ echo -e 'A\x00B' | ./a.out >> > > 41 00 42 0a 00 00 00 00 >> > > >> > > Note the terminating newline, which contradicts the spec. >> > >> > I think this bug report is correct; however the gets function is >> > awful, removed in C11, and should never be used. :-) >> > >> > I will see what can be done to fix it though. >> >> Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? If so that would be >> an appropriately hideous way to implement it that avoids the current >> bug? :-) >> >> Rich >> > -- Regards, Ponnuvel P [-- Attachment #2: Type: text/html, Size: 3488 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in gets function? 2019-02-12 14:55 ` Ponnuvel Palaniyappan @ 2019-02-12 16:30 ` Rich Felker 2019-02-13 21:39 ` Alexey Izbyshev 0 siblings, 1 reply; 9+ messages in thread From: Rich Felker @ 2019-02-12 16:30 UTC (permalink / raw) To: musl On Tue, Feb 12, 2019 at 02:55:19PM +0000, Ponnuvel Palaniyappan wrote: > > Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? > > I think it has at least one minor issue: it doesn't null-terminate the > buffer on empty input i.e., just a newline as input. Indeed, I omitted what the logic for handling the return value of scanf would be. But it also seems more complicated than we might like. If input begins with a newline, it would also fail to consume the newline without an additional call, and the additional call would make the operation as a whole non-atomic with respect to the FILE lock, which is what I was trying to avoid. Here's an alternate proposal via direct implementation: char *gets(char *s) { size_t i=0; int c; FLOCK(stdin); while ((c=getc_unlocked(stdin)) != EOF && c != '\n') s[i++] = c; s[i] = 0; if (c != '\n' && !feof(stdin)) s = 0; FUNLOCK(stdin); return s; } Does this look ok? Of course it's slow compared to a fgets-like operation on the buffer, but gets is not a usable interface and I don't see any reason to care whether it's fast. Rich > On Tue, Feb 12, 2019 at 2:42 PM James Larrowe <larrowe.semaj11@gmail.com> > wrote: > > > I could probably try patching it. That C99 specification seems descriptive > > enough. > > > > On Mon, Feb 11, 2019 at 10:51 PM Rich Felker <dalias@libc.org> wrote: > > > >> On Mon, Feb 11, 2019 at 10:48:38PM -0500, Rich Felker wrote: > >> > On Mon, Feb 11, 2019 at 06:55:24PM -0800, Keyhan Vakil wrote: > >> > > Hi. It seems that the gets function does not follow the C99 spec. In > >> > > particular, if the input contains a null byte in the middle of the > >> > > input, then the new-line character is not discarded. > >> > > > >> > > For reference, here's the relevant part in the C99 standard > >> > > (7.19.7.7): > >> > > > >> > > > The gets function reads characters from the input stream pointed to > >> > > > by stdin, into the array pointed to by s, until end-of-file is > >> > > > encountered or a new-line character is read. Any new-line character > >> > > > is discarded, and a null character is written immediately after the > >> > > > last character read into the array. > >> > > > >> > > Here is an example: > >> > > > >> > > #include <stdio.h> > >> > > char s[8]; > >> > > int main() { > >> > > gets(s); > >> > > for (int i = 0; i < sizeof s; i++) { > >> > > printf("%02x ", s[i]); > >> > > } > >> > > printf("\n"); > >> > > return 0; > >> > > } > >> > > > >> > > When compiled against gcc: > >> > > > >> > > $ echo -e 'A\x00B' | ./a.out > >> > > 41 00 42 00 00 00 00 00 > >> > > > >> > > When compiled against musl: > >> > > > >> > > $ echo -e 'A\x00B' | ./a.out > >> > > 41 00 42 0a 00 00 00 00 > >> > > > >> > > Note the terminating newline, which contradicts the spec. > >> > > >> > I think this bug report is correct; however the gets function is > >> > awful, removed in C11, and should never be used. :-) > >> > > >> > I will see what can be done to fix it though. > >> > >> Is gets(s) equivalent to scanf("%[^\n]%*1[\n]",s)? If so that would be > >> an appropriately hideous way to implement it that avoids the current > >> bug? :-) > >> > >> Rich > >> > > > > -- > Regards, > Ponnuvel P ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in gets function? 2019-02-12 16:30 ` Rich Felker @ 2019-02-13 21:39 ` Alexey Izbyshev 2019-02-13 22:13 ` Rich Felker 0 siblings, 1 reply; 9+ messages in thread From: Alexey Izbyshev @ 2019-02-13 21:39 UTC (permalink / raw) To: musl; +Cc: Rich Felker, Rich Felker On 2019-02-12 19:30, Rich Felker wrote: > Here's an alternate proposal via direct implementation: > > char *gets(char *s) > { > size_t i=0; > int c; > FLOCK(stdin); > while ((c=getc_unlocked(stdin)) != EOF && c != '\n') s[i++] = c; > s[i] = 0; > if (c != '\n' && !feof(stdin)) s = 0; > FUNLOCK(stdin); > return s; > } > > Does this look ok? Of course it's slow compared to a fgets-like > operation on the buffer, but gets is not a usable interface and I > don't see any reason to care whether it's fast. > gets() must also return NULL if EOF is reached and no bytes were read. Alexey ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in gets function? 2019-02-13 21:39 ` Alexey Izbyshev @ 2019-02-13 22:13 ` Rich Felker 2019-02-13 23:19 ` Alexey Izbyshev 0 siblings, 1 reply; 9+ messages in thread From: Rich Felker @ 2019-02-13 22:13 UTC (permalink / raw) To: musl On Thu, Feb 14, 2019 at 12:39:07AM +0300, Alexey Izbyshev wrote: > On 2019-02-12 19:30, Rich Felker wrote: > >Here's an alternate proposal via direct implementation: > > > >char *gets(char *s) > >{ > > size_t i=0; > > int c; > > FLOCK(stdin); > > while ((c=getc_unlocked(stdin)) != EOF && c != '\n') s[i++] = c; > > s[i] = 0; > > if (c != '\n' && !feof(stdin)) s = 0; > > FUNLOCK(stdin); > > return s; > >} > > > >Does this look ok? Of course it's slow compared to a fgets-like > >operation on the buffer, but gets is not a usable interface and I > >don't see any reason to care whether it's fast. > > > gets() must also return NULL if EOF is reached and no bytes were read. So if (c != '\n' && (!feof(stdin) || !i)) ? Rich ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bug in gets function? 2019-02-13 22:13 ` Rich Felker @ 2019-02-13 23:19 ` Alexey Izbyshev 0 siblings, 0 replies; 9+ messages in thread From: Alexey Izbyshev @ 2019-02-13 23:19 UTC (permalink / raw) To: musl; +Cc: Rich Felker On 2019-02-14 01:13, Rich Felker wrote: > On Thu, Feb 14, 2019 at 12:39:07AM +0300, Alexey Izbyshev wrote: >> On 2019-02-12 19:30, Rich Felker wrote: >> >Here's an alternate proposal via direct implementation: >> > >> >char *gets(char *s) >> >{ >> > size_t i=0; >> > int c; >> > FLOCK(stdin); >> > while ((c=getc_unlocked(stdin)) != EOF && c != '\n') s[i++] = c; >> > s[i] = 0; >> > if (c != '\n' && !feof(stdin)) s = 0; >> > FUNLOCK(stdin); >> > return s; >> >} >> > >> >Does this look ok? Of course it's slow compared to a fgets-like >> >operation on the buffer, but gets is not a usable interface and I >> >don't see any reason to care whether it's fast. >> > >> gets() must also return NULL if EOF is reached and no bytes were read. > > So if (c != '\n' && (!feof(stdin) || !i)) ? > Yes, looks good. Alexey ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-02-13 23:19 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-12 2:55 Bug in gets function? Keyhan Vakil 2019-02-12 3:48 ` Rich Felker 2019-02-12 3:51 ` Rich Felker 2019-02-12 14:41 ` James Larrowe 2019-02-12 14:55 ` Ponnuvel Palaniyappan 2019-02-12 16:30 ` Rich Felker 2019-02-13 21:39 ` Alexey Izbyshev 2019-02-13 22:13 ` Rich Felker 2019-02-13 23:19 ` Alexey Izbyshev
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).