[-- Attachment #1: Type: text/plain, Size: 1299 bytes --] Hello, I don't know if you guys noticed, but sometime ago we voted some of the ..._r functions from <time.h> into the C standard, just to then discover that POSIX has deprecated the whole set of functions and proposes to replace them by `strftime`. One of the arguments to keep them, was that `asctime_r` does not need access to locale and has a fixed format, and so can be implemented with a much smaller footprint. Looking into musl I found that the current implementation is basically doing verbatim what the C standard says, namely uses `snprintf` under the hood to do the formatting. This has obviously the disadvantage that this drags the whole infrastructure that is needed for `snprintf` into the executable. Making some tests, I found that coding `asctime_r` straight forward with byte-copying has it shave off about 10k from the final executable. Would it be interesting for musl to change to such an implementation? Shall I prepare a patch to do so? Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #1: Type: text/plain, Size: 786 bytes --] On Sun, 23 Aug 2020, Jens Gustedt wrote: > Looking into musl I found that the current implementation is basically > doing verbatim what the C standard says, namely uses `snprintf` under > the hood to do the formatting. This has obviously the disadvantage > that this drags the whole infrastructure that is needed for `snprintf` > into the executable. > > Making some tests, I found that coding `asctime_r` straight forward > with byte-copying has it shave off about 10k from the final > executable. Do I understand correctly that this 10k figure is for an "application" that does not use stdio at all otherwise? If so, I believe that is a quite unrealistic test — why would an application use asctime_r but then avoid use of stdio to do something useful with the result? Alexander
[-- Attachment #1.1: Type: text/plain, Size: 841 bytes --] Alexander, for simplicity I attach what I have. on Sun, 23 Aug 2020 12:33:30 +0300 (MSK) you (Alexander Monakov <amonakov@ispras.ru>) wrote: > Do I understand correctly that this 10k figure is for an "application" > that does not use stdio at all otherwise? It just uses unformatted IO, namely `puts`. > If so, I believe that is a > quite unrealistic test — why would an application use asctime_r but > then avoid use of stdio to do something useful with the result? Just dumping a time stamp to a file e.g. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.2: test-asctime.c --] [-- Type: text/x-c++src, Size: 2114 bytes --] #include <time.h> #include <string.h> #include <stdio.h> static unsigned print_decimal(size_t places, char buf[places], unsigned val) { for (size_t pos = places; pos > 0; pos--) { buf[pos-1] = (val % 10) + '0'; val /= 10; } return val; } char *asctime_r(const struct tm *tm, char *buf) { static char const wday[7][3] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", }; static char const mon[12][3] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", }; memcpy(buf, "\0\0\0 \0\0\0 \0 \0\0:\0\0:\0\0 \0\0\0\0\n", 26); if (tm->tm_wday >= 7u) goto CLEANUP; memcpy(buf, wday[tm->tm_wday], 3); if (tm->tm_mon >= 12u) goto CLEANUP; memcpy(buf+4, mon[tm->tm_mon], 3); if (tm->tm_mday < 10u) { if (print_decimal(1, buf+9, tm->tm_mday)) goto CLEANUP; } else { if (print_decimal(2, buf+8, tm->tm_mday)) goto CLEANUP; } if (print_decimal(2, buf+11, tm->tm_hour)) goto CLEANUP; if (print_decimal(2, buf+14, tm->tm_min)) goto CLEANUP; if (print_decimal(2, buf+17, tm->tm_sec)) goto CLEANUP; if (1900u+tm->tm_year < 1000u || print_decimal(4, buf+20, 1900u+tm->tm_year)) goto CLEANUP; CLEANUP: return buf; } int main(int argc, char* argv[argc+1]) { char buf[26]; struct tm T; time_t t = time(0); gmtime_r(&t, &T); if (argc == 2) T.tm_mon = 13; if (argc == 3) T.tm_mon = -6; asctime_r(&T, buf); puts(buf); //puts(asctime(&T)); } [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --]
On Sun, 23 Aug 2020, Jens Gustedt wrote:
> > Do I understand correctly that this 10k figure is for an "application"
> > that does not use stdio at all otherwise?
>
> It just uses unformatted IO, namely `puts`.
I see, thanks. Indeed printf-style formatting can pull in a bit above 10KB
on top of what 'puts' alone needs. On x86-64, top 5 new symbols by size are:
357 T vfprintf
464 r states
1823 r errmsg
2384 t printf_core
2930 t fmt_fp
Of those, errmsg would also be pulled in by 'strerror'.
Alexander
On Sun, Aug 23, 2020 at 10:24:39AM +0200, Jens Gustedt wrote:
> Hello,
> I don't know if you guys noticed, but sometime ago we voted some of
> the ..._r functions from <time.h> into the C standard, just to then
> discover that POSIX has deprecated the whole set of functions and
> proposes to replace them by `strftime`.
>
> One of the arguments to keep them, was that `asctime_r` does not need
> access to locale and has a fixed format, and so can be implemented
> with a much smaller footprint.
>
> Looking into musl I found that the current implementation is basically
> doing verbatim what the C standard says, namely uses `snprintf` under
> the hood to do the formatting. This has obviously the disadvantage
> that this drags the whole infrastructure that is needed for `snprintf`
> into the executable.
>
> Making some tests, I found that coding `asctime_r` straight forward
> with byte-copying has it shave off about 10k from the final
> executable.
>
> Would it be interesting for musl to change to such an implementation?
>
> Shall I prepare a patch to do so?
I'm not *strongly* opposed to this, but my reasoning is fairly much in
line with the POSIX side, that these interfaces are legacy/deprecated,
and in general musl practice is to choose maximum simplicity over
size/performance optimality for deprecated/legacy or junk interfaces.
In particular, asctime[_r] formats dates in a legacy US format,
whereas modern applications should be using either ISO date format or
a locale-specific format.
Note that ISO C specifies asctime in terms of a particular printf
format string, meaning the results are well-defined for any values
that don't overflow the specified buffer, even if they are somewhat
nonsensical. In particular, as specified, it's required to accept
negative hours etc. if the year is such that it requires fewer than 4
digits. Direct coding this and ensuring all the cases are covered
seems nontrivial. One might argue that this is stupid and asctime is
not intended to be used in this way or useful to use in this way, and
I'd tend to agree but it's not something I'd want to argue with
someone who had a fairly legitimate claim that it's allowable
as-specified when there's no real compelling reason not to just
implement it as-specified with snprintf.
Rich
[-- Attachment #1: Type: text/plain, Size: 1826 bytes --] Rich, on Mon, 24 Aug 2020 12:14:00 -0400 you (Rich Felker <dalias@libc.org>) wrote: > I'm not *strongly* opposed to this, but my reasoning is fairly much in > line with the POSIX side, that these interfaces are legacy/deprecated, > and in general musl practice is to choose maximum simplicity over > size/performance optimality for deprecated/legacy or junk interfaces. > > In particular, asctime[_r] formats dates in a legacy US format, > whereas modern applications should be using either ISO date format or > a locale-specific format. But which is also a format used by the language itself (or refered to) by `__TIME__` and similar macros. > Note that ISO C specifies asctime in terms of a particular printf > format string, meaning the results are well-defined for any values > that don't overflow the specified buffer, even if they are somewhat > nonsensical. I don't think so. The general rules for valid arguments to C library functions always apply, so according to 7.1.4 calls to the functions with values that are outside the specified ranges for the type have UB. In the <time.h> header the only exception from this rule seems to be `mktime`, which makes such exceptions explicit and says how the argument is normalized if it is not in the ranges as specified. The sample code that I posted does range checks with simple means that never results in unbounded UB and always returns a string that is null terminated. I would think that this is reasonable behavior. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --]
On Thu, Aug 27, 2020 at 11:27:59AM +0200, Jens Gustedt wrote: > Rich, > > on Mon, 24 Aug 2020 12:14:00 -0400 you (Rich Felker <dalias@libc.org>) > wrote: > > > I'm not *strongly* opposed to this, but my reasoning is fairly much in > > line with the POSIX side, that these interfaces are legacy/deprecated, > > and in general musl practice is to choose maximum simplicity over > > size/performance optimality for deprecated/legacy or junk interfaces. > > > > In particular, asctime[_r] formats dates in a legacy US format, > > whereas modern applications should be using either ISO date format or > > a locale-specific format. > > But which is also a format used by the language itself (or refered to) > by `__TIME__` and similar macros. Yes, that doesn't indicate that it should continue to be used, though. And in theory you can use __TIME__ just to parse and convert to a more reasonable form. > > Note that ISO C specifies asctime in terms of a particular printf > > format string, meaning the results are well-defined for any values > > that don't overflow the specified buffer, even if they are somewhat > > nonsensical. > > I don't think so. The general rules for valid arguments to C library > functions always apply, so according to 7.1.4 calls to the functions > with values that are outside the specified ranges for the type have > UB. The range of the type is [INT_MIN,INT_MAX]. For tm_wday and wm_mon, UB of out-of-normal-range values would be established just by omission of any spec for what they do. However you missed the actual text in support of your claim, 7.27.3.1 ¶3: "If any of the members of the broken-down time contain values that are outside their normal ranges,323) the behavior of the asctime function is undefined." Normal ranges are defined in 7.27.1 ¶4. So this removes my main potential objection and the remaining question is just whether this is a size optimization that makes sense. > In the <time.h> header the only exception from this rule seems to be > `mktime`, which makes such exceptions explicit and says how the > argument is normalized if it is not in the ranges as specified. > > The sample code that I posted does range checks with simple means that > never results in unbounded UB and always returns a string that is null > terminated. I would think that this is reasonable behavior. I think the behavior of crashing on inputs that are UB and that can't safely be printed should probably be preserved, too; I'm not clear if you had that in mind already. I'm rather indifferent on what happens for inputs that are UB but that can faithfully be presented in the allotted space. Rich
[-- Attachment #1: Type: text/plain, Size: 1066 bytes --] on Thu, 27 Aug 2020 10:03:07 -0400 you (Rich Felker <dalias@libc.org>) wrote: > I think the behavior of crashing on inputs that are UB and that can't > safely be printed should probably be preserved, too; I'm not clear if > you had that in mind already. I'm rather indifferent on what happens > for inputs that are UB but that can faithfully be presented in the > allotted space. same for me In the sample implementation I have "goto CLEANUP" and an implicit guarantee that the output is always null terminated. This is more in the spirit of `snprintf` not to do bad things as soon the output buffer has at least 26 bytes. But we could also do `abort()`, `do_crash()`, whatever fits into musl's general strategy for error handling. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --]