* [musl] direct coding of asctime_r @ 2020-08-23 8:24 Jens Gustedt 2020-08-23 9:33 ` Alexander Monakov 2020-08-24 16:14 ` Rich Felker 0 siblings, 2 replies; 8+ messages in thread From: Jens Gustedt @ 2020-08-23 8:24 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1299 bytes --] Hello, I don't know if you guys noticed, but sometime ago we voted some of the ..._r functions from <time.h> into the C standard, just to then discover that POSIX has deprecated the whole set of functions and proposes to replace them by `strftime`. One of the arguments to keep them, was that `asctime_r` does not need access to locale and has a fixed format, and so can be implemented with a much smaller footprint. Looking into musl I found that the current implementation is basically doing verbatim what the C standard says, namely uses `snprintf` under the hood to do the formatting. This has obviously the disadvantage that this drags the whole infrastructure that is needed for `snprintf` into the executable. Making some tests, I found that coding `asctime_r` straight forward with byte-copying has it shave off about 10k from the final executable. Would it be interesting for musl to change to such an implementation? Shall I prepare a patch to do so? Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] direct coding of asctime_r 2020-08-23 8:24 [musl] direct coding of asctime_r Jens Gustedt @ 2020-08-23 9:33 ` Alexander Monakov 2020-08-23 9:56 ` Jens Gustedt 2020-08-24 16:14 ` Rich Felker 1 sibling, 1 reply; 8+ messages in thread From: Alexander Monakov @ 2020-08-23 9:33 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 786 bytes --] On Sun, 23 Aug 2020, Jens Gustedt wrote: > Looking into musl I found that the current implementation is basically > doing verbatim what the C standard says, namely uses `snprintf` under > the hood to do the formatting. This has obviously the disadvantage > that this drags the whole infrastructure that is needed for `snprintf` > into the executable. > > Making some tests, I found that coding `asctime_r` straight forward > with byte-copying has it shave off about 10k from the final > executable. Do I understand correctly that this 10k figure is for an "application" that does not use stdio at all otherwise? If so, I believe that is a quite unrealistic test — why would an application use asctime_r but then avoid use of stdio to do something useful with the result? Alexander ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] direct coding of asctime_r 2020-08-23 9:33 ` Alexander Monakov @ 2020-08-23 9:56 ` Jens Gustedt 2020-08-23 11:08 ` Alexander Monakov 0 siblings, 1 reply; 8+ messages in thread From: Jens Gustedt @ 2020-08-23 9:56 UTC (permalink / raw) To: musl [-- Attachment #1.1: Type: text/plain, Size: 841 bytes --] Alexander, for simplicity I attach what I have. on Sun, 23 Aug 2020 12:33:30 +0300 (MSK) you (Alexander Monakov <amonakov@ispras.ru>) wrote: > Do I understand correctly that this 10k figure is for an "application" > that does not use stdio at all otherwise? It just uses unformatted IO, namely `puts`. > If so, I believe that is a > quite unrealistic test — why would an application use asctime_r but > then avoid use of stdio to do something useful with the result? Just dumping a time stamp to a file e.g. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.2: test-asctime.c --] [-- Type: text/x-c++src, Size: 2114 bytes --] #include <time.h> #include <string.h> #include <stdio.h> static unsigned print_decimal(size_t places, char buf[places], unsigned val) { for (size_t pos = places; pos > 0; pos--) { buf[pos-1] = (val % 10) + '0'; val /= 10; } return val; } char *asctime_r(const struct tm *tm, char *buf) { static char const wday[7][3] = { "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", }; static char const mon[12][3] = { "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", }; memcpy(buf, "\0\0\0 \0\0\0 \0 \0\0:\0\0:\0\0 \0\0\0\0\n", 26); if (tm->tm_wday >= 7u) goto CLEANUP; memcpy(buf, wday[tm->tm_wday], 3); if (tm->tm_mon >= 12u) goto CLEANUP; memcpy(buf+4, mon[tm->tm_mon], 3); if (tm->tm_mday < 10u) { if (print_decimal(1, buf+9, tm->tm_mday)) goto CLEANUP; } else { if (print_decimal(2, buf+8, tm->tm_mday)) goto CLEANUP; } if (print_decimal(2, buf+11, tm->tm_hour)) goto CLEANUP; if (print_decimal(2, buf+14, tm->tm_min)) goto CLEANUP; if (print_decimal(2, buf+17, tm->tm_sec)) goto CLEANUP; if (1900u+tm->tm_year < 1000u || print_decimal(4, buf+20, 1900u+tm->tm_year)) goto CLEANUP; CLEANUP: return buf; } int main(int argc, char* argv[argc+1]) { char buf[26]; struct tm T; time_t t = time(0); gmtime_r(&t, &T); if (argc == 2) T.tm_mon = 13; if (argc == 3) T.tm_mon = -6; asctime_r(&T, buf); puts(buf); //puts(asctime(&T)); } [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] direct coding of asctime_r 2020-08-23 9:56 ` Jens Gustedt @ 2020-08-23 11:08 ` Alexander Monakov 0 siblings, 0 replies; 8+ messages in thread From: Alexander Monakov @ 2020-08-23 11:08 UTC (permalink / raw) To: musl On Sun, 23 Aug 2020, Jens Gustedt wrote: > > Do I understand correctly that this 10k figure is for an "application" > > that does not use stdio at all otherwise? > > It just uses unformatted IO, namely `puts`. I see, thanks. Indeed printf-style formatting can pull in a bit above 10KB on top of what 'puts' alone needs. On x86-64, top 5 new symbols by size are: 357 T vfprintf 464 r states 1823 r errmsg 2384 t printf_core 2930 t fmt_fp Of those, errmsg would also be pulled in by 'strerror'. Alexander ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] direct coding of asctime_r 2020-08-23 8:24 [musl] direct coding of asctime_r Jens Gustedt 2020-08-23 9:33 ` Alexander Monakov @ 2020-08-24 16:14 ` Rich Felker 2020-08-27 9:27 ` Jens Gustedt 1 sibling, 1 reply; 8+ messages in thread From: Rich Felker @ 2020-08-24 16:14 UTC (permalink / raw) To: musl On Sun, Aug 23, 2020 at 10:24:39AM +0200, Jens Gustedt wrote: > Hello, > I don't know if you guys noticed, but sometime ago we voted some of > the ..._r functions from <time.h> into the C standard, just to then > discover that POSIX has deprecated the whole set of functions and > proposes to replace them by `strftime`. > > One of the arguments to keep them, was that `asctime_r` does not need > access to locale and has a fixed format, and so can be implemented > with a much smaller footprint. > > Looking into musl I found that the current implementation is basically > doing verbatim what the C standard says, namely uses `snprintf` under > the hood to do the formatting. This has obviously the disadvantage > that this drags the whole infrastructure that is needed for `snprintf` > into the executable. > > Making some tests, I found that coding `asctime_r` straight forward > with byte-copying has it shave off about 10k from the final > executable. > > Would it be interesting for musl to change to such an implementation? > > Shall I prepare a patch to do so? I'm not *strongly* opposed to this, but my reasoning is fairly much in line with the POSIX side, that these interfaces are legacy/deprecated, and in general musl practice is to choose maximum simplicity over size/performance optimality for deprecated/legacy or junk interfaces. In particular, asctime[_r] formats dates in a legacy US format, whereas modern applications should be using either ISO date format or a locale-specific format. Note that ISO C specifies asctime in terms of a particular printf format string, meaning the results are well-defined for any values that don't overflow the specified buffer, even if they are somewhat nonsensical. In particular, as specified, it's required to accept negative hours etc. if the year is such that it requires fewer than 4 digits. Direct coding this and ensuring all the cases are covered seems nontrivial. One might argue that this is stupid and asctime is not intended to be used in this way or useful to use in this way, and I'd tend to agree but it's not something I'd want to argue with someone who had a fairly legitimate claim that it's allowable as-specified when there's no real compelling reason not to just implement it as-specified with snprintf. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] direct coding of asctime_r 2020-08-24 16:14 ` Rich Felker @ 2020-08-27 9:27 ` Jens Gustedt 2020-08-27 14:03 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Jens Gustedt @ 2020-08-27 9:27 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1826 bytes --] Rich, on Mon, 24 Aug 2020 12:14:00 -0400 you (Rich Felker <dalias@libc.org>) wrote: > I'm not *strongly* opposed to this, but my reasoning is fairly much in > line with the POSIX side, that these interfaces are legacy/deprecated, > and in general musl practice is to choose maximum simplicity over > size/performance optimality for deprecated/legacy or junk interfaces. > > In particular, asctime[_r] formats dates in a legacy US format, > whereas modern applications should be using either ISO date format or > a locale-specific format. But which is also a format used by the language itself (or refered to) by `__TIME__` and similar macros. > Note that ISO C specifies asctime in terms of a particular printf > format string, meaning the results are well-defined for any values > that don't overflow the specified buffer, even if they are somewhat > nonsensical. I don't think so. The general rules for valid arguments to C library functions always apply, so according to 7.1.4 calls to the functions with values that are outside the specified ranges for the type have UB. In the <time.h> header the only exception from this rule seems to be `mktime`, which makes such exceptions explicit and says how the argument is normalized if it is not in the ranges as specified. The sample code that I posted does range checks with simple means that never results in unbounded UB and always returns a string that is null terminated. I would think that this is reasonable behavior. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] direct coding of asctime_r 2020-08-27 9:27 ` Jens Gustedt @ 2020-08-27 14:03 ` Rich Felker 2020-08-27 14:24 ` Jens Gustedt 0 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2020-08-27 14:03 UTC (permalink / raw) To: musl On Thu, Aug 27, 2020 at 11:27:59AM +0200, Jens Gustedt wrote: > Rich, > > on Mon, 24 Aug 2020 12:14:00 -0400 you (Rich Felker <dalias@libc.org>) > wrote: > > > I'm not *strongly* opposed to this, but my reasoning is fairly much in > > line with the POSIX side, that these interfaces are legacy/deprecated, > > and in general musl practice is to choose maximum simplicity over > > size/performance optimality for deprecated/legacy or junk interfaces. > > > > In particular, asctime[_r] formats dates in a legacy US format, > > whereas modern applications should be using either ISO date format or > > a locale-specific format. > > But which is also a format used by the language itself (or refered to) > by `__TIME__` and similar macros. Yes, that doesn't indicate that it should continue to be used, though. And in theory you can use __TIME__ just to parse and convert to a more reasonable form. > > Note that ISO C specifies asctime in terms of a particular printf > > format string, meaning the results are well-defined for any values > > that don't overflow the specified buffer, even if they are somewhat > > nonsensical. > > I don't think so. The general rules for valid arguments to C library > functions always apply, so according to 7.1.4 calls to the functions > with values that are outside the specified ranges for the type have > UB. The range of the type is [INT_MIN,INT_MAX]. For tm_wday and wm_mon, UB of out-of-normal-range values would be established just by omission of any spec for what they do. However you missed the actual text in support of your claim, 7.27.3.1 ¶3: "If any of the members of the broken-down time contain values that are outside their normal ranges,323) the behavior of the asctime function is undefined." Normal ranges are defined in 7.27.1 ¶4. So this removes my main potential objection and the remaining question is just whether this is a size optimization that makes sense. > In the <time.h> header the only exception from this rule seems to be > `mktime`, which makes such exceptions explicit and says how the > argument is normalized if it is not in the ranges as specified. > > The sample code that I posted does range checks with simple means that > never results in unbounded UB and always returns a string that is null > terminated. I would think that this is reasonable behavior. I think the behavior of crashing on inputs that are UB and that can't safely be printed should probably be preserved, too; I'm not clear if you had that in mind already. I'm rather indifferent on what happens for inputs that are UB but that can faithfully be presented in the allotted space. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] direct coding of asctime_r 2020-08-27 14:03 ` Rich Felker @ 2020-08-27 14:24 ` Jens Gustedt 0 siblings, 0 replies; 8+ messages in thread From: Jens Gustedt @ 2020-08-27 14:24 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1066 bytes --] on Thu, 27 Aug 2020 10:03:07 -0400 you (Rich Felker <dalias@libc.org>) wrote: > I think the behavior of crashing on inputs that are UB and that can't > safely be printed should probably be preserved, too; I'm not clear if > you had that in mind already. I'm rather indifferent on what happens > for inputs that are UB but that can faithfully be presented in the > allotted space. same for me In the sample implementation I have "goto CLEANUP" and an implicit guarantee that the output is always null terminated. This is more in the spirit of `snprintf` not to do bad things as soon the output buffer has at least 26 bytes. But we could also do `abort()`, `do_crash()`, whatever fits into musl's general strategy for error handling. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-08-27 14:24 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-23 8:24 [musl] direct coding of asctime_r Jens Gustedt 2020-08-23 9:33 ` Alexander Monakov 2020-08-23 9:56 ` Jens Gustedt 2020-08-23 11:08 ` Alexander Monakov 2020-08-24 16:14 ` Rich Felker 2020-08-27 9:27 ` Jens Gustedt 2020-08-27 14:03 ` Rich Felker 2020-08-27 14:24 ` Jens Gustedt
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).