mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] direct coding of asctime_r
@ 2020-08-23  8:24 Jens Gustedt
  2020-08-23  9:33 ` Alexander Monakov
  2020-08-24 16:14 ` Rich Felker
  0 siblings, 2 replies; 8+ messages in thread
From: Jens Gustedt @ 2020-08-23  8:24 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

Hello,
I don't know if you guys noticed, but sometime ago we voted some of
the ..._r functions from <time.h> into the C standard, just to then
discover that POSIX has deprecated the whole set of functions and
proposes to replace them by `strftime`.

One of the arguments to keep them, was that `asctime_r` does not need
access to locale and has a fixed format, and so can be implemented
with a much smaller footprint.

Looking into musl I found that the current implementation is basically
doing verbatim what the C standard says, namely uses `snprintf` under
the hood to do the formatting. This has obviously the disadvantage
that this drags the whole infrastructure that is needed for `snprintf`
into the executable.

Making some tests, I found that coding `asctime_r` straight forward
with byte-copying has it shave off about 10k from the final
executable.

Would it be interesting for musl to change to such an implementation?

Shall I prepare a patch to do so?

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] direct coding of asctime_r
  2020-08-23  8:24 [musl] direct coding of asctime_r Jens Gustedt
@ 2020-08-23  9:33 ` Alexander Monakov
  2020-08-23  9:56   ` Jens Gustedt
  2020-08-24 16:14 ` Rich Felker
  1 sibling, 1 reply; 8+ messages in thread
From: Alexander Monakov @ 2020-08-23  9:33 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

On Sun, 23 Aug 2020, Jens Gustedt wrote:

> Looking into musl I found that the current implementation is basically
> doing verbatim what the C standard says, namely uses `snprintf` under
> the hood to do the formatting. This has obviously the disadvantage
> that this drags the whole infrastructure that is needed for `snprintf`
> into the executable.
> 
> Making some tests, I found that coding `asctime_r` straight forward
> with byte-copying has it shave off about 10k from the final
> executable.

Do I understand correctly that this 10k figure is for an "application"
that does not use stdio at all otherwise? If so, I believe that is a
quite unrealistic test — why would an application use asctime_r but then
avoid use of stdio to do something useful with the result?

Alexander

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] direct coding of asctime_r
  2020-08-23  9:33 ` Alexander Monakov
@ 2020-08-23  9:56   ` Jens Gustedt
  2020-08-23 11:08     ` Alexander Monakov
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Gustedt @ 2020-08-23  9:56 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 841 bytes --]

Alexander,
for simplicity I attach what I have.

on Sun, 23 Aug 2020 12:33:30 +0300 (MSK) you (Alexander Monakov
<amonakov@ispras.ru>) wrote:

> Do I understand correctly that this 10k figure is for an "application"
> that does not use stdio at all otherwise?

It just uses unformatted IO, namely `puts`.

> If so, I believe that is a
> quite unrealistic test — why would an application use asctime_r but
> then avoid use of stdio to do something useful with the result?

Just dumping a time stamp to a file e.g.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: test-asctime.c --]
[-- Type: text/x-c++src, Size: 2114 bytes --]

#include <time.h>
#include <string.h>
#include <stdio.h>

static unsigned print_decimal(size_t places, char buf[places], unsigned val) {
  for (size_t pos = places; pos > 0; pos--) {
    buf[pos-1] = (val % 10) + '0';
    val /= 10;
  }
  return val;
}

char *asctime_r(const struct tm *tm, char *buf) {
  static char const wday[7][3] = {
                                "Sun",
                                "Mon",
                                "Tue",
                                "Wed",
                                "Thu",
                                "Fri",
                                "Sat",
  };
  static char const mon[12][3] = {
                                "Jan",
                                "Feb",
                                "Mar",
                                "Apr",
                                "May",
                                "Jun",
                                "Jul",
                                "Aug",
                                "Sep",
                                "Oct",
                                "Nov",
                                "Dec",
  };
  memcpy(buf, "\0\0\0 \0\0\0  \0 \0\0:\0\0:\0\0 \0\0\0\0\n", 26);

  if (tm->tm_wday >= 7u) goto CLEANUP;
  memcpy(buf, wday[tm->tm_wday], 3);

  if (tm->tm_mon >= 12u) goto CLEANUP;
  memcpy(buf+4, mon[tm->tm_mon], 3);

  if (tm->tm_mday < 10u) {
    if (print_decimal(1, buf+9, tm->tm_mday)) goto CLEANUP;
  } else {
    if (print_decimal(2, buf+8, tm->tm_mday)) goto CLEANUP;
  }
  if (print_decimal(2, buf+11, tm->tm_hour)) goto CLEANUP;
  if (print_decimal(2, buf+14, tm->tm_min)) goto CLEANUP;
  if (print_decimal(2, buf+17, tm->tm_sec)) goto CLEANUP;
  if (1900u+tm->tm_year < 1000u || print_decimal(4, buf+20, 1900u+tm->tm_year)) goto CLEANUP;

 CLEANUP:
  return buf;
}


int main(int argc, char* argv[argc+1]) {
  char buf[26];
  struct tm T;
  time_t t = time(0);
  gmtime_r(&t, &T);
  if (argc == 2) T.tm_mon = 13;
  if (argc == 3) T.tm_mon = -6;
  asctime_r(&T, buf);
  puts(buf);
  //puts(asctime(&T));
}

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] direct coding of asctime_r
  2020-08-23  9:56   ` Jens Gustedt
@ 2020-08-23 11:08     ` Alexander Monakov
  0 siblings, 0 replies; 8+ messages in thread
From: Alexander Monakov @ 2020-08-23 11:08 UTC (permalink / raw)
  To: musl

On Sun, 23 Aug 2020, Jens Gustedt wrote:

> > Do I understand correctly that this 10k figure is for an "application"
> > that does not use stdio at all otherwise?
> 
> It just uses unformatted IO, namely `puts`.

I see, thanks. Indeed printf-style formatting can pull in a bit above 10KB
on top of what 'puts' alone needs. On x86-64, top 5 new symbols by size are:

357 T vfprintf
464 r states
1823 r errmsg
2384 t printf_core
2930 t fmt_fp

Of those, errmsg would also be pulled in by 'strerror'.

Alexander

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] direct coding of asctime_r
  2020-08-23  8:24 [musl] direct coding of asctime_r Jens Gustedt
  2020-08-23  9:33 ` Alexander Monakov
@ 2020-08-24 16:14 ` Rich Felker
  2020-08-27  9:27   ` Jens Gustedt
  1 sibling, 1 reply; 8+ messages in thread
From: Rich Felker @ 2020-08-24 16:14 UTC (permalink / raw)
  To: musl

On Sun, Aug 23, 2020 at 10:24:39AM +0200, Jens Gustedt wrote:
> Hello,
> I don't know if you guys noticed, but sometime ago we voted some of
> the ..._r functions from <time.h> into the C standard, just to then
> discover that POSIX has deprecated the whole set of functions and
> proposes to replace them by `strftime`.
> 
> One of the arguments to keep them, was that `asctime_r` does not need
> access to locale and has a fixed format, and so can be implemented
> with a much smaller footprint.
> 
> Looking into musl I found that the current implementation is basically
> doing verbatim what the C standard says, namely uses `snprintf` under
> the hood to do the formatting. This has obviously the disadvantage
> that this drags the whole infrastructure that is needed for `snprintf`
> into the executable.
> 
> Making some tests, I found that coding `asctime_r` straight forward
> with byte-copying has it shave off about 10k from the final
> executable.
> 
> Would it be interesting for musl to change to such an implementation?
> 
> Shall I prepare a patch to do so?

I'm not *strongly* opposed to this, but my reasoning is fairly much in
line with the POSIX side, that these interfaces are legacy/deprecated,
and in general musl practice is to choose maximum simplicity over
size/performance optimality for deprecated/legacy or junk interfaces.

In particular, asctime[_r] formats dates in a legacy US format,
whereas modern applications should be using either ISO date format or
a locale-specific format.

Note that ISO C specifies asctime in terms of a particular printf
format string, meaning the results are well-defined for any values
that don't overflow the specified buffer, even if they are somewhat
nonsensical. In particular, as specified, it's required to accept
negative hours etc. if the year is such that it requires fewer than 4
digits. Direct coding this and ensuring all the cases are covered
seems nontrivial. One might argue that this is stupid and asctime is
not intended to be used in this way or useful to use in this way, and
I'd tend to agree but it's not something I'd want to argue with
someone who had a fairly legitimate claim that it's allowable
as-specified when there's no real compelling reason not to just
implement it as-specified with snprintf.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] direct coding of asctime_r
  2020-08-24 16:14 ` Rich Felker
@ 2020-08-27  9:27   ` Jens Gustedt
  2020-08-27 14:03     ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Gustedt @ 2020-08-27  9:27 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1826 bytes --]

Rich,

on Mon, 24 Aug 2020 12:14:00 -0400 you (Rich Felker <dalias@libc.org>)
wrote:

> I'm not *strongly* opposed to this, but my reasoning is fairly much in
> line with the POSIX side, that these interfaces are legacy/deprecated,
> and in general musl practice is to choose maximum simplicity over
> size/performance optimality for deprecated/legacy or junk interfaces.
> 
> In particular, asctime[_r] formats dates in a legacy US format,
> whereas modern applications should be using either ISO date format or
> a locale-specific format.

But which is also a format used by the language itself (or refered to)
by `__TIME__` and similar macros.

> Note that ISO C specifies asctime in terms of a particular printf
> format string, meaning the results are well-defined for any values
> that don't overflow the specified buffer, even if they are somewhat
> nonsensical.

I don't think so. The general rules for valid arguments to C library
functions always apply, so according to 7.1.4 calls to the functions
with values that are outside the specified ranges for the type have
UB.

In the <time.h> header the only exception from this rule seems to be
`mktime`, which makes such exceptions explicit and says how the
argument is normalized if it is not in the ranges as specified.

The sample code that I posted does range checks with simple means that
never results in unbounded UB and always returns a string that is null
terminated. I would think that this is reasonable behavior.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] direct coding of asctime_r
  2020-08-27  9:27   ` Jens Gustedt
@ 2020-08-27 14:03     ` Rich Felker
  2020-08-27 14:24       ` Jens Gustedt
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2020-08-27 14:03 UTC (permalink / raw)
  To: musl

On Thu, Aug 27, 2020 at 11:27:59AM +0200, Jens Gustedt wrote:
> Rich,
> 
> on Mon, 24 Aug 2020 12:14:00 -0400 you (Rich Felker <dalias@libc.org>)
> wrote:
> 
> > I'm not *strongly* opposed to this, but my reasoning is fairly much in
> > line with the POSIX side, that these interfaces are legacy/deprecated,
> > and in general musl practice is to choose maximum simplicity over
> > size/performance optimality for deprecated/legacy or junk interfaces.
> > 
> > In particular, asctime[_r] formats dates in a legacy US format,
> > whereas modern applications should be using either ISO date format or
> > a locale-specific format.
> 
> But which is also a format used by the language itself (or refered to)
> by `__TIME__` and similar macros.

Yes, that doesn't indicate that it should continue to be used, though.
And in theory you can use __TIME__ just to parse and convert to a more
reasonable form.

> > Note that ISO C specifies asctime in terms of a particular printf
> > format string, meaning the results are well-defined for any values
> > that don't overflow the specified buffer, even if they are somewhat
> > nonsensical.
> 
> I don't think so. The general rules for valid arguments to C library
> functions always apply, so according to 7.1.4 calls to the functions
> with values that are outside the specified ranges for the type have
> UB.

The range of the type is [INT_MIN,INT_MAX]. For tm_wday and wm_mon, UB
of out-of-normal-range values would be established just by omission of
any spec for what they do. However you missed the actual text in
support of your claim, 7.27.3.1 ¶3:

    "If any of the members of the broken-down time contain values that
    are outside their normal ranges,323) the behavior of the asctime
    function is undefined."

Normal ranges are defined in 7.27.1 ¶4.

So this removes my main potential objection and the remaining question
is just whether this is a size optimization that makes sense.

> In the <time.h> header the only exception from this rule seems to be
> `mktime`, which makes such exceptions explicit and says how the
> argument is normalized if it is not in the ranges as specified.
> 
> The sample code that I posted does range checks with simple means that
> never results in unbounded UB and always returns a string that is null
> terminated. I would think that this is reasonable behavior.

I think the behavior of crashing on inputs that are UB and that can't
safely be printed should probably be preserved, too; I'm not clear if
you had that in mind already. I'm rather indifferent on what happens
for inputs that are UB but that can faithfully be presented in the
allotted space.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] direct coding of asctime_r
  2020-08-27 14:03     ` Rich Felker
@ 2020-08-27 14:24       ` Jens Gustedt
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Gustedt @ 2020-08-27 14:24 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]


on Thu, 27 Aug 2020 10:03:07 -0400 you (Rich Felker <dalias@libc.org>)
wrote:

> I think the behavior of crashing on inputs that are UB and that can't
> safely be printed should probably be preserved, too; I'm not clear if
> you had that in mind already. I'm rather indifferent on what happens
> for inputs that are UB but that can faithfully be presented in the
> allotted space.

same for me

In the sample implementation I have "goto CLEANUP" and an implicit
guarantee that the output is always null terminated. This is more in
the spirit of `snprintf` not to do bad things as soon the output
buffer has at least 26 bytes.

But we could also do `abort()`, `do_crash()`, whatever fits into
musl's general strategy for error handling.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-08-27 14:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-23  8:24 [musl] direct coding of asctime_r Jens Gustedt
2020-08-23  9:33 ` Alexander Monakov
2020-08-23  9:56   ` Jens Gustedt
2020-08-23 11:08     ` Alexander Monakov
2020-08-24 16:14 ` Rich Felker
2020-08-27  9:27   ` Jens Gustedt
2020-08-27 14:03     ` Rich Felker
2020-08-27 14:24       ` Jens Gustedt

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).