mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] strftime %Z behavior with manually populated struct tm
@ 2020-08-10 13:31 Nikita Popov
  2020-08-10 16:53 ` Rich Felker
  2020-08-12 12:57 ` [musl] " Nikita Popov
  0 siblings, 2 replies; 5+ messages in thread
From: Nikita Popov @ 2020-08-10 13:31 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1084 bytes --]

Hi,

Currently, strftime() %Z will print an empty string if the provided tm_zone
does not originate from musl. It appears that this behavior is explicitly
implemented in __tm_to_tzname(). Would it be possible to instead print any
non-NULL tm_zone as provided?

For consumers manually populating the struct tm structure, it is
non-trivial to work around the current behavior. Python introduced a work
around in
https://github.com/python/cpython/commit/163eca34c48f1b25e1504e37f4656773fd0fdc78,
but it is not easy to generalize.

From what I gathered, the original concern here was that a consumer
manually initializing struct tm may not initialize the tm_zone field. As
struct tm is only specified to contain "at least" certain members, manual
initialization of this structure is already on shaky ground anyway and I
think it is more useful to assume the initialization is correct than try to
deal with garbage data. Because other libc implementation do not try to
validate the pointer either (beyond being non-NULL), incorrect
initialization is unlikely in practice.

Regards,
Nikita

[-- Attachment #2: Type: text/html, Size: 2154 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] strftime %Z behavior with manually populated struct tm
  2020-08-10 13:31 [musl] strftime %Z behavior with manually populated struct tm Nikita Popov
@ 2020-08-10 16:53 ` Rich Felker
  2020-08-12 12:57 ` [musl] " Nikita Popov
  1 sibling, 0 replies; 5+ messages in thread
From: Rich Felker @ 2020-08-10 16:53 UTC (permalink / raw)
  To: musl

On Mon, Aug 10, 2020 at 03:31:23PM +0200, Nikita Popov wrote:
> Hi,
> 
> Currently, strftime() %Z will print an empty string if the provided tm_zone
> does not originate from musl. It appears that this behavior is explicitly
> implemented in __tm_to_tzname(). Would it be possible to instead print any
> non-NULL tm_zone as provided?

I don't think so. The problem is that %Z is specified, albeit
underspecified, as something portable applications can pass to
strftime, but tm_zone is not standard and may be uninitialized in a
structure the caller passes. Attempting to honor it would mean
crashing on inputs where it's not present, which is nonconforming and
unacceptable.

I haven't checked, but I believe most implementations just print the
zone name from the current timezone, using tm_isdst to decide whether
to print the standard or daylight version of the name. This is
insufficient with zoneinfo for zones where the name changed over time,
where it would print the wrong name for historical times. So instead
we support printing any one of the zone names from the current zone,
if the tm_zone member points to one of them, and blank otherwise.

> For consumers manually populating the struct tm structure, it is
> non-trivial to work around the current behavior. Python introduced a work
> around in
> https://github.com/python/cpython/commit/163eca34c48f1b25e1504e37f4656773fd0fdc78,
> but it is not easy to generalize.

I think you have the wrong commit link; that one does not look
related. If you have the real one I'd be interested in seeing what
they did.

> From what I gathered, the original concern here was that a consumer
> manually initializing struct tm may not initialize the tm_zone field. As
> struct tm is only specified to contain "at least" certain members, manual
> initialization of this structure is already on shaky ground anyway and I

This is not the case. There is nothing shaky about using the interface
in the standard-specified way. The specification already labels which
struct tm fields each format uses, and it's perfectly valid to call
strftime with junk in the rest; this is even fairly common usage (e.g.
not filling in the derived fields that you'd need mktime to get).

On the other hand, it's already dubious doing anything more than
ignoring tm_zone and just using tm_isdst (which is the field the spec
labels %Z as depending on). I only did this because it's impossible to
give culturally-correct results in all zones with just tm_isdst and
because the degree of underspecification made it seem defensible.

> think it is more useful to assume the initialization is correct than try to
> deal with garbage data. Because other libc implementation do not try to
> validate the pointer either (beyond being non-NULL), incorrect
> initialization is unlikely in practice.

I think if applications want to use zones other than the actual
configured zone with strftime, they need to just do something like
expand the %Z themselves with the string they want before calling
strftime (note: this requires quoting any % in the name). I looked
hard for a better solution that wouldn't crash valid applications, and
couldn't find one at the time.

There is a proposal for a new C time API that allows zone objects
(like locale_t objects for locales, but hopefully better-designed) and
functions that take the zone as an argument. This would be a really
good solution once it's established, but I don't know the status on
it.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [musl] Re: strftime %Z behavior with manually populated struct tm
  2020-08-10 13:31 [musl] strftime %Z behavior with manually populated struct tm Nikita Popov
  2020-08-10 16:53 ` Rich Felker
@ 2020-08-12 12:57 ` Nikita Popov
  2020-08-12 14:05   ` Nikita Popov
  2020-08-12 15:17   ` Rich Felker
  1 sibling, 2 replies; 5+ messages in thread
From: Nikita Popov @ 2020-08-12 12:57 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1833 bytes --]

> I haven't checked, but I believe most implementations just print the
> zone name from the current timezone, using tm_isdst to decide whether
> to print the standard or daylight version of the name. This is
> insufficient with zoneinfo for zones where the name changed over time,
> where it would print the wrong name for historical times. So instead
> we support printing any one of the zone names from the current zone,
> if the tm_zone member points to one of them, and blank otherwise.

You are right. I was under the impression that glibc uses tm_zone, but
double checking the implementation right now, it doesn't. So the behavioral
discrepancy here comes from the fact that musl checks tm_zone at all, not
the other way around. Sorry for looking in the completely wrong direction
here.

> I think you have the wrong commit link; that one does not look
> related. If you have the real one I'd be interested in seeing what
> they did.

Indeed, I meant to link to
https://github.com/python/cpython/commit/5633c4f342d957df2ef0d67b9bfb472a0d28a76b.
What Python does is to retrieve musl's string for UTC using gmtime_r, as
well as trying to match against the current tzname.

> I think if applications want to use zones other than the actual
> configured zone with strftime, they need to just do something like
> expand the %Z themselves with the string they want before calling
> strftime (note: this requires quoting any % in the name).

With your explanation in mind, explicitly expanding %Z does seem like a
good approach. I'll consider doing that.

> I looked
> hard for a better solution that wouldn't crash valid applications, and
> couldn't find one at the time.

Possibly it would make sense to fall back to using the current tzname if
the pointer given in tm_zone is invalid, rather than an empty string?

Regards,
Nikita

[-- Attachment #2: Type: text/html, Size: 2268 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [musl] Re: strftime %Z behavior with manually populated struct tm
  2020-08-12 12:57 ` [musl] " Nikita Popov
@ 2020-08-12 14:05   ` Nikita Popov
  2020-08-12 15:17   ` Rich Felker
  1 sibling, 0 replies; 5+ messages in thread
From: Nikita Popov @ 2020-08-12 14:05 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1429 bytes --]

On Wed, Aug 12, 2020 at 2:57 PM Nikita Popov <nikita.ppv@gmail.com> wrote:

> > I haven't checked, but I believe most implementations just print the
> > zone name from the current timezone, using tm_isdst to decide whether
> > to print the standard or daylight version of the name. This is
> > insufficient with zoneinfo for zones where the name changed over time,
> > where it would print the wrong name for historical times. So instead
> > we support printing any one of the zone names from the current zone,
> > if the tm_zone member points to one of them, and blank otherwise.
>
> You are right. I was under the impression that glibc uses tm_zone, but
> double checking the implementation right now, it doesn't. So the behavioral
> discrepancy here comes from the fact that musl checks tm_zone at all, not
> the other way around. Sorry for looking in the completely wrong direction
> here.
>

Nevermind, I just confused myself further here. glibc does use tm_zone,
with the following comment:

    /* The POSIX test suite assumes that setting
    the environment variable TZ to a new value before calling strftime()
    will influence the result (the %Z format) even if the information in
    TP is computed with a totally different time zone.
    This is bogus: though POSIX allows bad behavior like this,
    POSIX does not require it. Do the right thing instead. */
    zone = (const char *) tp->tm_zone;

Regards,
Nikita

[-- Attachment #2: Type: text/html, Size: 1955 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Re: strftime %Z behavior with manually populated struct tm
  2020-08-12 12:57 ` [musl] " Nikita Popov
  2020-08-12 14:05   ` Nikita Popov
@ 2020-08-12 15:17   ` Rich Felker
  1 sibling, 0 replies; 5+ messages in thread
From: Rich Felker @ 2020-08-12 15:17 UTC (permalink / raw)
  To: musl

On Wed, Aug 12, 2020 at 02:57:08PM +0200, Nikita Popov wrote:
> > I haven't checked, but I believe most implementations just print the
> > zone name from the current timezone, using tm_isdst to decide whether
> > to print the standard or daylight version of the name. This is
> > insufficient with zoneinfo for zones where the name changed over time,
> > where it would print the wrong name for historical times. So instead
> > we support printing any one of the zone names from the current zone,
> > if the tm_zone member points to one of them, and blank otherwise.
> 
> You are right. I was under the impression that glibc uses tm_zone, but
> double checking the implementation right now, it doesn't. So the behavioral
> discrepancy here comes from the fact that musl checks tm_zone at all, not
> the other way around. Sorry for looking in the completely wrong direction
> here.
> 
> > I think you have the wrong commit link; that one does not look
> > related. If you have the real one I'd be interested in seeing what
> > they did.
> 
> Indeed, I meant to link to
> https://github.com/python/cpython/commit/5633c4f342d957df2ef0d67b9bfb472a0d28a76b.
> What Python does is to retrieve musl's string for UTC using gmtime_r, as
> well as trying to match against the current tzname.
> 
> > I think if applications want to use zones other than the actual
> > configured zone with strftime, they need to just do something like
> > expand the %Z themselves with the string they want before calling
> > strftime (note: this requires quoting any % in the name).
> 
> With your explanation in mind, explicitly expanding %Z does seem like a
> good approach. I'll consider doing that.
> 
> > I looked
> > hard for a better solution that wouldn't crash valid applications, and
> > couldn't find one at the time.
> 
> Possibly it would make sense to fall back to using the current tzname if
> the pointer given in tm_zone is invalid, rather than an empty string?

This might make sense. I was thinking from a standpoint of most
reasonable behavior for an uninitialized field, but but can always
give junk anyway (if you happen to have the exact wrong pointer value
that points to the current timezone's string area) so it's really not
useful to care about that case except for it not crashing. Using
tm_isdst and __timezone if the caller zeros the field would still be
useful potentially.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-12 15:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-10 13:31 [musl] strftime %Z behavior with manually populated struct tm Nikita Popov
2020-08-10 16:53 ` Rich Felker
2020-08-12 12:57 ` [musl] " Nikita Popov
2020-08-12 14:05   ` Nikita Popov
2020-08-12 15:17   ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).