From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12579 Path: news.gmane.org!.POSTED!not-for-mail From: "Konstantin P." Newsgroups: gmane.linux.lib.musl.general Subject: Re: Draft proposed locale changes Date: Mon, 5 Mar 2018 21:42:49 +0300 Message-ID: References: <20180305183950.GA17616@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="94eb2c1c176e60574f0566aeb1ee" X-Trace: blaine.gmane.org 1520275270 26854 195.159.176.226 (5 Mar 2018 18:41:10 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 5 Mar 2018 18:41:10 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-12595-gllmg-musl=m.gmane.org@lists.openwall.com Mon Mar 05 19:41:05 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1esv2g-0005mj-Rt for gllmg-musl@m.gmane.org; Mon, 05 Mar 2018 19:40:58 +0100 Original-Received: (qmail 25836 invoked by uid 550); 5 Mar 2018 18:43:03 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 25810 invoked from network); 5 Mar 2018 18:43:02 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=IIkKqQ5eLQ3YTtr85m5RkqnjkoNGfLSTBw0Xv2XUsHQ=; b=jbV5+JtzNRidZpHsymTnpNGd5enjDMfg8JxWbSS9kI4OYqUJKLDpSeOgU0WW8IL4si QA9k0bsgAqBZVSqBIFFqscvdbjUhkd4Hfe32G96UVvbijq61a07KO3s6UR2Wybv75MLS 5ChxU+1hzslgGVvCs8HQz+mTXDcv5NholGe9Q3CFVpFU3z4wxblyNfrSlAXVTRR5657x 35KBOtzJGK9uL/ytV6LU4xjYspjfpRTvLAYoZhEjRi9hk8r//PajTGtWhSglYsQfCcYA dssf5JMJXL7DajIaqIcvv2n05OIDvEgf6HxYnf//XHSqMqMR5K2X7oV8c04QlhZQ9C0v FjtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=IIkKqQ5eLQ3YTtr85m5RkqnjkoNGfLSTBw0Xv2XUsHQ=; b=rjM+kkzd9xk/zKz/uWOJf9WFkvwvFNcjVl9gqzYgQH+y/YC+FsXDmfo2AxZcUhBC49 CO8hFPJdOMHl9JlFZdtROCEOZSn1R0VOkGYwQ1NnbZ70SBBDx6bk/yN2c6OgdxRA+giT MYGG9Dr2FTJYimE7PffHmUvtTofytmhzVRCiPs0OnxcShEJBJLe0fX6c49V4YXZwGKvT W491xUytSo2DMVPgKfB3Ne9KvaDuSgI0J+8aygx4/q4ytmwr8ka1NOQg4Wg108E7fa7a SSMGoQ6fsdneIAvmy74JT5072BUZ5Xs9tdy5gRf/fpn0A4rKxm4ozNndvFVpJhqEtSyo kl/A== X-Gm-Message-State: AElRT7Gp4JfN/p71o/HFyXCUqW7KbUWBKrrNBB1WljM0Rdd4dVYea9Gi sWx0VUpHgMfKvENQTB/LyvBaPNR1DaXebbyVjEs= X-Google-Smtp-Source: AG47ELuEQ1dSDFyI9T9pFD1XxT6UjoO71wpqQIqVA0ySz1CX3l6LXEtP/V9kXIGhvLnrPCP49JAg/CJl0NzQjQi+ut0= X-Received: by 10.157.66.134 with SMTP id r6mr11748982ote.388.1520275369775; Mon, 05 Mar 2018 10:42:49 -0800 (PST) In-Reply-To: <20180305183950.GA17616@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:12579 Archived-At: --94eb2c1c176e60574f0566aeb1ee Content-Type: text/plain; charset="UTF-8" Can you publish official po file for musl after proposed changes? On Mon, Mar 5, 2018 at 9:39 PM, Rich Felker wrote: > > localeconv/LC_NUMERIC/LC_MONETARY > > Each loaded locale needs an immutable lconv structure to represent > this data. It needs to be allocated with the locale (at locale loading > time) since localeconv() has no provision for failure, but we can wait > to populate it lazily, and we can put the code to populate it in > localeconv.c so that static-linked programs that don't use this > rarely-used interface don't have to pay for it. We could also omit > even allocating it (56/96 bytes) if localeconv.o is not linked, but > it's probably not worth the special-casing code to do that. > > The localeconv structure should be part of struct __locale_map, not > struct __locale_struct, since it's a pure function of the data in the > memory-mapped locale file and not a function of how that data is > linked to a specific locale category. Putting it in __locale_struct > would just complicate setlocale and newlocale. > > The obvious (but not terribly efficient) form for the data in the > locale file is to have each lconv field as a mo-level key, as in: > > msgid "int_frac_digits" > msgstr "2" > > A more compact form could pack them all into one, but then the order > becomes a hidden locale-file interface boundary/ABI. > > For the string fields it's necessary that they each be in-place > strings in the mo file. grouping and mon_grouping also have the > special constraint that they need to vary by whether the arch uses > signed or unsigned plain-char (since CHAR_MAX has special meaning) so > the mo file needs to store both versions. That's ugly but I don't see > any good way around it. We can probably punt on this for now just by > not supporting grouping (i.e. only supporting locale definitions that > don't do grouping), since it's not implemented anyway. > > If we support decimal_point, it should not go through the localeconv > mechanism since it would always be needed by printf and strtod. > Instead __get_locale should probe it right away and set a 1-bit flag > in the __locale_map structure for these functions to consume (1-bit > based on previous research that [.,] are the only values). > > > > nl_langinfo/LC_TIME/etc. > > Eliminate the currently-present wrong values for ERA* and related > LC_TIME stuff; that gets rid of all ambiguous translation keys except > "May". Bikeshed up some alternate key for May. > > > > strerror/LC_MESSAGES > > Not sure yet. One radical idea I kinda like is removing all the > English-phrase messages from libc core and just having strerror > produce strings like "ENOENT", "EPERM", etc. in the C locale. This > seems to be the only option that wouldn't either moderately increase > libc size or require translation files to match the exact current text > in the builtin English libc messages. Users who want the current > messages would then need an "en" locale with contents like: > > msgid "ENOENT" > msgstr "No such file or directory" > > If we don't want this, the possible solutions look like one of: > > 1. Prepending the error code and a null byte (e.g. "ENOENT\0") to all > the existing error strings, then skipping past it if the translation > was not found. > > 2. Putting a second version of strerror in locale_map.c with the E* > names in it, so it's only linked if you use locale. I strongly dislike > this approach because it greatly increases the marginal size cost of > doing the right thing (calling setlocale) and imposes the cost even if > you don't use strerror at all (only setlocale). > > 3. Accepting that translations need to match (and perpetually be > updated to match) error strings in musl __strerror.h. I don't like > this much either. > > So I think it should be between options 1 and "zero" above. Option > zero decreases the size of libc by nearly 1k (removing messages) but > changes the behavior. Option 1 increases the size of libc by about 1k. > > > > LC_COLLATE > > No specific proposal yet. We need a data structure to map characters > and sequences of characters to collating elements. Obviously the mo > file's lookups could be used directly (O(log n), improved avg case if > we ever add hash table support) but they might be heavier than we > want. The alternative would be having a gigantic string in the mo file > that's just "compiled" collation table data, but unless it's > well-designed that seems like an undesirable permanent interface > boundary. > > --94eb2c1c176e60574f0566aeb1ee Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Can you publish official po file for musl after proposed c= hanges?

= On Mon, Mar 5, 2018 at 9:39 PM, Rich Felker <dalias@libc.org> wrote:

localeconv/LC_NUMERIC/LC_MONETARY

Each loaded locale needs an immutable lconv structure to represent
this data. It needs to be allocated with the locale (at locale loading
time) since localeconv() has no provision for failure, but we can wait
to populate it lazily, and we can put the code to populate it in
localeconv.c so that static-linked programs that don't use this
rarely-used interface don't have to pay for it. We could also omit
even allocating it (56/96 bytes) if localeconv.o is not linked, but
it's probably not worth the special-casing code to do that.

The localeconv structure should be part of struct __locale_map, not
struct __locale_struct, since it's a pure function of the data in the memory-mapped locale file and not a function of how that data is
linked to a specific locale category. Putting it in __locale_struct
would just complicate setlocale and newlocale.

The obvious (but not terribly efficient) form for the data in the
locale file is to have each lconv field as a mo-level key, as in:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 msgid "int_frac_digits"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 msgstr "2"

A more compact form could pack them all into one, but then the order
becomes a hidden locale-file interface boundary/ABI.

For the string fields it's necessary that they each be in-place
strings in the mo file. grouping and mon_grouping also have the
special constraint that they need to vary by whether the arch uses
signed or unsigned plain-char (since CHAR_MAX has special meaning) so
the mo file needs to store both versions. That's ugly but I don't s= ee
any good way around it. We can probably punt on this for now just by
not supporting grouping (i.e. only supporting locale definitions that
don't do grouping), since it's not implemented anyway.

If we support decimal_point, it should not go through the localeconv
mechanism since it would always be needed by printf and strtod.
Instead __get_locale should probe it right away and set a 1-bit flag
in the __locale_map structure for these functions to consume (1-bit
based on previous research that [.,] are the only values).



nl_langinfo/LC_TIME/etc.

Eliminate the currently-present wrong values for ERA* and related
LC_TIME stuff; that gets rid of all ambiguous translation keys except
"May". Bikeshed up some alternate key for May.



strerror/LC_MESSAGES

Not sure yet. One radical idea I kinda like is removing all the
English-phrase messages from libc core and just having strerror
produce strings like "ENOENT", "EPERM", etc. in the C l= ocale. This
seems to be the only option that wouldn't either moderately increase libc size or require translation files to match the exact current text
in the builtin English libc messages. Users who want the current
messages would then need an "en" locale with contents like:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 msgid "ENOENT"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 msgstr "No such file or directory"
If we don't want this, the possible solutions look like one of:

1. Prepending the error code and a null byte (e.g. "ENOENT\0") to= all
the existing error strings, then skipping past it if the translation
was not found.

2. Putting a second version of strerror in locale_map.c with the E*
names in it, so it's only linked if you use locale. I strongly dislike<= br> this approach because it greatly increases the marginal size cost of
doing the right thing (calling setlocale) and imposes the cost even if
you don't use strerror at all (only setlocale).

3. Accepting that translations need to match (and perpetually be
updated to match) error strings in musl __strerror.h. I don't like
this much either.

So I think it should be between options 1 and "zero" above. Optio= n
zero decreases the size of libc by nearly 1k (removing messages) but
changes the behavior. Option 1 increases the size of libc by about 1k.



LC_COLLATE

No specific proposal yet. We need a data structure to map characters
and sequences of characters to collating elements. Obviously the mo
file's lookups could be used directly (O(log n), improved avg case if we ever add hash table support) but they might be heavier than we
want. The alternative would be having a gigantic string in the mo file
that's just "compiled" collation table data, but unless it= 9;s
well-designed that seems like an undesirable permanent interface
boundary.


--94eb2c1c176e60574f0566aeb1ee--