mailing list of musl libc
 help / color / mirror / code / Atom feed
From: "Konstantin P." <ria.freelander@gmail.com>
To: musl@lists.openwall.com
Subject: Re: Draft proposed locale changes
Date: Mon, 5 Mar 2018 21:42:49 +0300	[thread overview]
Message-ID: <CAF1WSuzYsm81k=oCz9_O9+2_h5nLuQGzuZCEMb3Hg=dzb6tt5A@mail.gmail.com> (raw)
In-Reply-To: <20180305183950.GA17616@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 4531 bytes --]

Can you publish official po file for musl after proposed changes?

On Mon, Mar 5, 2018 at 9:39 PM, Rich Felker <dalias@libc.org> wrote:

>
> localeconv/LC_NUMERIC/LC_MONETARY
>
> Each loaded locale needs an immutable lconv structure to represent
> this data. It needs to be allocated with the locale (at locale loading
> time) since localeconv() has no provision for failure, but we can wait
> to populate it lazily, and we can put the code to populate it in
> localeconv.c so that static-linked programs that don't use this
> rarely-used interface don't have to pay for it. We could also omit
> even allocating it (56/96 bytes) if localeconv.o is not linked, but
> it's probably not worth the special-casing code to do that.
>
> The localeconv structure should be part of struct __locale_map, not
> struct __locale_struct, since it's a pure function of the data in the
> memory-mapped locale file and not a function of how that data is
> linked to a specific locale category. Putting it in __locale_struct
> would just complicate setlocale and newlocale.
>
> The obvious (but not terribly efficient) form for the data in the
> locale file is to have each lconv field as a mo-level key, as in:
>
>         msgid "int_frac_digits"
>         msgstr "2"
>
> A more compact form could pack them all into one, but then the order
> becomes a hidden locale-file interface boundary/ABI.
>
> For the string fields it's necessary that they each be in-place
> strings in the mo file. grouping and mon_grouping also have the
> special constraint that they need to vary by whether the arch uses
> signed or unsigned plain-char (since CHAR_MAX has special meaning) so
> the mo file needs to store both versions. That's ugly but I don't see
> any good way around it. We can probably punt on this for now just by
> not supporting grouping (i.e. only supporting locale definitions that
> don't do grouping), since it's not implemented anyway.
>
> If we support decimal_point, it should not go through the localeconv
> mechanism since it would always be needed by printf and strtod.
> Instead __get_locale should probe it right away and set a 1-bit flag
> in the __locale_map structure for these functions to consume (1-bit
> based on previous research that [.,] are the only values).
>
>
>
> nl_langinfo/LC_TIME/etc.
>
> Eliminate the currently-present wrong values for ERA* and related
> LC_TIME stuff; that gets rid of all ambiguous translation keys except
> "May". Bikeshed up some alternate key for May.
>
>
>
> strerror/LC_MESSAGES
>
> Not sure yet. One radical idea I kinda like is removing all the
> English-phrase messages from libc core and just having strerror
> produce strings like "ENOENT", "EPERM", etc. in the C locale. This
> seems to be the only option that wouldn't either moderately increase
> libc size or require translation files to match the exact current text
> in the builtin English libc messages. Users who want the current
> messages would then need an "en" locale with contents like:
>
>         msgid "ENOENT"
>         msgstr "No such file or directory"
>
> If we don't want this, the possible solutions look like one of:
>
> 1. Prepending the error code and a null byte (e.g. "ENOENT\0") to all
> the existing error strings, then skipping past it if the translation
> was not found.
>
> 2. Putting a second version of strerror in locale_map.c with the E*
> names in it, so it's only linked if you use locale. I strongly dislike
> this approach because it greatly increases the marginal size cost of
> doing the right thing (calling setlocale) and imposes the cost even if
> you don't use strerror at all (only setlocale).
>
> 3. Accepting that translations need to match (and perpetually be
> updated to match) error strings in musl __strerror.h. I don't like
> this much either.
>
> So I think it should be between options 1 and "zero" above. Option
> zero decreases the size of libc by nearly 1k (removing messages) but
> changes the behavior. Option 1 increases the size of libc by about 1k.
>
>
>
> LC_COLLATE
>
> No specific proposal yet. We need a data structure to map characters
> and sequences of characters to collating elements. Obviously the mo
> file's lookups could be used directly (O(log n), improved avg case if
> we ever add hash table support) but they might be heavier than we
> want. The alternative would be having a gigantic string in the mo file
> that's just "compiled" collation table data, but unless it's
> well-designed that seems like an undesirable permanent interface
> boundary.
>
>

[-- Attachment #2: Type: text/html, Size: 5343 bytes --]

  reply	other threads:[~2018-03-05 18:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-05 18:39 Rich Felker
2018-03-05 18:42 ` Konstantin P. [this message]
2018-03-05 18:54   ` Rich Felker
2018-03-05 20:00     ` Konstantin P.
2018-03-05 21:25       ` Rich Felker
2018-03-06  1:54         ` Konstantin P.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF1WSuzYsm81k=oCz9_O9+2_h5nLuQGzuZCEMb3Hg=dzb6tt5A@mail.gmail.com' \
    --to=ria.freelander@gmail.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).