mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Josiah Worcester <josiahw@gmail.com>
To: musl@lists.openwall.com
Subject: Re: Build option to disable locale [was: Byte-based C locale, draft 1]
Date: Sun, 7 Jun 2015 19:28:51 -0500	[thread overview]
Message-ID: <CAMAJcuC62F8jp76XFac-z505HUnJDkxFBzSw1D+VKK+0ws3Fxw@mail.gmail.com> (raw)
In-Reply-To: <5574DAE7.8040101@gmx.de>

On Sun, Jun 7, 2015 at 6:59 PM, Harald Becker <ralda@gmx.de> wrote:
> On 07.06.2015 02:24, Rich Felker wrote:
>>
>> It's somewhat more clear what you're talking about, but I'm still not
>> sure what specific pieces of code you would want to omit from libc.so.
>> Which of the following would you want to remove or keep?
>
>
> I did not look into all the details ...
>

To start with: keep in mind that in the case of static linking most of
this is not at all pulled in except when strictly necessary. Static
linking might be more relevant to your needs.

> In general: Keep the API, but add stubs with minimal operation or fail for
> none C locale (etc.).
>
>> - UTF-8 encoding and decoding
>
>
> May be of use to keep, if on bare minimum.

Seeing as the UTF-8 decoder is very small already, I'd be shocked if
you could make an argument for removing that.

>> - Character properties
>
>> - Case mappings
>
> Keep ASCII, map all none ASCII to a single value.

This would be not-quite-right. Also, the case mapping tables are quite
small. towctrans.lo which contains the case mappings is 1106 bytes.

>> - Internal message translation (nl_langinfo strings, errors, etc.)
>
>> - Message translation API (gettext)
>
> No translation at all, keep the English messages (as short as possible).

musl does not have any translations in it at all. It only has a small
portion of logic able to load external translations. locale_map.lo and
__mo_lookup.lo which are together responsible for this, are a total of
1471 bytes.

>> - Charset conversion (iconv)
>
>
> Copy ASCII / UTF-8, but fail for all other.

Though quite possible, it's worth noting that musl iconv is not very
large. iconv.lo is 128408 bytes, or 125k.

>> - Non-ASCII characters in regex and fnmatch patterns/brackers
>
>
> May be the question to allow for UTF-8, but only those, no other charsets
> (should allow to do some optimization and avoid all the extended overhead).

This is already the case.

> fnmatch: Match None ASCII just 1:1, no other special operation.

fnmatch.lo itself is 2227 bytes right now and none of that is in UTF-8
handling. The body of that is in mbtowc.lo and mbsrtowcs.lo, which are
227 bytes and 636 bytes respectively.

> regex: Don't have the experience on the internals of this topic. In general
> allow for 1:1 matching of none ASCII characters, but otherwise behave as C
> locale (e.g. equivalence classes).
>

The regex equivalence classes are handled via the isw* functions which
(as mentioned above) are quite small.

In short, it seems like if we made these changes we'd maybe be able to
trim out 135k and almost all of that would be in iconv. Though I
appreciate the desire for smaller code, this doesn't quite seem like
the place to go looking.


  reply	other threads:[~2015-06-08  0:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-06 21:40 [PATCH] Byte-based C locale, draft 1 Rich Felker
2015-06-06 22:39 ` Harald Becker
2015-06-06 23:10   ` Rich Felker
2015-06-06 23:59     ` Harald Becker
2015-06-07  0:24       ` Rich Felker
2015-06-07 23:59         ` Build option to disable locale [was: Byte-based C locale, draft 1] Harald Becker
2015-06-08  0:28           ` Josiah Worcester [this message]
2015-06-08  1:57             ` Harald Becker
2015-06-08  2:36               ` Rich Felker
2015-06-08  3:35                 ` Harald Becker
2015-06-08  3:51                   ` Josiah Worcester
2015-06-08  0:33           ` Rich Felker
2015-06-08  2:46             ` Harald Becker
2015-06-08  4:06               ` Rich Felker
2015-06-09  3:20               ` Isaac Dunham
2015-06-09  4:27                 ` Rich Felker
2015-06-07  1:17 ` [PATCH] Byte-based C locale, draft 1 Rich Felker
2015-06-07  2:50 ` Rich Felker
2015-06-13  7:06   ` [PATCH] Byte-based C locale, draft 2 Rich Felker
2015-06-16  4:26     ` Rich Felker
2015-06-16  4:35       ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMAJcuC62F8jp76XFac-z505HUnJDkxFBzSw1D+VKK+0ws3Fxw@mail.gmail.com \
    --to=josiahw@gmail.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).