From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7902 Path: news.gmane.org!not-for-mail From: Josiah Worcester Newsgroups: gmane.linux.lib.musl.general Subject: Re: Build option to disable locale [was: Byte-based C locale, draft 1] Date: Sun, 7 Jun 2015 19:28:51 -0500 Message-ID: References: <20150606214007.GA17398@brightrain.aerifal.cx> <55737684.7020803@gmx.de> <20150606231057.GZ17573@brightrain.aerifal.cx> <55738979.4030809@gmx.de> <20150607002459.GA17573@brightrain.aerifal.cx> <5574DAE7.8040101@gmx.de> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1433723348 13156 80.91.229.3 (8 Jun 2015 00:29:08 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 8 Jun 2015 00:29:08 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7915-gllmg-musl=m.gmane.org@lists.openwall.com Mon Jun 08 02:29:08 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Z1kw4-0007Af-UH for gllmg-musl@m.gmane.org; Mon, 08 Jun 2015 02:29:05 +0200 Original-Received: (qmail 11415 invoked by uid 550); 8 Jun 2015 00:29:03 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 11387 invoked from network); 8 Jun 2015 00:29:02 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=W6wPCzGMn3aJUBnOlTJdL+i2lRRwZt6IxOryGvw+x08=; b=GeO22T9F+UtC7eSFsdY+fy6gr/hyxfQCGRImBlCtneSy/ff4A05YFB4LE4OQ5OY4XW NvccOP0LJHY1ce2OoqGRNIb/ov7w+B3//qcrJZTcdiewrQdGNjs9+0kertHhn+v7Rgo/ v0hstPN5+1z8RDH4ASXAyKTBWN/8JrK8qbFDh6INPEXlcziB9ctY/K5MWnwVTmeYkw/Q G5WIUfCUop9+iXD+oPQILWnla7X8k/PwyqJQfxOPAON1qEX12DAf8ZX2l7lfhqbuxQ5G TJ5MrpR0fHEzJH6oG70zC3Y2UkN9SPQToHXbH0FilyNJEXGIOdhjNmsU54j0JXUAmClb zhgA== X-Received: by 10.152.5.164 with SMTP id t4mr13997888lat.16.1433723331398; Sun, 07 Jun 2015 17:28:51 -0700 (PDT) In-Reply-To: <5574DAE7.8040101@gmx.de> Xref: news.gmane.org gmane.linux.lib.musl.general:7902 Archived-At: On Sun, Jun 7, 2015 at 6:59 PM, Harald Becker wrote: > On 07.06.2015 02:24, Rich Felker wrote: >> >> It's somewhat more clear what you're talking about, but I'm still not >> sure what specific pieces of code you would want to omit from libc.so. >> Which of the following would you want to remove or keep? > > > I did not look into all the details ... > To start with: keep in mind that in the case of static linking most of this is not at all pulled in except when strictly necessary. Static linking might be more relevant to your needs. > In general: Keep the API, but add stubs with minimal operation or fail for > none C locale (etc.). > >> - UTF-8 encoding and decoding > > > May be of use to keep, if on bare minimum. Seeing as the UTF-8 decoder is very small already, I'd be shocked if you could make an argument for removing that. >> - Character properties > >> - Case mappings > > Keep ASCII, map all none ASCII to a single value. This would be not-quite-right. Also, the case mapping tables are quite small. towctrans.lo which contains the case mappings is 1106 bytes. >> - Internal message translation (nl_langinfo strings, errors, etc.) > >> - Message translation API (gettext) > > No translation at all, keep the English messages (as short as possible). musl does not have any translations in it at all. It only has a small portion of logic able to load external translations. locale_map.lo and __mo_lookup.lo which are together responsible for this, are a total of 1471 bytes. >> - Charset conversion (iconv) > > > Copy ASCII / UTF-8, but fail for all other. Though quite possible, it's worth noting that musl iconv is not very large. iconv.lo is 128408 bytes, or 125k. >> - Non-ASCII characters in regex and fnmatch patterns/brackers > > > May be the question to allow for UTF-8, but only those, no other charsets > (should allow to do some optimization and avoid all the extended overhead). This is already the case. > fnmatch: Match None ASCII just 1:1, no other special operation. fnmatch.lo itself is 2227 bytes right now and none of that is in UTF-8 handling. The body of that is in mbtowc.lo and mbsrtowcs.lo, which are 227 bytes and 636 bytes respectively. > regex: Don't have the experience on the internals of this topic. In general > allow for 1:1 matching of none ASCII characters, but otherwise behave as C > locale (e.g. equivalence classes). > The regex equivalence classes are handled via the isw* functions which (as mentioned above) are quite small. In short, it seems like if we made these changes we'd maybe be able to trim out 135k and almost all of that would be in iconv. Though I appreciate the desire for smaller code, this doesn't quite seem like the place to go looking.