From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7911 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Build option to disable locale [was: Byte-based C locale, draft 1] Date: Tue, 9 Jun 2015 00:27:30 -0400 Message-ID: <20150609042730.GH17573@brightrain.aerifal.cx> References: <20150606214007.GA17398@brightrain.aerifal.cx> <55737684.7020803@gmx.de> <20150606231057.GZ17573@brightrain.aerifal.cx> <55738979.4030809@gmx.de> <20150607002459.GA17573@brightrain.aerifal.cx> <5574DAE7.8040101@gmx.de> <20150608003315.GD17573@brightrain.aerifal.cx> <55750212.5090304@gmx.de> <20150609032025.GA1605@localhost> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1433824069 3622 80.91.229.3 (9 Jun 2015 04:27:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 9 Jun 2015 04:27:49 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7924-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jun 09 06:27:49 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Z2B8d-0000zH-4Q for gllmg-musl@m.gmane.org; Tue, 09 Jun 2015 06:27:47 +0200 Original-Received: (qmail 13893 invoked by uid 550); 9 Jun 2015 04:27:45 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 13870 invoked from network); 9 Jun 2015 04:27:44 -0000 Content-Disposition: inline In-Reply-To: <20150609032025.GA1605@localhost> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:7911 Archived-At: On Mon, Jun 08, 2015 at 08:20:26PM -0700, Isaac Dunham wrote: > On Mon, Jun 08, 2015 at 04:46:42AM +0200, Harald Becker wrote: > > On 08.06.2015 02:33, Rich Felker wrote: > > >So aside from iconv, the above seem to total around 19k, and at least > > >6k of that is mandatory if you want to be able to claim to support > > >UTF-8. So the topic at hand seems to be whether you can save <13k of > > >libc.so size by hacking out character handling/locale related features > > >that are non-essential to basic UTF-8 support... > > > > I like to get a stripped down version, which eliminate all the unnecessary > > char set handling code used in dedicated systems, but stripping that on > > every release is too much work to do. > > > > The benefit may be for: > > > > - embedded systems > > - small initramfs based systems > > - container systems > > - minimal chroot environments > > Somehow it sounds like you may not have gotten wat Rich was asking. > > IIRC, the goals of musl include full native support for UTF-8; keeping > the time complexity to a minimum; and clean, correct code. > > Dropping out 'legacy' charsets doesn't really sacrifice those goals. > But the other changes are have a much bigger impact on them. > So you're probably going to have to convince Rich that there *is* a > major benefit ('is' != 'could be'). > > For container systems or minimal chroot environments, you're dealing > with something that doesn't have a hard size limit, and if a chroot > or container runs ~6 MB ordinarily, you might be able to run 0.3% more > on the same hardware. That's probably not enough of a case. > For initramfs-based systems, you've got a similar situation but no > chance to multiply the effect, unless you're using a VM or hypervisor. > > Now, since embedded systems have hard limits on size, you might be > able to make a case there. But you will need to come up with somthing > more specific, such as "I have a system where I could upgrade the kernel > to 2.6.xx *if* musl were ~20k smaller than building with a minimal > iconv" or "If we did this, there would be enough space to switch XYZ > router firmware from telnetd to dropbear". Yes, this is roughly what I was saying. Thank you for expressing it better than I could. And along those lines, if you really need to minimize libc.so for such a special case, the solution is not manually maintaining extra knobs and #ifdefs, but changing the way libc.so is generated. Instead of linking all the object files directly, put them in a .a file first, then link with something like: $CC -shared -o libc.so -Wl,-u,sym1 -Wl,-u,sym2 ... libc_so.a where the list sym1, sym2, ... is generated from 'nm' output for all the binaries you need to run, plus a few mandatory libc-internal symbols that need to be linked. This will produce the minimal libc.so needed for your exact set of programs. In the specific case of UTF-8 and locale-related code, I believe that if none of your programs call setlocale or use any of the wchar functions, regex/fnmatch/glob, or iconv explicitly, the only code that we discussed that would get linked into libc.so is mbtowc.c and wcrtomb.c, for a total of about 550 bytes. Even these would be omitted if you don't use printf or scanf (printf needs wcrtomb; scanf needs mbtowc). Using fnmatch/glob/regex would pull in another ~9k for the character class and case mapping functions. Rich