From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3821 Path: news.gmane.org!not-for-mail From: Szabolcs Nagy Newsgroups: gmane.linux.lib.musl.general Subject: Re: iconv Korean and Traditional Chinese research so far Date: Mon, 5 Aug 2013 05:13:22 +0200 Message-ID: <20130805031322.GM25714@port70.net> References: <20130804165152.GA32076@brightrain.aerifal.cx> <20130805003943.050fc58e@ralda.gmx.de> <20130805004420.GL25714@port70.net> <20130805032452.280127fd@ralda.gmx.de> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1375672416 25693 80.91.229.3 (5 Aug 2013 03:13:36 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 5 Aug 2013 03:13:36 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3825-gllmg-musl=m.gmane.org@lists.openwall.com Mon Aug 05 05:13:38 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V6BEl-0004ul-0A for gllmg-musl@plane.gmane.org; Mon, 05 Aug 2013 05:13:35 +0200 Original-Received: (qmail 21787 invoked by uid 550); 5 Aug 2013 03:13:34 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 21779 invoked from network); 5 Aug 2013 03:13:33 -0000 Content-Disposition: inline In-Reply-To: <20130805032452.280127fd@ralda.gmx.de> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3821 Archived-At: * Harald Becker [2013-08-05 03:24:52 +0200]: > iconv then shall: > - look for some fixed charsets like ASCII, Latin-1, UTF-8, etc. > - search table of with libc linked charsets > - search table of with the program linked charsets > - search for charset on external search path sounds like a lot of extra management cost (for libc, application writer and user as well) it would be nice if the compiler could figure out at build time (eg with lto) which tables are used but i guess charsets often only known at runtime > [Addendum after thinking a bit more: The byte code conversion > files shall exist of a small statical header, followed by the > byte code program. The header shall contain the charset name, > version of required virtual machine and length of byte code. So > you need only add all such conversion files to a big array of > bytes and add a Null header to mark the end of table. Then you > only need the start of the array and you are able to search > through for a specific charset. The iconv function in libc > contains a definition for an "unsigned char const > *iconv_user_charsets = NULL;", which is linked in, when the user > does not provide it's own definition. So iconv can search all > linked in charset definitions, and need no code changes. Really > simple configuration to select charsets to build in.] > yes that can work, but it's a musl specific hack that the application programmer need to take care of > > if the format changes then dynamic linking is > > problematic as well: you cannot update libc > > in a single atomic operation > > The byte code shall be independent of dynamic linking. The > conversion files are only streams of bytes, which shall also be > architecture independent. So you do only need to update the > conversion files if the virtual machine definition of iconv has > been changed (shall not be done much). External files may be read > into malloc-ed buffers or mmap-ed, not linked in by the > dynamical linker. > that does not solve the format change problem you cannot update libc without race (unless you first replace the .so which supports the old format as well as the new one, but then libc has to support all previous formats) it's probably easy to design a fixed format to avoid this it seems somewhat similar to the timezone problem ecxept zoneinfo is maintained outside of libc so there is not much choice, but there are the same issues: updating it should be done carefully, setuid programs must be handled specially etc