From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3827 Path: news.gmane.org!not-for-mail From: Harald Becker Newsgroups: gmane.linux.lib.musl.general Subject: Re: iconv Korean and Traditional Chinese research so far Date: Mon, 5 Aug 2013 09:53:32 +0200 Message-ID: <20130805095332.3b1be6e5@ralda.gmx.de> References: <20130804165152.GA32076@brightrain.aerifal.cx> <20130805003943.050fc58e@ralda.gmx.de> <20130805004915.GA221@brightrain.aerifal.cx> <20130805035312.5d874012@ralda.gmx.de> <20130805033955.GC221@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1375689224 15118 80.91.229.3 (5 Aug 2013 07:53:44 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 5 Aug 2013 07:53:44 +0000 (UTC) Cc: musl@lists.openwall.com, dalias@aerifal.cx Original-X-From: musl-return-3831-gllmg-musl=m.gmane.org@lists.openwall.com Mon Aug 05 09:53:47 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V6Fbv-0007Se-9p for gllmg-musl@plane.gmane.org; Mon, 05 Aug 2013 09:53:47 +0200 Original-Received: (qmail 20162 invoked by uid 550); 5 Aug 2013 07:53:46 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 20154 invoked from network); 5 Aug 2013 07:53:46 -0000 In-Reply-To: <20130805033955.GC221@brightrain.aerifal.cx> X-Provags-ID: V03:K0:/t4adw1ufjenhTswvFZ8Ik9A8IISz+Zw5s4NMBdIXMVeRIXCd5T CyMDqks8fsPkWr3nI6LPrrS+iofOgz284LrMZIBW2FEdNE28ELSuy6i5hspZ2liugIYdtw5 jKmKuNMsJtpfiaszFsGSxuczg0QU4+XqhrEsMJjcLdkx6IQvCWd3sz/IiHbw9fKpGzSUE2P AfyrYZQG0XGgQ4z6v274Q== Xref: news.gmane.org gmane.linux.lib.musl.general:3827 Archived-At: Hi Rich ! > iconv is not something that needs to be extensible. There is a > finite set of legacy encodings that's relevant to the world, > and their relevance is going to go down and down with time, not > up. Oh! So you consider Japanese, Chinese, Korean, etc. languages relevant for programs sitting on my machines? How can you decide this? Why being so ignorant and trying to write an standard conform library and then pick out a list of char sets of your choice which may be possible on iconv, neglecting wishes and need of any musl user. ... or in other words, if you really be this ignorant and insist on including those charsets fixed in musl, musl is never more for me :( ... I don't need to bring in any part of mine into musl, but I don't consider a lib usable for my needs, which include several char set files in statical build and neglects to load seldom used charset definitions from extern in any way. > > > > Do I want to give users who have large volumes of legacy > > > text in their languages stored in these encodings the same > > > respect and dignity as users of other legacy encodings we > > > already support? Yes. > > > > Of course. I won't dictate others which conversions they want > > to use. I only hat to have plenty of conversion tables on my > > system when I really know I never use such kind of > > conversions. > > And your table for just Chinese is as large as all our tables > combined... How can you tell this. I don't think so. Such conversion codes may be very compact. Size is mainly required for translation tables, that is when code points of the char sets does not match Unicode character order, but you always need the space for those translations. The rest won't be much. > I agree you can make iconv smaller than musl's in the case > where _no_ legacy DBCS are installed. But if you have just one, > you'll be just as large or larger than musl with them all. ... musl with them all? I don't consider them smaller than an optimized byte code interpreter ... not when you are going to include DBCS char sets fixed into musl. At least if you do all the required translations. > compare the size of musl's tables to glibc's converters. I've > worked hard to make them as small as reasonably possible > without doing hideous hacks like decompression into an > in-memory buffer, which would actually increase bloat. Are you now going to build a lib for startup purpose and embedded systems only or are you trying to write a general purpose library? Including all those definitions in a statical build is definitely not the way I will ever like. This may be done for some special situations and selected char sets, but not for a general purpose library, claiming to get a wide usage. > If you have root or want to setup nonstandard environment > variables. What about a charset searchpath including something like "~/.local/share/charset". This would allow to install charset files in the users directory. > > interpreter allows to statical link in the conversion byte > > code programs. > > At several times the size of the current code/tables, and after > the user searches through the documentation to figure out how > to do it. You definitely consider to include all those code tables statically into musl? I won't include much more than some standard sets. Why don't you want to load the charset definitions as they are required? On one hand you say "use dietlibc" if you need small statical programs and on the other hand you want to include many charset definitions into a statical build to avoid dynamic loading of tables, required only on embedded systems. So what's the purpose of musl? I don't think you stay right here. > It's not just a matter of dropping in. You'd have path searches > to modify or disable, build options to get the static tables > turned on, and all of this stuff would have to be integrated > with the build system for what you're dropping it into. I don't see the required complexity. In fact I won't have a lib that includes several charset definitions in a statical build. I really like to have a directory with definition files for those char sets and don't see the complexity for this you proclamate. Inclusion in statical build is not more than selection of the charsets you want o be included statically. This selection is always required or you include all files , which I definitly neglect. > Complexity is never the solution. Honestly, I would take a 1mb > increase in binary size over this kind of complexity any day. > Thankfully, we don't have to make such a tradeoff. The only complexity which we has here is the complexity of charset translation. The rest is relatively simple. > Charsets are not added. The time of charsets is over. It should > have been over in 1992, when Pike and Thompson made them > obsolete, but it's really over now. So why are you adding Japanese, Chinese and Korean charsets to an iconv conversion in musl? Why not just using UTF-8? Whenever you use iconv you want the flexibility to do all required charset conversions. Which means you need to statically link in many charset definitions or you need to dynamically load what is required. > Then dynamic link it. If you want an extensible binary, you use > dynamic linking. Dynamic linking of mail client, ok and where go the charset definition files? Are they all packed into your libc.so? That is a very big file? Why do I need to have Asian language definition on my disk, when I do not want? It is your decision, but please state clear what purpose you are building musl. Here it looks you are mixing things and steping in a direction I will never like. -- Rich