From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3838 Path: news.gmane.org!not-for-mail From: Roy Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: iconv Korean and Traditional Chinese research so far Date: Tue, 06 Aug 2013 14:14:33 +0800 Message-ID: References: <20130804165152.GA32076@brightrain.aerifal.cx> <20130805191246.GM221@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1375769691 25834 80.91.229.3 (6 Aug 2013 06:14:51 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 6 Aug 2013 06:14:51 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3842-gllmg-musl=m.gmane.org@lists.openwall.com Tue Aug 06 08:14:55 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V6aXl-0005yg-LS for gllmg-musl@plane.gmane.org; Tue, 06 Aug 2013 08:14:53 +0200 Original-Received: (qmail 23669 invoked by uid 550); 6 Aug 2013 06:14:52 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 23660 invoked from network); 6 Aug 2013 06:14:52 -0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 91 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 203186096008.static.ctinets.com User-Agent: Opera Mail/11.64 (Win32) Xref: news.gmane.org gmane.linux.lib.musl.general:3838 Archived-At: Tue, 06 Aug 2013 03:12:47 +0800, Rich Felker wrote: > On Mon, Aug 05, 2013 at 04:28:32PM +0800, Roy wrote: >> Since I'm a Traditional Chinese and Japanese legacy encoding user, I >> think I can say something here. >> [...] >> There is another Big5 extension called Big5-UAO, which is being used >> in world's largest telnet-based BBS called "ptt.cc". >> >> It has two tables, one for Big5-UAO to Unicode, another one is >> Unicode to Big5-UAO. >> http://moztw.org/docs/big5/table/uao250-b2u.txt >> http://moztw.org/docs/big5/table/uao250-u2b.txt >> >> Which extends DBCS lead byte to 0x81. > > OK, I've been trying to do some research on this and I turned up: > > http://lists.w3.org/Archives/Public/public-html-ig-zh/2012Apr/0061.html > http://lists.gnu.org/archive/html/bug-gnu-libiconv/2010-11/msg00007.html > > My impression (please correct me if I'm wrong) is that you can't use > Big5-UAO as the system encoding on modern versions of Windows (just > ancient ones where you install unmaintained third-party software that > hacks the system charset tables) It doesn't "hack" the nls file but replaces with UAO-available CP950 nls file. The executable(setup program) is generated with NSIS(Nullsoft Scriptable Install System). Since the nls file format doesn't change since NT 3.1 in 1993 till now NT 6.2(i.e. Win 8.1 "Blue"), the UAO-available CP950 nls will continue to work in newer versions of windows unless MS throw away nls file format with something different. > and that it's not supported in GNU > libiconv. If this is the case, and especially if Big5-UAO's main use > is on a telnet-based BBS where everybody is using special telnet > clients that have their own Big5-UAO converters, GNU libiconv even not supports IBM EBCDIC(both SBCS and stateful SBCS+DBCS)! So does it matter if GNU libiconv is not support whatever encodings? (Yes glibc iconv(or say, gconv modules) does support both IBM EBCDIC SBCS and stateful SBCS+DBCS encodings) > I'd find it really > hard to justify trying to support this. But I'm open to hearing > arguments on why we should, if you believe it's important. I think it will be nice to have build/link time option for those "unpopular" encodings. >> For static linking, can we have conditional linking like QT does? > > My feeling is that it's a tradeoff, and probably has more pros than > cons. Unlike QT, musl's iconv is extremely small. I would add "right now" here. When we adds more encoding later, iconv module will be bigger than now, and people will need to find a way to conditionally compiling the encoding they need (for both dynamically or statically) > Even with all the > above, the size of iconv.o will be under 130k, maybe closer to 110k. > If you actually use iconv in your program, this is a small price to > pay for having it fully functional. On the other hand, if linking it > is conditional, you have to consider who makes the decision, and when. > If it's at link time for each application, that's probably too much of > a musl-specific version. Since statically linking libc-iconv is new area now (other libc doesn't touch this topic much), I think we can create standard for statically linking specified encoding table in link time. (This is also a reason of "why libc should provide an unique identifier with preprocessor define") > If it's at build time for musl, then is it > your device vendor deciding for you what languages you need? One of > the biggest headaches of uClibc-based systems is finding that the > system libc was built with important options you need turned off and > that you need to hack in a replacement to get something working... > > I think the cost of getting stuck with broken binaries where charsets > were omitted is sufficiently greater than the cost of adding a few > tens of kb to static binaries using iconv, that we should only > consider a build time option if embedded users are actively reporting > size problems. > > Rich