From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3839 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: Re: iconv Korean and Traditional Chinese research so far Date: Tue, 6 Aug 2013 09:32:05 -0400 Message-ID: <20130806133205.GS221@brightrain.aerifal.cx> References: <20130804165152.GA32076@brightrain.aerifal.cx> <20130805191246.GM221@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1375795937 24553 80.91.229.3 (6 Aug 2013 13:32:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 6 Aug 2013 13:32:17 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3843-gllmg-musl=m.gmane.org@lists.openwall.com Tue Aug 06 15:32:20 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V6hN5-0000lK-Nm for gllmg-musl@plane.gmane.org; Tue, 06 Aug 2013 15:32:19 +0200 Original-Received: (qmail 26315 invoked by uid 550); 6 Aug 2013 13:32:18 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 26300 invoked from network); 6 Aug 2013 13:32:17 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3839 Archived-At: On Tue, Aug 06, 2013 at 02:14:33PM +0800, Roy wrote: > >My impression (please correct me if I'm wrong) is that you can't use > >Big5-UAO as the system encoding on modern versions of Windows (just > >ancient ones where you install unmaintained third-party software that > >hacks the system charset tables) > > It doesn't "hack" the nls file but replaces with UAO-available CP950 > nls file. > The executable(setup program) is generated with NSIS(Nullsoft > Scriptable Install System). > Since the nls file format doesn't change since NT 3.1 in 1993 till > now NT 6.2(i.e. Win 8.1 "Blue"), the UAO-available CP950 nls will > continue to work in newer versions of windows unless MS throw away > nls file format with something different. OK, thanks for clarifying. I'd still consider it a ways into the "hack" domain if the OS vendor still is not supporting it directly, but it does make a difference that it still works "cleanly". I was under the impression that these sorts of things changes between Windows versions in ways that would preclude using old, unmaintained patches like this. I agree that just the fact that certain OS vendors do not support an encoding is not in itself a reason not to support it. > >and that it's not supported in GNU > >libiconv. If this is the case, and especially if Big5-UAO's main use > >is on a telnet-based BBS where everybody is using special telnet > >clients that have their own Big5-UAO converters, > > GNU libiconv even not supports IBM EBCDIC(both SBCS and stateful > SBCS+DBCS)! > > So does it matter if GNU libiconv is not support whatever encodings? > (Yes glibc iconv(or say, gconv modules) does support both IBM EBCDIC > SBCS and stateful SBCS+DBCS encodings) I was under the impression that GNU libiconv was in sync with glibc's iconv, but I have not checked this. I actually was more interested in glibc's, which is in widespread use. glibc's inclusion or exclusion of a feature is not in itself a reason to include or exclude it, but supporting something that glibc supports does have the added motivation that it will increase compatibility with what programs are expecting. > >I'd find it really > >hard to justify trying to support this. But I'm open to hearing > >arguments on why we should, if you believe it's important. > > I think it will be nice to have build/link time option for those > "unpopular" encodings. > > >>For static linking, can we have conditional linking like QT does? > > > >My feeling is that it's a tradeoff, and probably has more pros than > >cons. Unlike QT, musl's iconv is extremely small. > > I would add "right now" here. When we adds more encoding later, > iconv module will be bigger than now, and people will need to find a > way to conditionally compiling the encoding they need (for both > dynamically or statically) It's never been my intent to add more encodings later (aside from pure non-table-based variants of existing ones, like the ISO-2022 versions) once coverage is complete, at least not as built-in features. This can be discussed if you think there are reasons it needs to change, but up until now, the plan has been to support: - ISO-8859 based 8-bit encodings - Other 8-bit encodings with actual legacy usage (mainly Cyrillic) - JIS 0208 based encodings - KS X 1001 based encodings - GB 2312 and supersets - Big5 and supersets All of those except Big5 and supersets are now supported, so short of any change, my position is that right now we're discussing the "last" significant addition to musl's iconv. Some things that are definitely outside the scope of musl's iconv: - Anything whose characters are not present in Unicode - Anything PUA-based (really, same as above) - Newly invented encodings with no historical encoded data What's more borderline is where UAO falls: encodings that have neither governmental or language-body-authority support nor any vendor support from other software vendors, but for which there is at least one major corpus of historical data and/or current usage for the encoding by users of the language(s) whose characters are encoded. However, based on the file at http://moztw.org/docs/big5/table/uao250-b2u.txt a number of the mappings UAO defines are into the private use area. This would generally preclude support (as this is a font-specific encoding, not a Unicode encoding) unless the affected characters have since been added to Unicode and could be remapped to the correct codepoints. Do you know the status on this? I'm also still unclear on whether this is a superset of HKSCS (it's definitely not directly, but maybe it is if the PUA mappings are corrected; I did not do any detaield checks but just noted the lack of mappings to the non-BMP codepoints HKSCS uses). > >Even with all the > >above, the size of iconv.o will be under 130k, maybe closer to 110k. > >If you actually use iconv in your program, this is a small price to > >pay for having it fully functional. On the other hand, if linking it > >is conditional, you have to consider who makes the decision, and when. > >If it's at link time for each application, that's probably too much of > >a musl-specific version. > > Since statically linking libc-iconv is new area now (other libc > doesn't touch this topic much), I think we can create standard for > statically linking specified encoding table in link time. > (This is also a reason of "why libc should provide an unique > identifier with preprocessor define") I don't see how "creating a standard" for doing this would make the situation any better. Most software authors these days are at best tolerant of the existing of static linking, and more often hostile to it. They're not going to add specific build behavior for static linking, and even if they do, they're likely to get it wrong, in which case the user ends up stuck with binaries that can't process input in their language. Rich