From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3841 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: Re: Re: iconv Korean and Traditional Chinese research so far Date: Tue, 6 Aug 2013 12:22:15 -0400 Message-ID: <20130806162214.GX221@brightrain.aerifal.cx> References: <20130804165152.GA32076@brightrain.aerifal.cx> <20130805191246.GM221@brightrain.aerifal.cx> <20130806133205.GS221@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1375806149 19964 80.91.229.3 (6 Aug 2013 16:22:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 6 Aug 2013 16:22:29 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3845-gllmg-musl=m.gmane.org@lists.openwall.com Tue Aug 06 18:22:32 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V6k1k-00058N-8a for gllmg-musl@plane.gmane.org; Tue, 06 Aug 2013 18:22:28 +0200 Original-Received: (qmail 32695 invoked by uid 550); 6 Aug 2013 16:22:27 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 32687 invoked from network); 6 Aug 2013 16:22:27 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3841 Archived-At: On Tue, Aug 06, 2013 at 11:11:23PM +0800, Roy wrote: > >However, based on the file at > > > >http://moztw.org/docs/big5/table/uao250-b2u.txt > > > >a number of the mappings UAO defines are into the private use area. > >This would generally preclude support (as this is a font-specific > >encoding, not a Unicode encoding) unless the affected characters have > >since been added to Unicode and could be remapped to the correct > >codepoints. Do you know the status on this? > > Those are Big5-2003 compatibility code range. Big5-2003 is in > CNS11643 appendix section, but it is rarely used since no > OS/Application supports it. > So skipping the PUA mappings are fine. OK, a few more questions... 1. What, if anything, is the accepted charset name for Big5-UAO, i.e. how would it appear in MIME headers, etc.? 2. Can you give me an idea of the relationship between the Big5 variants/extensions/supersets? I'm aware of Windows CP950, HKSCS, and now UAO. Is CP950 a common subset of them all, or is there a smaller base subset "plain Big5" that's the only shared part? What is ETEN and how does it fit in? 3. How should different MIME charset names be handled? In particular, what does plain "Big5" refer to? Should it be interpreted as CP950? 4. Is there anywhere to get clean semi-authoritative sources for the definitions of these charsets in plain text form. For HKSCS I found a government PDF file but it's useless because you can't extract the data in any meaningful way. Unicode has the CP950 file and "BIG5" file, but the latter refers to Unicode 1.1 in the comments and I've heard claims that it's completely wrong on many issues. Unihan.txt is also fairly useless because it only defines the mappings for ideographic characters, not the rest of the mappings in legacy CJK encodings. Short of anything better I may just have to use glibc output as a reference... > >I'm also still unclear on whether this is a superset of HKSCS (it's > >definitely not directly, but maybe it is if the PUA mappings are > >corrected; I did not do any detaield checks but just noted the lack of > >mappings to the non-BMP codepoints HKSCS uses). > > No it isn't. There is some code conflict between HKSCS(2001/2004) and UAO. Some conflict or heavy conflict? From an implementation standpoint, I want to know if this is something where they could use a common table plus "if (type==BIG5UAO) { /* fixups here */ ... }" or if they need completely separate tables. Rich