From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3845 Path: news.gmane.org!not-for-mail From: Roy Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: Re: Re: iconv Korean and Traditional Chinese research so far Date: Wed, 07 Aug 2013 15:20:25 +0800 Message-ID: References: <20130804165152.GA32076@brightrain.aerifal.cx> <20130805191246.GM221@brightrain.aerifal.cx> <20130806133205.GS221@brightrain.aerifal.cx> <20130806162214.GX221@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1375860052 13094 80.91.229.3 (7 Aug 2013 07:20:52 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 7 Aug 2013 07:20:52 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3849-gllmg-musl=m.gmane.org@lists.openwall.com Wed Aug 07 09:20:53 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V6y39-0003q7-VM for gllmg-musl@plane.gmane.org; Wed, 07 Aug 2013 09:20:52 +0200 Original-Received: (qmail 30515 invoked by uid 550); 7 Aug 2013 07:20:46 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 30507 invoked from network); 7 Aug 2013 07:20:46 -0000 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 41 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 203186096008.static.ctinets.com User-Agent: Opera Mail/11.64 (Win32) Xref: news.gmane.org gmane.linux.lib.musl.general:3845 Archived-At: On Wed, 07 Aug 2013 08:54:35 +0800, Roy wrote: [snip] > > Big5-HKSCS 2004 map for reference: > http://moztw.org/docs/big5/table/hkscs2004.txt > Use sed and awk to create b2u.txt for comparing: > $ sed -e '/^==/d' -e '1,2d' hkscs2004.txt| awk 'BEGIN{print "# big5 > unicode"}{print "0x" $1 " 0x" $4}' > hkscs2004-b2u.txt > In result: > http://roy.dnsd.me/hkscs2004-b2u.txt > > And finally the diff: > http://roy.dnsd.me/uao250-hkscs2004.diff > > The diff is huge so separated table is needed. I forgot that the HKSCS table has original CP950 entries missing. $ cat cp950-b2u.txt hkscs2004-b2u.txt | sed -e '1d'|sort > hkscs2004-big5-b2u.txt And I wrote a small utility in PHP to compare 2 tables by keys(first column): http://roy.dnsd.me/tbldiff.phps $ php tbldiff.php uao250-b2u.txt hkscs2004-big5-b2u.txt > uao250-vs-hkscs2004.txt http://roy.dnsd.me/uao250-vs-hkscs2004.txt $ sed -e '/==/d' uao250-vs-hkscs2004.txt > uao250-hkscs2004-diff.txt http://roy.dnsd.me/uao250-hkscs2004-diff.txt So 5965 mappings are different, including 1379 mappings does not exist in HKSCS2004. But since there is mix-usage of HKSCS2001/2004 in both local files and Internet pages, the condition of HKSCS become worse. BTW, There is another NLS hack that hacks MS-CP932 to support JIS X 0213:2004 http://www.eonet.ne.jp/~kotobukispace/ddt/jisx0213/jisx0213.html