From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3835
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@aerifal.cx>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Re: iconv Korean and Traditional Chinese research so far
Date: Mon, 5 Aug 2013 13:31:45 -0400
Message-ID: <20130805173144.GL221@brightrain.aerifal.cx>
References: <20130804165152.GA32076@brightrain.aerifal.cx>
 <op.w1b4hubbdyj81a@monster.itedn32a.localdomain>
 <20130805154344.GJ221@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1375723916 15576 80.91.229.3 (5 Aug 2013 17:31:56 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 5 Aug 2013 17:31:56 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-3839-gllmg-musl=m.gmane.org@lists.openwall.com Mon Aug 05 19:31:59 2013
Return-path: <musl-return-3839-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-3839-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1V6OdS-0005bi-IA
	for gllmg-musl@plane.gmane.org; Mon, 05 Aug 2013 19:31:58 +0200
Original-Received: (qmail 1409 invoked by uid 550); 5 Aug 2013 17:31:57 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 1388 invoked from network); 5 Aug 2013 17:31:57 -0000
Content-Disposition: inline
In-Reply-To: <20130805154344.GJ221@brightrain.aerifal.cx>
User-Agent: Mutt/1.5.21 (2010-09-15)
Xref: news.gmane.org gmane.linux.lib.musl.general:3835
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/3835>

On Mon, Aug 05, 2013 at 11:43:45AM -0400, Rich Felker wrote:
> > In EUC-KR (MS-CP949), there is Hanja characters (i.e. Kanji
> > characters in Japanese) and Japanese Katakana/Hiragana besides of
> > Hangul characters.
> 
> Yes, I'm aware of these. However, it looks to me like the only
> characters outside the standard 94x94 grid zone are Hangul syllables,
> and they appear in codepoint order. If so, even if there's not a good
> pattern to where they're located, merely knowing that the ones that
> are missing from the 94x94 grid are placed in order in the expanded
> space is sufficient to perform algorithmic (albeit inefficient)
> conversion. Does this sound correct?

I've verified that this is correct and committed an implementation of
Korean based on this principle, which I basically copied from my
current implementation of GB18030's support for arbitrary Unicode
codepoints. It has not been heavily tested but I did test it casually
with all the important boundary values and it seems correct. Tests
should probably be added to the test suite.

Rich