From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/4610 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: CP850 & IBM850 codepages Date: Tue, 25 Feb 2014 17:39:39 -0500 Message-ID: <20140225223939.GI184@brightrain.aerifal.cx> References: <530D0EA5.8080904@fairlite.co.uk> <20140225222557.GH184@brightrain.aerifal.cx> <530D19D2.4020800@fairlite.co.uk> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1393367985 16728 80.91.229.3 (25 Feb 2014 22:39:45 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 25 Feb 2014 22:39:45 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-4614-gllmg-musl=m.gmane.org@lists.openwall.com Tue Feb 25 23:39:54 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1WIQfJ-0006kC-5X for gllmg-musl@plane.gmane.org; Tue, 25 Feb 2014 23:39:53 +0100 Original-Received: (qmail 22354 invoked by uid 550); 25 Feb 2014 22:39:52 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 22344 invoked from network); 25 Feb 2014 22:39:51 -0000 Content-Disposition: inline In-Reply-To: <530D19D2.4020800@fairlite.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:4610 Archived-At: On Tue, Feb 25, 2014 at 10:31:46PM +0000, Alan Hourihane wrote: > >Adding cp850 and other DOS codepages should not be hard and should not > >take up much additional size in iconv, but it's also nontrivial to do > >without my tools to generate the tables, which are not published. > >Publishing them is something I should really get around to doing, > >since their absence affects the ability of others to modify the code > >in meaningful ways; I need to apologize for not doing so already. > > > > O.k. that makes sense as I couldn't understand the format. :-) The format is basically this: legacy_chars is a table of all codepoints that ever appear in a supported legacy codepage, with a limit of 1024 total codepoints. The individual codepage tables are 10 bits per entry and map into this table, and they omit the initial subrange that's identical to latin1 (and thus a one-to-one mapping to unicode). I have tools that automatically generate these from the unicode txt files containing the mappings. Rich