From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/4612 Path: news.gmane.org!not-for-mail From: Alan Hourihane Newsgroups: gmane.linux.lib.musl.general Subject: Re: CP850 & IBM850 codepages Date: Wed, 26 Feb 2014 11:58:49 +0000 Message-ID: <530DD6F9.3040301@fairlite.co.uk> References: <530D0EA5.8080904@fairlite.co.uk> <20140225222557.GH184@brightrain.aerifal.cx> <530D19D2.4020800@fairlite.co.uk> <20140225223939.GI184@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1393415934 22017 80.91.229.3 (26 Feb 2014 11:58:54 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 26 Feb 2014 11:58:54 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-4616-gllmg-musl=m.gmane.org@lists.openwall.com Wed Feb 26 12:59:03 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1WId8h-0008KS-3q for gllmg-musl@plane.gmane.org; Wed, 26 Feb 2014 12:59:03 +0100 Original-Received: (qmail 18298 invoked by uid 550); 26 Feb 2014 11:59:01 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 18290 invoked from network); 26 Feb 2014 11:59:01 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 In-Reply-To: <20140225223939.GI184@brightrain.aerifal.cx> X-Mailcore-Auth: 11222358 X-Mailcore-Domain: 207951 Xref: news.gmane.org gmane.linux.lib.musl.general:4612 Archived-At: On 02/25/14 22:39, Rich Felker wrote: > On Tue, Feb 25, 2014 at 10:31:46PM +0000, Alan Hourihane wrote: >>> Adding cp850 and other DOS codepages should not be hard and should not >>> take up much additional size in iconv, but it's also nontrivial to do >>> without my tools to generate the tables, which are not published. >>> Publishing them is something I should really get around to doing, >>> since their absence affects the ability of others to modify the code >>> in meaningful ways; I need to apologize for not doing so already. >>> >> O.k. that makes sense as I couldn't understand the format. :-) > The format is basically this: legacy_chars is a table of all > codepoints that ever appear in a supported legacy codepage, with a > limit of 1024 total codepoints. The individual codepage tables are 10 > bits per entry and map into this table, and they omit the initial > subrange that's identical to latin1 (and thus a one-to-one mapping to > unicode). I have tools that automatically generate these from the > unicode txt files containing the mappings. > Thanks Rich. I'll keep an eye out for the cp850/ibm850 table to land when you've had chance with your tools. Alan.