From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12119 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: cp437 issue with bad mapping at least for one char Date: Tue, 21 Nov 2017 22:25:24 -0500 Message-ID: <20171122032524.GO1627@brightrain.aerifal.cx> References: <85e-5a14e600-3-6a898d80@69999707> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1511321139 30073 195.159.176.226 (22 Nov 2017 03:25:39 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 22 Nov 2017 03:25:39 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-12135-gllmg-musl=m.gmane.org@lists.openwall.com Wed Nov 22 04:25:35 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1eHLfI-0007OV-EC for gllmg-musl@m.gmane.org; Wed, 22 Nov 2017 04:25:32 +0100 Original-Received: (qmail 27993 invoked by uid 550); 22 Nov 2017 03:25:37 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 27972 invoked from network); 22 Nov 2017 03:25:37 -0000 Content-Disposition: inline In-Reply-To: <85e-5a14e600-3-6a898d80@69999707> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:12119 Archived-At: On Wed, Nov 22, 2017 at 03:50:48AM +0100, Jacob Thrane Lund wrote: > > Hi musl devs, > > I experienced a test failing when building the latest version of gammu for Alpine Linux. > > After reporting the issue to the gammu developer the reached conclusion was the issue is with musl - > https://github.com/gammu/gammu/issues/303#issuecomment-345258460 > > I have checked the log for > https://git.musl-libc.org/cgit/musl/commit/src/locale/codepages.h > and Rich Felker pushed a commit 8 days ago. As of yet I have not had > the chance to verify if this also resolves this issue. Dealing with > charsets at this level is for me totally new territory.. > > I was hoping you could confirm/deny if Rich’s commit indeed also resolves my issue? It does. Here is how CP437 decodes, before: Çüéâäàåç êëèïîìÄÅ ÉæÆôöòûù ÿÖÜ¢£¥₧ƒ  ¡¢£¤¥¦§ ¨©ª«¬­®¯ ░▒▓│┤╡╢╖ ╕╣║╗╝╜╛┐ └┴┬├─┼╞╟ ╚╔╩╦╠═╬╧ ╨╤╥╙╘╒╓╫ ╪┘┌█▄▌▐▀ αáΓπΣσæτ ΦΘΩδìφεï ðñ≥≤⌠⌡÷≈ °∙·√ü²■  and after: Çüéâäàåç êëèïîìÄÅ ÉæÆôöòûù ÿÖÜ¢£¥₧ƒ áíóúñѪº ¿⌐¬½¼¡«» ░▒▓│┤╡╢╖ ╕╣║╗╝╜╛┐ └┴┬├─┼╞╟ ╚╔╩╦╠═╬╧ ╨╤╥╙╘╒╓╫ ╪┘┌█▄▌▐▀ αßΓπΣσµτ ΦΘΩδ∞φε∩ ≡±≥≤⌠⌡÷≈ °∙·√ⁿ²■  The problem (silently fixed) was that the table generation code for legacychars.h ignored entries in the Unicode charmap files that used lowercase a-f in the hex, _and_ omitted characters that appeared in the same slot as their Unicode codepoint (in all the ISO-8859 encodings containing í, it appears in "its own" slot), since these previously got a special encoding. If not for the latter, this character would have been included in the legacychars.h map already due to being in Latin-1, where the charmap file used uppercase. Somehow when the character was missing in legacychars.h, the mapping tables ended up containing nonsense. Rich