From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.2 Received: from mother.openwall.net (mother.openwall.net [195.42.179.200]) by inbox.vuxu.org (OpenSMTPD) with SMTP id 2d6d66df for ; Tue, 3 Mar 2020 20:45:15 +0000 (UTC) Received: (qmail 5860 invoked by uid 550); 3 Mar 2020 20:45:13 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 5842 invoked from network); 3 Mar 2020 20:45:13 -0000 Date: Tue, 3 Mar 2020 15:45:00 -0500 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20200303204500.GO11469@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [PATCH] Decode 0x80 Euro for GBK On Tue, Mar 03, 2020 at 04:09:54PM +0800, Mingye Wang wrote: > Hi, > > Sorry for the inconvenience, but please check the attachment. > -- > Mingye Wang (Artoria2e5) > From 0451fe959a55cf19d17ca131d68825922e1357a4 Mon Sep 17 00:00:00 2001 > From: Mingye Wang > Date: Tue, 3 Mar 2020 15:56:15 +0800 > Subject: [PATCH] Decode 0x80 Euro for GBK > > Microsoft's cp936 has a Euro sign in its complete form, and it is the > official IANA "GBK". Add it. > > Ref: https://encoding.spec.whatwg.org/#gbk-flag > Ref: https://www.iana.org/assignments/charset-reg/GBK > --- > src/locale/iconv.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/src/locale/iconv.c b/src/locale/iconv.c > index 3047c27b..d01342a2 100644 > --- a/src/locale/iconv.c > +++ b/src/locale/iconv.c > @@ -403,6 +403,11 @@ size_t iconv(iconv_t cd, char **restrict in, size_t *restrict inb, char **restri > if (c < 128) break; > if (c < 0xa1) goto ilseq; > case GBK: > + // CP936 Euro. WHATWG tolerates it in GB18030, should we too? > + if (c == 128) { > + c = 0x20AC; > + break; > + } > case GB18030: > if (c < 128) break; > c -= 0x81; Does this mean GBK encodes the euro sign twice? Or is the normal encoding of it only present in GB18030, not legacy GBK? Rich