mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] [PATCH] Decode 0x80 Euro for GBK
@ 2020-03-03  8:09 Mingye Wang
  2020-03-03 20:45 ` Rich Felker
  0 siblings, 1 reply; 2+ messages in thread
From: Mingye Wang @ 2020-03-03  8:09 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 95 bytes --]

Hi,

Sorry for the inconvenience, but please check the attachment.
--
Mingye Wang (Artoria2e5)

[-- Attachment #2: euro.patch --]
[-- Type: application/octet-stream, Size: 945 bytes --]

From 0451fe959a55cf19d17ca131d68825922e1357a4 Mon Sep 17 00:00:00 2001
From: Mingye Wang <arthur200126@gmail.com>
Date: Tue, 3 Mar 2020 15:56:15 +0800
Subject: [PATCH] Decode 0x80 Euro for GBK

Microsoft's cp936 has a Euro sign in its complete form, and it is the
official IANA "GBK". Add it.

Ref: https://encoding.spec.whatwg.org/#gbk-flag
Ref: https://www.iana.org/assignments/charset-reg/GBK
---
 src/locale/iconv.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/locale/iconv.c b/src/locale/iconv.c
index 3047c27b..d01342a2 100644
--- a/src/locale/iconv.c
+++ b/src/locale/iconv.c
@@ -403,6 +403,11 @@ size_t iconv(iconv_t cd, char **restrict in, size_t *restrict inb, char **restri
 			if (c < 128) break;
 			if (c < 0xa1) goto ilseq;
 		case GBK:
+			// CP936 Euro. WHATWG tolerates it in GB18030, should we too?
+			if (c == 128) {
+				c = 0x20AC;
+				break;
+			}
 		case GB18030:
 			if (c < 128) break;
 			c -= 0x81;

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [musl] [PATCH] Decode 0x80 Euro for GBK
  2020-03-03  8:09 [musl] [PATCH] Decode 0x80 Euro for GBK Mingye Wang
@ 2020-03-03 20:45 ` Rich Felker
  0 siblings, 0 replies; 2+ messages in thread
From: Rich Felker @ 2020-03-03 20:45 UTC (permalink / raw)
  To: musl

On Tue, Mar 03, 2020 at 04:09:54PM +0800, Mingye Wang wrote:
> Hi,
> 
> Sorry for the inconvenience, but please check the attachment.
> --
> Mingye Wang (Artoria2e5)

> From 0451fe959a55cf19d17ca131d68825922e1357a4 Mon Sep 17 00:00:00 2001
> From: Mingye Wang <arthur200126@gmail.com>
> Date: Tue, 3 Mar 2020 15:56:15 +0800
> Subject: [PATCH] Decode 0x80 Euro for GBK
> 
> Microsoft's cp936 has a Euro sign in its complete form, and it is the
> official IANA "GBK". Add it.
> 
> Ref: https://encoding.spec.whatwg.org/#gbk-flag
> Ref: https://www.iana.org/assignments/charset-reg/GBK
> ---
>  src/locale/iconv.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/src/locale/iconv.c b/src/locale/iconv.c
> index 3047c27b..d01342a2 100644
> --- a/src/locale/iconv.c
> +++ b/src/locale/iconv.c
> @@ -403,6 +403,11 @@ size_t iconv(iconv_t cd, char **restrict in, size_t *restrict inb, char **restri
>  			if (c < 128) break;
>  			if (c < 0xa1) goto ilseq;
>  		case GBK:
> +			// CP936 Euro. WHATWG tolerates it in GB18030, should we too?
> +			if (c == 128) {
> +				c = 0x20AC;
> +				break;
> +			}
>  		case GB18030:
>  			if (c < 128) break;
>  			c -= 0x81;

Does this mean GBK encodes the euro sign twice? Or is the normal
encoding of it only present in GB18030, not legacy GBK?

Rich

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-03-03 20:45 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-03  8:09 [musl] [PATCH] Decode 0x80 Euro for GBK Mingye Wang
2020-03-03 20:45 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).