mailing list of musl libc
 help / color / mirror / code / Atom feed
* [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?)
@ 2018-05-03 19:45 Will Dietz
  2018-05-07 18:06 ` Will Dietz
  0 siblings, 1 reply; 4+ messages in thread
From: Will Dietz @ 2018-05-03 19:45 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 169 bytes --]

Attached, I think it's just a case of a missing case statement.

This is needed or the result can't be read back as utf32 which seems
like an important property.

~Will

[-- Attachment #2: utf32.patch --]
[-- Type: text/x-patch, Size: 878 bytes --]

From f49ee6afa69d0736ddad1ace0adfb4597075a6ac Mon Sep 17 00:00:00 2001
From: Will Dietz <w@wdtz.org>
Date: Thu, 3 May 2018 13:44:53 -0500
Subject: [PATCH] iconv: fix conversion to utf32, treat like utf32be

I'm not sure how best to describe current behavior,
we treat to=utf32 somewhat like to=ascii
and the result is not valid UTF32.

This change treats to=utf32 like to=utf32be,
similar to what's done with utf16.
---
 src/locale/iconv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/locale/iconv.c b/src/locale/iconv.c
index 3c1f4dd2..3a34395c 100644
--- a/src/locale/iconv.c
+++ b/src/locale/iconv.c
@@ -646,6 +646,8 @@ size_t iconv(iconv_t cd, char **restrict in, size_t *restrict inb, char **restri
 			*out += 4;
 			*outb -= 4;
 			break;
+		case UTF_32:
+			totype = UTF_32BE;
 		case UTF_32BE:
 		case UTF_32LE:
 			if (*outb < 4) goto toobig;
-- 
2.17.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?)
  2018-05-03 19:45 [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?) Will Dietz
@ 2018-05-07 18:06 ` Will Dietz
  2018-05-07 19:25   ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Will Dietz @ 2018-05-07 18:06 UTC (permalink / raw)
  To: musl

Hmm this is more complicated than I originally thought.
I'm not sure I understand the current behavior,
but am less convinced this is a clear improvement.

Thoughts/comments appreciated :).

~Will

PS: Did we discuss this years ago? I thought so, but can't find it anywhere...


On Thu, May 3, 2018 at 2:45 PM, Will Dietz <w@wdtz.org> wrote:
> Attached, I think it's just a case of a missing case statement.
>
> This is needed or the result can't be read back as utf32 which seems
> like an important property.
>
> ~Will


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?)
  2018-05-07 18:06 ` Will Dietz
@ 2018-05-07 19:25   ` Rich Felker
  2018-05-07 20:52     ` Will Dietz
  0 siblings, 1 reply; 4+ messages in thread
From: Rich Felker @ 2018-05-07 19:25 UTC (permalink / raw)
  To: musl

On Mon, May 07, 2018 at 01:06:57PM -0500, Will Dietz wrote:
> Hmm this is more complicated than I originally thought.
> I'm not sure I understand the current behavior,
> but am less convinced this is a clear improvement.

Can you explain what you're confused about? It seems ok.

> Thoughts/comments appreciated :).
> 
> ~Will
> 
> PS: Did we discuss this years ago? I thought so, but can't find it anywhere...

I don't think so. UTF-32 did not exist as a different case from
UTF-32BE until this year.

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?)
  2018-05-07 19:25   ` Rich Felker
@ 2018-05-07 20:52     ` Will Dietz
  0 siblings, 0 replies; 4+ messages in thread
From: Will Dietz @ 2018-05-07 20:52 UTC (permalink / raw)
  To: musl

On Mon, May 7, 2018 at 2:25 PM, Rich Felker <dalias@libc.org> wrote:
> On Mon, May 07, 2018 at 01:06:57PM -0500, Will Dietz wrote:
>> Hmm this is more complicated than I originally thought.
>> I'm not sure I understand the current behavior,
>> but am less convinced this is a clear improvement.
>
> Can you explain what you're confused about? It seems ok.
>

Nothing specific, and depending in the perspective this change is
relatively straightforward.
If it seems that way to you and doesn't raise any alarm bells then
it's probably perfectly fine :).

Mostly I couldn't shake the sense I'd gone down this path before and
someone explained there
was a reason to do things this way; this feeling was an itch I
couldn't scratch and so I wanted
to conservatively pass along my doubts until I could convince myself
there were unfounded : ).

If it doesn't ring any bells with you then I probably am remembering
incorrectly or from a different project,
or a combination of both of these :).
The fragment I couldn't shake was that this would break or
significantly bloat re:some uses that compulsively
converted everything to utf32 and expected some particular behavior with stdio.
I want to say it was somehow win32 related but that doesn't make any
sense for musl anyway O:).

Combined with a bit of BOM iconv SNAFU when testing
UTF-32/UTF-32BE/UTF32-LE/etc.,
I didn't want to misrepresent my confidence in this change :).

Especially compared to the other patch, which IMO is both more urgent
and "obviously" an improvement.

I don't know of a specific reason this change is wrong, however, and
in fact AFAICT
it is only more correct.  Sorry for unspecified doubts, it's more that
I couldn't vouch for it 100% O:).

~Will

>> Thoughts/comments appreciated :).
>>
>> ~Will
>>
>> PS: Did we discuss this years ago? I thought so, but can't find it anywhere...
>
> I don't think so. UTF-32 did not exist as a different case from
> UTF-32BE until this year.

Hmm, indeed! Well I don't know what I'm thinking of, then.  Thanks for
taking a look and pointing this out.

>
> Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-07 20:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-03 19:45 [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?) Will Dietz
2018-05-07 18:06 ` Will Dietz
2018-05-07 19:25   ` Rich Felker
2018-05-07 20:52     ` Will Dietz

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).