* [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?)
@ 2018-05-03 19:45 Will Dietz
2018-05-07 18:06 ` Will Dietz
0 siblings, 1 reply; 4+ messages in thread
From: Will Dietz @ 2018-05-03 19:45 UTC (permalink / raw)
To: musl
[-- Attachment #1: Type: text/plain, Size: 169 bytes --]
Attached, I think it's just a case of a missing case statement.
This is needed or the result can't be read back as utf32 which seems
like an important property.
~Will
[-- Attachment #2: utf32.patch --]
[-- Type: text/x-patch, Size: 878 bytes --]
From f49ee6afa69d0736ddad1ace0adfb4597075a6ac Mon Sep 17 00:00:00 2001
From: Will Dietz <w@wdtz.org>
Date: Thu, 3 May 2018 13:44:53 -0500
Subject: [PATCH] iconv: fix conversion to utf32, treat like utf32be
I'm not sure how best to describe current behavior,
we treat to=utf32 somewhat like to=ascii
and the result is not valid UTF32.
This change treats to=utf32 like to=utf32be,
similar to what's done with utf16.
---
src/locale/iconv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/locale/iconv.c b/src/locale/iconv.c
index 3c1f4dd2..3a34395c 100644
--- a/src/locale/iconv.c
+++ b/src/locale/iconv.c
@@ -646,6 +646,8 @@ size_t iconv(iconv_t cd, char **restrict in, size_t *restrict inb, char **restri
*out += 4;
*outb -= 4;
break;
+ case UTF_32:
+ totype = UTF_32BE;
case UTF_32BE:
case UTF_32LE:
if (*outb < 4) goto toobig;
--
2.17.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?)
2018-05-03 19:45 [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?) Will Dietz
@ 2018-05-07 18:06 ` Will Dietz
2018-05-07 19:25 ` Rich Felker
0 siblings, 1 reply; 4+ messages in thread
From: Will Dietz @ 2018-05-07 18:06 UTC (permalink / raw)
To: musl
Hmm this is more complicated than I originally thought.
I'm not sure I understand the current behavior,
but am less convinced this is a clear improvement.
Thoughts/comments appreciated :).
~Will
PS: Did we discuss this years ago? I thought so, but can't find it anywhere...
On Thu, May 3, 2018 at 2:45 PM, Will Dietz <w@wdtz.org> wrote:
> Attached, I think it's just a case of a missing case statement.
>
> This is needed or the result can't be read back as utf32 which seems
> like an important property.
>
> ~Will
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?)
2018-05-07 18:06 ` Will Dietz
@ 2018-05-07 19:25 ` Rich Felker
2018-05-07 20:52 ` Will Dietz
0 siblings, 1 reply; 4+ messages in thread
From: Rich Felker @ 2018-05-07 19:25 UTC (permalink / raw)
To: musl
On Mon, May 07, 2018 at 01:06:57PM -0500, Will Dietz wrote:
> Hmm this is more complicated than I originally thought.
> I'm not sure I understand the current behavior,
> but am less convinced this is a clear improvement.
Can you explain what you're confused about? It seems ok.
> Thoughts/comments appreciated :).
>
> ~Will
>
> PS: Did we discuss this years ago? I thought so, but can't find it anywhere...
I don't think so. UTF-32 did not exist as a different case from
UTF-32BE until this year.
Rich
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?)
2018-05-07 19:25 ` Rich Felker
@ 2018-05-07 20:52 ` Will Dietz
0 siblings, 0 replies; 4+ messages in thread
From: Will Dietz @ 2018-05-07 20:52 UTC (permalink / raw)
To: musl
On Mon, May 7, 2018 at 2:25 PM, Rich Felker <dalias@libc.org> wrote:
> On Mon, May 07, 2018 at 01:06:57PM -0500, Will Dietz wrote:
>> Hmm this is more complicated than I originally thought.
>> I'm not sure I understand the current behavior,
>> but am less convinced this is a clear improvement.
>
> Can you explain what you're confused about? It seems ok.
>
Nothing specific, and depending in the perspective this change is
relatively straightforward.
If it seems that way to you and doesn't raise any alarm bells then
it's probably perfectly fine :).
Mostly I couldn't shake the sense I'd gone down this path before and
someone explained there
was a reason to do things this way; this feeling was an itch I
couldn't scratch and so I wanted
to conservatively pass along my doubts until I could convince myself
there were unfounded : ).
If it doesn't ring any bells with you then I probably am remembering
incorrectly or from a different project,
or a combination of both of these :).
The fragment I couldn't shake was that this would break or
significantly bloat re:some uses that compulsively
converted everything to utf32 and expected some particular behavior with stdio.
I want to say it was somehow win32 related but that doesn't make any
sense for musl anyway O:).
Combined with a bit of BOM iconv SNAFU when testing
UTF-32/UTF-32BE/UTF32-LE/etc.,
I didn't want to misrepresent my confidence in this change :).
Especially compared to the other patch, which IMO is both more urgent
and "obviously" an improvement.
I don't know of a specific reason this change is wrong, however, and
in fact AFAICT
it is only more correct. Sorry for unspecified doubts, it's more that
I couldn't vouch for it 100% O:).
~Will
>> Thoughts/comments appreciated :).
>>
>> ~Will
>>
>> PS: Did we discuss this years ago? I thought so, but can't find it anywhere...
>
> I don't think so. UTF-32 did not exist as a different case from
> UTF-32BE until this year.
Hmm, indeed! Well I don't know what I'm thinking of, then. Thanks for
taking a look and pointing this out.
>
> Rich
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-05-07 20:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-03 19:45 [PATCH] iconv: fix to=utf32 to behave like utf32be (not... ascii?) Will Dietz
2018-05-07 18:06 ` Will Dietz
2018-05-07 19:25 ` Rich Felker
2018-05-07 20:52 ` Will Dietz
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).