mailing list of musl libc
 help / color / mirror / code / Atom feed
* iconv UTF-8 <--> CP1255 roundtrip possible bug?
@ 2018-05-16 17:22 Will Dietz
  2018-05-16 23:04 ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Will Dietz @ 2018-05-16 17:22 UTC (permalink / raw)
  To: musl

I admit to being a bit unsure, but the behavior shown below doesn't
seem obviously right --LMK if I'm missing something :).

Input file attached for inspection without relying on it getting
through byte-identical to what I have--
indeed I'm not sure copy+paste into this is working correctly (the
characters look different in my terminal :)).  Anyway:

$ cat cp1255-snippet.xxd
00000000: efac b3d6 b8d7 9d0a                      ........
$ xxd -r cp1255-snippet.xxd
דָּם

Attempt to round-trip this from UTF-8 to CP1255 and back,
first with glibc's iconv (2.26):

$ xxd -r cp1255-snippet.xxd|iconv -f UTF-8 -t CP1255|iconv -f CP1255
-t UTF-8 | xxd
00000000: efac b3d6 b8d7 9d0a

Looks good, same as what was sent in.

Using musl-based iconv utility (1.1.19):
$ xxd -r cp1255-snippet.xxd|$ICONV -f UTF-8 -t CP1255|$ICONV -f CP1255
-t UTF-8 | xxd
00000000: 2ad6 b8d7 9d0a                           *.....

Indeed, the result looks different than what was started with:

*ָם

(again apologies if that doesn't survive mailing and such)

This input was taken from gnu libiconv's test suite, in particular the
first line of tests/CP1255-snippet.UTF-8.  Since it's 2 characters,
and test data, I hope there's no problem re:licensing O:).

I've reproduced the same behavior using iconv() directly, I can share
that if that would be preferable. It's the same code from earlier
iconv threads on the ML.

--------------

Hopefully this is useful!

On the subject, a question or two if it's not too much trouble:

* is the above what's meant by "round-trip" as discussed in[1]?
* What sorts of "round-trip" conversions are expected to work? And
over what inputs should round-trip conversions work-- for any 'valid"
UTF-8 or so?

Armed with some insights regarding these questions, I'm hoping to
scope out something that can be tested or (no promises!) perhaps
pushed through some formal verification goodness.  But also I'm just
curious :).

Thanks!

~Will

[1] http://www.openwall.com/lists/musl/2018/02/27/2


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-06-14 19:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-16 17:22 iconv UTF-8 <--> CP1255 roundtrip possible bug? Will Dietz
2018-05-16 23:04 ` Rich Felker
2018-05-17  1:48   ` Will Dietz
2018-06-03  2:26     ` Rich Felker
2018-06-14 19:37       ` Will Dietz

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).