mailing list of musl libc
 help / color / mirror / code / Atom feed
* Iconv and old codepages
@ 2013-06-26 18:15 orc
  2013-06-26 18:34 ` Rich Felker
  2013-06-26 18:39 ` LM
  0 siblings, 2 replies; 7+ messages in thread
From: orc @ 2013-06-26 18:15 UTC (permalink / raw)
  To: musl list

[-- Attachment #1: Type: text/plain, Size: 547 bytes --]

Hi,

How many codepages does in-musl iconv supports?
Currently I'm trying converting from "utf8" to "cp1251" and iconv()
only gives me a number of "*"'s matching the utf8 input. Is this
correct behavior and iconv() currently does not support non-UTF legacy
codepages? Even so, I still see many of them in src/locale/codepages.h
The (dirty) test program attached.

I also noticed alternative libs thread and corresponding wiki page.
Does someone know lightweight iconv replacement as a temporary measure
(other than libiconv for example)?

Regards.

[-- Attachment #2: ticonv.c --]
[-- Type: application/octet-stream, Size: 384 bytes --]

#include <stdio.h>
#include <iconv.h>
#include <string.h>

int main(void)
{
	char c[] = "\xd1\x82\xd0\xb5\xd1\x81\xd1\x82", *tc = c;
	char to[512] = {0}, *out = to;
	size_t f, t;
	iconv_t cd;
	
	cd = iconv_open("cp1251", "utf8");
	if (cd == (iconv_t)(-1)) return 1;
	f = strlen(c);
	t = sizeof(to);
	iconv(cd, &tc, &f, &out, &t);
	printf("%s\n", to);

	iconv_close(cd);

	return 0;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Iconv and old codepages
  2013-06-26 18:15 Iconv and old codepages orc
@ 2013-06-26 18:34 ` Rich Felker
  2013-06-26 18:56   ` orc
  2013-06-26 18:39 ` LM
  1 sibling, 1 reply; 7+ messages in thread
From: Rich Felker @ 2013-06-26 18:34 UTC (permalink / raw)
  To: musl

On Thu, Jun 27, 2013 at 02:15:39AM +0800, orc wrote:
> Hi,
> 
> How many codepages does in-musl iconv supports?
> Currently I'm trying converting from "utf8" to "cp1251" and iconv()
> only gives me a number of "*"'s matching the utf8 input. Is this
> correct behavior and iconv() currently does not support non-UTF legacy
> codepages? Even so, I still see many of them in src/locale/codepages.h
> The (dirty) test program attached.
> 
> I also noticed alternative libs thread and corresponding wiki page.
> Does someone know lightweight iconv replacement as a temporary measure
> (other than libiconv for example)?

Should be fixed in git. In general, the state of musl's iconv is that
the following charsets are supported:

utf8
wchart
ucs2
ucs2be
ucs2le
utf16
utf16be
utf16le
ucs4
ucs4be
utf32
utf32be
ucs4le
utf32le
ascii
usascii
iso646
iso646us
eucjp
shiftjis
sjis
gb18030
gbk
gb2312
iso88591
latin1
iso88592
iso88593
iso88594
iso88595
iso88596
iso88597
iso88598
iso88599
iso885910
iso885911
tis620
iso885913
iso885914
iso885915
latin9
iso885916
cp1250
windows1250
cp1251
windows1251
cp1252
windows1252
cp1253
windows1253
cp1254
windows1254
cp1255
windows1255
cp1256
windows1256
cp1257
windows1257
cp1258
windows1258
koi8r
koi8u

Non-alphanumeric characters are ignored in matching charset names, so
all combinations of hyphens and underscores are also supported with
these.

One caveat which should not affect your usage is that the following
charsets are only supported as the "from" charset, not the "to"
charset:

eucjp
shiftjis
sjis
gb18030
gbk
gb2312

Until the latest commit, the legacy 8bit codepages were also broken as
the "to" charset, but this breakage was unintentional.


Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Iconv and old codepages
  2013-06-26 18:15 Iconv and old codepages orc
  2013-06-26 18:34 ` Rich Felker
@ 2013-06-26 18:39 ` LM
  2013-06-26 18:47   ` Rich Felker
  2013-06-27  0:37   ` Isaac
  1 sibling, 2 replies; 7+ messages in thread
From: LM @ 2013-06-26 18:39 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

On Wed, Jun 26, 2013 at 2:15 PM, orc <orc@sibserver.ru> wrote:

> I also noticed alternative libs thread and corresponding wiki page.
> Does someone know lightweight iconv replacement as a temporary measure
> (other than libiconv for example)?
>

Thought I had the Apache Portable Runtime project listed on the
alternatives page.  Will update that.
APR has a version of iconv.

BSD systems have their own implementations of iconv.  Haven't found a
standalone version.  There may be some code in obase (listed on wiki).
There's mention of a BSD licensed libiconv (
https://wiki.freebsd.org/G%C3%A1borSoC2009 ) as part of Citrus (which was
also supposed to have a gettext alternative).  The web page on Citrus is at
http://citrus.bsdclub.org/ but I haven't found the source code for the
project.  ICU ( http://site.icu-project.org/) provides uconv instead of
iconv.  On Windows, they typically use GNU libiconv since iconv isn't part
of the C runtime library.

If anyone thinks these (or possibly some other alternatives) are useful, I
can add the links for them to the wiki.  Please let me know if they any of
them don't appear too bloated and would be worth adding.

Thanks.
Laura
http://www.distasis.com/cpp

[-- Attachment #2: Type: text/html, Size: 1714 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Iconv and old codepages
  2013-06-26 18:39 ` LM
@ 2013-06-26 18:47   ` Rich Felker
  2013-06-27  0:37   ` Isaac
  1 sibling, 0 replies; 7+ messages in thread
From: Rich Felker @ 2013-06-26 18:47 UTC (permalink / raw)
  To: musl

On Wed, Jun 26, 2013 at 02:39:59PM -0400, LM wrote:
> On Wed, Jun 26, 2013 at 2:15 PM, orc <orc@sibserver.ru> wrote:
> 
> > I also noticed alternative libs thread and corresponding wiki page.
> > Does someone know lightweight iconv replacement as a temporary measure
> > (other than libiconv for example)?
> >
> 
> Thought I had the Apache Portable Runtime project listed on the
> alternatives page.  Will update that.
> APR has a version of iconv.
> 
> BSD systems have their own implementations of iconv.  Haven't found a
> standalone version.  There may be some code in obase (listed on wiki).
> There's mention of a BSD licensed libiconv (
> https://wiki.freebsd.org/G%C3%A1borSoC2009 ) as part of Citrus (which was
> also supposed to have a gettext alternative).  The web page on Citrus is at
> http://citrus.bsdclub.org/ but I haven't found the source code for the
> project.  ICU ( http://site.icu-project.org/) provides uconv instead of
> iconv.  On Windows, they typically use GNU libiconv since iconv isn't part
> of the C runtime library.
> 
> If anyone thinks these (or possibly some other alternatives) are useful, I
> can add the links for them to the wiki.  Please let me know if they any of
> them don't appear too bloated and would be worth adding.

You're welcome to add them, but it would be good to note that musl has
iconv, and it's definitely possible to consider deficiencies in musl's
iconv that are affecting real-world usage as bugs to be fixed. The
current priorities of musl's iconv are size, simplicity, and charset
coverage, in that order, and the usage cases I've prioritized are
conversion from legacy to Unicode-based rather than going in the other
direction.

So as of now, the main reasons someone might want a third-party iconv
when using musl are:

- Support for obscure charsets or the few not-yet-supported East Asian
  charsets (mainly Korean and Taiwanese).

- High performance bulk conversions. 

- Avoiding extremely-slow or missing reverse conversions (legacy
  destinations). In musl, these are very slow and a few (the East
  Asian legacy ones) are not even supported in reverse yet.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Iconv and old codepages
  2013-06-26 18:34 ` Rich Felker
@ 2013-06-26 18:56   ` orc
  0 siblings, 0 replies; 7+ messages in thread
From: orc @ 2013-06-26 18:56 UTC (permalink / raw)
  To: musl

Thanks Rich for your quick answer!

On Wed, 26 Jun 2013 14:34:32 -0400
Rich Felker <dalias@aerifal.cx> wrote:

> On Thu, Jun 27, 2013 at 02:15:39AM +0800, orc wrote:
> > Hi,
> > 
> > How many codepages does in-musl iconv supports?
> > Currently I'm trying converting from "utf8" to "cp1251" and iconv()
> > only gives me a number of "*"'s matching the utf8 input. Is this
> > correct behavior and iconv() currently does not support non-UTF
> > legacy codepages? Even so, I still see many of them in
> > src/locale/codepages.h The (dirty) test program attached.
> > 
> > I also noticed alternative libs thread and corresponding wiki page.
> > Does someone know lightweight iconv replacement as a temporary
> > measure (other than libiconv for example)?
> 
> Should be fixed in git. In general, the state of musl's iconv is that
> the following charsets are supported:
> 
> utf8
> wchart
> ucs2
> ucs2be
> ucs2le
> utf16
> utf16be
> utf16le
> ucs4
> ucs4be
> utf32
> utf32be
> ucs4le
> utf32le
> ascii
> usascii
> iso646
> iso646us
> eucjp
> shiftjis
> sjis
> gb18030
> gbk
> gb2312
> iso88591
> latin1
> iso88592
> iso88593
> iso88594
> iso88595
> iso88596
> iso88597
> iso88598
> iso88599
> iso885910
> iso885911
> tis620
> iso885913
> iso885914
> iso885915
> latin9
> iso885916
> cp1250
> windows1250
> cp1251
> windows1251
> cp1252
> windows1252
> cp1253
> windows1253
> cp1254
> windows1254
> cp1255
> windows1255
> cp1256
> windows1256
> cp1257
> windows1257
> cp1258
> windows1258
> koi8r
> koi8u

So "most major encodings", yep.
Thanks, it is fixed and works now.

> 
> Non-alphanumeric characters are ignored in matching charset names, so
> all combinations of hyphens and underscores are also supported with
> these.
> 
> One caveat which should not affect your usage is that the following
> charsets are only supported as the "from" charset, not the "to"
> charset:
> 
> eucjp
> shiftjis
> sjis
> gb18030
> gbk
> gb2312
> 
> Until the latest commit, the legacy 8bit codepages were also broken as
> the "to" charset, but this breakage was unintentional.

While digging trough code I did not noticed that too.

> 
> 
> Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Iconv and old codepages
  2013-06-26 18:39 ` LM
  2013-06-26 18:47   ` Rich Felker
@ 2013-06-27  0:37   ` Isaac
  2013-06-27  1:25     ` Luca Barbato
  1 sibling, 1 reply; 7+ messages in thread
From: Isaac @ 2013-06-27  0:37 UTC (permalink / raw)
  To: musl

On Wed, Jun 26, 2013 at 02:39:59PM -0400, LM wrote:
> On Wed, Jun 26, 2013 at 2:15 PM, orc <orc@sibserver.ru> wrote:
> 
> > I also noticed alternative libs thread and corresponding wiki page.
> > Does someone know lightweight iconv replacement as a temporary measure
> > (other than libiconv for example)?
> >
> 
> Thought I had the Apache Portable Runtime project listed on the
> alternatives page.  Will update that.
> APR has a version of iconv.
> 
> BSD systems have their own implementations of iconv.  Haven't found a
> standalone version.  There may be some code in obase (listed on wiki).
> There's mention of a BSD licensed libiconv (
> https://wiki.freebsd.org/G%C3%A1borSoC2009 ) as part of Citrus (which was
> also supposed to have a gettext alternative).  The web page on Citrus is at
> http://citrus.bsdclub.org/ but I haven't found the source code for the
> project.  ICU ( http://site.icu-project.org/) provides uconv instead of
> iconv.  On Windows, they typically use GNU libiconv since iconv isn't part
> of the C runtime library.

AFAICT, Citrus is now developed in Net/FreeBSD (I forget which), 
since several years ago.

-100 on ICU; it's bloated to a degree that GNU has yet to match.
C++/needs libstdc++, 31.9 M for -dev, 22.4 M for libs, 29.7 M for docs.
(numbers from icu 4.8 on Ubuntu Lucid)

> 
> If anyone thinks these (or possibly some other alternatives) are useful, I
> can add the links for them to the wiki.  Please let me know if they any of
> them don't appear too bloated and would be worth adding.
> 
> Thanks.
> Laura
> http://www.distasis.com/cpp

HTH,
Isaac Dunham



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Iconv and old codepages
  2013-06-27  0:37   ` Isaac
@ 2013-06-27  1:25     ` Luca Barbato
  0 siblings, 0 replies; 7+ messages in thread
From: Luca Barbato @ 2013-06-27  1:25 UTC (permalink / raw)
  To: musl

On 06/27/2013 02:37 AM, Isaac wrote:
> -100 on ICU; it's bloated to a degree that GNU has yet to match.
> C++/needs libstdc++, 31.9 M for -dev, 22.4 M for libs, 29.7 M for docs.
> (numbers from icu 4.8 on Ubuntu Lucid)

ICU is huge and does much more than iconv or so they told me.

What's sure is that it is managed by people not really knowing what's
the meaning of API and ABI compatibility means...

(Still looks like it is really well documented...)

lu


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-06-27  1:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-26 18:15 Iconv and old codepages orc
2013-06-26 18:34 ` Rich Felker
2013-06-26 18:56   ` orc
2013-06-26 18:39 ` LM
2013-06-26 18:47   ` Rich Felker
2013-06-27  0:37   ` Isaac
2013-06-27  1:25     ` Luca Barbato

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).