mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: Re: iconv Korean and Traditional Chinese research so far
Date: Sun, 4 Aug 2013 20:49:15 -0400	[thread overview]
Message-ID: <20130805004915.GA221@brightrain.aerifal.cx> (raw)
In-Reply-To: <20130805003943.050fc58e@ralda.gmx.de>

On Mon, Aug 05, 2013 at 12:39:43AM +0200, Harald Becker wrote:
> Hi Rich !
> 
> > Worst-case, adding Korean and Traditional Chinese tables will
> > roughly double the size of iconv.o to around 150k. This will
> > noticably enlarge libc.so, but will make no difference to
> > static-linked programs except those using iconv. I'm hoping we
> > can make these additions less expensive, but I don't see a good
> > way yet.
> 
> Oh nooo, do you really want to add this statically to the iconv
> version?

Do I want to add that size? No, of course not, and that's why I'm
hoping (but not optimistic) that there may be a way to elide a good
part of the table based on patterns in the Hangul syllables or the
possibility that the giant extensions are unimportant.

Do I want to give users who have large volumes of legacy text in their
languages stored in these encodings the same respect and dignity as
users of other legacy encodings we already support? Yes.

> Why cant we have all this character conversions on a state driven
> machine which loads its information from a external configuration
> file? This way we can have any kind of conversion someone likes,
> by just adding the configuration file for the required Unicode to
> X and X to Unicode conversions.

This issue was discussed a long time ago and the consensus among users
of static linking was that static linking is most valuable when it
makes the binary completely "portable" to arbitrary Linux systems for
the same cpu arch, without any dependency on having files in
particular locations on the system aside from the minimum required by
POSIX (things like /dev/null), the standard Linux /proc mountpoint,
and universal config files like /etc/resolv.conf (even that is not
necessary, BTW, if you have a DNS on localhost). Having iconv not work
without external character tables is essentially a form of dynamic
linking, and carries with it issues like where the files are to be
found (you can override that with an environment variable, but that
can't be permitted for setuid binaries), what happens if the format
needs to change and the format on the target machine is not compatible
with the libc version your binary was built with, etc. This is also
the main reason musl does not support something like nss.

Another side benefit of the current implementation is that it's fully
self-contained and independent of any system facilities. It's pure C
and can be taken out of musl and dropped in to any program on any C
implementation, including freestanding (non-hosted) implementations.
If it depended on the filesystem, adapting it for such usage would be
a lot more work.

> State driven fsm interpreters are really small and fast and may
> read it's complete configuration from a file ... architecture
> independent file, so we may have same character conversion files
> for all architectures.

A fsm implementation would be several times larger than the
implementations in iconv.c. It's possible that we could, at some time
in the future, support loading of user-defined character conversion
files as an added feature, but this should only be for really
special-purpose things like custom encodings used for games or
obsolete systems (old Mac, console games, IBM mainframes, etc.).

In terms of the criteria for what to include in musl itself, my idea
is that if you have a mail client or web browser based on iconv for
its character set handling, you should be able to read the bulk of
content in any language.

Rich


  parent reply	other threads:[~2013-08-05  0:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-04 16:51 Rich Felker
2013-08-04 22:39 ` Harald Becker
2013-08-05  0:44   ` Szabolcs Nagy
2013-08-05  1:24     ` Harald Becker
2013-08-05  3:13       ` Szabolcs Nagy
2013-08-05  7:03         ` Harald Becker
2013-08-05 12:54           ` Rich Felker
2013-08-05  0:49   ` Rich Felker [this message]
2013-08-05  1:53     ` Harald Becker
2013-08-05  3:39       ` Rich Felker
2013-08-05  7:53         ` Harald Becker
2013-08-05  8:24           ` Justin Cormack
2013-08-05 14:43             ` Rich Felker
2013-08-05 14:35           ` Rich Felker
2013-08-05  0:46 ` Harald Becker
2013-08-05  5:00 ` Rich Felker
2013-08-05  8:28 ` Roy
2013-08-05 15:43   ` Rich Felker
2013-08-05 17:31     ` Rich Felker
2013-08-05 19:12   ` Rich Felker
2013-08-06  6:14     ` Roy
2013-08-06 13:32       ` Rich Felker
2013-08-06 15:11         ` Roy
2013-08-06 16:22           ` Rich Felker
2013-08-07  0:54             ` Roy
2013-08-07  7:20               ` Roy
     [not found] <20130804232816.dc30d64f61e5ec441c34ffd4f788e58e.313eb9eea8.wbe@email22.secureserver.net>
2013-08-05 12:46 ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130805004915.GA221@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).