mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Szabolcs Nagy <nsz@port70.net>
To: musl@lists.openwall.com
Subject: Re: iconv Korean and Traditional Chinese research so far
Date: Mon, 5 Aug 2013 05:13:22 +0200	[thread overview]
Message-ID: <20130805031322.GM25714@port70.net> (raw)
In-Reply-To: <20130805032452.280127fd@ralda.gmx.de>

* Harald Becker <ralda@gmx.de> [2013-08-05 03:24:52 +0200]:
> iconv then shall:
> - look for some fixed charsets like ASCII, Latin-1, UTF-8, etc.
> - search table of with libc linked charsets
> - search table of with the program linked charsets
> - search for charset on external search path

sounds like a lot of extra management cost
(for libc, application writer and user as well)

it would be nice if the compiler could figure out
at build time (eg with lto) which tables are used
but i guess charsets often only known at runtime

> [Addendum after thinking a bit more: The byte code conversion
> files shall exist of a small statical header, followed by the
> byte code program. The header shall contain the charset name,
> version of required virtual machine and length of byte code. So
> you need only add all such conversion files to a big array of
> bytes and add a Null header to mark the end of table. Then you
> only need the start of the array and you are able to search
> through for a specific charset. The iconv function in libc
> contains a definition for an "unsigned char const
> *iconv_user_charsets = NULL;", which is linked in, when the user
> does not provide it's own definition. So iconv can search all
> linked in charset definitions, and need no code changes. Really
> simple configuration to select charsets to build in.]
> 

yes that can work, but it's a musl specific hack
that the application programmer need to take care of

> > if the format changes then dynamic linking is
> > problematic as well: you cannot update libc
> > in a single atomic operation
> 
> The byte code shall be independent of dynamic linking. The
> conversion files are only streams of bytes, which shall also be
> architecture independent. So you do only need to update the
> conversion files if the virtual machine definition of iconv has
> been changed (shall not be done much). External files may be read
> into malloc-ed buffers or mmap-ed, not linked in by the
> dynamical linker.
> 

that does not solve the format change problem
you cannot update libc without race
(unless you first replace the .so which supports
the old format as well as the new one, but then
libc has to support all previous formats)

it's probably easy to design a fixed format to
avoid this

it seems somewhat similar to the timezone problem
ecxept zoneinfo is maintained outside of libc so
there is not much choice, but there are the same
issues: updating it should be done carefully,
setuid programs must be handled specially etc


  reply	other threads:[~2013-08-05  3:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-04 16:51 Rich Felker
2013-08-04 22:39 ` Harald Becker
2013-08-05  0:44   ` Szabolcs Nagy
2013-08-05  1:24     ` Harald Becker
2013-08-05  3:13       ` Szabolcs Nagy [this message]
2013-08-05  7:03         ` Harald Becker
2013-08-05 12:54           ` Rich Felker
2013-08-05  0:49   ` Rich Felker
2013-08-05  1:53     ` Harald Becker
2013-08-05  3:39       ` Rich Felker
2013-08-05  7:53         ` Harald Becker
2013-08-05  8:24           ` Justin Cormack
2013-08-05 14:43             ` Rich Felker
2013-08-05 14:35           ` Rich Felker
2013-08-05  0:46 ` Harald Becker
2013-08-05  5:00 ` Rich Felker
2013-08-05  8:28 ` Roy
2013-08-05 15:43   ` Rich Felker
2013-08-05 17:31     ` Rich Felker
2013-08-05 19:12   ` Rich Felker
2013-08-06  6:14     ` Roy
2013-08-06 13:32       ` Rich Felker
2013-08-06 15:11         ` Roy
2013-08-06 16:22           ` Rich Felker
2013-08-07  0:54             ` Roy
2013-08-07  7:20               ` Roy
     [not found] <20130804232816.dc30d64f61e5ec441c34ffd4f788e58e.313eb9eea8.wbe@email22.secureserver.net>
2013-08-05 12:46 ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130805031322.GM25714@port70.net \
    --to=nsz@port70.net \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).