mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Harald Becker <ralda@gmx.de>
Cc: musl@lists.openwall.com, nsz@port70.net
Subject: Re: iconv Korean and Traditional Chinese research so far
Date: Mon, 5 Aug 2013 03:24:52 +0200	[thread overview]
Message-ID: <20130805032452.280127fd@ralda.gmx.de> (raw)
In-Reply-To: <20130805004420.GL25714@port70.net>

Hi !

05-08-2013 02:44 Szabolcs Nagy <nsz@port70.net>:

> * Harald Becker <ralda@gmx.de> [2013-08-05 00:39:43 +0200]:
> > Why cant we have all this character conversions on a state
> > driven machine which loads its information from a external
> > configuration file? This way we can have any kind of
> > conversion someone likes, by just adding the configuration
> > file for the required Unicode to X and X to Unicode
> > conversions.
> 
> external files provided by libc can work but they
> should be possible to embed into the binary

As far as I know, does glibc create small dynamically linked
objects and load those when required. This is architecture
specific. So you always need conversion files which correspond
to your C library.

My intention is to write conversion as a machine independent byte
code, which may be copied between machines of different
architecture. You need a charset conversion, just add the charset
bytecode to the conversion directory, which may be configurable
(directory name from environ variable with default fallback). May
even be a search path for conversion files, so conversion files
may be installed in different locations.

> otherwise a static binary is not self-contained
> and you have to move parts of the libc around
> along with the binary and if they are loaded
> from fixed path then it does not work at all
> (permissions, conflicting versions etc)

Ok, I see the static linking topic, but this is no problem with
byte code conversion programs. It can easily be added: Just add
all the conversion byte code programs together to a single big
array, with a name and offset table ahead, then link it into your
program.

May be done in two steps:

1) Create a selection file for musl build, and include the
specified charsets in libc.a/.so

2) Select the required charset files and create an .o file to
link into your program.


iconv then shall:
- look for some fixed charsets like ASCII, Latin-1, UTF-8, etc.
- search table of with libc linked charsets
- search table of with the program linked charsets
- search for charset on external search path

... or do in opposite direction and use first charset
conversion found.

This lookup is usually very small, except file system search, so
it shall not produce much overhead / bloat.

[Addendum after thinking a bit more: The byte code conversion
files shall exist of a small statical header, followed by the
byte code program. The header shall contain the charset name,
version of required virtual machine and length of byte code. So
you need only add all such conversion files to a big array of
bytes and add a Null header to mark the end of table. Then you
only need the start of the array and you are able to search
through for a specific charset. The iconv function in libc
contains a definition for an "unsigned char const
*iconv_user_charsets = NULL;", which is linked in, when the user
does not provide it's own definition. So iconv can search all
linked in charset definitions, and need no code changes. Really
simple configuration to select charsets to build in.]

> if the format changes then dynamic linking is
> problematic as well: you cannot update libc
> in a single atomic operation

The byte code shall be independent of dynamic linking. The
conversion files are only streams of bytes, which shall also be
architecture independent. So you do only need to update the
conversion files if the virtual machine definition of iconv has
been changed (shall not be done much). External files may be read
into malloc-ed buffers or mmap-ed, not linked in by the
dynamical linker.

--
Harald


  reply	other threads:[~2013-08-05  1:24 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-04 16:51 Rich Felker
2013-08-04 22:39 ` Harald Becker
2013-08-05  0:44   ` Szabolcs Nagy
2013-08-05  1:24     ` Harald Becker [this message]
2013-08-05  3:13       ` Szabolcs Nagy
2013-08-05  7:03         ` Harald Becker
2013-08-05 12:54           ` Rich Felker
2013-08-05  0:49   ` Rich Felker
2013-08-05  1:53     ` Harald Becker
2013-08-05  3:39       ` Rich Felker
2013-08-05  7:53         ` Harald Becker
2013-08-05  8:24           ` Justin Cormack
2013-08-05 14:43             ` Rich Felker
2013-08-05 14:35           ` Rich Felker
2013-08-05  0:46 ` Harald Becker
2013-08-05  5:00 ` Rich Felker
2013-08-05  8:28 ` Roy
2013-08-05 15:43   ` Rich Felker
2013-08-05 17:31     ` Rich Felker
2013-08-05 19:12   ` Rich Felker
2013-08-06  6:14     ` Roy
2013-08-06 13:32       ` Rich Felker
2013-08-06 15:11         ` Roy
2013-08-06 16:22           ` Rich Felker
2013-08-07  0:54             ` Roy
2013-08-07  7:20               ` Roy
     [not found] <20130804232816.dc30d64f61e5ec441c34ffd4f788e58e.313eb9eea8.wbe@email22.secureserver.net>
2013-08-05 12:46 ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130805032452.280127fd@ralda.gmx.de \
    --to=ralda@gmx.de \
    --cc=musl@lists.openwall.com \
    --cc=nsz@port70.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).