caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: John Max Skaller <skaller@ozemail.com.au>
To: Gerd Stolpmann <info@gerd-stolpmann.de>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Announcement: PXP 1.1.92 (development version)
Date: Sun, 01 Sep 2002 18:52:20 +1000	[thread overview]
Message-ID: <3D71D544.4010509@ozemail.com.au> (raw)
In-Reply-To: <20020901014544.GC820@ice.gerd-stolpmann.de>

Gerd Stolpmann wrote:


> previous versions of PXP, the internal representation of the XML trees was 
> restricted to either UTF-8 or ISO-8859-1. Now, a number of additional 
> encodings are supported, including the whole ISO-8859 series. 


I have ALL the code sets specified at Unicode.org in
programmatic form. Easy to generate Ocaml versions
of the tables.

however, how about developing a standard I18n library
with an eye to future inclusion in the standard
distribution?

The questions are mainly: what form should the
encode/decode functions take?

My functions are in Python, and take the form:

	decode: string -> (int * string)
	encode: int -> string

where string is an 8 bit byte stream,
and int is a unicode (or other) code point.

The actual python functions use dynamically loaded
data tables, but each character set has a fixed
format for the tables that knows about the raw
structure of the character set (eg what ranges of
hi and low bytes are allowed in two byte encodings
of Shift-Jis, KSC, etc). For Ocaml, we'd probably
want to bind the encodings at compile time
(since there is no well defined way to find
the data tables at run time :(

The tables are very compact, but there are quite
a few encodings -- some overhead if they're all
in the one module ..


-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2002-09-01  8:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-01  1:45 Gerd Stolpmann
2002-09-01  8:52 ` John Max Skaller [this message]
2002-09-01 11:57   ` Yamagata Yoriyuki
2002-09-01 13:54     ` John Max Skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D71D544.4010509@ozemail.com.au \
    --to=skaller@ozemail.com.au \
    --cc=caml-list@inria.fr \
    --cc=info@gerd-stolpmann.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).