caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: skaller <skaller@users.sourceforge.net>
To: Julien Moutinho <julien.moutinho@gmail.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Re: Warning on home-made functions dealing with UTF-8.
Date: Wed, 17 Oct 2007 12:23:33 +1000	[thread overview]
Message-ID: <1192587813.31367.77.camel@rosella.wigram> (raw)
In-Reply-To: <20071016184621.GA12628@localhost>


On Tue, 2007-10-16 at 20:46 +0200, Julien Moutinho wrote:
> Here, I have reused some old code of mine to secure and extend J.Skaller's:
>   unicode_of_utf8 ~ parse_utf8
>   utf8_of_unicode ~ utf8_of_int
> May it help, and may it not be too bugged.

The UTF-8 to UCS4 string conversion probably needs an option NOT
to throw exceptions. Almost all uses are not serviced well by
throwing exceptions. 

* Exceptions are a very bad idea in the first place :)

* An alternative strategy using a replacement code may be
worth considering, possibly using an argument, and possibly
by another technique such as a wrapper function which
continues on after inserting the replacement.

I think you will find that covering all the real uses isn't
so easy .. one reason for a whole framework like Camomile.

In particular, most Standards will say things like "behaviour
is undefined if such and such" for example an invalid code.

This does NOT mean using such a code is an error, it means
the Standard leaves the behaviour open in such cases.
In particular, open to vendor extensions.

It's perfectly legitimate, for example, for an application
to encode colour and font information in the 31-21=10 remaining
bits**, but your codec will throw an exception here, instead
of just translating the codes. What the application is doing
isn't portable -- but that doesn't make it incorrect if the
application has control over the context.

In particular, the application can even define an extension
to the Standard and send such codes over file systems and
networks. Other applications may decide to support the
extensions. 

In fact this is how Standards are made: the ISO mantra is that
standards should encode existing practice, and that specifically
implies ALLOWING practice outside the existing Standards.

So my routine wasn't quite so 'home grown' .. I've actually
been a participant in ISO Standardisation processes with a
special National Body interest in I18n issues (since Australia
is highly multi-cultural and has people speaking many languages).
I18n is a real quagmire of complexity... for example there are
no known commercial text rendering routines that actually
comply with the Standard -- bidirectional rendering is
extremely difficult to do efficiently and it isn't clear that
the Standard requirements are all that useful if you happen
to be mixing English, Arabic, and Chinese in the same document.

** there is a real use of the extra bits by some Egyptologists
encoding hieroglyphics.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


  parent reply	other threads:[~2007-10-17  2:56 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-08 15:08 Correct way of programming a CGI script Tom
2007-10-08 15:32 ` [Caml-list] " Dario Teixeira
2007-10-08 16:04 ` Gerd Stolpmann
2007-10-08 21:37   ` skaller
2007-10-08 22:21     ` Erik de Castro Lopo
2007-10-08 23:05       ` skaller
2007-10-08 23:19         ` skaller
2007-10-08 23:23           ` Arnaud Spiwack
2007-10-08 23:47             ` skaller
2007-10-09  5:49         ` David Teller
2007-10-09 10:15         ` Christophe TROESTLER
2007-10-09 15:29           ` skaller
2007-10-09 15:49             ` Vincent Hanquez
2007-10-09 16:00               ` Jon Harrop
2007-10-09 14:02         ` William D. Neumann
2007-10-09 15:25           ` skaller
2007-10-09 15:33             ` William D. Neumann
2007-10-09 15:48             ` Jon Harrop
2007-10-08 23:37       ` skaller
2007-10-09 10:20         ` Christophe TROESTLER
2007-10-09 13:40           ` Rope is the new string Jon Harrop
2007-10-09 15:57             ` [Caml-list] " Vincent Hanquez
2007-10-09 16:42               ` Loup Vaillant
2007-10-09 16:55                 ` Vincent Hanquez
2007-10-09 17:32                   ` Loup Vaillant
2007-10-09 19:51                     ` Vincent Hanquez
2007-10-09 21:06                       ` Loup Vaillant
2007-10-10  7:35                         ` Vincent Hanquez
2007-10-10  8:05                           ` Loup Vaillant
2007-10-11 13:23                             ` Vincent Hanquez
2007-10-09 22:04                       ` Chris King
2007-10-11 13:03                         ` Vincent Hanquez
2007-10-11 13:54                           ` skaller
2007-10-11 14:21                             ` Vincent Hanquez
2007-10-11 14:27                               ` Benjamin Monate
2007-10-11 14:48                               ` skaller
2007-10-11 21:16                                 ` Alain Frisch
2007-10-15 20:35                                 ` Warning on home-made functions dealing with UTF-8 Julien Moutinho
2007-10-15 23:51                                   ` [Caml-list] " skaller
2007-10-16  2:21                                     ` Julien Moutinho
2007-10-16 18:46                                   ` Julien Moutinho
2007-10-16 18:51                                     ` Julien Moutinho
2007-10-17  2:23                                     ` skaller [this message]
2007-10-09 10:26     ` [Caml-list] Correct way of programming a CGI script Gerd Stolpmann
2007-10-09 15:16       ` skaller
2007-10-09 15:31         ` William D. Neumann
2007-10-09 12:52     ` Brian Hurt
2007-10-09 13:56   ` Jon Harrop
2007-10-09 15:18     ` William D. Neumann
2007-10-08 16:11 ` Loup Vaillant
2007-10-08 19:07   ` Christophe TROESTLER

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1192587813.31367.77.camel@rosella.wigram \
    --to=skaller@users.sourceforge.net \
    --cc=caml-list@inria.fr \
    --cc=julien.moutinho@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).