RE: Request for Ideas: i18n issues

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: "Frank A. Christoph" <christo@nextsolution.co.jp>
To: "John Prevost" <prevost@maya.com>, "skaller" <skaller@maxtal.com.au>
Cc: <caml-list@inria.fr>
Subject: RE: Request for Ideas: i18n issues
Date: Tue, 26 Oct 1999 22:36:39 +0900	[thread overview]
Message-ID: <000601bf1fb7$21a4fc00$0150ebca@nextsolution.co.jp> (raw)
In-Reply-To: <m3aep6vnxj.fsf@isil.maya.com>

John Prevost wrote in response to John Skaller:
> > 	Probably not. There is a distinction, for example, between
> > concatenating two arrays of code points, and producing a new array
> > of code points corresponding to the concatenated script.  This is
> > 'most true' in Arabic, but it is also true in plain old English: the
> > script
>   {...}
>
> I don't believe that this is a job for the basic string manipulation
> stuff to do.  There do need to be methods for manipulating strings as
> sequences.  As such, I'm not going to worry about it at this level.

Maybe he is arguing for a higher-level approach to text representation. Text
is, after all, an important enough application field that it deserves some
treatment in a standard library. Unfortunately (as we are all bearing
witness to in this discussion) there is no standard way to do it, even for
one (natural) language, much less all of them.

> The big difficulty here is that not everybody wants to eat Unicode.  I
> think it's appropriate, but not everyone does.  And there are still
> characters in iso-2022, for example, which have no Unicode code point.
>
> I think that as I look at the problem more, though, I'm inclined to
> say "one definitive set of characters" is a better idea.  Especially
> since that set is needed for reasonable interoperation between the
> others.

This is getting a little afield from the topic of the list, but I have been
wondering if maybe Unicode (and other related standards) might not end up
being most valuable in the long run, not for being able to represent text
from existing languages but, for shaping the form of languages to come. I
don't mean completely new languages of course (I think I heard Klingon and
Tengwar were actually submitted for inclusion!), but rather that as
electronic information becomes more and more central to communication, the
need to encode it will change the way people speak and write in other media
as well. I think this is happening already, sort of: for example, I've seen
email emoticons used in published materials, and on billboards here in
Japan; and we all know a hundred words from computer jargon that have made
it into mainstream languages.

> Something I've noted looking at O'Caml these last few days: the
> "string" type is really more an efficient byte array type.  And the
> char type is really a byte type.  There's no real way to do "input
> bytes from a stream" except inputting them as characters and then
> interpreting those characters as bytes.

That's exactly the way I think of and use O'Caml characters and strings. It
is sort of unfortunate that O'Caml inherited this terminology from C... I
would have preferred "byte" or "octet" for characters, at least.

--FAC

next prev parent reply	other threads:[~1999-10-28 17:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-10-22  0:25 John Prevost
1999-10-25 18:55 ` skaller
1999-10-26  2:46   ` John Prevost
1999-10-26 13:36     ` Frank A. Christoph [this message]
1999-10-26 22:02       ` skaller
1999-10-26 20:16     ` skaller
1999-10-27  0:37       ` John Prevost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000601bf1fb7$21a4fc00$0150ebca@nextsolution.co.jp' \
    --to=christo@nextsolution.co.jp \
    --cc=caml-list@inria.fr \
    --cc=prevost@maya.com \
    --cc=skaller@maxtal.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).