From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/16925
Path: main.gmane.org!not-for-mail
From: =?ISO-8859-1?Q?Fran=E7ois_Pinard?= <pinard@iro.umontreal.ca>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: "Coding system"?  Eh?
Date: 11 Sep 1998 10:16:20 +-400
Sender: owner-ding@hpc.uh.edu
Message-ID: <oqww7a20qz.fsf@icule.progiciels-bpi.ca>
References: <m31zpqtups.fsf@sparky.gnus.org> <v1tbtouwmf6.fsf@peoria.mt.cs.cmu.edu> <m3ww7ie31s.fsf@sparky.gnus.org> <jpd8982byv.fsf@tinget.lysator.liu.se> <oq1zpkisuh.fsf@icule.progiciels-bpi.ca> <jp4sug6srx.fsf@sandra.lysator.liu.se>
NNTP-Posting-Host: coloc-standby.netfonds.no
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: main.gmane.org 1035155717 30682 80.91.224.250 (20 Oct 2002 23:15:17 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Sun, 20 Oct 2002 23:15:17 +0000 (UTC)
Cc: ding@gnus.org
Return-Path: <owner-ding@hpc.uh.edu>
Original-Received: from gizmo.hpc.uh.edu (gizmo.hpc.uh.edu [129.7.102.31])
	by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id KAA14014
	for <jason@mailhost.sclp.com>; Fri, 11 Sep 1998 10:52:02 -0400 (EDT)
Original-Received: from sina.hpc.uh.edu (sina.hpc.uh.edu [129.7.3.5]) by gizmo.hpc.uh.edu (8.7.6/8.7.3) with ESMTP id JAF11206; Fri, 11 Sep 1998 09:23:03 -0500
Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Fri, 11 Sep 1998 09:50:31 -0500 (CDT)
Original-Received: from sclp3.sclp.com (root@sclp3.sclp.com [209.195.19.139]) by sina.hpc.uh.edu (8.7.3/8.7.3) with ESMTP id JAA04493 for <ding@hpc.uh.edu>; Fri, 11 Sep 1998 09:50:12 -0500 (CDT)
Original-Received: from pluton.rtsq.qc.ca (pluton.grics.qc.ca [199.84.132.10])
	by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id KAA13964
	for <ding@gnus.org>; Fri, 11 Sep 1998 10:50:03 -0400 (EDT)
Original-Received: by pluton.rtsq.qc.ca (8.8.8/8.8.8) with UUCP id KAA03912; Fri, 11 Sep 1998 10:16:55 -0400
Original-Received: by icule.progiciels-bpi.ca (8.8.5/8.7.3) id KAA01567; Fri, 11 Sep 1998 10:16:21 -0400
Original-To: davidk@lysator.liu.se (David Kågedal)
X-Face: "b_m|CE6#'Q8fliQrwHl9K,]PA_o'*S~Dva{~b1n*)K*A(BIwQW.:LY?t4~xhYka_.LV?Qq
 `}X|71X0ea&H]9Dsk!`kxBXlG;q$mLfv_vtaHK_rHFKu]4'<*LWCyUe@ZcI6"*wB5M@[m<Ok5/cC^=
 CxDhg=TJi^o[E
In-Reply-To: davidk@lysator.liu.se's message of "10 Sep 1998 14:45:06 +0200"
Original-Lines: 58
User-Agent: Gnus/5.070023 Emacs/20.3 (Pterodactyl Gnus v0.23)
Precedence: list
X-Majordomo: 1.94.jlt7
Xref: main.gmane.org gmane.emacs.gnus.general:16925
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:16925

davidk@lysator.liu.se (David Kågedal) écrit:

> > Does Unicode now defines UTF-7?  It originated from the IETF, and UTF-7
> > is specifically for MIME contexts, which Unicode does not address.

> I might be wrong about the origin of UTF-7.  But it's still ugly.

We are all helping each other, here, it is not that important that we
are wrong or right, as long as we improve.  If UTF-7 has been adopted
by Unicode, I would surely have liked to know, because the `recode'
documentation would then need to be adjusted.

About the ugliness of UTF-7, I agree to a certain extent.  For ding readers
who do not know, UTF-7 is a kind of quoted-printable for characters using
more than 8 bits, and is suited for transmission over 7 bit channels.
Very roughly said, instead of `=', it uses `+', and instead of hexadecimal
values, it uses in-lined Base64.

I found it a bit painful to write an UTF-7 encoder and decoder, but now that
it's done, the algorithmic ugliness (which is another kind of ugliness)
is all hidden in black boxes, and we might consider that it is not in the
way anymore.  UTF-8 has its elegances, but still it is slightly painful
to write _efficient_ encoders/decoders.

For transmission of Unicode or ISO 10646 message bodies, it looks to me that
we have the choice between UTF-8 and UTF-7.  UCS-2 and UCS-4 are internal
formats not well suited for transmission, UTF-1 is obsolete, and UTF-16
is not much better than UCS-2 for transmission before all machines replace
8-bit bytes with 16-bit bytes, and this will not happen in this century :-).

In fact, we have to look at things with a cold eye, here.  If you do
not have an integrated decoder in Gnus or in your other mail readers,
I would not be sure which of UTF-8 or UTF-7 looks uglier.  UTF-8 would
look like a mix of ASCII and binary dump, UTF-7 would look like ASCII
with fragments of Base64 in it.  I might prefer UTF-7, after all, maybe.
And if you have a decoder well integrated, you do not see the ugliness.
The algorithmic ugliness is hidden once and for all in black boxes anyway,
and then, it does not really matter.

> The difference between UTF-16 and UCS-2 is that it can encode some of
> the charaters outside the Unicode range (BMP).  So I guess Unicode has
> no need for UTF-16.

Unicode needs it, because people are beginning to see that 65.000
characters are not as _enough_ as it was once thought (hmph! I suspect
this is strange English :-).  I mean that a few years ago, it was believed
that 65.000 characters were to satisfied all our needs for a lot of years,
but relatively soon, people began to see that it is not enough, and that
we need a way to get more characters.  ISO 10646 had much higher goals to
start with, so it did not have that problem.  UTF-16 extends the Unicode
set to around 1.000.000 characters, still much less than ISO 10646, but
yet, much more comfortable than 65.000 -- and ISO 10646 later made room
in its BMP so the UTF-16 technique be more simply implementable.  I do
not think ISO 10646 ever needed UTF-16, but it wanted Unicode compatibility.

-- 
François Pinard                            mailto:pinard@iro.umontreal.ca
Join the free Translation Project!    http://www.iro.umontreal.ca/~pinard