From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/53734
Path: main.gmane.org!not-for-mail
From: Oliver Scholz <alkibiades@gmx.de>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: Gnus: UTF-8 and compatibility with other MUAs
Date: Sat, 16 Aug 2003 17:36:45 +0200
Sender: ding-owner@lists.math.uh.edu
Message-ID: <uoeyp62cy.fsf@ID-87814.user.dfncis.de>
References: <plop87brus6y07.fsf@gnu-rox.org> <m3oeyrkfnc.fsf@defun.localdomain>
 <ubruruj1i.fsf@ID-87814.user.dfncis.de> <m3isoy50kt.fsf@defun.localdomain>
 <uhe4in64v.fsf@ID-87814.user.dfncis.de> <m3u18i30xi.fsf@defun.localdomain>
 <u8ypulyqs.fsf@ID-87814.user.dfncis.de> <m3ada9vjrl.fsf@defun.localdomain>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Trace: sea.gmane.org 1061048821 32049 80.91.224.253 (16 Aug 2003 15:47:01 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Sat, 16 Aug 2003 15:47:01 +0000 (UTC)
Original-X-From: ding-owner+M2278@lists.math.uh.edu Sat Aug 16 17:46:59 2003
Return-path: <ding-owner+M2278@lists.math.uh.edu>
Original-Received: from malifon.math.uh.edu ([129.7.128.13])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 19o3Gd-0003iq-00
	for <ding-account@gmane.org>; Sat, 16 Aug 2003 17:46:59 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu)
	by malifon.math.uh.edu with smtp (Exim 3.20 #1)
	id 19o3FG-0004SH-00; Sat, 16 Aug 2003 10:45:34 -0500
Original-Received: from sclp3.sclp.com ([64.157.176.121])
	by malifon.math.uh.edu with smtp (Exim 3.20 #1)
	id 19o3F9-0004SB-00
	for ding@lists.math.uh.edu; Sat, 16 Aug 2003 10:45:27 -0500
Original-Received: (qmail 59315 invoked by alias); 16 Aug 2003 15:45:26 -0000
Original-Received: (qmail 59310 invoked from network); 16 Aug 2003 15:45:26 -0000
Original-Received: from main.gmane.org (80.91.224.249)
  by sclp3.sclp.com with SMTP; 16 Aug 2003 15:45:26 -0000
Original-Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian))
	id 19o3GC-0001ps-00
	for <ding@gnus.org>; Sat, 16 Aug 2003 17:46:32 +0200
X-Injected-Via-Gmane: http://gmane.org/
Original-To: ding@gnus.org
Original-Received: from sea.gmane.org ([80.91.224.252])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 19o3GB-0001pk-00
	for <gmane-emacs-gnus-general@m.gmane.org>; Sat, 16 Aug 2003 17:46:31 +0200
Original-Received: from news by sea.gmane.org with local (Exim 3.35 #1 (Debian))
	id 19o3F4-0008Iu-00
	for <gmane-emacs-gnus-general@m.gmane.org>; Sat, 16 Aug 2003 17:45:22 +0200
Original-Lines: 93
Original-X-Complaints-To: usenet@sea.gmane.org
X-Attribution: os
X-Face: "HgH2sgK|bfH$;PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6;Td%
 IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c&
User-Agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (windows-nt)
Cancel-Lock: sha1:6b3bHGzY2zbO2wXqzTZFBM4cIHA=
Precedence: bulk
Xref: main.gmane.org gmane.emacs.gnus.general:53734
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:53734

Jesper Harder <harder@myrealbox.com> writes:

> Oliver Scholz <alkibiades@gmx.de> writes:
>
>> If you are satisfied with a _fair_ chance to be flawlessly readable
>> at the other end, you may use UTF-8.
>
> But the purpose of email is to _communicate_.  Why lower you chance of
> cummunicating if there is no compelling technical reason to do so?

First of all: I am not talking about UTF-16 or UTF-7, and I am not
talking about Greek, Hebrew or Arabic. I am talking about UTF-8 for
Latin-based scripts. Even if there is no UTF-8 support at all at the
other end, communication won't fail. As things stand I would not yet
recommend UTF-8 to a Greek user, for example. Now and then I realize
in German Usenet, that a few people who post replies to my articles
can not deal with UTF-8, because when they quote the text I wrote, I
see funny characters instead of umlauts. This is not a big impediment
to communication. I doubt that anybody would put me into his or her
killfile, because I use UTF-8.

And, yes, there is a technical reason that Unicode should become the
default text encoding in the future. The fact that we have a myriad of
different encodings to choose from causes a lot of trouble; just
consider how many questions there are in the various Emacs newsgroups
about coding system issues; and this is just the top of the
iceberg. Sure, Unicode makes sometimes trouble, too. But at least one
could say that these are problems of transition. If we don't move to
Unicode in the future then coding system problems will go on forever
and ever.

If we stick to 256-characters encodings forever, then Latin-9 won't be
the last invention that we will have seen. There may be a need for a
new character in three, five, seven years. Who knows? Latin-10 is
already in final state. What should save us from Latin-11, Latin-12
.... Latin-N, if not a single unified encoding that is designed to
match any need now and in the future?

My guess -- by the way -- is that Unicode will become increasingly
important in Europe, especially for the members of the EU. We'd need
at least Latin-1/Latin-9, Latin-2 and Greek (ISO 8859-7). And I am not
sure if that already covers Latvian, Romanian and others. There will
be a growing need for an encoding that covers all of these languages.

Then, if you want to be absolutely sure that everything works as
expected, then you only option is ASCII. Maybe Latin-1 is also
o.k. for a Western European. But every encoding that contains an Euro
sign is a big no-no.

I really hope for a future (however remote it may be), where I can be
sure that every text file I find on a computer is either ASCII, UTF-8
or UTF-16. When we'll look back then, we will regard this whole ISO
8859-soup as something as strange and weird as EBDIC.

>> How long it will take for Unicode to become as widespread in western
>> Europe as Latin-1 is now -- I don't know. But so far it has spread
>> very rapidly.
>
> 1. Application support isn't that great.  Emacs, (La)TeX and Texinfo
>    don't support Unicode fully (those are some of the most important
>    applications as far as I'm concerned).

The Unicode support for Emacs is quite good; there may be issues with
CJK in the current released version of Emacs, but the rest works
fine. But yes, LaTex and Texinfo (especially Texinfo) need
fixing. Even I, Unicode-Jacobite that I am, use Latin-1 for my LaTeX
stuff. But AFAIK there is some work going on, fortunately. The babel
encoding (sic!) for classical Greek (to take an example that is
important for me) is a nuisance. It is about time for LaTeX to support
Unicode.

> 2. Unicode support itself doesn't really buy me a lot if most people
>    don't have fairly complete Unicode fonts (which they don't).
[...]

So the worst thing that could happen is that they see a hollow box now
and then. And yet some characters are more frequent than others. You
can probably rely on the fact that western Europeans have fonts that
contain the Latin-1 repertoire. Box drawing characters or symbols may
not be that frequent, but there is a good chance to get the additional
punctuation characters.

In the future, when UTF-8 will be the default in Mail and News, this
shouldn't be a problem anymore. People who read mailing lists about
classical Greek, will make sure that they have a font containing
“Greek Extended”; the regulars of alt.fan.tolkien (whatever) will make
sure that they can display Tengwar, Star Trek fans will use fonts
including Klingon etc. etc.

    Oliver
-- 
29 Thermidor an 211 de la Révolution
Liberté, Egalité, Fraternité!