From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/53734 Path: main.gmane.org!not-for-mail From: Oliver Scholz Newsgroups: gmane.emacs.gnus.general Subject: Re: Gnus: UTF-8 and compatibility with other MUAs Date: Sat, 16 Aug 2003 17:36:45 +0200 Sender: ding-owner@lists.math.uh.edu Message-ID: References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1061048821 32049 80.91.224.253 (16 Aug 2003 15:47:01 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 16 Aug 2003 15:47:01 +0000 (UTC) Original-X-From: ding-owner+M2278@lists.math.uh.edu Sat Aug 16 17:46:59 2003 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19o3Gd-0003iq-00 for ; Sat, 16 Aug 2003 17:46:59 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19o3FG-0004SH-00; Sat, 16 Aug 2003 10:45:34 -0500 Original-Received: from sclp3.sclp.com ([64.157.176.121]) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19o3F9-0004SB-00 for ding@lists.math.uh.edu; Sat, 16 Aug 2003 10:45:27 -0500 Original-Received: (qmail 59315 invoked by alias); 16 Aug 2003 15:45:26 -0000 Original-Received: (qmail 59310 invoked from network); 16 Aug 2003 15:45:26 -0000 Original-Received: from main.gmane.org (80.91.224.249) by sclp3.sclp.com with SMTP; 16 Aug 2003 15:45:26 -0000 Original-Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 19o3GC-0001ps-00 for ; Sat, 16 Aug 2003 17:46:32 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-To: ding@gnus.org Original-Received: from sea.gmane.org ([80.91.224.252]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19o3GB-0001pk-00 for ; Sat, 16 Aug 2003 17:46:31 +0200 Original-Received: from news by sea.gmane.org with local (Exim 3.35 #1 (Debian)) id 19o3F4-0008Iu-00 for ; Sat, 16 Aug 2003 17:45:22 +0200 Original-Lines: 93 Original-X-Complaints-To: usenet@sea.gmane.org X-Attribution: os X-Face: "HgH2sgK|bfH$;PiOJI6|qUCf.ve<51_Od(%ynHr?=>znn#~#oS>",F%B8&\vus),2AsPYb -n>PgddtGEn}s7kH?7kH{P_~vu?]OvVN^qD(L)>G^gDCl(U9n{:d>'DkilN!_K"eNzjrtI4Ya6;Td% IZGMbJ{lawG+'J>QXPZD&TwWU@^~A}f^zAb[Ru;CT(UA]c& User-Agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (windows-nt) Cancel-Lock: sha1:6b3bHGzY2zbO2wXqzTZFBM4cIHA= Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:53734 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:53734 Jesper Harder writes: > Oliver Scholz writes: > >> If you are satisfied with a _fair_ chance to be flawlessly readable >> at the other end, you may use UTF-8. > > But the purpose of email is to _communicate_. Why lower you chance of > cummunicating if there is no compelling technical reason to do so? First of all: I am not talking about UTF-16 or UTF-7, and I am not talking about Greek, Hebrew or Arabic. I am talking about UTF-8 for Latin-based scripts. Even if there is no UTF-8 support at all at the other end, communication won't fail. As things stand I would not yet recommend UTF-8 to a Greek user, for example. Now and then I realize in German Usenet, that a few people who post replies to my articles can not deal with UTF-8, because when they quote the text I wrote, I see funny characters instead of umlauts. This is not a big impediment to communication. I doubt that anybody would put me into his or her killfile, because I use UTF-8. And, yes, there is a technical reason that Unicode should become the default text encoding in the future. The fact that we have a myriad of different encodings to choose from causes a lot of trouble; just consider how many questions there are in the various Emacs newsgroups about coding system issues; and this is just the top of the iceberg. Sure, Unicode makes sometimes trouble, too. But at least one could say that these are problems of transition. If we don't move to Unicode in the future then coding system problems will go on forever and ever. If we stick to 256-characters encodings forever, then Latin-9 won't be the last invention that we will have seen. There may be a need for a new character in three, five, seven years. Who knows? Latin-10 is already in final state. What should save us from Latin-11, Latin-12 .... Latin-N, if not a single unified encoding that is designed to match any need now and in the future? My guess -- by the way -- is that Unicode will become increasingly important in Europe, especially for the members of the EU. We'd need at least Latin-1/Latin-9, Latin-2 and Greek (ISO 8859-7). And I am not sure if that already covers Latvian, Romanian and others. There will be a growing need for an encoding that covers all of these languages. Then, if you want to be absolutely sure that everything works as expected, then you only option is ASCII. Maybe Latin-1 is also o.k. for a Western European. But every encoding that contains an Euro sign is a big no-no. I really hope for a future (however remote it may be), where I can be sure that every text file I find on a computer is either ASCII, UTF-8 or UTF-16. When we'll look back then, we will regard this whole ISO 8859-soup as something as strange and weird as EBDIC. >> How long it will take for Unicode to become as widespread in western >> Europe as Latin-1 is now -- I don't know. But so far it has spread >> very rapidly. > > 1. Application support isn't that great. Emacs, (La)TeX and Texinfo > don't support Unicode fully (those are some of the most important > applications as far as I'm concerned). The Unicode support for Emacs is quite good; there may be issues with CJK in the current released version of Emacs, but the rest works fine. But yes, LaTex and Texinfo (especially Texinfo) need fixing. Even I, Unicode-Jacobite that I am, use Latin-1 for my LaTeX stuff. But AFAIK there is some work going on, fortunately. The babel encoding (sic!) for classical Greek (to take an example that is important for me) is a nuisance. It is about time for LaTeX to support Unicode. > 2. Unicode support itself doesn't really buy me a lot if most people > don't have fairly complete Unicode fonts (which they don't). [...] So the worst thing that could happen is that they see a hollow box now and then. And yet some characters are more frequent than others. You can probably rely on the fact that western Europeans have fonts that contain the Latin-1 repertoire. Box drawing characters or symbols may not be that frequent, but there is a good chance to get the additional punctuation characters. In the future, when UTF-8 will be the default in Mail and News, this shouldn't be a problem anymore. People who read mailing lists about classical Greek, will make sure that they have a font containing “Greek Extended”; the regulars of alt.fan.tolkien (whatever) will make sure that they can display Tengwar, Star Trek fans will use fonts including Klingon etc. etc. Oliver -- 29 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité!