* Gnus: UTF-8 and compatibility with other MUAs @ 2003-08-14 15:48 Xavier Maillard 2003-08-14 22:39 ` Frank Schmitt ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Xavier Maillard @ 2003-08-14 15:48 UTC (permalink / raw) [-- Attachment #1: Type: text/plain, Size: 506 bytes --] Hi, I know Emacs is able to use utf-8 encoding so Gnus is. My question is more a question of compliance with other MUAs. Would you recommend your users to use utf-8 as a default encoding system ? AFAIK, I can't see many MUAs aware of it and worst almost nobody is using utf-8 which was presented as the future. So what is the problem with utf in general that prevent users in general to use it defaultly ? Regards, zeDek -- alt.mcdonalds Can I get fries with that? [-- Attachment #2: Type: application/pgp-signature, Size: 188 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-14 15:48 Gnus: UTF-8 and compatibility with other MUAs Xavier Maillard @ 2003-08-14 22:39 ` Frank Schmitt 2003-08-15 18:22 ` Xavier Maillard 2003-08-14 23:01 ` Jesper Harder 2003-08-14 23:05 ` Simon Josefsson 2 siblings, 1 reply; 37+ messages in thread From: Frank Schmitt @ 2003-08-14 22:39 UTC (permalink / raw) Xavier Maillard <zedek@gnu-rox.org> writes: > My question is more a question of compliance with other MUAs. > Would you recommend your users to use utf-8 as a default encoding > system ? AFAIK, I can't see many MUAs aware of it and worst almost > nobody is using utf-8 which was presented as the future. So what is the > problem with utf in general that prevent users in general to use it > defaultly ? Well, it's the chicken-egg-problem. People don't use UTF-8 since quite some MUAs don't support it and some authors of MUAs don't add support since few people use it. Nevertheless I've got the impression that today most MUAs will handle Unicode quite well. I send UTF-8 in both Mail and News and few people told me they couldn't read my messages properly. -- Did you ever realize how much text fits in eighty columns? If you now consider that a signature usually consists of up to four lines, this gives you enough space to spread a tremendous amount of information with your messages. So seize this opportunity and don't waste your signature with bullshit nobody will read. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-14 22:39 ` Frank Schmitt @ 2003-08-15 18:22 ` Xavier Maillard 0 siblings, 0 replies; 37+ messages in thread From: Xavier Maillard @ 2003-08-15 18:22 UTC (permalink / raw) [-- Attachment #1: Type: text/plain, Size: 1200 bytes --] Frank Schmitt <usereplyto@frank-schmitt.net> writes: > Xavier Maillard <zedek@gnu-rox.org> writes: > > > My question is more a question of compliance with other MUAs. > > Would you recommend your users to use utf-8 as a default encoding > > system ? AFAIK, I can't see many MUAs aware of it and worst almost > > nobody is using utf-8 which was presented as the future. So what is > > the problem with utf in general that prevent users in general to > > use it defaultly ? > > Well, it's the chicken-egg-problem. People don't use UTF-8 since > quite some MUAs don't support it and some authors of MUAs don't add > support since few people use it. Yep I have the same impression right now. > Nevertheless I've got the impression that today most MUAs will handle > Unicode quite well. I send UTF-8 in both Mail and News and few people > told me they couldn't read my messages properly. They stay readable but seems to be pain to read a mail containing accentuated characters. zeDek -- "Die Geteilten selbst sind jedoch nie Feindbild der Vereiner, denn dies sind immer nur die Teiler." Norbert Harry Marzahn <70oKlG6LbXB@nm01.vision.IN-BRB.DE> [-- Attachment #2: Type: application/pgp-signature, Size: 188 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-14 15:48 Gnus: UTF-8 and compatibility with other MUAs Xavier Maillard 2003-08-14 22:39 ` Frank Schmitt @ 2003-08-14 23:01 ` Jesper Harder 2003-08-15 13:50 ` Oliver Scholz 2003-08-15 18:24 ` Xavier Maillard 2003-08-14 23:05 ` Simon Josefsson 2 siblings, 2 replies; 37+ messages in thread From: Jesper Harder @ 2003-08-14 23:01 UTC (permalink / raw) Xavier Maillard <zedek@gnu-rox.org> writes: > I know Emacs is able to use utf-8 encoding so Gnus is. > > My question is more a question of compliance with other MUAs. > Would you recommend your users to use utf-8 as a default encoding > system ? No, because there's no reason to use UTF-8 if a more widely supported charset is sufficient. To use UTF-8 by default would also be against RFC 2046: ,----[ RFC 2046, Section 4.1.2. ] | | In general, composition software should always use the "lowest common | denominator" character set possible. For example, if a body contains | only US-ASCII characters, it SHOULD be marked as being in the US- | ASCII character set, not ISO-8859-1, which, like all the ISO-8859 | family of character sets, is a superset of US-ASCII. More generally, | if a widely-used character set is a subset of another character set, | and a body contains only characters in the widely-used subset, it | should be labelled as being in that subset. This will increase the | chances that the recipient will be able to view the resulting entity | correctly. `---- But if the message contains characters (or combination of characters) where a _single_ iso-8859-x charset can't be used, then by all means use UTF-8. This is far better than sending a multipart message (which Gnus does if UTF-8 isn't available). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-14 23:01 ` Jesper Harder @ 2003-08-15 13:50 ` Oliver Scholz 2003-08-15 16:48 ` Jesper Harder 2003-08-15 18:24 ` Xavier Maillard 1 sibling, 1 reply; 37+ messages in thread From: Oliver Scholz @ 2003-08-15 13:50 UTC (permalink / raw) Jesper Harder <harder@myrealbox.com> writes: [...] > To use UTF-8 by default would also be against RFC 2046: > > ,----[ RFC 2046, Section 4.1.2. ] > | > | In general, composition software should always use the "lowest common > | denominator" character set possible. For example, if a body contains > | only US-ASCII characters, it SHOULD be marked as being in the US- > | ASCII character set, not ISO-8859-1, which, like all the ISO-8859 > | family of character sets, is a superset of US-ASCII. More generally, > | if a widely-used character set is a subset of another character set, > | and a body contains only characters in the widely-used subset, it > | should be labelled as being in that subset. This will increase the > | chances that the recipient will be able to view the resulting entity > | correctly. > `---- [...] That's not how I read the section you quoted. In my reading this means that you should not declare the message to be in UTF-8, when it contains only ASCII characters. For characters from the right hand part of ISO 8859-1 this is not so simple: Latin-1 (as a coded character set) may be a subset of UCS. But Latin-1 (as a character encoding scheme) is _not_ a subset of UTF-8. The lowest common denominator for most German text is ISO 646-DE. For most Danish text (I presume) ISO 646-DK. Virtually nobody uses those coding systems anymore, and IMNSHO nobody should use them. (I have implemented ISO 646-DE for GNU Emacs in a way that it could be easily extended to other national variants of ISO 646, in case you are interested ...) Sure, one could say that the national variants of ISO 646 are excluded by the phrase “widely-used character sets”, but that is a bit too fuzzy for my taste. Taken literally nobody should use ISO 8859-15 then, unless the message really contains an € (or one of the other 7 characters). Maybe this is what this section wants to say, but then I dare say that it doesn't make much sense as a technical rule and I am glad that it is not stated in a way that makes it mandatory. Oliver -- 28 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-15 13:50 ` Oliver Scholz @ 2003-08-15 16:48 ` Jesper Harder 2003-08-15 18:10 ` Oliver Scholz 0 siblings, 1 reply; 37+ messages in thread From: Jesper Harder @ 2003-08-15 16:48 UTC (permalink / raw) Oliver Scholz <alkibiades@gmx.de> writes: > The lowest common denominator for most German text is ISO > 646-DE. For most Danish text (I presume) ISO 646-DK. Virtually > nobody uses those coding systems anymore, and IMNSHO nobody should > use them. The RFC does say that ISO-8859 is prefered over ISO 646: Note that the ISO 646 character sets have deliberately been omitted in favor of their 8859 replacements, which are the designated character sets for Internet mail. > Taken literally nobody should use ISO 8859-15 then, unless the > message really contains an € (or one of the other 7 > characters). I agree with that. I don't see _any_ reason to use latin-9 if you don't need it. Some MUA's don't support latin-9 (including older versions of Gnus) -- why break those clients for no good reason? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-15 16:48 ` Jesper Harder @ 2003-08-15 18:10 ` Oliver Scholz 2003-08-16 0:23 ` Jesper Harder 0 siblings, 1 reply; 37+ messages in thread From: Oliver Scholz @ 2003-08-15 18:10 UTC (permalink / raw) Jesper Harder <harder@myrealbox.com> writes: > Oliver Scholz <alkibiades@gmx.de> writes: > >> The lowest common denominator for most German text is ISO >> 646-DE. For most Danish text (I presume) ISO 646-DK. Virtually >> nobody uses those coding systems anymore, and IMNSHO nobody should >> use them. > > The RFC does say that ISO-8859 is prefered over ISO 646: > > Note that the ISO 646 character sets have deliberately been omitted > in favor of their 8859 replacements, which are the designated > character sets for Internet mail. > Hmm. I guess it's time for me to finally read RFC 2046 ... >> Taken literally nobody should use ISO 8859-15 then, unless the >> message really contains an € (or one of the other 7 >> characters). > > I agree with that. I don't see _any_ reason to use latin-9 if you > don't need it. Some MUA's don't support latin-9 (including older > versions of Gnus) -- why break those clients for no good reason? Well, I think, if you want to maximize the chance that your message is flawlessly readable at the other end, this makes sense as a pragmatic rule. As a technical rule, however, which is important for the question whether a message is fully RFC compliant or not, it does not make sense. BTW, if the rule were that we should use the smallest, most widely used coded character set which covers the all necessary characters in a message, then western European users should use neither Latin-1 nor Latin-9, but windows-1252. However, from the section you quotet alone it is not entirely clear whether it refers to absctract characters, code points in a coded character set or octets in a character encoding scheme. The term “character set” may seem to indicate that they are talking about coded character sets, but RFC 2046 refers to RFC 2045 for the definition of the term “character set”. There it reads: NOTE: The term "character set" was originally to describe such straightforward schemes as US-ASCII and ISO-8859-1 which have a simple one-to-one mapping from single octets to single characters. Multi-octet coded character sets and switching techniques make the situation more complex. For example, some communities use the term "character encoding" for what MIME calls a "character set", while using the phrase "coded character set" to denote an abstract mapping from integers (not octets) to characters. So I'd say “character set” refers to the character encoding scheme. And in this sense the rule makes sense: if a message contains only characters from the ASCII repertoire it should be declared as US-ASCII, not as UTF-8. But that does not extend to ISO 8859-[[:digit:]]+, since UTF-8 and Latin-1 are not compatible. Oliver -- 28 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-15 18:10 ` Oliver Scholz @ 2003-08-16 0:23 ` Jesper Harder 2003-08-16 9:48 ` Oliver Scholz 0 siblings, 1 reply; 37+ messages in thread From: Jesper Harder @ 2003-08-16 0:23 UTC (permalink / raw) Oliver Scholz <alkibiades@gmx.de> writes: > Well, I think, if you want to maximize the chance that your message > is flawlessly readable at the other end That _is_ the raison d'être for MIME after all. > As a technical rule, however, which is important for the question > whether a message is fully RFC compliant or not, it does not make > sense. To be fair, the RFC does recognize that they weren't able to specify exact rules at the time: The character sets specified above are the ones that were relatively uncontroversial during the drafting of MIME. This document does not endorse the use of any particular character set other than US-ASCII, and recognizes that the future evolution of world character sets remains unclear. > BTW, if the rule were that we should use the smallest, most widely > used coded character set which covers the all necessary characters in > a message, then western European users should use neither Latin-1 nor > Latin-9, but windows-1252. No, because Windows-1252 isn't a standard, i.e. endorsed by IETF, ISO or another reputable standards body. (IANA registration doesn't make it a standard -- anyone can in principle register any old homebrewed charset with IANA). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 0:23 ` Jesper Harder @ 2003-08-16 9:48 ` Oliver Scholz 2003-08-16 13:01 ` Jesper Harder 0 siblings, 1 reply; 37+ messages in thread From: Oliver Scholz @ 2003-08-16 9:48 UTC (permalink / raw) Jesper Harder <harder@myrealbox.com> writes: > Oliver Scholz <alkibiades@gmx.de> writes: > >> Well, I think, if you want to maximize the chance that your message >> is flawlessly readable at the other end > > That _is_ the raison d'être for MIME after all. Yes, but I think we'd both agree that the chance is rather small, if I started to use the MIME compliant coding system ISO 2022 in western Europe. IMO Unicode offers the chance to escape the current tower-of-babel situation as far as character encodings are concerned. I'd like to compare the current state of affairs with western Europe in pre-Latin-1 time. I'd like to put it this way: If you are satisfied with a _fair_ chance to be flawlessly readable at the other end, you may use UTF-8. If you want to _maximize_ the chance that you are flawlessly readable at the other end, but don't want to sacrifice important national characters, you should follow the rules which Simon pointed out. If you want to be _sure_ that you are flawlessly readable at the other end, you should use US-ASCII. In Germany there are at least two conventions to express umlauts in plain ASCII. I'd guess that similar conventions exist for other languages. How long it will take for Unicode to become as widespread in western Europe as Latin-1 is now -- I don't know. But so far it has spread very rapidly. [...] >> BTW, if the rule were that we should use the smallest, most widely >> used coded character set which covers the all necessary characters in >> a message, then western European users should use neither Latin-1 nor >> Latin-9, but windows-1252. > > No, because Windows-1252 isn't a standard, i.e. endorsed by IETF, ISO > or another reputable standards body. (IANA registration doesn't make > it a standard -- anyone can in principle register any old homebrewed > charset with IANA). [Aside: Hmm, maybe it could be funny to register emacs-mule ...] I also prefer standards developed by official standards bodies, especially such like ISO, CEN (Europe) and DIN (Germany), because they are at least indirectly under democratic control. However, there are also things that are de facto standards, because of their widespread use. Windows-1252, Postscript and the English language, for example. I am pretty sure that there are more people around whose MUAs/NUAs can deal with windows-1252 than with Latin-9. Oliver -- 29 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 9:48 ` Oliver Scholz @ 2003-08-16 13:01 ` Jesper Harder 2003-08-16 15:36 ` Oliver Scholz 0 siblings, 1 reply; 37+ messages in thread From: Jesper Harder @ 2003-08-16 13:01 UTC (permalink / raw) Oliver Scholz <alkibiades@gmx.de> writes: > If you are satisfied with a _fair_ chance to be flawlessly readable > at the other end, you may use UTF-8. But the purpose of email is to _communicate_. Why lower you chance of cummunicating if there is no compelling technical reason to do so? > How long it will take for Unicode to become as widespread in western > Europe as Latin-1 is now -- I don't know. But so far it has spread > very rapidly. 1. Application support isn't that great. Emacs, (La)TeX and Texinfo don't support Unicode fully (those are some of the most important applications as far as I'm concerned). 2. Unicode support itself doesn't really buy me a lot if most people don't have fairly complete Unicode fonts (which they don't). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 13:01 ` Jesper Harder @ 2003-08-16 15:36 ` Oliver Scholz 2003-08-16 17:14 ` Reiner Steib ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Oliver Scholz @ 2003-08-16 15:36 UTC (permalink / raw) Jesper Harder <harder@myrealbox.com> writes: > Oliver Scholz <alkibiades@gmx.de> writes: > >> If you are satisfied with a _fair_ chance to be flawlessly readable >> at the other end, you may use UTF-8. > > But the purpose of email is to _communicate_. Why lower you chance of > cummunicating if there is no compelling technical reason to do so? First of all: I am not talking about UTF-16 or UTF-7, and I am not talking about Greek, Hebrew or Arabic. I am talking about UTF-8 for Latin-based scripts. Even if there is no UTF-8 support at all at the other end, communication won't fail. As things stand I would not yet recommend UTF-8 to a Greek user, for example. Now and then I realize in German Usenet, that a few people who post replies to my articles can not deal with UTF-8, because when they quote the text I wrote, I see funny characters instead of umlauts. This is not a big impediment to communication. I doubt that anybody would put me into his or her killfile, because I use UTF-8. And, yes, there is a technical reason that Unicode should become the default text encoding in the future. The fact that we have a myriad of different encodings to choose from causes a lot of trouble; just consider how many questions there are in the various Emacs newsgroups about coding system issues; and this is just the top of the iceberg. Sure, Unicode makes sometimes trouble, too. But at least one could say that these are problems of transition. If we don't move to Unicode in the future then coding system problems will go on forever and ever. If we stick to 256-characters encodings forever, then Latin-9 won't be the last invention that we will have seen. There may be a need for a new character in three, five, seven years. Who knows? Latin-10 is already in final state. What should save us from Latin-11, Latin-12 .... Latin-N, if not a single unified encoding that is designed to match any need now and in the future? My guess -- by the way -- is that Unicode will become increasingly important in Europe, especially for the members of the EU. We'd need at least Latin-1/Latin-9, Latin-2 and Greek (ISO 8859-7). And I am not sure if that already covers Latvian, Romanian and others. There will be a growing need for an encoding that covers all of these languages. Then, if you want to be absolutely sure that everything works as expected, then you only option is ASCII. Maybe Latin-1 is also o.k. for a Western European. But every encoding that contains an Euro sign is a big no-no. I really hope for a future (however remote it may be), where I can be sure that every text file I find on a computer is either ASCII, UTF-8 or UTF-16. When we'll look back then, we will regard this whole ISO 8859-soup as something as strange and weird as EBDIC. >> How long it will take for Unicode to become as widespread in western >> Europe as Latin-1 is now -- I don't know. But so far it has spread >> very rapidly. > > 1. Application support isn't that great. Emacs, (La)TeX and Texinfo > don't support Unicode fully (those are some of the most important > applications as far as I'm concerned). The Unicode support for Emacs is quite good; there may be issues with CJK in the current released version of Emacs, but the rest works fine. But yes, LaTex and Texinfo (especially Texinfo) need fixing. Even I, Unicode-Jacobite that I am, use Latin-1 for my LaTeX stuff. But AFAIK there is some work going on, fortunately. The babel encoding (sic!) for classical Greek (to take an example that is important for me) is a nuisance. It is about time for LaTeX to support Unicode. > 2. Unicode support itself doesn't really buy me a lot if most people > don't have fairly complete Unicode fonts (which they don't). [...] So the worst thing that could happen is that they see a hollow box now and then. And yet some characters are more frequent than others. You can probably rely on the fact that western Europeans have fonts that contain the Latin-1 repertoire. Box drawing characters or symbols may not be that frequent, but there is a good chance to get the additional punctuation characters. In the future, when UTF-8 will be the default in Mail and News, this shouldn't be a problem anymore. People who read mailing lists about classical Greek, will make sure that they have a font containing “Greek Extended”; the regulars of alt.fan.tolkien (whatever) will make sure that they can display Tengwar, Star Trek fans will use fonts including Klingon etc. etc. Oliver -- 29 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 15:36 ` Oliver Scholz @ 2003-08-16 17:14 ` Reiner Steib 2003-08-16 19:29 ` Oliver Scholz 2003-08-19 14:54 ` Miles Bader 2003-08-16 17:23 ` Simon Josefsson 2003-08-17 0:57 ` Jesper Harder 2 siblings, 2 replies; 37+ messages in thread From: Reiner Steib @ 2003-08-16 17:14 UTC (permalink / raw) On Sat, Aug 16 2003, Oliver Scholz wrote: > The Unicode support for Emacs is quite good; there may be issues > with CJK in the current released version of Emacs, but the rest > works fine. Not only in the released versions, see this thread on emacs-devel: <URL:http://article.gmane.org/gmane.emacs.devel/13487>. > But yes, LaTex and Texinfo (especially Texinfo) need fixing. Texinfo until recently (I didn't find time to check 4.5 and 4.6) didn't even support Latin-1 @documentencoding. See <URL:http://search.gmane.org/search.php?query=documentencoding& group=gmane.comp.tex.texinfo.general> > Even I, Unicode-Jacobite that I am, use Latin-1 for my LaTeX > stuff. But AFAIK there is some work going on, fortunately. [...] It > is about time for LaTeX to support Unicode. Maybe interesting for you (I didn't test it. BTW: Did you ever try Omega?): ,---- | From: Frank Mittelbach <frank.mittelbach@latex-project.org> | Newsgroups: de.comp.text.tex | Subject: ankuendigung: utf8 support fuer inputenc | Date: Mon, 26 May 2003 21:06:17 +0200 | Message-ID: <batoro$5cj$2@online.de> `---- Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- PGP key available via WWW http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 17:14 ` Reiner Steib @ 2003-08-16 19:29 ` Oliver Scholz 2003-08-19 14:54 ` Miles Bader 1 sibling, 0 replies; 37+ messages in thread From: Oliver Scholz @ 2003-08-16 19:29 UTC (permalink / raw) Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes: [...] > Maybe interesting for you (I didn't test it. BTW: Did you ever try > Omega?): > > ,---- > | From: Frank Mittelbach <frank.mittelbach@latex-project.org> > | Newsgroups: de.comp.text.tex > | Subject: ankuendigung: utf8 support fuer inputenc > | Date: Mon, 26 May 2003 21:06:17 +0200 > | Message-ID: <batoro$5cj$2@online.de> > `---- [...] Thanks. It is good to know that this is going to be part of the main distribution. I was not yet able to make it work, but I will try again as soon as I have my GNU/Linux up and running again. I only hope that it lets me write Greek text without a special switching command, which was required by the previous effort from Dominique Unruh. No, I have not yet tried Omega. I didn't understand the documentation, or I didn't find the right documentation. Could anybody point me to an introduction to Omega for LaTeX-dummies? Oliver -- 29 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 17:14 ` Reiner Steib 2003-08-16 19:29 ` Oliver Scholz @ 2003-08-19 14:54 ` Miles Bader 2003-08-20 15:24 ` Reiner Steib 1 sibling, 1 reply; 37+ messages in thread From: Miles Bader @ 2003-08-19 14:54 UTC (permalink / raw) Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes: > > The Unicode support for Emacs is quite good; there may be issues > > with CJK in the current released version of Emacs, but the rest > > works fine. > > Not only in the released versions, see this thread on emacs-devel: > <URL:http://article.gmane.org/gmane.emacs.devel/13487>. Did you try turning on `utf-translate-cjk-mode' (in CVS emacs)? It enables UTF-8 CJK support. -Miles -- We live, as we dream -- alone.... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-19 14:54 ` Miles Bader @ 2003-08-20 15:24 ` Reiner Steib 2003-08-21 0:20 ` Miles Bader 0 siblings, 1 reply; 37+ messages in thread From: Reiner Steib @ 2003-08-20 15:24 UTC (permalink / raw) On Tue, Aug 19 2003, Miles Bader wrote: > Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes: >> Not only in the released versions, see this thread on emacs-devel: >> <URL:http://article.gmane.org/gmane.emacs.devel/13487>. > > Did you try turning on `utf-translate-cjk-mode' (in CVS emacs)? No, since I don't need CJK myself (and usually use Emacs 21.3). > It enables UTF-8 CJK support. But UTF support in CVS (HEAD) is not complete yet (as describe in the abovementioned thread), is it? Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- PGP key available via WWW http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-20 15:24 ` Reiner Steib @ 2003-08-21 0:20 ` Miles Bader 0 siblings, 0 replies; 37+ messages in thread From: Miles Bader @ 2003-08-21 0:20 UTC (permalink / raw) Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes: > > It enables UTF-8 CJK support. > > But UTF support in CVS (HEAD) is not complete yet (as describe in the > abovementioned thread), is it? I don't know what you mean by `complete'*, but as far as I know the above-mentioned CJK support was the main big omission. It's not turned by default because it loads some big lisp files to do the mappings. * I suppose there will always be small differences, until the real emacs unicode branch becomes official (which should reasonably soon I think). -miles -- We have met the enemy... and he is us. -- Pogo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 15:36 ` Oliver Scholz 2003-08-16 17:14 ` Reiner Steib @ 2003-08-16 17:23 ` Simon Josefsson 2003-08-16 19:18 ` Oliver Scholz ` (2 more replies) 2003-08-17 0:57 ` Jesper Harder 2 siblings, 3 replies; 37+ messages in thread From: Simon Josefsson @ 2003-08-16 17:23 UTC (permalink / raw) Cc: ding Oliver Scholz <alkibiades@gmx.de> writes: > In the future, when UTF-8 will be the default in Mail and News, this > shouldn't be a problem anymore. People who read mailing lists about > classical Greek, will make sure that they have a font containing > “Greek Extended”; the regulars of alt.fan.tolkien (whatever) will make > sure that they can display Tengwar, Star Trek fans will use fonts > including Klingon etc. etc. Wasn't the Klingon proposal for Unicode rejected? Tengwar has been a proposal for ten years, or so, and nothing has happend, as far as I know. > I really hope for a future (however remote it may be), where I can be > sure that every text file I find on a computer is either ASCII, UTF-8 > or UTF-16. UTF-16? It's not even a well define encoding scheme, two files may contain the exact same Unicode code points, but may differ in a binary comparison, due to byte ordering. And concatenating two UTF-16 strings from different sources requires knowledge about the encoding. And surrogate pairs complicate matters as well. > When we'll look back then, we will regard this whole ISO 8859-soup > as something as strange and weird as EBDIC. I wish I could be that optimistic. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 17:23 ` Simon Josefsson @ 2003-08-16 19:18 ` Oliver Scholz 2003-08-16 22:24 ` Simon Josefsson 2003-08-18 2:09 ` James H. Cloos Jr. 2003-08-28 13:35 ` Jens Müller 2 siblings, 1 reply; 37+ messages in thread From: Oliver Scholz @ 2003-08-16 19:18 UTC (permalink / raw) Simon Josefsson <jas@extundo.com> writes: > Oliver Scholz <alkibiades@gmx.de> writes: [Klingon and Tengwar in Unicode] > Wasn't the Klingon proposal for Unicode rejected? Tengwar has been a > proposal for ten years, or so, and nothing has happend, as far as I > know. I have no idea. I was just looking for exotic examples and these two were the second and third ones that came to my mind. [...] > UTF-16? It's not even a well define encoding scheme, two files may > contain the exact same Unicode code points, but may differ in a binary > comparison, due to byte ordering. That's what the byte order mark is for. > And concatenating two UTF-16 strings from different sources requires > knowledge about the encoding. And surrogate pairs complicate matters > as well. Why do you think that surrogate pairs complicate matters? There can't be any confusion whether an arbitrary 16 bit value is part a surrogate pair or not; and if it is, whether it is the higher surrogate or the lower one. As for concatenating I'd say this depends on whether the tools are able to deal with it. But I do have to admit that I have zero experience with UTF-16. I don't know how good it is in daily use. I use only UTF-8. I mentioned UTF-16 only because I am told that it is important in some areas (Java, MS Windows, XML ...). Oliver -- 29 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 19:18 ` Oliver Scholz @ 2003-08-16 22:24 ` Simon Josefsson 2003-08-17 12:30 ` Benjamin Riefenstahl 2003-08-18 2:16 ` James H. Cloos Jr. 0 siblings, 2 replies; 37+ messages in thread From: Simon Josefsson @ 2003-08-16 22:24 UTC (permalink / raw) Cc: ding Oliver Scholz <alkibiades@gmx.de> writes: > [...] >> UTF-16? It's not even a well define encoding scheme, two files may >> contain the exact same Unicode code points, but may differ in a binary >> comparison, due to byte ordering. > > That's what the byte order mark is for. But it doesn't solve the problem. 'cmp' still says the files are different. UTF-8 had a similar problem (overlong encodings) but that has been fixed, UTF-16 and UTF-32 can't be. >> And concatenating two UTF-16 strings from different sources requires >> knowledge about the encoding. And surrogate pairs complicate matters >> as well. > > Why do you think that surrogate pairs complicate matters? There can't > be any confusion whether an arbitrary 16 bit value is part a surrogate > pair or not; and if it is, whether it is the higher surrogate or the > lower one. One way to realize it is to compare UTF-16 with either UTF-8 or UTF-32. The surrogate pair construction make UTF-16 contain the disadvantage of both UTF-8 and UTF-32, but none of their advantage. The disadvantage with UTF-8 is that you don't know where a code value ends within the encoded data without knowledge of UTF-8, and the disadvantage with UTF-32 is that it wastes space since most data fit in 16 bits or less. If normal computers was 16 bit, I could understand the trade-off, but with 32 bit (or more) machines you can remove one of the disadvantages by choosing either UTF-8 or UTF-32 instead of UTF-16. > As for concatenating I'd say this depends on whether the tools are > able to deal with it. Right, and many tools assume that if you receive two binary blobs A and B which are said to contain text, you can form the concatenation of the text by concatenating the binary blobs as A||B. This is a reasonable assumption, and it works for most encoding schemes, including UTF-8. It doesn't work for UTF-16 or UTF-32. My preference is to use UTF-8 when data is stored or transfered, and only use UTF-32 internally because applications may need to compare data against Unicode code points. If I must use Unicode at all, that is. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 22:24 ` Simon Josefsson @ 2003-08-17 12:30 ` Benjamin Riefenstahl 2003-08-17 16:40 ` Oliver Scholz 2003-08-18 2:16 ` James H. Cloos Jr. 1 sibling, 1 reply; 37+ messages in thread From: Benjamin Riefenstahl @ 2003-08-17 12:30 UTC (permalink / raw) Hi Simon, Just two additional thoughts, I agree with most of what you said otherwise. Simon Josefsson <jas@extundo.com> writes: > But it doesn't solve the problem. 'cmp' still says the files are > different. UTF-8 had a similar problem (overlong encodings) but > that has been fixed, UTF-16 and UTF-32 can't be. Actually UTF-8 still has that problem with composed vs. decomposed characters. There is no perfect system AFAIK. > If normal computers was 16 bit, I could understand the trade-off, Depends of what you call "normal computers." MS Windows and Apple's Mac OS X both use UTF-16 for APIs and internal implmentation. benny ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-17 12:30 ` Benjamin Riefenstahl @ 2003-08-17 16:40 ` Oliver Scholz 2003-08-18 2:20 ` James H. Cloos Jr. 2003-08-18 15:58 ` Benjamin Riefenstahl 0 siblings, 2 replies; 37+ messages in thread From: Oliver Scholz @ 2003-08-17 16:40 UTC (permalink / raw) Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes: [...] > Simon Josefsson <jas@extundo.com> writes: >> But it doesn't solve the problem. 'cmp' still says the files are >> different. UTF-8 had a similar problem (overlong encodings) but >> that has been fixed, UTF-16 and UTF-32 can't be. > > Actually UTF-8 still has that problem with composed vs. decomposed > characters. There is no perfect system AFAIK. Just to be sure that I understand you correctly: Do you refer to the fact here that a character like, say, U+00E9 (LATIN SMALL LETTER E WITH ACUTE) is equivalent to U+0065 followed by U+0301 (LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT)? >> If normal computers was 16 bit, I could understand the trade-off, > > Depends of what you call "normal computers." MS Windows and Apple's > Mac OS X both use UTF-16 for APIs and internal implmentation. [...] I am not sure, but I think that the characters that need to be accessed via surrogate pairs are meant to be rare, since they are outside of the BMP. So AFAIK UTF-16 is meant as a space-efficient format for East Asian text. But as I said: this is outside the scope of things with which I have normally to deal with. Oliver -- 30 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-17 16:40 ` Oliver Scholz @ 2003-08-18 2:20 ` James H. Cloos Jr. 2003-08-18 15:58 ` Benjamin Riefenstahl 1 sibling, 0 replies; 37+ messages in thread From: James H. Cloos Jr. @ 2003-08-18 2:20 UTC (permalink / raw) >>>>> "os" == Oliver Scholz <alkibiades@gmx.de> writes: os> So AFAIK UTF-16 is meant as a space-efficient os> format for East Asian text. Actually utf16 is meant to be backwards compatable with the earlier adopters of unicode -- back when it was a 16 bit standard -- who were using ucs2. That it also tends to use fewer bits for most of the CJK characters is a later issue, AIUI. -JimC ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-17 16:40 ` Oliver Scholz 2003-08-18 2:20 ` James H. Cloos Jr. @ 2003-08-18 15:58 ` Benjamin Riefenstahl 1 sibling, 0 replies; 37+ messages in thread From: Benjamin Riefenstahl @ 2003-08-18 15:58 UTC (permalink / raw) Hi Oliver, >> Simon Josefsson <jas@extundo.com> writes: >>> 'cmp' still says the files are different. > Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes: >> Actually UTF-8 still has that problem with composed vs. decomposed >> characters. There is no perfect system AFAIK. Oliver Scholz <alkibiades@gmx.de> writes: > Do you refer to the fact here that a character like, say, U+00E9 > (LATIN SMALL LETTER E WITH ACUTE) is equivalent to U+0065 followed > by U+0301 (LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT)? Yes. > So AFAIK UTF-16 is meant as a space-efficient format for East Asian > text. That and compatibility. The first Unicode versions talked much about the 16-bit representation and the most wide-spread users (Windows NT, COM, VFAT, HFS+) implemented it like that. benny ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 22:24 ` Simon Josefsson 2003-08-17 12:30 ` Benjamin Riefenstahl @ 2003-08-18 2:16 ` James H. Cloos Jr. 1 sibling, 0 replies; 37+ messages in thread From: James H. Cloos Jr. @ 2003-08-18 2:16 UTC (permalink / raw) >>>>> "Simon" == Simon Josefsson <jas@extundo.com> writes: Simon> The disadvantage with UTF-8 is that you don't know where a code Simon> value ends within the encoded data without knowledge of UTF-8, [ed's note: this should be taken as an extension of Simon's point, not a counter-argument. It seemed ambiguous w/o a disclaimer.... -JimC] That isn't really a disadvantage, since you need knowledge of unicode itself anyway: not every unit fits in a single code point. Combining characters, variation selectors, et al all mean that even with utf32 there is no guarentee that you can split at any given int32, hense the fact that utf8 cannot be split at any given int8 is irrelevant. -JimC ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 17:23 ` Simon Josefsson 2003-08-16 19:18 ` Oliver Scholz @ 2003-08-18 2:09 ` James H. Cloos Jr. 2003-08-28 13:38 ` Jens Müller 2003-08-28 13:35 ` Jens Müller 2 siblings, 1 reply; 37+ messages in thread From: James H. Cloos Jr. @ 2003-08-18 2:09 UTC (permalink / raw) >>>>> "Simon" == Simon Josefsson <jas@extundo.com> writes: Simon> Wasn't the Klingon proposal for Unicode rejected? Tengwar has Simon> been a proposal for ten years, or so, and nothing has happend, Simon> as far as I know. Klingon was rejected because it is a made-up script/language. Tengwar is unlikely to be accepted for the same reason. That said, Tengwar has been singled out by some as a great script to use as the basis for a document describing how to properly support complex scripts. (It is probably about as complex to render as arabic, urdu, etc. At least based on recent comments on the unicode list.) -JimC ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-18 2:09 ` James H. Cloos Jr. @ 2003-08-28 13:38 ` Jens Müller 0 siblings, 0 replies; 37+ messages in thread From: Jens Müller @ 2003-08-28 13:38 UTC (permalink / raw) "James H. Cloos Jr." <cloos@jhcloos.com> writes: > Klingon was rejected because it is a made-up script/language. > Tengwar is unlikely to be accepted for the same reason. And why does the roadmap then talk about scripts for artificial languages? No, that was probably not the reason. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 17:23 ` Simon Josefsson 2003-08-16 19:18 ` Oliver Scholz 2003-08-18 2:09 ` James H. Cloos Jr. @ 2003-08-28 13:35 ` Jens Müller 2 siblings, 0 replies; 37+ messages in thread From: Jens Müller @ 2003-08-28 13:35 UTC (permalink / raw) Simon Josefsson <jas@extundo.com> writes: > Wasn't the Klingon proposal for Unicode rejected? Yepp. Not suitable for encoding. The current Klingon characters are just other presentation forms for letters from the Latin alphabet. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 15:36 ` Oliver Scholz 2003-08-16 17:14 ` Reiner Steib 2003-08-16 17:23 ` Simon Josefsson @ 2003-08-17 0:57 ` Jesper Harder 2003-08-17 17:24 ` Oliver Scholz 2 siblings, 1 reply; 37+ messages in thread From: Jesper Harder @ 2003-08-17 0:57 UTC (permalink / raw) Oliver Scholz <alkibiades@gmx.de> writes: > Jesper Harder <harder@myrealbox.com> writes: > >> But the purpose of email is to _communicate_. Why lower you chance >> of cummunicating if there is no compelling technical reason to do >> so? > > Now and then I realize in German Usenet, that a few people who post > replies to my articles can not deal with UTF-8, because when they > quote the text I wrote, I see funny characters instead of umlauts. > This is not a big impediment to communication. It is a big impediment, believe me. A long time ago I used read Usenet by TELNETTing from a Norsk Data terminal to an overloaded Ultrix box. Needless to say this setup could not display any 8bit characters (the eight bit was stripped). Reading Danish was so annoying that I didn't use dk.* for many years. Also remember that not everyone can say "Okay, I'll just upgrade to something Unicode-capable". If you're using a shared system you probably don't have the power to decide that. > If we don't move to Unicode in the future then coding system > problems will go on forever and ever. It would be foolish not to use Unicode for any _new_ protocols or formats. But for legacy systems like email and Usenet backward compatibility is really, really important. If you look at how e.g. MIME or format=flowed was designed, you'll see that a lot of effort and thought was spent on minimizing negative effects for existing clients. You need an especially good excuse to break existing stuff. The fact that Unicode is a technically more pleasing solution just isn't a good enough reason to break things unnecessarily, IMHO. But if you're doing something that wasn't possible before, say, using German and Thai in the same message, that's a valid reason to use Unicode. > My guess -- by the way -- is that Unicode will become increasingly > important in Europe, especially for the members of the EU. We'd need > at least Latin-1/Latin-9, Latin-2 and Greek (ISO 8859-7). And I am not > sure if that already covers Latvian, Romanian and others. There will > be a growing need for an encoding that covers all of these languages. I think most Western European users don't care about and don't know how to access any glyph that isn't printed on the keyboard. >> 2. Unicode support itself doesn't really buy me a lot if most people >> don't have fairly complete Unicode fonts (which they don't). > > So the worst thing that could happen is that they see a hollow box now > and then. An empty box can be bad enough. If you're writing an equation it can be really important what that empty box happens to be ☺ I experienced that problem recently when I used ℏ in a message. > And yet some characters are more frequent than others. You can > probably rely on the fact that western Europeans have fonts that > contain the Latin-1 repertoire. Box drawing characters or symbols > may not be that frequent, but there is a good chance to get the > additional punctuation characters. In practice the only thing you can reasonably expect are the 650 glyphs in WGL4.¹ ¹ http://partners.adobe.com/asn/tech/type/opentype/appendices/wgl4.jsp ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-17 0:57 ` Jesper Harder @ 2003-08-17 17:24 ` Oliver Scholz 2003-08-17 18:21 ` Matthias Andree 0 siblings, 1 reply; 37+ messages in thread From: Oliver Scholz @ 2003-08-17 17:24 UTC (permalink / raw) Jesper Harder <harder@myrealbox.com> writes: [...] > It would be foolish not to use Unicode for any _new_ protocols or > formats. But for legacy systems like email and Usenet backward > compatibility is really, really important. If you look at how > e.g. MIME or format=flowed was designed, you'll see that a lot of > effort and thought was spent on minimizing negative effects for > existing clients. > > You need an especially good excuse to break existing stuff. The fact > that Unicode is a technically more pleasing solution just isn't a good > enough reason to break things unnecessarily, IMHO. [...] I have to admit that this is a very strong argument. It could probably convince me, if the situation in Usenet were not already such a mess. I agree that it is sometimes a good thing to preserve a current working state in order to maximize compatibility. But sometimes it is a good thing to dare a reform. Which is the case for Usenet is probably a matter of estimation. I think I have stated most of my arguments. At least I shouldn't smile upon people anymore who use plain ASCII in the de.* hierarchy. One could probably rather convince me to use ASCII than to use, say, Latin-9. > > My guess -- by the way -- is that Unicode will become increasingly > > important in Europe, especially for the members of the EU. We'd need > > at least Latin-1/Latin-9, Latin-2 and Greek (ISO 8859-7). And I am not > > sure if that already covers Latvian, Romanian and others. There will > > be a growing need for an encoding that covers all of these languages. > I think most Western European users don't care about and don't know > how to access any glyph that isn't printed on the keyboard. My guess is that the usage of UTF-8 in Europe will start in business e-eail and spread from there. But maybe this is not my actual point. It's rather that I want it to be easy to mix different languages freely. Why shouldn't a Pole or a Chinese posting a German message to the de.* hierarchy sign with his or her Chinese of Polish name? Why not a Greek verse in the signature? Or an Arabian proverb? Many people would't do it, unless they can be sure that it wouldn't garble their umlauts for some people, however small or great their number may be. This is decent, but I also find it suboptimal. It won't change, until UTF-8 becomes the default. Oliver -- 30 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-17 17:24 ` Oliver Scholz @ 2003-08-17 18:21 ` Matthias Andree 0 siblings, 0 replies; 37+ messages in thread From: Matthias Andree @ 2003-08-17 18:21 UTC (permalink / raw) Oliver Scholz <alkibiades@gmx.de> writes: > I have to admit that this is a very strong argument. It could > probably convince me, if the situation in Usenet were not already > such a mess. I agree that it is sometimes a good thing to preserve a > current working state in order to maximize compatibility. But > sometimes it is a good thing to dare a reform. Which is the case for > Usenet is probably a matter of estimation. I think I have stated most > of my arguments. RFC violations in Usenet are commonplace in Northern Europe and Germany anyways. dk.* and no.* users complained when leafnode didn't accept their unencoded 8-bit headers. (At least, there's no newsgroup such as no.østfold - which Arnt Gulbrandsen, original author of leafnode, was concerned about.) -- Matthias Andree ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-14 23:01 ` Jesper Harder 2003-08-15 13:50 ` Oliver Scholz @ 2003-08-15 18:24 ` Xavier Maillard 2003-08-16 0:35 ` Jesper Harder 1 sibling, 1 reply; 37+ messages in thread From: Xavier Maillard @ 2003-08-15 18:24 UTC (permalink / raw) [-- Attachment #1: Type: text/plain, Size: 1706 bytes --] Jesper Harder <harder@myrealbox.com> writes: > Xavier Maillard <zedek@gnu-rox.org> writes: > > > I know Emacs is able to use utf-8 encoding so Gnus is. > > > > My question is more a question of compliance with other MUAs. > > Would you recommend your users to use utf-8 as a default encoding > > system ? > > No, because there's no reason to use UTF-8 if a more widely supported > charset is sufficient. Ok for that. So what would be the default charset to recommend to people ? Why the hell was utf-8 invented so ? > To use UTF-8 by default would also be against RFC 2046: > > ,----[ RFC 2046, Section 4.1.2. ] > | > | In general, composition software should always use the "lowest > | common denominator" character set possible. For example, if a > | body contains only US-ASCII characters, it SHOULD be marked as > | being in the US- ASCII character set, not ISO-8859-1, which, > | like all the ISO-8859 family of character sets, is a superset of > | US-ASCII. More generally, if a widely-used character set is a > | subset of another character set, and a body contains only > | characters in the widely-used subset, it should be labelled as > | being in that subset. This will increase the chances that the > | recipient will be able to view the resulting entity correctly. > `---- > But if the message contains characters (or combination of characters) > where a _single_ iso-8859-x charset can't be used, then by all means > use UTF-8. This is far better than sending a multipart message > (which Gnus does if UTF-8 isn't available). Thanx for the hint. zeDek -- "Just did it." [-- Attachment #2: Type: application/pgp-signature, Size: 188 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-15 18:24 ` Xavier Maillard @ 2003-08-16 0:35 ` Jesper Harder 0 siblings, 0 replies; 37+ messages in thread From: Jesper Harder @ 2003-08-16 0:35 UTC (permalink / raw) Xavier Maillard <zedek@gnu-rox.org> writes: > Jesper Harder <harder@myrealbox.com> writes: > >> Xavier Maillard <zedek@gnu-rox.org> writes: >> >> > Would you recommend your users to use utf-8 as a default >> > encoding system ? >> >> No, because there's no reason to use UTF-8 if a more widely supported >> charset is sufficient. > > Ok for that. So what would be the default charset to recommend to > people ? I would just leave the default setting in Gnus as it is. > Why the hell was utf-8 invented so ? To extend the repertoire of available glyphs and allow people to mix glyphs from different scripts. UTF-8 is excellent if you need it. But most people usually don't need to mix Vietnamese and Thai words, write hieroglyphics or runes and so on. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-14 15:48 Gnus: UTF-8 and compatibility with other MUAs Xavier Maillard 2003-08-14 22:39 ` Frank Schmitt 2003-08-14 23:01 ` Jesper Harder @ 2003-08-14 23:05 ` Simon Josefsson 2003-08-15 17:00 ` Oliver Scholz 2 siblings, 1 reply; 37+ messages in thread From: Simon Josefsson @ 2003-08-14 23:05 UTC (permalink / raw) Cc: ding Xavier Maillard <zedek@gnu-rox.org> writes: > Hi, > > I know Emacs is able to use utf-8 encoding so Gnus is. > > My question is more a question of compliance with other MUAs. > Would you recommend your users to use utf-8 as a default encoding > system ? AFAIK, I can't see many MUAs aware of it and worst almost > nobody is using utf-8 which was presented as the future. So what is the > problem with utf in general that prevent users in general to use it > defaultly ? IMHO: Users should use the oldest charset widely deployed, or preferred, in their own geographic region that is able to encode what they write. This means if a user write only ASCII, it is tagged as ASCII (or rather not tagged at all). And if a (northern?) European user write å it should use iso-8859-1. And if a european user write Ελληνικά it should use iso-8859-7. And if a european user write € it should use iso-8859-15. (One could argue that iso-8859-15 is too recent and that it may make sense to go directly to UTF-8, but my experience, as a northern european user, is that iso-8859-15 is more appropriate, since the almost-compatibility with iso-8859-1 is friendlier for people with old software.) And if a european user write € and ά it should use UTF-8. (I'm assuming no 8859-* can encode both € and ά.) This also means that it is wrong to use JP-2022-2, for european users, even though it technically may be able to encode some strings, that contain characters from 8859-* that isn't available in any single 8859-*. Instead they should go to UTF-8. I think this is how Gnus works though, unless you are in a UTF-8 locale and uses an old Emacs (then I think it will skip the 8859-* step, but I might be wrong). This logic might be flawed if the receiver is in another geographic region, of if a user mostly communicate internationally. Still, I'd probably use the above logic even if I sent something to a Japanese user, and expect them to use JP-2022-2 (or whatever) in return. Perhaps some day we can try ASCII first, then fall back to UTF-8. But that will take a long time. Even moving to ISO-8859-1 in northern Europe took a long time, and still isn't finished. I still use IBMPC2 (CP437?) in some regional communication channels. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-14 23:05 ` Simon Josefsson @ 2003-08-15 17:00 ` Oliver Scholz 2003-08-16 7:43 ` Ivan Boldyrev 2003-08-18 6:01 ` Steinar Bang 0 siblings, 2 replies; 37+ messages in thread From: Oliver Scholz @ 2003-08-15 17:00 UTC (permalink / raw) Simon Josefsson <jas@extundo.com> writes: > Xavier Maillard <zedek@gnu-rox.org> writes: [...] >> My question is more a question of compliance with other MUAs. >> Would you recommend your users to use utf-8 as a default encoding >> system ? AFAIK, I can't see many MUAs aware of it and worst almost >> nobody is using utf-8 which was presented as the future. So what is the >> problem with utf in general that prevent users in general to use it >> defaultly ? I have been using UTF-8 as a default in Mails&News for over a year now. It is sometimes problematic, but even if the MUA on the other end does not cope with UTF-8 it never makes my (western european) text entirely unreadable. Sure, that is still not nice. But like Frank I see it as an chicken-and-egg problem: I decided once that I was going to promote UTF-8 by using it. I realized then that virtually none of my non-technical-oriented friends had any problems with UTF-8, since they use programs like Outlook, Mozilla or some obscure Macintosh MUA, whose name I have forgot. The only major group of people who have problems with UTF-8 are computer-literates. This seems weird to me. I wouldn't use UTF-8 if it were the other way around. I don't think that this sort of UTF-8 radicalism is the right thing for everyone. Simon's suggestions demonstrate nicely the tower-of-babel situation resulting from the current flood of coding systems, but I have to admit that they also indicate the most sensible way to deal with those things, if you want to maximize the chance that your text is flawlessly readable at the other end. But I do think that *some* people should start to use UTF-8 as a default. [...] > And if a european user write € it should use iso-8859-15. (One could > argue that iso-8859-15 is too recent and that it may make sense to go > directly to UTF-8, but my experience, as a northern european user, is > that iso-8859-15 is more appropriate, since the almost-compatibility > with iso-8859-1 is friendlier for people with old software.) This seems to make sense. But how good is it working in your experience? I seems possible to me that a non-Latin-9-aware MUA/NUA could try to display a message with an iso8859-15 charset header as ascii, so that not even Latin-1-compatible chars would be displayed correctly. Does that happen with some MUAs? However, I tend to think that it's better to write EUR instead of €, if you want to avoid UTF-8. [...] > Perhaps some day we can try ASCII first, then fall back to UTF-8. But > that will take a long time. Even moving to ISO-8859-1 in northern > Europe took a long time, and still isn't finished. I still use IBMPC2 > (CP437?) in some regional communication channels. [...] I think it is in general a good idea to choose the encoding according to the audience. Fortunately this is not hard with Gnus. There are some people to which I send my mail in Latin-1. Oliver -- 28 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-15 17:00 ` Oliver Scholz @ 2003-08-16 7:43 ` Ivan Boldyrev 2003-08-17 17:27 ` Oliver Scholz 2003-08-18 6:01 ` Steinar Bang 1 sibling, 1 reply; 37+ messages in thread From: Ivan Boldyrev @ 2003-08-16 7:43 UTC (permalink / raw) On 8472 day of my life Oliver Scholz wrote: >> Perhaps some day we can try ASCII first, then fall back to UTF-8. But >> that will take a long time. Even moving to ISO-8859-1 in northern >> Europe took a long time, and still isn't finished. I still use IBMPC2 >> (CP437?) in some regional communication channels. > [...] > > I think it is in general a good idea to choose the encoding according > to the audience. Fortunately this is not hard with Gnus. There are > some people to which I send my mail in Latin-1. Do you use special group for them or do something more tricky? -- Ivan Boldyrev Onions has layers. Unix has layers too. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-16 7:43 ` Ivan Boldyrev @ 2003-08-17 17:27 ` Oliver Scholz 0 siblings, 0 replies; 37+ messages in thread From: Oliver Scholz @ 2003-08-17 17:27 UTC (permalink / raw) Ivan Boldyrev <boldyrev+nospam@cgitftp.uiggm.nsc.ru> writes: > On 8472 day of my life Oliver Scholz wrote: >>> Perhaps some day we can try ASCII first, then fall back to UTF-8. But >>> that will take a long time. Even moving to ISO-8859-1 in northern >>> Europe took a long time, and still isn't finished. I still use IBMPC2 >>> (CP437?) in some regional communication channels. >> [...] >> >> I think it is in general a good idea to choose the encoding according >> to the audience. Fortunately this is not hard with Gnus. There are >> some people to which I send my mail in Latin-1. > > Do you use special group for them or do something more tricky? I started to use the BBDB recently and I added a special property for people that should receive mail in an encoding other than UTF-8, like: egoge-encoding: latin-1 Before that I kept their Email-addresses in a list, the mechanics are similar then. But I am not sure whether a defadvice around `message-send-and-exit' is the best way to do this. (defadvice message-send-and-exit (around egoge-latin-1-friendly activate) "Query the BBDB for a preferred encoding for this message." (let* ((address (message-fetch-field "to")) (encoding (and address (egoge-bbdb-get-prop (cadr (gnus-extract-address-components address)) 'egoge-encoding)))) (if (not encoding) ad-do-it (let ((mm-coding-system-priorities (cons (intern encoding) mm-coding-system-priorities))) ad-do-it)))) (defun egoge-bbdb-get-prop (address property) (let ((record (car (egoge-bbdb-find-address address)))) (and record (bbdb-record-getprop record property)))) (defun egoge-bbdb-find-address (address) "Return BBDB records which contain ADDRESS as net-address. Return nil if there is no such record." (bbdb-search (bbdb-records t) nil nil address)) Oliver -- 30 Thermidor an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Gnus: UTF-8 and compatibility with other MUAs 2003-08-15 17:00 ` Oliver Scholz 2003-08-16 7:43 ` Ivan Boldyrev @ 2003-08-18 6:01 ` Steinar Bang 1 sibling, 0 replies; 37+ messages in thread From: Steinar Bang @ 2003-08-18 6:01 UTC (permalink / raw) >>>>> Oliver Scholz <alkibiades@gmx.de>: [snip!] > But I do think that *some* people should start to use UTF-8 as a > default. Power to you then. But I suspect you will get many similar responses to those I got ten years ago, when I started using quoted-unreadable in email (hey! it was the standard...! :-) ) ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2003-08-28 13:38 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-08-14 15:48 Gnus: UTF-8 and compatibility with other MUAs Xavier Maillard 2003-08-14 22:39 ` Frank Schmitt 2003-08-15 18:22 ` Xavier Maillard 2003-08-14 23:01 ` Jesper Harder 2003-08-15 13:50 ` Oliver Scholz 2003-08-15 16:48 ` Jesper Harder 2003-08-15 18:10 ` Oliver Scholz 2003-08-16 0:23 ` Jesper Harder 2003-08-16 9:48 ` Oliver Scholz 2003-08-16 13:01 ` Jesper Harder 2003-08-16 15:36 ` Oliver Scholz 2003-08-16 17:14 ` Reiner Steib 2003-08-16 19:29 ` Oliver Scholz 2003-08-19 14:54 ` Miles Bader 2003-08-20 15:24 ` Reiner Steib 2003-08-21 0:20 ` Miles Bader 2003-08-16 17:23 ` Simon Josefsson 2003-08-16 19:18 ` Oliver Scholz 2003-08-16 22:24 ` Simon Josefsson 2003-08-17 12:30 ` Benjamin Riefenstahl 2003-08-17 16:40 ` Oliver Scholz 2003-08-18 2:20 ` James H. Cloos Jr. 2003-08-18 15:58 ` Benjamin Riefenstahl 2003-08-18 2:16 ` James H. Cloos Jr. 2003-08-18 2:09 ` James H. Cloos Jr. 2003-08-28 13:38 ` Jens Müller 2003-08-28 13:35 ` Jens Müller 2003-08-17 0:57 ` Jesper Harder 2003-08-17 17:24 ` Oliver Scholz 2003-08-17 18:21 ` Matthias Andree 2003-08-15 18:24 ` Xavier Maillard 2003-08-16 0:35 ` Jesper Harder 2003-08-14 23:05 ` Simon Josefsson 2003-08-15 17:00 ` Oliver Scholz 2003-08-16 7:43 ` Ivan Boldyrev 2003-08-17 17:27 ` Oliver Scholz 2003-08-18 6:01 ` Steinar Bang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).