* quoted-printable no-go in news?
@ 2003-04-07 19:43 Matthias Andree
2003-04-07 22:29 ` Randal L. Schwartz
2003-04-08 11:48 ` Jesper Harder
0 siblings, 2 replies; 9+ messages in thread
From: Matthias Andree @ 2003-04-07 19:43 UTC (permalink / raw)
Hi,
Gnus displays wide hollow boxes instead of the umlauts in Message-ID:
<etvq8vonajs2p1llpmfpnl8d0rehmsdu5u@4ax.com> (de.rec.fotografie) -- it
is declared iso-8859-1 and quoted-printable; but Gnus nicely and
correctly displays "\200" for a b0rken =80. Later, when following up to
the post, Gnus complains about unprintable characters and offers to
replace, remove, ignore, ... them. It works in binary-encoded articles.
That's wrong with that article? I tried CVS Gnus with Emacs 21.2 and
21.3. Admittedly, I have ignored some warnings issued by Reiner Steib
because everything else seemed to work just fine.
--
Matthias Andree
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: quoted-printable no-go in news?
2003-04-07 19:43 quoted-printable no-go in news? Matthias Andree
@ 2003-04-07 22:29 ` Randal L. Schwartz
2003-04-08 5:59 ` Graham Murray
2003-04-10 17:44 ` Matthias Andree
2003-04-08 11:48 ` Jesper Harder
1 sibling, 2 replies; 9+ messages in thread
From: Randal L. Schwartz @ 2003-04-07 22:29 UTC (permalink / raw)
Cc: ding
>>>>> "Matthias" == Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:
Matthias> Gnus displays wide hollow boxes instead of the umlauts in
Matthias> Message-ID: <etvq8vonajs2p1llpmfpnl8d0rehmsdu5u@4ax.com>
Matthias> (de.rec.fotografie) -- it is declared iso-8859-1 and
Matthias> quoted-printable; but Gnus nicely and correctly displays
Matthias> "\200" for a b0rken =80. Later, when following up to the
Matthias> post, Gnus complains about unprintable characters and offers
Matthias> to replace, remove, ignore, ... them. It works in
Matthias> binary-encoded articles.
The only acceptable "encoding" for Usenet at large in the text
newsgroups that I'm aware of is plain 7-bit ASCII, although I imagine
iso-8859-1 is probably also acceptable. Quoted-printable, definitely
not.
However, maybe de.* has a different rule.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: quoted-printable no-go in news?
2003-04-07 22:29 ` Randal L. Schwartz
@ 2003-04-08 5:59 ` Graham Murray
2003-04-10 17:44 ` Matthias Andree
1 sibling, 0 replies; 9+ messages in thread
From: Graham Murray @ 2003-04-08 5:59 UTC (permalink / raw)
merlyn@stonehenge.com (Randal L. Schwartz) writes:
> The only acceptable "encoding" for Usenet at large in the text
> newsgroups that I'm aware of is plain 7-bit ASCII, although I imagine
> iso-8859-1 is probably also acceptable. Quoted-printable, definitely
> not.
Though has USEFOR not suggested/recommended that UTF-8 be the default
encoding for text newsgroups. So that current 7-bit ASCII remains
unchanged but also allow for accents, Greek, Cyrillic, Kanji, etc all
within the one encoding.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: quoted-printable no-go in news?
2003-04-07 19:43 quoted-printable no-go in news? Matthias Andree
2003-04-07 22:29 ` Randal L. Schwartz
@ 2003-04-08 11:48 ` Jesper Harder
2003-04-10 19:06 ` Matthias Andree
1 sibling, 1 reply; 9+ messages in thread
From: Jesper Harder @ 2003-04-08 11:48 UTC (permalink / raw)
Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:
> Gnus displays wide hollow boxes instead of the umlauts in Message-ID:
> <etvq8vonajs2p1llpmfpnl8d0rehmsdu5u@4ax.com> (de.rec.fotografie) -- it
> is declared iso-8859-1 and quoted-printable; but Gnus nicely and
> correctly displays "\200" for a b0rken =80.
It's not related to QP -- the same would happen if it wasn't QP-encoded.
The real problem is that the article is not in iso-8859-1 as declared,
but in Windows-1252.
FWIW, the umlauts are displayed correctly for me. That's because I'm
using a Latin-1 locale, while you're probably using UTF-8.
Here's what happens:
* Gnus detects that the message can't possibly be encoded in Latin-1 as
advertised in the header (because \200 is not a valid character in
Latin-1).
* It then uses Emacs' charset detection functions to determine the
charset. By default, Emacs doesn't know about windows-1252, so it
isn't detected correctly, and the text is decoded as something else.
* If you're using a Latin-1 locale, then this happens to be displayed
correctly.
The real way to solve it is to teach Emacs about windows-1252. That's
possible with `code-pages.el' in CVS Emacs. With this package loaded
and configured, the €'s are displayed correctly and converted to a
proper charset when you reply.
If you're not using code-pages.el, then it's probably better to use the
declared charset unconditionally, since the most common case of wrong
charset declaration is presumeably windows-125x vs. iso-8859-x.
I've changed the code to skip the auto detection if a charset was
declared in the headers and `code-pages' hasn't been loaded.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: quoted-printable no-go in news?
2003-04-07 22:29 ` Randal L. Schwartz
2003-04-08 5:59 ` Graham Murray
@ 2003-04-10 17:44 ` Matthias Andree
1 sibling, 0 replies; 9+ messages in thread
From: Matthias Andree @ 2003-04-10 17:44 UTC (permalink / raw)
Cc: Matthias Andree, ding
merlyn@stonehenge.com (Randal L. Schwartz) writes:
>>>>>> "Matthias" == Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:
>
> Matthias> Gnus displays wide hollow boxes instead of the umlauts in
> Matthias> Message-ID: <etvq8vonajs2p1llpmfpnl8d0rehmsdu5u@4ax.com>
> Matthias> (de.rec.fotografie) -- it is declared iso-8859-1 and
> Matthias> quoted-printable; but Gnus nicely and correctly displays
> Matthias> "\200" for a b0rken =80. Later, when following up to the
> Matthias> post, Gnus complains about unprintable characters and offers
> Matthias> to replace, remove, ignore, ... them. It works in
> Matthias> binary-encoded articles.
>
> The only acceptable "encoding" for Usenet at large in the text
> newsgroups that I'm aware of is plain 7-bit ASCII, although I imagine
> iso-8859-1 is probably also acceptable. Quoted-printable, definitely
> not.
You're confusing encoding and character set.
de.* is fine with iso-8859-1, -2, -15 or windows-1252, and few will
complain about utf-8, some more about utf-7. People do complain if you
don't declare your character set properly (most common mistake is using
the € -- EUR -- symbol at position \200 and declaring iso-something when
it should've been windows-1252).
quoted-printable is usually silently accepted in de.* -- unless someone
with Outlook (Express)? sends a follow-up: Outlook* junk catapults goof
up the quoting and don't indent properly.
Having said that, why is quoted-printable + iso-8859-1 different than
8bit + iso-8859-1 -- or, in other words: why am I getting those 4
character wide boxes although the posting is technically correct?
--
Matthias Andree
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: quoted-printable no-go in news?
2003-04-08 11:48 ` Jesper Harder
@ 2003-04-10 19:06 ` Matthias Andree
2003-04-10 19:39 ` Matthias Andree
2003-04-10 20:26 ` Jesper Harder
0 siblings, 2 replies; 9+ messages in thread
From: Matthias Andree @ 2003-04-10 19:06 UTC (permalink / raw)
Jesper Harder <harder@myrealbox.com> writes:
> FWIW, the umlauts are displayed correctly for me. That's because I'm
> using a Latin-1 locale, while you're probably using UTF-8.
My locale is
LANG=de_DE@euro
Which looks pretty much like Latin-9 (iso-8859-15).
> Here's what happens:
>
> * Gnus detects that the message can't possibly be encoded in Latin-1 as
> advertised in the header (because \200 is not a valid character in
> Latin-1).
This would also apply to Latin-9.
> * It then uses Emacs' charset detection functions to determine the
> charset. By default, Emacs doesn't know about windows-1252, so it
> isn't detected correctly, and the text is decoded as something else.
It must have guessed some other charset that doesn't map \200 then. How
do I ask it which character set it thinks it is again?
> The real way to solve it is to teach Emacs about windows-1252. That's
> possible with `code-pages.el' in CVS Emacs. With this package loaded
> and configured, the €'s are displayed correctly and converted to a
> proper charset when you reply.
can I just steal that from CVS and stuff it into my 21.3 emacs? Or
should I go for CVS? (BTW, is CVS emacs fast enough? 21.2/21.3 are _way_
slower than 21.1 was).
> If you're not using code-pages.el, then it's probably better to use the
> declared charset unconditionally, since the most common case of wrong
> charset declaration is presumeably windows-125x vs. iso-8859-x.
>
> I've changed the code to skip the auto detection if a charset was
> declared in the headers and `code-pages' hasn't been loaded.
Thanks.
--
Matthias Andree
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: quoted-printable no-go in news?
2003-04-10 19:06 ` Matthias Andree
@ 2003-04-10 19:39 ` Matthias Andree
2003-04-10 23:25 ` Reiner Steib
2003-04-10 20:26 ` Jesper Harder
1 sibling, 1 reply; 9+ messages in thread
From: Matthias Andree @ 2003-04-10 19:39 UTC (permalink / raw)
Cc: ding
Following up to myself (2nd quote level is Jesper Harder):
>> The real way to solve it is to teach Emacs about windows-1252. That's
>> possible with `code-pages.el' in CVS Emacs. With this package loaded
>> and configured, the €'s are displayed correctly and converted to a
>> proper charset when you reply.
>
> can I just steal that from CVS and stuff it into my 21.3 emacs? Or
> should I go for CVS? (BTW, is CVS emacs fast enough? 21.2/21.3 are _way_
> slower than 21.1 was).
Hum, stealing from CVS doesn't work out, it complains about the feature
mule-diag not being provided.
>> If you're not using code-pages.el, then it's probably better to use the
>> declared charset unconditionally, since the most common case of wrong
>> charset declaration is presumeably windows-125x vs. iso-8859-x.
>>
>> I've changed the code to skip the auto detection if a charset was
>> declared in the headers and `code-pages' hasn't been loaded.
WorksForMe[tm]. Thanks a bunch!
--
Matthias Andree
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: quoted-printable no-go in news?
2003-04-10 19:06 ` Matthias Andree
2003-04-10 19:39 ` Matthias Andree
@ 2003-04-10 20:26 ` Jesper Harder
1 sibling, 0 replies; 9+ messages in thread
From: Jesper Harder @ 2003-04-10 20:26 UTC (permalink / raw)
Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:
> Jesper Harder <harder@myrealbox.com> writes:
>
>> * It then uses Emacs' charset detection functions to determine the
>> charset. By default, Emacs doesn't know about windows-1252, so it
>> isn't detected correctly, and the text is decoded as something else.
>
> It must have guessed some other charset that doesn't map \200 then. How
> do I ask it which character set it thinks it is again?
It's indicated in the mode line of the article buffer. For me it
displayed "t", i.e. raw-text-unix.
> can I just steal that from CVS and stuff it into my 21.3 emacs?
I don't know.
> (BTW, is CVS emacs fast enough? 21.2/21.3 are _way_ slower than 21.1
> was).
I don't see a speed difference between CVS and 21.2 that's statistically
significant. These are timings for fetching 10.000 articles from a
local NNTP server 4 times:
Sum: 132.005927 Avg: 33.001482 Var: 2.129394 Thu Apr 10 22:14:52 2003 - Gnus/5.090018 (Oort Gnus v0.18) Emacs/21.2 (gnu/linux)
Sum: 134.704826 Avg: 33.676206 Var: 2.241309 Thu Apr 10 22:18:33 2003 - Gnus/5.090018 (Oort Gnus v0.18) Emacs/21.3.50 (gnu/linux)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: quoted-printable no-go in news?
2003-04-10 19:39 ` Matthias Andree
@ 2003-04-10 23:25 ` Reiner Steib
0 siblings, 0 replies; 9+ messages in thread
From: Reiner Steib @ 2003-04-10 23:25 UTC (permalink / raw)
On Thu, Apr 10 2003, Matthias Andree wrote:
>> can I just steal that from CVS and stuff it into my 21.3 emacs? Or
>> should I go for CVS? (BTW, is CVS emacs fast enough? 21.2/21.3 are _way_
>> slower than 21.1 was).
>
> Hum, stealing from CVS doesn't work out, it complains about the feature
> mule-diag not being provided.
;; X-URL: http://theotp1.physik.uni-ulm.de/~ste/comp/emacs/gnus/rs-windows-1252.el
;;; Commentary:
;;
;; This file contains to (mutually exclusive options) to get get windows-1252
;; coding for Emacs 21.[1-3].
;; Option 1: (rs-use-windows-1252-sk)
;;
;; Slightly modified version of `sk-ucs-coding-system.el' from Simon Krahnke
;; <krahnke@gmx.de>.
;; Option 2: (rs-use-windows-1252-code-pages)
;;
;; Some bits from Emacs 21.4 (`mule.el' and `code-pages.el') to get
;; windows-1252 coding.
[...]
;; Warning: To make this work with Emacs < 21.4 some functions had to be
;; redefined. I'm not sure about possible side-effects.
Together with this, I can use `12 g' to see such articles correctly:
,----[ C-h v gnus-summary-show-article-charset-alist RET ]
| gnus-summary-show-article-charset-alist's value is
| ((12 . windows-1252)
| (0 . iso-8859-15)
| (8 . utf-8))
`----
Bye, Reiner.
--
,,,
(o o)
---ooO-(_)-Ooo--- PGP key available via WWW http://rsteib.home.pages.de/
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-04-10 23:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-07 19:43 quoted-printable no-go in news? Matthias Andree
2003-04-07 22:29 ` Randal L. Schwartz
2003-04-08 5:59 ` Graham Murray
2003-04-10 17:44 ` Matthias Andree
2003-04-08 11:48 ` Jesper Harder
2003-04-10 19:06 ` Matthias Andree
2003-04-10 19:39 ` Matthias Andree
2003-04-10 23:25 ` Reiner Steib
2003-04-10 20:26 ` Jesper Harder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).