Gnus development mailing list
 help / color / mirror / Atom feed
* quoted-printable no-go in news?
@ 2003-04-07 19:43 Matthias Andree
  2003-04-07 22:29 ` Randal L. Schwartz
  2003-04-08 11:48 ` Jesper Harder
  0 siblings, 2 replies; 9+ messages in thread
From: Matthias Andree @ 2003-04-07 19:43 UTC (permalink / raw)


Hi,

Gnus displays wide hollow boxes instead of the umlauts in Message-ID:
<etvq8vonajs2p1llpmfpnl8d0rehmsdu5u@4ax.com> (de.rec.fotografie) -- it
is declared iso-8859-1 and quoted-printable; but Gnus nicely and
correctly displays "\200" for a b0rken =80. Later, when following up to
the post, Gnus complains about unprintable characters and offers to
replace, remove, ignore, ... them. It works in binary-encoded articles.

That's wrong with that article? I tried CVS Gnus with Emacs 21.2 and
21.3. Admittedly, I have ignored some warnings issued by Reiner Steib
because everything else seemed to work just fine.

-- 
Matthias Andree



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: quoted-printable no-go in news?
  2003-04-07 19:43 quoted-printable no-go in news? Matthias Andree
@ 2003-04-07 22:29 ` Randal L. Schwartz
  2003-04-08  5:59   ` Graham Murray
  2003-04-10 17:44   ` Matthias Andree
  2003-04-08 11:48 ` Jesper Harder
  1 sibling, 2 replies; 9+ messages in thread
From: Randal L. Schwartz @ 2003-04-07 22:29 UTC (permalink / raw)
  Cc: ding

>>>>> "Matthias" == Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:

Matthias> Gnus displays wide hollow boxes instead of the umlauts in
Matthias> Message-ID: <etvq8vonajs2p1llpmfpnl8d0rehmsdu5u@4ax.com>
Matthias> (de.rec.fotografie) -- it is declared iso-8859-1 and
Matthias> quoted-printable; but Gnus nicely and correctly displays
Matthias> "\200" for a b0rken =80. Later, when following up to the
Matthias> post, Gnus complains about unprintable characters and offers
Matthias> to replace, remove, ignore, ... them. It works in
Matthias> binary-encoded articles.

The only acceptable "encoding" for Usenet at large in the text
newsgroups that I'm aware of is plain 7-bit ASCII, although I imagine
iso-8859-1 is probably also acceptable.  Quoted-printable, definitely
not.  

However, maybe de.* has a different rule.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: quoted-printable no-go in news?
  2003-04-07 22:29 ` Randal L. Schwartz
@ 2003-04-08  5:59   ` Graham Murray
  2003-04-10 17:44   ` Matthias Andree
  1 sibling, 0 replies; 9+ messages in thread
From: Graham Murray @ 2003-04-08  5:59 UTC (permalink / raw)


merlyn@stonehenge.com (Randal L. Schwartz) writes:

> The only acceptable "encoding" for Usenet at large in the text
> newsgroups that I'm aware of is plain 7-bit ASCII, although I imagine
> iso-8859-1 is probably also acceptable.  Quoted-printable, definitely
> not.  

Though has USEFOR not suggested/recommended that UTF-8 be the default
encoding for text newsgroups. So that current 7-bit ASCII remains
unchanged but also allow for accents, Greek, Cyrillic, Kanji, etc all
within the one encoding. 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: quoted-printable no-go in news?
  2003-04-07 19:43 quoted-printable no-go in news? Matthias Andree
  2003-04-07 22:29 ` Randal L. Schwartz
@ 2003-04-08 11:48 ` Jesper Harder
  2003-04-10 19:06   ` Matthias Andree
  1 sibling, 1 reply; 9+ messages in thread
From: Jesper Harder @ 2003-04-08 11:48 UTC (permalink / raw)


Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:

> Gnus displays wide hollow boxes instead of the umlauts in Message-ID:
> <etvq8vonajs2p1llpmfpnl8d0rehmsdu5u@4ax.com> (de.rec.fotografie) -- it
> is declared iso-8859-1 and quoted-printable; but Gnus nicely and
> correctly displays "\200" for a b0rken =80. 

It's not related to QP -- the same would happen if it wasn't QP-encoded.
The real problem is that the article is not in iso-8859-1 as declared,
but in Windows-1252.

FWIW, the umlauts are displayed correctly for me.  That's because I'm
using a Latin-1 locale, while you're probably using UTF-8.

Here's what happens:

* Gnus detects that the message can't possibly be encoded in Latin-1 as
  advertised in the header (because \200 is not a valid character in
  Latin-1).

* It then uses Emacs' charset detection functions to determine the
  charset.  By default, Emacs doesn't know about windows-1252, so it
  isn't detected correctly, and the text is decoded as something else.

* If you're using a Latin-1 locale, then this happens to be displayed
  correctly.

The real way to solve it is to teach Emacs about windows-1252.  That's
possible with `code-pages.el' in CVS Emacs.  With this package loaded
and configured, the €'s are displayed correctly and converted to a
proper charset when you reply.

If you're not using code-pages.el, then it's probably better to use the
declared charset unconditionally, since the most common case of wrong
charset declaration is presumeably windows-125x vs. iso-8859-x.

I've changed the code to skip the auto detection if a charset was
declared in the headers and `code-pages' hasn't been loaded.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: quoted-printable no-go in news?
  2003-04-07 22:29 ` Randal L. Schwartz
  2003-04-08  5:59   ` Graham Murray
@ 2003-04-10 17:44   ` Matthias Andree
  1 sibling, 0 replies; 9+ messages in thread
From: Matthias Andree @ 2003-04-10 17:44 UTC (permalink / raw)
  Cc: Matthias Andree, ding

merlyn@stonehenge.com (Randal L. Schwartz) writes:

>>>>>> "Matthias" == Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:
>
> Matthias> Gnus displays wide hollow boxes instead of the umlauts in
> Matthias> Message-ID: <etvq8vonajs2p1llpmfpnl8d0rehmsdu5u@4ax.com>
> Matthias> (de.rec.fotografie) -- it is declared iso-8859-1 and
> Matthias> quoted-printable; but Gnus nicely and correctly displays
> Matthias> "\200" for a b0rken =80. Later, when following up to the
> Matthias> post, Gnus complains about unprintable characters and offers
> Matthias> to replace, remove, ignore, ... them. It works in
> Matthias> binary-encoded articles.
>
> The only acceptable "encoding" for Usenet at large in the text
> newsgroups that I'm aware of is plain 7-bit ASCII, although I imagine
> iso-8859-1 is probably also acceptable.  Quoted-printable, definitely
> not.  

You're confusing encoding and character set.

de.* is fine with iso-8859-1, -2, -15 or windows-1252, and few will
complain about utf-8, some more about utf-7. People do complain if you
don't declare your character set properly (most common mistake is using
the € -- EUR -- symbol at position \200 and declaring iso-something when
it should've been windows-1252).

quoted-printable is usually silently accepted in de.* -- unless someone
with Outlook (Express)? sends a follow-up: Outlook* junk catapults goof
up the quoting and don't indent properly.

Having said that, why is quoted-printable + iso-8859-1 different than
8bit + iso-8859-1 -- or, in other words: why am I getting those 4
character wide boxes although the posting is technically correct?

-- 
Matthias Andree



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: quoted-printable no-go in news?
  2003-04-08 11:48 ` Jesper Harder
@ 2003-04-10 19:06   ` Matthias Andree
  2003-04-10 19:39     ` Matthias Andree
  2003-04-10 20:26     ` Jesper Harder
  0 siblings, 2 replies; 9+ messages in thread
From: Matthias Andree @ 2003-04-10 19:06 UTC (permalink / raw)


Jesper Harder <harder@myrealbox.com> writes:

> FWIW, the umlauts are displayed correctly for me.  That's because I'm
> using a Latin-1 locale, while you're probably using UTF-8.

My locale is

LANG=de_DE@euro

Which looks pretty much like Latin-9 (iso-8859-15).

> Here's what happens:
>
> * Gnus detects that the message can't possibly be encoded in Latin-1 as
>   advertised in the header (because \200 is not a valid character in
>   Latin-1).

This would also apply to Latin-9.

> * It then uses Emacs' charset detection functions to determine the
>   charset.  By default, Emacs doesn't know about windows-1252, so it
>   isn't detected correctly, and the text is decoded as something else.

It must have guessed some other charset that doesn't map \200 then. How
do I ask it which character set it thinks it is again?

> The real way to solve it is to teach Emacs about windows-1252.  That's
> possible with `code-pages.el' in CVS Emacs.  With this package loaded
> and configured, the €'s are displayed correctly and converted to a
> proper charset when you reply.

can I just steal that from CVS and stuff it into my 21.3 emacs? Or
should I go for CVS? (BTW, is CVS emacs fast enough? 21.2/21.3 are _way_
slower than 21.1 was).

> If you're not using code-pages.el, then it's probably better to use the
> declared charset unconditionally, since the most common case of wrong
> charset declaration is presumeably windows-125x vs. iso-8859-x.
>
> I've changed the code to skip the auto detection if a charset was
> declared in the headers and `code-pages' hasn't been loaded.

Thanks.

-- 
Matthias Andree



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: quoted-printable no-go in news?
  2003-04-10 19:06   ` Matthias Andree
@ 2003-04-10 19:39     ` Matthias Andree
  2003-04-10 23:25       ` Reiner Steib
  2003-04-10 20:26     ` Jesper Harder
  1 sibling, 1 reply; 9+ messages in thread
From: Matthias Andree @ 2003-04-10 19:39 UTC (permalink / raw)
  Cc: ding

Following up to myself (2nd quote level is Jesper Harder):

>> The real way to solve it is to teach Emacs about windows-1252.  That's
>> possible with `code-pages.el' in CVS Emacs.  With this package loaded
>> and configured, the €'s are displayed correctly and converted to a
>> proper charset when you reply.
>
> can I just steal that from CVS and stuff it into my 21.3 emacs? Or
> should I go for CVS? (BTW, is CVS emacs fast enough? 21.2/21.3 are _way_
> slower than 21.1 was).

Hum, stealing from CVS doesn't work out, it complains about the feature
mule-diag not being provided.

>> If you're not using code-pages.el, then it's probably better to use the
>> declared charset unconditionally, since the most common case of wrong
>> charset declaration is presumeably windows-125x vs. iso-8859-x.
>>
>> I've changed the code to skip the auto detection if a charset was
>> declared in the headers and `code-pages' hasn't been loaded.

WorksForMe[tm]. Thanks a bunch!

-- 
Matthias Andree



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: quoted-printable no-go in news?
  2003-04-10 19:06   ` Matthias Andree
  2003-04-10 19:39     ` Matthias Andree
@ 2003-04-10 20:26     ` Jesper Harder
  1 sibling, 0 replies; 9+ messages in thread
From: Jesper Harder @ 2003-04-10 20:26 UTC (permalink / raw)


Matthias Andree <ma@dt.e-technik.uni-dortmund.de> writes:

> Jesper Harder <harder@myrealbox.com> writes:
>
>> * It then uses Emacs' charset detection functions to determine the
>>   charset.  By default, Emacs doesn't know about windows-1252, so it
>>   isn't detected correctly, and the text is decoded as something else.
>
> It must have guessed some other charset that doesn't map \200 then. How
> do I ask it which character set it thinks it is again?

It's indicated in the mode line of the article buffer.  For me it
displayed "t", i.e. raw-text-unix.

> can I just steal that from CVS and stuff it into my 21.3 emacs?

I don't know.

> (BTW, is CVS emacs fast enough? 21.2/21.3 are _way_ slower than 21.1
> was).

I don't see a speed difference between CVS and 21.2 that's statistically
significant.  These are timings for fetching 10.000 articles from a
local NNTP server 4 times:

Sum: 132.005927 Avg: 33.001482 Var: 2.129394 Thu Apr 10 22:14:52 2003 - Gnus/5.090018 (Oort Gnus v0.18) Emacs/21.2 (gnu/linux)
Sum: 134.704826 Avg: 33.676206 Var: 2.241309 Thu Apr 10 22:18:33 2003 - Gnus/5.090018 (Oort Gnus v0.18) Emacs/21.3.50 (gnu/linux)



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: quoted-printable no-go in news?
  2003-04-10 19:39     ` Matthias Andree
@ 2003-04-10 23:25       ` Reiner Steib
  0 siblings, 0 replies; 9+ messages in thread
From: Reiner Steib @ 2003-04-10 23:25 UTC (permalink / raw)


On Thu, Apr 10 2003, Matthias Andree wrote:

>> can I just steal that from CVS and stuff it into my 21.3 emacs? Or
>> should I go for CVS? (BTW, is CVS emacs fast enough? 21.2/21.3 are _way_
>> slower than 21.1 was).
>
> Hum, stealing from CVS doesn't work out, it complains about the feature
> mule-diag not being provided.

;; X-URL: http://theotp1.physik.uni-ulm.de/~ste/comp/emacs/gnus/rs-windows-1252.el

;;; Commentary:
;;
;; This file contains to (mutually exclusive options) to get get windows-1252
;; coding for Emacs 21.[1-3].

;; Option 1: (rs-use-windows-1252-sk)
;;
;; Slightly modified version of `sk-ucs-coding-system.el' from Simon Krahnke
;; <krahnke@gmx.de>.

;; Option 2: (rs-use-windows-1252-code-pages)
;;
;; Some bits from Emacs 21.4 (`mule.el' and `code-pages.el') to get
;; windows-1252 coding.
[...]
;; Warning: To make this work with Emacs < 21.4 some functions had to be
;; redefined.  I'm not sure about possible side-effects.

Together with this, I can use `12 g' to see such articles correctly:

,----[ C-h v gnus-summary-show-article-charset-alist RET ]
| gnus-summary-show-article-charset-alist's value is 
| ((12 . windows-1252)
|  (0 . iso-8859-15)
|  (8 . utf-8))
`----

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo--- PGP key available via WWW   http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-04-10 23:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-07 19:43 quoted-printable no-go in news? Matthias Andree
2003-04-07 22:29 ` Randal L. Schwartz
2003-04-08  5:59   ` Graham Murray
2003-04-10 17:44   ` Matthias Andree
2003-04-08 11:48 ` Jesper Harder
2003-04-10 19:06   ` Matthias Andree
2003-04-10 19:39     ` Matthias Andree
2003-04-10 23:25       ` Reiner Steib
2003-04-10 20:26     ` Jesper Harder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).