Gnus development mailing list
 help / color / mirror / Atom feed
* Latin 1 in non-MIME news postings?
@ 1998-09-02  7:33 Kai Grossjohann
  1998-09-02  9:49 ` Jost Krieger
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Kai Grossjohann @ 1998-09-02  7:33 UTC (permalink / raw)


I just read a news article with Latin 1 characters in it with pGnus
0.13 (Emacs 20.3).  Latin 1 characters were displayed as \888 octal
escapes.  The news article didn't have any MIME headers.

Is this the correct behavior?  I have (set-language-environment
"Latin-1") in my init files.

kai
-- 
OOP: object oriented programming;  OOPS: object oriented mistakes


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
  1998-09-02  7:33 Latin 1 in non-MIME news postings? Kai Grossjohann
@ 1998-09-02  9:49 ` Jost Krieger
  1998-09-02 11:24   ` Kai Grossjohann
  1998-09-03 10:27   ` Hrvoje Niksic
  1998-09-02 10:42 ` jean-luc cassel
  1998-09-02 12:25 ` Lars Magne Ingebrigtsen
  2 siblings, 2 replies; 12+ messages in thread
From: Jost Krieger @ 1998-09-02  9:49 UTC (permalink / raw)


>>>>> "Kai" == Kai Grossjohann <grossjohann@amaunet.cs.uni-dortmund.de> writes:

 > I just read a news article with Latin 1 characters in it with pGnus
 > 0.13 (Emacs 20.3).  Latin 1 characters were displayed as \888 octal
 > escapes.  The news article didn't have any MIME headers.

So how should gnus know they are Latin1 characters ?

 > Is this the correct behavior?  I have (set-language-environment
 > "Latin-1") in my init files.

That might be a hint. On the other hand, those just-send-8-bit people
should stand out like a sore thumb.

Jost

-- 
| Jost.Krieger@ruhr-uni-bochum.de      Please help stamp out spam! |
| Postmaster, JAPH, resident answer machine          am RZ der RUB |


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
  1998-09-02  7:33 Latin 1 in non-MIME news postings? Kai Grossjohann
  1998-09-02  9:49 ` Jost Krieger
@ 1998-09-02 10:42 ` jean-luc cassel
  1998-09-02 12:25 ` Lars Magne Ingebrigtsen
  2 siblings, 0 replies; 12+ messages in thread
From: jean-luc cassel @ 1998-09-02 10:42 UTC (permalink / raw)


/ Kai Grossjohann <grossjohann@amaunet.cs.uni-dortmund.de> :

> I just read a news article with Latin 1 characters in it with pGnus
> 0.13 (Emacs 20.3).  Latin 1 characters were displayed as \888 octal
> escapes.  The news article didn't have any MIME headers.
> 
> Is this the correct behavior?  I have (set-language-environment
> "Latin-1") in my init files.

[I'm french] with pgnus 0.7-emacs 20.2, no problem even if no MIME
headers, with only in .emacs : (standard-display-european 1) [and
(gnus-strict-mime t)]



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
  1998-09-02  9:49 ` Jost Krieger
@ 1998-09-02 11:24   ` Kai Grossjohann
  1998-09-03 10:15     ` Russ Allbery
  1998-09-03 10:27   ` Hrvoje Niksic
  1 sibling, 1 reply; 12+ messages in thread
From: Kai Grossjohann @ 1998-09-02 11:24 UTC (permalink / raw)
  Cc: ding

>>>>> On 02 Sep 1998, Jost Krieger said:

  Jost> So how should gnus know they are Latin1 characters ?

Hm.  Maybe I confused HTML with news, here.  In the HTML standard, it
says the document is Latin 1.  I thought the news RFC also specified
Latin 1 as the default charset?  Who knows more?

kai
-- 
OOP: object oriented programming;  OOPS: object oriented mistakes


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
  1998-09-02  7:33 Latin 1 in non-MIME news postings? Kai Grossjohann
  1998-09-02  9:49 ` Jost Krieger
  1998-09-02 10:42 ` jean-luc cassel
@ 1998-09-02 12:25 ` Lars Magne Ingebrigtsen
  1998-09-07 20:15   ` Kai Grossjohann
  2 siblings, 1 reply; 12+ messages in thread
From: Lars Magne Ingebrigtsen @ 1998-09-02 12:25 UTC (permalink / raw)


Kai Grossjohann <grossjohann@amaunet.cs.uni-dortmund.de> writes:

> I just read a news article with Latin 1 characters in it with pGnus
> 0.13 (Emacs 20.3).  Latin 1 characters were displayed as \888 octal
> escapes.  The news article didn't have any MIME headers.

Do you get that even if you `C-u g' the article to avoid any decoding
on Gnus' part?

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@gnus.org * Lars Magne Ingebrigtsen


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
  1998-09-02 11:24   ` Kai Grossjohann
@ 1998-09-03 10:15     ` Russ Allbery
       [not found]       ` <x7soi9yx8s.fsf@peorth.gweep.net>
  0 siblings, 1 reply; 12+ messages in thread
From: Russ Allbery @ 1998-09-03 10:15 UTC (permalink / raw)


Kai Grossjohann <grossjohann@amaunet.cs.uni-dortmund.de> writes:
>>>>>> On 02 Sep 1998, Jost Krieger said:

>   Jost> So how should gnus know they are Latin1 characters ?

> Hm.  Maybe I confused HTML with news, here.  In the HTML standard, it
> says the document is Latin 1.  I thought the news RFC also specified
> Latin 1 as the default charset?  Who knows more?

News specifies article body format follows RFC 822, which specifies 7bit
ASCII.  Technically, MIME isn't even legal in news.

In practice, most people use MIME or just send 8bit.

It looks likely that the new news RFC will specify UTF-7 as a default but
strongly encourage use of MIME charset tagging.

-- 
Russ Allbery (rra@stanford.edu)         <URL:http://www.eyrie.org/~eagle/>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
  1998-09-02  9:49 ` Jost Krieger
  1998-09-02 11:24   ` Kai Grossjohann
@ 1998-09-03 10:27   ` Hrvoje Niksic
  1 sibling, 0 replies; 12+ messages in thread
From: Hrvoje Niksic @ 1998-09-03 10:27 UTC (permalink / raw)


Jost Krieger <Jost.Krieger@ruhr-uni-bochum.de> writes:

> >>>>> "Kai" == Kai Grossjohann <grossjohann@amaunet.cs.uni-dortmund.de> writes:
> 
>  > I just read a news article with Latin 1 characters in it with pGnus
>  > 0.13 (Emacs 20.3).  Latin 1 characters were displayed as \888 octal
>  > escapes.  The news article didn't have any MIME headers.
> 
> So how should gnus know they are Latin1 characters ?

It's a good idea to assume Latin1 when nothing is specified.  The
assumption does no harm, and gets things right in most of the cases.

-- 
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
Those who like sausages, laws, and standards are well advised not to
learn how they are made.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
       [not found]       ` <x7soi9yx8s.fsf@peorth.gweep.net>
@ 1998-09-03 16:47         ` Russ Allbery
       [not found]           ` <x7k93lw2lm.fsf@peorth.gweep.net>
  1998-09-03 17:35         ` Karl Kleinpaste
  1 sibling, 1 reply; 12+ messages in thread
From: Russ Allbery @ 1998-09-03 16:47 UTC (permalink / raw)


Stainless Steel Rat <ratinox@peorth.gweep.net> writes:
> "RA" == Russ Allbery <rra@stanford.edu> writes:

> RA> News specifies article body format follows RFC 822, which specifies
> RA> 7bit ASCII.  Technically, MIME isn't even legal in news.

> Wait.

> MIME in and of itself sits on top of RFC 822.  MIME specifies that 8-bit
> data be encoded into a 7-bit format, usually base64.

> Recent incarnations of SMTP allow for 8-bit data over 8-bit clean
> networks between 8-bit clean MTAs, (ab)using aspects of MIME to
> accomplish this.

> Please do not confuse the two.

How am I confusing the two?

News specifies, by proxy, 7bit ASCII in article bodies.  MIME, whether
7bit or 8bit or what have you, technically does not apply to news and
means nothing in news, since the news standards (although somewhat obscure
on this point) seem to indicate that they do not adopt 822 extensions.

Therefore there is technically no standards-compliant way to send 8bit
data of any sort, even ISO 8859-1 characters, across Usenet, even in an
encoded form, since news says that base64 is just a stream of characters
like any other 7bit ASCII body and that you cannot reliably apply any
particular interpretation to the headers that claim otherwise.

Obviously this is widely ignored in practice.

-- 
Russ Allbery (rra@stanford.edu)         <URL:http://www.eyrie.org/~eagle/>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
       [not found]           ` <x7k93lw2lm.fsf@peorth.gweep.net>
@ 1998-09-03 17:22             ` Russ Allbery
  1998-09-03 20:48             ` Richard Coleman
  1 sibling, 0 replies; 12+ messages in thread
From: Russ Allbery @ 1998-09-03 17:22 UTC (permalink / raw)


Stainless Steel Rat <ratinox@peorth.gweep.net> writes:
> "RA" == Russ Allbery <rra@stanford.edu> writes:

> RA> How am I confusing the two?

> Vanilla MIME is 100% compliant with RFC 822.  8-bit data is *NOT*
> allowed in a MIME message; it must be encoded into a 7-bit format.  MIME
> is completely legal in news.

Yes, but legal is not what I'm talking about.  It's completely legal to
send MIME-encoded data in news.  However, the headers that tell you that
it's MIME-encoded data don't mean anything in news, and therefore it's
technically not legal to make assumptions based on their content.

Like, say, decoding articles.

It's a minor pedantic point, yes, but it's one of the things that annoys
me about RFC 1036.  Probably one of the more minor ones.

-- 
Russ Allbery (rra@stanford.edu)         <URL:http://www.eyrie.org/~eagle/>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
       [not found]       ` <x7soi9yx8s.fsf@peorth.gweep.net>
  1998-09-03 16:47         ` Russ Allbery
@ 1998-09-03 17:35         ` Karl Kleinpaste
  1 sibling, 0 replies; 12+ messages in thread
From: Karl Kleinpaste @ 1998-09-03 17:35 UTC (permalink / raw)


Stainless Steel Rat <ratinox@peorth.gweep.net> writes:
> MIME in and of itself sits on top of RFC 822.  MIME specifies that 8-bit
> data be encoded into a 7-bit format, usually base64.

> Recent incarnations of SMTP allow for 8-bit data over 8-bit clean networks
> between 8-bit clean MTAs, (ab)using aspects of MIME to accomplish this.

And later writes:
> Vanilla MIME is 100% compliant with RFC 822.  8-bit data is *NOT* allowed
> in a MIME message; it must be encoded into a 7-bit format.

Nonsense, as even a minimal review of the RFCs shows.

MIME specifies no such thing as a 7bit encoding requirement.  "Vanilla
MIME" is explicitly an extension of RFC822 beyond its original
intended domain.  MIME specifies that 8bit data has the identity
transformation when "Content-Transfer-Encoding: 8bit" is present.
MIME sits comfortably atop both RFC822 format and RFC821 transport, as
RFC-modified for 8bit data passage (e.g., RFC1652, RFC2045) -- and the
RFCs specifically state that such formats and transports have been
redefined and extended -- so that it is by no means "(ab)using aspects
of MIME" to do so.

For the pedantic, relevant RFC citations follow.

--karl

RFC 2045, _MIME Part 1_, Format of Internet Message Bodies, page 1,
Abstract:

   ...This set of
   documents, collectively called the Multipurpose Internet Mail
   Extensions, or MIME, redefines the format of messages to allow for
                        ^^^^^^^^^
    (1)   textual message bodies in character sets other than
          US-ASCII...

   ...Because RFC 822 said
   so little about message bodies, these documents are largely
   orthogonal to (rather than a revision of) RFC 822.

Page 3, Introduction:

   One of the notable limitations of RFC 821/822 based mail systems is
   the fact that they limit the contents of electronic mail messages to
   relatively short lines (e.g. 1000 characters or less [RFC-821]) of
   7bit US-ASCII.  This forces users to convert any non-textual data
   that they may wish to send into seven-bit bytes representable as
   printable US-ASCII characters...

Page 4:

   This document describes several mechanisms that combine to solve most
   of these problems...

    (3)   A Content-Transfer-Encoding header field, which can be
          used to specify both the encoding transformation that
          was applied to the body and the domain of the result.
          Encoding transformations other than the identity
          transformation are usually applied to data in order to
          allow it to pass through mail transport mechanisms
          which may have data or character set limitations.

Page 14, Content-Transfer-Encoding Header Field:

   Many media types which could be usefully transported via email are
   represented, in their "natural" format, as 8bit character or binary
   data.  Such data cannot be transmitted over some transfer protocols...

   ...Proper labelling
   of unencoded material in less restrictive formats for direct use over
   less restrictive transports is also desireable.  This document
   specifies that such encodings will be indicated by a new "Content-
   Transfer-Encoding" header field.  This field has not been defined by
   any previous standard.

Pages 15-16, Content-Transfer-Encoding Semantics:

   Three transformations are currently defined: identity, the "quoted-
   printable" encoding, and the "base64" encoding.  The domains are
   "binary", "8bit" and "7bit".

   The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all
   mean that the identity (i.e. NO) encoding transformation has been
   performed.  As such, they serve simply as indicators of the domain of
   the body data...

   ...[E]stablishing
   only a single transformation into the "7bit" domain does not seem
   possible.

8bit data is perfectly legal, as-is, in a MIME context
(i.e. identified as such), when riding through an RFC1652 8bit
transport.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
  1998-09-02 12:25 ` Lars Magne Ingebrigtsen
@ 1998-09-07 20:15   ` Kai Grossjohann
  0 siblings, 0 replies; 12+ messages in thread
From: Kai Grossjohann @ 1998-09-07 20:15 UTC (permalink / raw)


>>>>> Kai Grossjohann <grossjohann@amaunet.cs.uni-dortmund.de> writes:

  Kai> I just read a news article with Latin 1 characters in it with
  Kai> pGnus 0.13 (Emacs 20.3).  Latin 1 characters were displayed as
  Kai> \888 octal escapes.  The news article didn't have any MIME
  Kai> headers.

>>>>> On 02 Sep 1998, Lars Magne Ingebrigtsen said:

  Lars> Do you get that even if you `C-u g' the article to avoid any
  Lars> decoding on Gnus' part?

Dunno.  Seems to have disappeared between 0.13 and 0.17.  Was
traveling the past few days, so the point is moot now, I guess.

kai
-- 
OOP: object oriented programming;  OOPS: object oriented mistakes


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Latin 1 in non-MIME news postings?
       [not found]           ` <x7k93lw2lm.fsf@peorth.gweep.net>
  1998-09-03 17:22             ` Russ Allbery
@ 1998-09-03 20:48             ` Richard Coleman
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Coleman @ 1998-09-03 20:48 UTC (permalink / raw)


> Vanilla MIME is 100% compliant with RFC 822.  8-bit data is *NOT* allowed
> in a MIME message; it must be encoded into a 7-bit format.  MIME is
> completely legal in news.
> 
> SMTP added the '8bit' transfer type.  This is not a MIME standard type...
> that is, it is a standard MIME type for SMTP, not for MIME.  That being the
> case, 8bit is valid *ONLY* for SMTP traffic.

This is not correct.  RFC2045 clearly defines the type "8bit" as a valid
Content-Transfer-Encoding.  Here is one of the relevant paragraphs from
RFC2045:

   The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all
   mean that the identity (i.e. NO) encoding transformation has been
   performed.  As such, they serve simply as indicators of the domain of
   the body data, and provide useful information about the sort of
   encoding that might be needed for transmission in a given transport
   system.  The terms "7bit data", "8bit data", and "binary data" are
   all defined in Section 2.

The MIME standard does not force 7bit transport.

--
Richard Coleman
coleman@math.gatech.edu


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~1998-09-07 20:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-09-02  7:33 Latin 1 in non-MIME news postings? Kai Grossjohann
1998-09-02  9:49 ` Jost Krieger
1998-09-02 11:24   ` Kai Grossjohann
1998-09-03 10:15     ` Russ Allbery
     [not found]       ` <x7soi9yx8s.fsf@peorth.gweep.net>
1998-09-03 16:47         ` Russ Allbery
     [not found]           ` <x7k93lw2lm.fsf@peorth.gweep.net>
1998-09-03 17:22             ` Russ Allbery
1998-09-03 20:48             ` Richard Coleman
1998-09-03 17:35         ` Karl Kleinpaste
1998-09-03 10:27   ` Hrvoje Niksic
1998-09-02 10:42 ` jean-luc cassel
1998-09-02 12:25 ` Lars Magne Ingebrigtsen
1998-09-07 20:15   ` Kai Grossjohann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).