Gnus development mailing list
 help / color / mirror / Atom feed
* nntp servers with multibyte group names?
@ 2018-11-27 19:55 Eric Abrahamsen
  2018-11-27 20:18 ` Adam Sjøgren
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Abrahamsen @ 2018-11-27 19:55 UTC (permalink / raw)
  To: ding

Hi all,

I'm testing a patch to Gnus to stop coercing group names to unibyte --
ie, to leave group names decoded as much as possible.

I'd like to test this with the nntp backend, but I don't actually know
if nntp is allowed to have multibyte group names. I've gone browsing
through some publicly-accessible nntp servers and haven't found anything
but ascii-compatible group names, even for servers at, eg, Taiwanese
universities.

Are multibyte group names even legal?

Thanks,
Eric




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: nntp servers with multibyte group names?
  2018-11-27 19:55 nntp servers with multibyte group names? Eric Abrahamsen
@ 2018-11-27 20:18 ` Adam Sjøgren
  2018-11-27 20:20   ` Eric Abrahamsen
  2018-11-28  0:44   ` Russ Allbery
  0 siblings, 2 replies; 9+ messages in thread
From: Adam Sjøgren @ 2018-11-27 20:18 UTC (permalink / raw)
  To: ding

Eric writes:

> I'd like to test this with the nntp backend, but I don't actually know
> if nntp is allowed to have multibyte group names. I've gone browsing
> through some publicly-accessible nntp servers and haven't found anything
> but ascii-compatible group names, even for servers at, eg, Taiwanese
> universities.
>
> Are multibyte group names even legal?

According to RFC 3977 (Network News Transfer Protocol (NNTP)):

  "o  Although this specification allows UTF-8 for newsgroup names, they
      SHOULD be restricted to US-ASCII until a successor to RFC 1036
      [RFC1036] standardises another approach. 8-bit encodings SHOULD
      NOT be used because they are likely to cause interoperability
      problems."

     - https://tools.ietf.org/html/rfc3977#section-10


  Best regards,

    Adam

-- 
 "Gav                                                         Adam Sjøgren
  Strik"                                                 asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: nntp servers with multibyte group names?
  2018-11-27 20:18 ` Adam Sjøgren
@ 2018-11-27 20:20   ` Eric Abrahamsen
  2018-11-27 20:34     ` Adam Sjøgren
  2018-11-28  0:44   ` Russ Allbery
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Abrahamsen @ 2018-11-27 20:20 UTC (permalink / raw)
  To: ding

Adam Sjøgren <asjo@koldfront.dk> writes:

> Eric writes:
>
>> I'd like to test this with the nntp backend, but I don't actually know
>> if nntp is allowed to have multibyte group names. I've gone browsing
>> through some publicly-accessible nntp servers and haven't found anything
>> but ascii-compatible group names, even for servers at, eg, Taiwanese
>> universities.
>>
>> Are multibyte group names even legal?
>
> According to RFC 3977 (Network News Transfer Protocol (NNTP)):
>
>   "o  Although this specification allows UTF-8 for newsgroup names, they
>       SHOULD be restricted to US-ASCII until a successor to RFC 1036
>       [RFC1036] standardises another approach. 8-bit encodings SHOULD
>       NOT be used because they are likely to cause interoperability
>       problems."
>
>      - https://tools.ietf.org/html/rfc3977#section-10

Interesting! Thanks for that (I guess I could have looked that up
myself). I will assume for now that group names are coming in as ascii,
but maybe leave a note in the code.

Eric




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: nntp servers with multibyte group names?
  2018-11-27 20:20   ` Eric Abrahamsen
@ 2018-11-27 20:34     ` Adam Sjøgren
  2018-11-27 20:54       ` Eric Abrahamsen
  0 siblings, 1 reply; 9+ messages in thread
From: Adam Sjøgren @ 2018-11-27 20:34 UTC (permalink / raw)
  To: ding

Eric writes:

>>   "o  Although this specification allows UTF-8 for newsgroup names, they
>>       SHOULD be restricted to US-ASCII until a successor to RFC 1036
>>       [RFC1036] standardises another approach. 8-bit encodings SHOULD
>>       NOT be used because they are likely to cause interoperability
>>       problems."
>>
>>      - https://tools.ietf.org/html/rfc3977#section-10
>
> Interesting! Thanks for that (I guess I could have looked that up
> myself).

I was curious to find the answer, as I apparently have a hobby of
implementing an nntp-interface on average every half a decade :-)

> I will assume for now that group names are coming in as ascii, but
> maybe leave a note in the code.

Why not assume utf-8 (of which ascii is a subset, I believe)?


  Best regards,

    Adam

-- 
 "Sadly, these days, if you know the difference               Adam Sjøgren
  between a phillips- and a flat head screwdriver,       asjo@koldfront.dk
  you're a renaissance man."




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: nntp servers with multibyte group names?
  2018-11-27 20:34     ` Adam Sjøgren
@ 2018-11-27 20:54       ` Eric Abrahamsen
  2018-11-27 21:02         ` Adam Sjøgren
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Abrahamsen @ 2018-11-27 20:54 UTC (permalink / raw)
  To: ding

Adam Sjøgren <asjo@koldfront.dk> writes:

> Eric writes:
>
>>>   "o  Although this specification allows UTF-8 for newsgroup names, they
>>>       SHOULD be restricted to US-ASCII until a successor to RFC 1036
>>>       [RFC1036] standardises another approach. 8-bit encodings SHOULD
>>>       NOT be used because they are likely to cause interoperability
>>>       problems."
>>>
>>>      - https://tools.ietf.org/html/rfc3977#section-10
>>
>> Interesting! Thanks for that (I guess I could have looked that up
>> myself).
>
> I was curious to find the answer, as I apparently have a hobby of
> implementing an nntp-interface on average every half a decade :-)

Sounds like... fun. But it also sounds like you might be able to answer
some of my questions!

>> I will assume for now that group names are coming in as ascii, but
>> maybe leave a note in the code.
>
> Why not assume utf-8 (of which ascii is a subset, I believe)?

I am miserable at encoding-related programming, it's something I've
never really had to deal with, and I don't grok it.

For instance:

`nntp-make-process-buffer' disables multibyte, then
`nntp-open-connection' sets both `coding-system-for-read' and
`coding-system-for-write' to 'binary. This makes sense: everything is
unibyte.

If leave multibyte enabled, then set both coding-systems to 'utf-8, will
that just magically work? Doesn't it depend to some extent on what the
remote server is sending over the wire? Do I need to negotiate with the
server somehow, or will Emacs have already taken care of that for me, in
`open-network-stream'?

(This is also something I should be asking on emacs.help, I guess.)

Thanks for any light you're willing to shed!

Eric




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: nntp servers with multibyte group names?
  2018-11-27 20:54       ` Eric Abrahamsen
@ 2018-11-27 21:02         ` Adam Sjøgren
  2018-11-27 21:07           ` Eric Abrahamsen
  0 siblings, 1 reply; 9+ messages in thread
From: Adam Sjøgren @ 2018-11-27 21:02 UTC (permalink / raw)
  To: ding

Eric writes:

>> I was curious to find the answer, as I apparently have a hobby of
>> implementing an nntp-interface on average every half a decade :-)
>
> Sounds like... fun. But it also sounds like you might be able to answer
> some of my questions!

It is (I like using nntp for e.g. RSS/Atom feeds, and the nntp-model
fits a suprising number of things)! But I have just been either ignoring
the problem or assuming that everything is utf-8.

>> Why not assume utf-8 (of which ascii is a subset, I believe)?
>
> I am miserable at encoding-related programming, it's something I've
> never really had to deal with, and I don't grok it.

I don't understand encoding it works in Emacs at all, so I will have to
defer to someone/where who does.

Having an ø in my last name used to be a challenge. These days things
are converging on "utf-8 or broken" as far as I can tell.

> Thanks for any light you're willing to shed!

Sorry I can't be of any help...


  Best regards,

    Adam

-- 
 "Though this doesn't mean changing my sense of               Adam Sjøgren
  beauty. I.e., I don't like it."                        asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: nntp servers with multibyte group names?
  2018-11-27 21:02         ` Adam Sjøgren
@ 2018-11-27 21:07           ` Eric Abrahamsen
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Abrahamsen @ 2018-11-27 21:07 UTC (permalink / raw)
  To: ding

Adam Sjøgren <asjo@koldfront.dk> writes:

> Eric writes:
>
>>> I was curious to find the answer, as I apparently have a hobby of
>>> implementing an nntp-interface on average every half a decade :-)
>>
>> Sounds like... fun. But it also sounds like you might be able to answer
>> some of my questions!
>
> It is (I like using nntp for e.g. RSS/Atom feeds, and the nntp-model
> fits a suprising number of things)! But I have just been either ignoring
> the problem or assuming that everything is utf-8.
>
>>> Why not assume utf-8 (of which ascii is a subset, I believe)?
>>
>> I am miserable at encoding-related programming, it's something I've
>> never really had to deal with, and I don't grok it.
>
> I don't understand encoding it works in Emacs at all, so I will have to
> defer to someone/where who does.
>
> Having an ø in my last name used to be a challenge. These days things
> are converging on "utf-8 or broken" as far as I can tell.
>
>> Thanks for any light you're willing to shed!
>
> Sorry I can't be of any help...

No worries, I really should just float this on emacs.help...

Thanks,
Eric




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: nntp servers with multibyte group names?
  2018-11-27 20:18 ` Adam Sjøgren
  2018-11-27 20:20   ` Eric Abrahamsen
@ 2018-11-28  0:44   ` Russ Allbery
  2018-11-28  1:03     ` Eric Abrahamsen
  1 sibling, 1 reply; 9+ messages in thread
From: Russ Allbery @ 2018-11-28  0:44 UTC (permalink / raw)
  To: ding

Adam Sjøgren <asjo@koldfront.dk> writes:

> According to RFC 3977 (Network News Transfer Protocol (NNTP)):

>   "o  Although this specification allows UTF-8 for newsgroup names, they
>       SHOULD be restricted to US-ASCII until a successor to RFC 1036
>       [RFC1036] standardises another approach. 8-bit encodings SHOULD
>       NOT be used because they are likely to cause interoperability
>       problems."

>      - https://tools.ietf.org/html/rfc3977#section-10

For a bit of background here, non-ASCII newsgroup names mostly work, and
are even used in some areas, but we saw a few instances of strange
behavior in some experiments.  However, putting raw UTF-8 directly into
the Newsgroups header breaks compatibility with RFC 5322 (email), which
prohibits non-ASCII characters in headers.

Email would say that you should MIME-encode those names, but that will
definitely break all Usenet software, which assumes that Newsgroups are
byte strings that don't require any further interpretation.  (And some of
the encoding characters are invalid in newsgroup names, I believe.)

We weren't able to find a good reconciliation of that conflict before the
IETF working group ran out of steam.

So you can probably just use raw UTF-8 directly in newsgroup names with a
local server, but expect some strangeness with some clients, and you are
(for whatever it's worth) breaking compatibility with the email standards
by doing so.

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: nntp servers with multibyte group names?
  2018-11-28  0:44   ` Russ Allbery
@ 2018-11-28  1:03     ` Eric Abrahamsen
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Abrahamsen @ 2018-11-28  1:03 UTC (permalink / raw)
  To: ding

Russ Allbery <eagle@eyrie.org> writes:

> Adam Sjøgren <asjo@koldfront.dk> writes:
>
>> According to RFC 3977 (Network News Transfer Protocol (NNTP)):
>
>>   "o  Although this specification allows UTF-8 for newsgroup names, they
>>       SHOULD be restricted to US-ASCII until a successor to RFC 1036
>>       [RFC1036] standardises another approach. 8-bit encodings SHOULD
>>       NOT be used because they are likely to cause interoperability
>>       problems."
>
>>      - https://tools.ietf.org/html/rfc3977#section-10
>
> For a bit of background here, non-ASCII newsgroup names mostly work, and
> are even used in some areas, but we saw a few instances of strange
> behavior in some experiments.  However, putting raw UTF-8 directly into
> the Newsgroups header breaks compatibility with RFC 5322 (email), which
> prohibits non-ASCII characters in headers.
>
> Email would say that you should MIME-encode those names, but that will
> definitely break all Usenet software, which assumes that Newsgroups are
> byte strings that don't require any further interpretation.  (And some of
> the encoding characters are invalid in newsgroup names, I believe.)
>
> We weren't able to find a good reconciliation of that conflict before the
> IETF working group ran out of steam.
>
> So you can probably just use raw UTF-8 directly in newsgroup names with a
> local server, but expect some strangeness with some clients, and you are
> (for whatever it's worth) breaking compatibility with the email standards
> by doing so.

Thanks very much for this, Russ -- this is good background.

Since Gnus is only a client, we're not in a position to decide about the
encoding of NNTP group names (thankfully), we only need to decide how to
accept and handle such names as we receive from a server. As Stefan
Monnier pointed out in answer to a separate question on emacs.help, the
NNTP protocol will likely speak several different text encodings, so
Gnus should still be running the network connection in binary mode. I'm
going to leave the majority of the code as-is, and make the smallest
change to group-name decoding I can.

Thanks again,
Eric




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-11-28  1:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-27 19:55 nntp servers with multibyte group names? Eric Abrahamsen
2018-11-27 20:18 ` Adam Sjøgren
2018-11-27 20:20   ` Eric Abrahamsen
2018-11-27 20:34     ` Adam Sjøgren
2018-11-27 20:54       ` Eric Abrahamsen
2018-11-27 21:02         ` Adam Sjøgren
2018-11-27 21:07           ` Eric Abrahamsen
2018-11-28  0:44   ` Russ Allbery
2018-11-28  1:03     ` Eric Abrahamsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).