Gnus development mailing list
 help / color / mirror / Atom feed
From: Katsumi Yamaoka <yamaoka@jpl.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: bugs@gnus.org, ding@gnus.org, emacs-devel@gnu.org
Subject: Re: utf-7 encoding in imap.el is applied to already encoded byte sequences
Date: Thu, 13 Dec 2007 11:15:42 +0900	[thread overview]
Message-ID: <b4m63z3l4q9.fsf@jpl.org> (raw)
In-Reply-To: <jwvy7bzlosj.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 381 bytes --]

>>>>> Stefan Monnier wrote:

> It seems that the utf-encode call in imap.el is often (always?) applied
> to unibyte data (i.e. streams of bytes, a.k.a already encoded text).

> The reason this is so, is because when reading newsrc.eld, Gnus calls
> mm-string-as-unibyte (lisp/gnus/gnus-start.el:2420).

Gnus saves a newsgroup name in the ~/.newsrc.eld file as a string
like this:


[-- Attachment #2: Type: text/plain, Size: 72 bytes --]

(prin1-to-string (encode-coding-string "NAME" 'CODING-SYSTEM))

[-- Attachment #3: Type: text/plain, Size: 2415 bytes --]


If this is a nntp group, what actually encodes it is the news
server.  For instance, news.newsfan.net uses gb2312 (or possibly
gbk).  Gnus reads it through the net and uses it as-is internally.
Only when displaying it for a user, the newsgroup name is decoded
according to `gnus-group-name-charset-method-alist' or
`gnus-group-name-charset-group-alist'.  Gnus does it for groups
based on the other back ends, too.  But please note that Gnus
trunk supports non-ASCII newsgroup names[1] for only nntp, nnml
(including nnagent), and nnrss back ends.  In those cases,
encoding of newsgroup names is done by Gnus by itself.

When reading such an encoded newsgroup name from the ~/.newsrc.eld
file, Gnus uses `read' and `eval' (gnus-start.el:2391).  Once both
Emacs trunk and Emacs Unicode-2 did read it as a multibyte string,
and it didn't match the one that was in the active data.  That is
why I added `mm-string-as-unibyte'.  Though it seems to be
unnecessary nowadays, it behaves as no-op for a unibyte string,
doesn't it?

> It's also because Gnus pre-encodes the names when they're read
> from the keyboard in gnus-read-move-group-name (lisp/gnus/gnus-sum.el:11785).

Because a non-ASCII group name should be an encoded unibyte string
for the internal use.  But there should be no non-ASCII name in
the nnimap groups and pre-encoding doesn't affect those newsgroup
names (am I wrong?).

> I see 3 problems here:
> 1 - The use of mm-string-as-unibyte (I consider any use of
>     string-as-unibyte to be wrong, unless it is accompagnied by a comment
>     that explains why it is right).
> 2 - Inconsistent encoding: gnus-sum.el apparently uses utf-8 (at least
>     that's what (gnus-group-name-charset to-method to-newsgroup) returned
>     in my tests, tho maybe it's because of my locale), whereas
>     gnus-start.el uses emacs-mule (implicitly, via mm-string-as-unibyte).
> 3 - imap.el tries to re-encode in utf7 a folder names that have already
>     been encoded (with emacs-mule or utf-8).

I'm sorry I'm ignorant in IMAP and don't know what encoding in
utf-7 is for.  But re-encoding of ASCII newsgroup names makes no
difference, doesn't it?  Although I'm not capable in improving
nnimap.el, it might have to decode encoded newsgroup names
according to `gnus-group-name-charset-method-alist' or
`gnus-group-name-charset-group-alist' before re-encoding in utf-7.

[1] (info "(gnus)Non-ASCII Group Names")

[-- Attachment #4: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

           reply	other threads:[~2007-12-13  2:15 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <jwvy7bzlosj.fsf-monnier+emacs@gnu.org>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4m63z3l4q9.fsf@jpl.org \
    --to=yamaoka@jpl.org \
    --cc=bugs@gnus.org \
    --cc=ding@gnus.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).