Gnus development mailing list
 help / color / mirror / Atom feed
From: Kenichi Handa <handa@ni.aist.go.jp>
To: Katsumi Yamaoka <yamaoka@jpl.org>
Cc: ding@gnus.org, emacs-devel@gnu.org
Subject: Re: [Unicode-2] `read' always returns multibyte symbol
Date: Tue, 13 Nov 2007 21:55:44 +0900	[thread overview]
Message-ID: <E1IrvIu-0000ku-CD@etlken.m17n.org> (raw)
In-Reply-To: <b4moddycwjv.fsf@jpl.org> (message from Katsumi Yamaoka on Tue, 13 Nov 2007 18:41:08 +0900)

[-- Attachment #1: Type: text/plain, Size: 3483 bytes --]

In article <b4moddycwjv.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes:

> The following Lisp snippet emulates what Gnus does when reading
> active data for the local.テスト newsgroup.  The buffer contains
> data which have been retrieved from the nntp server.  Note that
> the newsgroup name contains non-ASCII characters, which has been
> encoded by utf-8 in the server.

> --8<---------------cut here---------------start------------->8---
> (let ((string (encode-coding-string "local.テスト" 'utf-8)))
>   (with-temp-buffer
>     (set-buffer-multibyte t)
>     (insert (string-to-multibyte string))
>     (goto-char (point-min))
>     (multibyte-string-p (symbol-name (read (current-buffer))))))
> --8<---------------cut here---------------end--------------->8---

> While Emacs trunk returns nil for this, Emacs Unicode-2 returns t.

That is because `read' decides the name is unibyte or
multibyte by whether the name is a valid multibyte sequence
or not.  In the trunk, utf-8 byte sequecne is not a valid
multibyte sequecne, but in emacs-unicode-2, it is valid.

> If it is not intentional, I hope `read' behaves just like it does
> in Emacs trunk.

The relevant code for `read' is very complicated and I want
to avoid touching it if there's another way.

In addition, I think it is the right thing that the above
code return t; i.e. any symbol created by reading a
multibyte buffer should have a multibyte string name.  The
bug to fix is that the following code also returns t in
emacs-unicode-2.

< --8<---------------cut here---------------start------------->8---
< (let ((string (encode-coding-string "local.テスト" 'utf-8)))
<   (with-temp-buffer
<     (set-buffer-multibyte nil)
<     (insert string)
<     (goto-char (point-min))
<     (multibyte-string-p (symbol-name (read (current-buffer))))))
< --8<---------------cut here---------------end--------------->8---

> Otherwise, is there a way to make `read' return a unibyte
> symbol (without slowing down)?

The replacement of the above code is simple as this:

(multibyte-string-p (intern (encode-coding-string "local.テスト" 'utf-8)))

But, hmmm, it seems that we can't use such a code in gnus...

> In the inside of Gnus, non-ASCII group names are all treated as
> unibyte strings, that are the ones that the server has encoded
> with certain coding systems.  Because of the present behavior of
> `read' in Emacs Unicode-2, Gnus doesn't work with such newsgroups
> perfectly.  You can find the actual code in gnus-start.el as
> follows:

> --8<---------------cut here---------------start------------->8---
> ;; Read an active file and place the results in `gnus-active-hashtb'.
> (defun gnus-active-to-gnus-format (&optional method hashtb ignore-errors
> 					     real-active)
> [...]
> 	      ;; group gets set to a symbol interned in the hash table
> 	      ;; (what a hack!!) - jwz
> 	      (setq group (let ((obarray hashtb)) (read cur)))
> --8<---------------cut here---------------end--------------->8---

How about this?

(setq group
       (let ((obarray hashtb) pos)
	 (skip-syntax-forward "^w_")
	 (setq pos (point))
	 (skip-syntax-forward "w_")
	 (intern (buffer-substring pos (point)))))

I think the overhead is just several more function calls.  The
actual task (searching for a range of symbol constituents,
make string from them, and intern it) is almost the same.

---
Kenichi Handa
handa@ni.aist.go.jp

[-- Attachment #2: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

  reply	other threads:[~2007-11-13 12:55 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-13  9:41 Katsumi Yamaoka
2007-11-13 12:55 ` Kenichi Handa [this message]
2007-11-13 15:10   ` Stefan Monnier
2007-11-14  4:53     ` Kenichi Handa
2007-11-14  3:56   ` Katsumi Yamaoka
2007-11-14 11:39     ` Katsumi Yamaoka
2007-11-14 14:52       ` Stefan Monnier
2007-11-14 23:52         ` Katsumi Yamaoka
2007-11-15  1:15           ` Stefan Monnier
2007-11-15  3:01             ` Katsumi Yamaoka
2007-11-15  3:39               ` Stefan Monnier
2007-11-15 10:20       ` Katsumi Yamaoka
2007-11-15 11:08         ` Kenichi Handa
2007-11-15 11:41           ` Katsumi Yamaoka
2007-11-15 14:41             ` Kenichi Handa
2007-11-15 23:31               ` Katsumi Yamaoka
2007-11-16  0:51                 ` Kenichi Handa
2007-11-16  1:24                   ` Katsumi Yamaoka
2007-11-16  2:51                     ` Stefan Monnier
2007-11-15 15:22           ` Stefan Monnier
2007-11-16  0:29             ` Kenichi Handa
2007-11-16 10:50             ` Eli Zaretskii
2007-11-13 15:07 ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1IrvIu-0000ku-CD@etlken.m17n.org \
    --to=handa@ni.aist.go.jp \
    --cc=ding@gnus.org \
    --cc=emacs-devel@gnu.org \
    --cc=yamaoka@jpl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).