From: Kenichi Handa <handa@ni.aist.go.jp>
To: Katsumi Yamaoka <yamaoka@jpl.org>
Cc: ding@gnus.org, emacs-devel@gnu.org
Subject: Re: [Unicode-2] `read' always returns multibyte symbol
Date: Tue, 13 Nov 2007 21:55:44 +0900 [thread overview]
Message-ID: <E1IrvIu-0000ku-CD@etlken.m17n.org> (raw)
In-Reply-To: <b4moddycwjv.fsf@jpl.org> (message from Katsumi Yamaoka on Tue, 13 Nov 2007 18:41:08 +0900)
[-- Attachment #1: Type: text/plain, Size: 3483 bytes --]
In article <b4moddycwjv.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes:
> The following Lisp snippet emulates what Gnus does when reading
> active data for the local.テスト newsgroup. The buffer contains
> data which have been retrieved from the nntp server. Note that
> the newsgroup name contains non-ASCII characters, which has been
> encoded by utf-8 in the server.
> --8<---------------cut here---------------start------------->8---
> (let ((string (encode-coding-string "local.テスト" 'utf-8)))
> (with-temp-buffer
> (set-buffer-multibyte t)
> (insert (string-to-multibyte string))
> (goto-char (point-min))
> (multibyte-string-p (symbol-name (read (current-buffer))))))
> --8<---------------cut here---------------end--------------->8---
> While Emacs trunk returns nil for this, Emacs Unicode-2 returns t.
That is because `read' decides the name is unibyte or
multibyte by whether the name is a valid multibyte sequence
or not. In the trunk, utf-8 byte sequecne is not a valid
multibyte sequecne, but in emacs-unicode-2, it is valid.
> If it is not intentional, I hope `read' behaves just like it does
> in Emacs trunk.
The relevant code for `read' is very complicated and I want
to avoid touching it if there's another way.
In addition, I think it is the right thing that the above
code return t; i.e. any symbol created by reading a
multibyte buffer should have a multibyte string name. The
bug to fix is that the following code also returns t in
emacs-unicode-2.
< --8<---------------cut here---------------start------------->8---
< (let ((string (encode-coding-string "local.テスト" 'utf-8)))
< (with-temp-buffer
< (set-buffer-multibyte nil)
< (insert string)
< (goto-char (point-min))
< (multibyte-string-p (symbol-name (read (current-buffer))))))
< --8<---------------cut here---------------end--------------->8---
> Otherwise, is there a way to make `read' return a unibyte
> symbol (without slowing down)?
The replacement of the above code is simple as this:
(multibyte-string-p (intern (encode-coding-string "local.テスト" 'utf-8)))
But, hmmm, it seems that we can't use such a code in gnus...
> In the inside of Gnus, non-ASCII group names are all treated as
> unibyte strings, that are the ones that the server has encoded
> with certain coding systems. Because of the present behavior of
> `read' in Emacs Unicode-2, Gnus doesn't work with such newsgroups
> perfectly. You can find the actual code in gnus-start.el as
> follows:
> --8<---------------cut here---------------start------------->8---
> ;; Read an active file and place the results in `gnus-active-hashtb'.
> (defun gnus-active-to-gnus-format (&optional method hashtb ignore-errors
> real-active)
> [...]
> ;; group gets set to a symbol interned in the hash table
> ;; (what a hack!!) - jwz
> (setq group (let ((obarray hashtb)) (read cur)))
> --8<---------------cut here---------------end--------------->8---
How about this?
(setq group
(let ((obarray hashtb) pos)
(skip-syntax-forward "^w_")
(setq pos (point))
(skip-syntax-forward "w_")
(intern (buffer-substring pos (point)))))
I think the overhead is just several more function calls. The
actual task (searching for a range of symbol constituents,
make string from them, and intern it) is almost the same.
---
Kenichi Handa
handa@ni.aist.go.jp
[-- Attachment #2: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
next prev parent reply other threads:[~2007-11-13 12:55 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-13 9:41 Katsumi Yamaoka
2007-11-13 12:55 ` Kenichi Handa [this message]
2007-11-13 15:10 ` Stefan Monnier
2007-11-14 4:53 ` Kenichi Handa
2007-11-14 3:56 ` Katsumi Yamaoka
2007-11-14 11:39 ` Katsumi Yamaoka
2007-11-14 14:52 ` Stefan Monnier
2007-11-14 23:52 ` Katsumi Yamaoka
2007-11-15 1:15 ` Stefan Monnier
2007-11-15 3:01 ` Katsumi Yamaoka
2007-11-15 3:39 ` Stefan Monnier
2007-11-15 10:20 ` Katsumi Yamaoka
2007-11-15 11:08 ` Kenichi Handa
2007-11-15 11:41 ` Katsumi Yamaoka
2007-11-15 14:41 ` Kenichi Handa
2007-11-15 23:31 ` Katsumi Yamaoka
2007-11-16 0:51 ` Kenichi Handa
2007-11-16 1:24 ` Katsumi Yamaoka
2007-11-16 2:51 ` Stefan Monnier
2007-11-15 15:22 ` Stefan Monnier
2007-11-16 0:29 ` Kenichi Handa
2007-11-16 10:50 ` Eli Zaretskii
2007-11-13 15:07 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1IrvIu-0000ku-CD@etlken.m17n.org \
--to=handa@ni.aist.go.jp \
--cc=ding@gnus.org \
--cc=emacs-devel@gnu.org \
--cc=yamaoka@jpl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).