In article , Katsumi Yamaoka writes: > The following Lisp snippet emulates what Gnus does when reading > active data for the local.テスト newsgroup. The buffer contains > data which have been retrieved from the nntp server. Note that > the newsgroup name contains non-ASCII characters, which has been > encoded by utf-8 in the server. > --8<---------------cut here---------------start------------->8--- > (let ((string (encode-coding-string "local.テスト" 'utf-8))) > (with-temp-buffer > (set-buffer-multibyte t) > (insert (string-to-multibyte string)) > (goto-char (point-min)) > (multibyte-string-p (symbol-name (read (current-buffer)))))) > --8<---------------cut here---------------end--------------->8--- > While Emacs trunk returns nil for this, Emacs Unicode-2 returns t. That is because `read' decides the name is unibyte or multibyte by whether the name is a valid multibyte sequence or not. In the trunk, utf-8 byte sequecne is not a valid multibyte sequecne, but in emacs-unicode-2, it is valid. > If it is not intentional, I hope `read' behaves just like it does > in Emacs trunk. The relevant code for `read' is very complicated and I want to avoid touching it if there's another way. In addition, I think it is the right thing that the above code return t; i.e. any symbol created by reading a multibyte buffer should have a multibyte string name. The bug to fix is that the following code also returns t in emacs-unicode-2. < --8<---------------cut here---------------start------------->8--- < (let ((string (encode-coding-string "local.テスト" 'utf-8))) < (with-temp-buffer < (set-buffer-multibyte nil) < (insert string) < (goto-char (point-min)) < (multibyte-string-p (symbol-name (read (current-buffer)))))) < --8<---------------cut here---------------end--------------->8--- > Otherwise, is there a way to make `read' return a unibyte > symbol (without slowing down)? The replacement of the above code is simple as this: (multibyte-string-p (intern (encode-coding-string "local.テスト" 'utf-8))) But, hmmm, it seems that we can't use such a code in gnus... > In the inside of Gnus, non-ASCII group names are all treated as > unibyte strings, that are the ones that the server has encoded > with certain coding systems. Because of the present behavior of > `read' in Emacs Unicode-2, Gnus doesn't work with such newsgroups > perfectly. You can find the actual code in gnus-start.el as > follows: > --8<---------------cut here---------------start------------->8--- > ;; Read an active file and place the results in `gnus-active-hashtb'. > (defun gnus-active-to-gnus-format (&optional method hashtb ignore-errors > real-active) > [...] > ;; group gets set to a symbol interned in the hash table > ;; (what a hack!!) - jwz > (setq group (let ((obarray hashtb)) (read cur))) > --8<---------------cut here---------------end--------------->8--- How about this? (setq group (let ((obarray hashtb) pos) (skip-syntax-forward "^w_") (setq pos (point)) (skip-syntax-forward "w_") (intern (buffer-substring pos (point))))) I think the overhead is just several more function calls. The actual task (searching for a range of symbol constituents, make string from them, and intern it) is almost the same. --- Kenichi Handa handa@ni.aist.go.jp