From: Katsumi Yamaoka <yamaoka@jpl.org>
Cc: Juri Linkov <juri@jurta.org>
Subject: Re: [BUG]What does this mean:"Mention that multibyte characters
Date: Wed, 20 Oct 2004 14:36:48 +0900 [thread overview]
Message-ID: <b9y3c0a2bnz.fsf@jpl.org> (raw)
In-Reply-To: <b9yacuji7t6.fsf@jpl.org>
>>>>> In <b9yacuji7t6.fsf@jpl.org> Katsumi Yamaoka wrote:
> I'll try translation of Kenichi Handa's advice next time.
This is practice of my English composition. No matter what mistake
may be there, the responsibility is in me.
>>>>> Katsumi Yamaoka wrote:
> With the following form, Emacs 21.3.50 returns non-nil, and 22.0.0
> returns nil. Could you let me know for reference what occurs there?
> (with-temp-buffer
> (set-buffer-multibyte t)
> (insert (string-as-unibyte "\200"))
> (goto-char (point-min))
> (search-forward (string-as-multibyte "\200") nil t))
;; Annotation by K.Y.:
;; At that time, I didn't know the possible insertion forms are
;; `(insert ?\200)' and `(insert (format "%c" ?\200))' yet.
>>>>> Kenichi Handa wrote:
Even with Emacs 21.3.50, the above form will return nil according to a
certain language environment (e.g., Vietnamese, etc.).
You have to understand first that a unibyte string is converted into a
multibyte string by `string-make-multibyte' when inserting a unibyte
^^^^
string in a multibyte buffer. Therefore,
the `(insert (string-as-unibyte "\200"))' form is identical to the
`(insert (string-make-multibyte (string-as-unibyte "\200")))' form.
Where how `string-make-multibyte' converts depends on the language
environment. As for Emacs 21, in the Latin-1 language environment,
for example, the string of "\200" will be converted into the character
which corresponds to \200 in the eight-bit-control charset since the
primary charset latin-iso8859-1 doesn't contain \200.
Second, `string-as-multibyte' converts STRING into the multibyte
string, keeping its byte sequence as much as possible. It works
``as much as possible'' but sometimes brings differences. For
example, the string of "\200" will be converted into the byte-sequence
of "\236\240" which is a character contained in the eight-bit-control
charset. It is the same as the character which the above program
inserted in the buffer.
Consequently, in the Latin-1 language environment, for example, the
above program returned non-nil, in Emacs 21.
On the other hand, in Emacs 22, since iso-8859-1 which is the primary
charset for Latin-1 contains \200, the form
(insert (string-as-unibyte "\200"))
inserts the character of U+0080 rather than the character which
belongs to eight-bit-control. However, `string-as-multibyte' always
converts \200 into the character of eight-bit-control. This is the
reason that program returns nil.
If you have a need to look for \200 after inserting it in a buffer, it
will go well in both Emacs 21 and 22 using the following way for
example:
(with-temp-buffer
(set-buffer-multibyte t)
(insert (string-to-multibyte "\200"))
(goto-char (point-min))
(search-forward (string-to-multibyte "\200") nil t))
;; Annotation by K.Y.:
;; I didn't use that way in the `gnus-update-summary-mark-positions'
;; function (which see).
`string-to-multibyte' always converts a string into the characters
which belong to eight-bit-control or eight-bit-graphic, so the string
which it makes will never match usual string.
P.S.
In Emacs 21, the form
`(insert (string-to-multibyte "\200"))'
does the same as the form
`(insert ?\200)'
does. It is because there is not the character corresponding to 128
in the multibyte buffer, and it is treated as the raw byte which
belongs to eight-bit-control.
However, it differs in Emacs 22. Since the character corresponding to
128 exists as U+0080, it will be inserted.
;; Annotation by K.Y.:
;; I deeply thank to Kenichi Handa. There was all knowledge that I
;; needed.
next prev parent reply other threads:[~2004-10-20 5:36 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-14 15:51 [BUG]What does this mean:"Mention that multibyte characters don't work as marks."? zrr
2004-10-14 17:24 ` [BUG]What does this mean:"Mention that multibyte characters Reiner Steib
2004-10-14 18:33 ` zrr
2004-10-14 17:44 ` [BUG]What does this mean:"Mention that multibyte characters don't work as marks."? Xavier Maillard
2004-10-14 18:41 ` [BUG]What does this mean:"Mention that multibyte characters zrr
2004-10-14 21:33 ` Xavier Maillard
2004-10-15 5:43 ` Katsumi Yamaoka
2004-10-15 14:12 ` Katsumi Yamaoka
2004-10-15 14:45 ` zrr
2004-10-15 14:51 ` Reiner Steib
2004-10-15 18:49 ` Derek
2004-10-17 23:18 ` Katsumi Yamaoka
2004-10-19 5:41 ` Katsumi Yamaoka
2004-10-19 8:41 ` zrr
2004-10-20 5:36 ` Katsumi Yamaoka [this message]
2004-10-27 11:55 ` Katsumi Yamaoka
2004-10-27 23:56 ` Katsumi Yamaoka
2004-10-15 16:19 ` Juri Linkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b9y3c0a2bnz.fsf@jpl.org \
--to=yamaoka@jpl.org \
--cc=juri@jurta.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).