Gnus development mailing list
 help / color / mirror / Atom feed
From: Katsumi Yamaoka <yamaoka@jpl.org>
Cc: Juri Linkov <juri@jurta.org>
Subject: Re: [BUG]What does this mean:"Mention that multibyte characters
Date: Wed, 20 Oct 2004 14:36:48 +0900	[thread overview]
Message-ID: <b9y3c0a2bnz.fsf@jpl.org> (raw)
In-Reply-To: <b9yacuji7t6.fsf@jpl.org>

>>>>> In <b9yacuji7t6.fsf@jpl.org> Katsumi Yamaoka wrote:

> I'll try translation of Kenichi Handa's advice next time.

This is practice of my English composition.  No matter what mistake
may be there, the responsibility is in me.

>>>>> Katsumi Yamaoka wrote:

> With the following form, Emacs 21.3.50 returns non-nil, and 22.0.0
> returns nil.  Could you let me know for reference what occurs there?

> (with-temp-buffer
>   (set-buffer-multibyte t)
>   (insert (string-as-unibyte "\200"))
>   (goto-char (point-min))
>   (search-forward (string-as-multibyte "\200") nil t))

;; Annotation by K.Y.:
;; At that time, I didn't know the possible insertion forms are
;; `(insert ?\200)' and `(insert (format "%c" ?\200))' yet.

>>>>> Kenichi Handa wrote:

Even with Emacs 21.3.50, the above form will return nil according to a
certain language environment (e.g., Vietnamese, etc.).

You have to understand first that a unibyte string is converted into a
multibyte string by `string-make-multibyte' when inserting a unibyte
                            ^^^^
string in a multibyte buffer.  Therefore,

the `(insert (string-as-unibyte "\200"))' form is identical to the
`(insert (string-make-multibyte (string-as-unibyte "\200")))' form.
Where how `string-make-multibyte' converts depends on the language
environment.  As for Emacs 21, in the Latin-1 language environment,
for example, the string of "\200" will be converted into the character
which corresponds to \200 in the eight-bit-control charset since the
primary charset latin-iso8859-1 doesn't contain \200.

Second, `string-as-multibyte' converts STRING into the multibyte
string, keeping its byte sequence as much as possible.  It works
``as much as possible'' but sometimes brings differences.  For
example, the string of "\200" will be converted into the byte-sequence
of "\236\240" which is a character contained in the eight-bit-control
charset.  It is the same as the character which the above program
inserted in the buffer.

Consequently, in the Latin-1 language environment, for example, the
above program returned non-nil, in Emacs 21.

On the other hand, in Emacs 22, since iso-8859-1 which is the primary
charset for Latin-1 contains \200, the form

(insert (string-as-unibyte "\200"))

inserts the character of U+0080 rather than the character which
belongs to eight-bit-control.  However, `string-as-multibyte' always
converts \200 into the character of eight-bit-control.  This is the
reason that program returns nil.

If you have a need to look for \200 after inserting it in a buffer, it
will go well in both Emacs 21 and 22 using the following way for
example:

(with-temp-buffer
  (set-buffer-multibyte t)
  (insert (string-to-multibyte "\200"))
  (goto-char (point-min))
  (search-forward (string-to-multibyte "\200") nil t))

;; Annotation by K.Y.:
;; I didn't use that way in the `gnus-update-summary-mark-positions'
;; function (which see).

`string-to-multibyte' always converts a string into the characters
which belong to eight-bit-control or eight-bit-graphic, so the string
which it makes will never match usual string.

P.S.
In Emacs 21, the form

`(insert (string-to-multibyte "\200"))'

does the same as the form

`(insert ?\200)'

does.  It is because there is not the character corresponding to 128
in the multibyte buffer, and it is treated as the raw byte which
belongs to eight-bit-control.

However, it differs in Emacs 22.  Since the character corresponding to
128 exists as U+0080, it will be inserted.

;; Annotation by K.Y.:
;; I deeply thank to Kenichi Handa.  There was all knowledge that I
;; needed.



  parent reply	other threads:[~2004-10-20  5:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-14 15:51 [BUG]What does this mean:"Mention that multibyte characters don't work as marks."? zrr
2004-10-14 17:24 ` [BUG]What does this mean:"Mention that multibyte characters Reiner Steib
2004-10-14 18:33   ` zrr
2004-10-14 17:44 ` [BUG]What does this mean:"Mention that multibyte characters don't work as marks."? Xavier Maillard
2004-10-14 18:41   ` [BUG]What does this mean:"Mention that multibyte characters zrr
2004-10-14 21:33     ` Xavier Maillard
2004-10-15  5:43       ` Katsumi Yamaoka
2004-10-15 14:12         ` Katsumi Yamaoka
2004-10-15 14:45           ` zrr
2004-10-15 14:51           ` Reiner Steib
2004-10-15 18:49             ` Derek
2004-10-17 23:18               ` Katsumi Yamaoka
2004-10-19  5:41                 ` Katsumi Yamaoka
2004-10-19  8:41                   ` zrr
2004-10-20  5:36                   ` Katsumi Yamaoka [this message]
2004-10-27 11:55                   ` Katsumi Yamaoka
2004-10-27 23:56                     ` Katsumi Yamaoka
2004-10-15 16:19           ` Juri Linkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b9y3c0a2bnz.fsf@jpl.org \
    --to=yamaoka@jpl.org \
    --cc=juri@jurta.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).