naive charset question

Gnus development mailing list
 help / color / mirror / Atom feed

* naive charset question
@ 2002-07-21  3:02 Ken Raeburn
  2002-07-21  4:47 ` Ken Raeburn
  0 siblings, 1 reply; 13+ messages in thread
From: Ken Raeburn @ 2002-07-21  3:02 UTC (permalink / raw)


I'm trying to experiment a little with sending non-ASCII characters in
email.  As far as I'm aware, I've done nothing special to set up the
charset handling for my Emacs or Gnus configurations.

I rather naively assume that if Gnus and Emacs are doing their job
well, I should be able to take a buffer with non-ASCII characters
displayed, stick it in an email message, and at most be prompted for
which of the possible charsets should be used for certain non-ASCII
characters, based on the characters actually used in the message.

No, let me change that statement: I would submit that a user-friendly
multilingual MUA should behave that way by default.  If I want to
mention Kai by his full name and discuss money, I should probably be
thinking "ess-tset" and "euro", not "latin-, uh, wait, let me look it
up again, are they both in the same charset?"  If I want to reply to
and quote a message written in Japanese, the charset selection should
Just Work.
</soapbox>

To make it a bit challenging (perhaps too much so?), I tried inserting
the HELLO buffer (C-h h) into a mail message (in Message mode); the
buffer showed ASCII, Cyrillic, Hebrew and other characters, which no
single 8-bit character set would encompass.  I tried to send the
message; it asked me for a charset, and when I hit "?" I got a bunch
of options, including many that I'm sure would not support some of the
characters in the buffer.

Assuming (again naively) that the default should do something
reasonable, I hit return, and the message was sent.  The message that
arrived in my mailbox says charset=us-ascii.  I tried the same test a
second time, and picked a charset from the list, "arabic-1-column";
the message was immediately sent with charset=arabic-1-column.  Both
messages look wrong in the *Article* buffer, of course.

I didn't find much in the documentation to indicate why my naive
assumption might actually be wrong.  In ognus-0.06, message.texi says
very little about charsets and encoding except "here's how you specify
your choice", and gnus.texi appears to only talk about them in the
context of viewing messages; it seems to be assumed that the reader
already knows how charsets are used.  If message.texi isn't going to
go into it, a pointer to some introductory material elsewhere might be
of use; after all, even some of us ignorant, self-centered Americans
need to know how to communicate with the outside world.

So, how should I, naive in such issues, transmit such a HELLO message
(not an attachment with a byte for byte copy of the HELLO file) so
that a recipient (and preferably one not necessarily using Emacs)
might view it as intended?  Would this have actually worked if I were
sending something that could be expressed using a single charset?  Is
there something about the HELLO file encoding that makes it a bad test
case?

Ken

P.S.  I'm using Emacs 21.3.50 built from the CVS repository within the
last day or so, and, as I said, ognus-0.06.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-21  3:02 naive charset question Ken Raeburn
@ 2002-07-21  4:47 ` Ken Raeburn
  2002-07-21  8:57   ` Kai Großjohann
  0 siblings, 1 reply; 13+ messages in thread
From: Ken Raeburn @ 2002-07-21  4:47 UTC (permalink / raw)

I wrote:
> I didn't find much in the documentation to indicate why my naive
> assumption might actually be wrong.  In ognus-0.06, message.texi says
> very little about charsets and encoding except "here's how you specify
> your choice",

Oops.  That's not quite true.

I overlooked the reference to emacs-mime.texi in the description of
message-default-charset; I think it was the fact that it was in a
parenthetical remark, suggesting that the use of
message-default-charset was the normal case and the MULE bit was not,
and that once upon a time non-MULE Emacs *was* the normal case, that
led me to unconsciously dismiss it.  My bad.

Even so, the charset descriptions in emacs-mime are somewhat more
encouraging, but they seem to suggest that Emacs should take the
"HELLO" data, and if it can't encode it all as UTF-8, pull it apart
into separate MIME parts that can be encoded with available charsets.
Which sounds like it's probably exactly the right thing.  But it
didn't happen.

I even tried breaking it into separate parts myself, adding a bunch of
"#part" MML tags to the buffer (type="text/plain",
disposition=inline), with "#multipart" around the whole thing, and all
I got for my trouble was lots of prompts for the charset instead of
just one.  (Is it safe to use those magic #-sequences in <> brackets
in an MML Message buffer when they're supposed to be literal and not
MML tags?)  I assume each prompt was for a different part, but I got
no indication which part a given prompt might have been related to.

Is there some special library code I should load in order for Emacs
to recognize and process the non-ASCII characters in HELLO for
transmission as email?

Ken

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-21  4:47 ` Ken Raeburn
@ 2002-07-21  8:57   ` Kai Großjohann
  2002-07-21  9:17     ` Kai Großjohann
  2002-07-21 21:41     ` Ken Raeburn
  0 siblings, 2 replies; 13+ messages in thread
From: Kai Großjohann @ 2002-07-21  8:57 UTC (permalink / raw)
  Cc: ding

[-- Attachment #1: Type: text/plain, Size: 557 bytes --]

Ken Raeburn <raeburn@raeburn.org> writes:

> Even so, the charset descriptions in emacs-mime are somewhat more
> encouraging, but they seem to suggest that Emacs should take the
> "HELLO" data, and if it can't encode it all as UTF-8, pull it apart
> into separate MIME parts that can be encoded with available charsets.
> Which sounds like it's probably exactly the right thing.  But it
> didn't happen.

I don't usually try to create such messages, but not so long ago the
splitting still worked.  Hm.  Let me try it with this message:

你好.

[-- Attachment #2: Type: text/plain, Size: 14 bytes --]

Danke schön.

[-- Attachment #3: Type: text/plain, Size: 131 bytes --]

Pavel Janík

I wonder what will happen...

kai
-- 
A large number of young women don't trust men with beards.  (BFBS Radio)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-21  8:57   ` Kai Großjohann
@ 2002-07-21  9:17     ` Kai Großjohann
  2002-07-21 11:42       ` Henrik Enberg
  2002-07-21 21:41     ` Ken Raeburn
  1 sibling, 1 reply; 13+ messages in thread
From: Kai Großjohann @ 2002-07-21  9:17 UTC (permalink / raw)
  Cc: ding

Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> I wonder what will happen...

It was split.  Though it's a little strange that there were two
consecutive *-15 parts :-)  I think that's because of
unify-8859-on-encoding-mode being on in my Emacs.  The characters in
the Emacs buffer were iso-8859-2 for Pavel's name and iso-8859-15 for
the German wording.

Does anyone know how to solve this?

kai
-- 
A large number of young women don't trust men with beards.  (BFBS Radio)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-21  9:17     ` Kai Großjohann
@ 2002-07-21 11:42       ` Henrik Enberg
  2002-07-21 13:38         ` Kai Großjohann
  0 siblings, 1 reply; 13+ messages in thread
From: Henrik Enberg @ 2002-07-21 11:42 UTC (permalink / raw)
  Cc: ding

Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> The characters in the Emacs buffer were iso-8859-2 for Pavel's name
> and iso-8859-15 for the German wording.

Have you set something to prefer iso8859-15?  Shouldn't "schön" default
to iso-8859-1?

-- 
Booting... /vmemacs.el



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-21 11:42       ` Henrik Enberg
@ 2002-07-21 13:38         ` Kai Großjohann
  0 siblings, 0 replies; 13+ messages in thread
From: Kai Großjohann @ 2002-07-21 13:38 UTC (permalink / raw)
  Cc: Ken Raeburn, ding

Henrik Enberg <henrik@enberg.org> writes:

> Have you set something to prefer iso8859-15?  Shouldn't "schön" default
> to iso-8859-1?

Well, I set LC_CTYPE=de_DE@euro which means that Emacs defaults to an
iso-8859-15 language environment.  But the matter is a bit more
complicated than that, as I use the german-prefix input method to
type ö, and that inserts iso-8859-1 characters...

kai
-- 
A large number of young women don't trust men with beards.  (BFBS Radio)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-21  8:57   ` Kai Großjohann
  2002-07-21  9:17     ` Kai Großjohann
@ 2002-07-21 21:41     ` Ken Raeburn
  2002-07-21 22:31       ` Ken Raeburn
  1 sibling, 1 reply; 13+ messages in thread
From: Ken Raeburn @ 2002-07-21 21:41 UTC (permalink / raw)

Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai writes:
> I don't usually try to create such messages, but not so long ago the
> splitting still worked.  Hm.  Let me try it with this message:

That come through fine for me, although if I try to quote the
non-ASCII characters in a reply, again I get asked for a charset and
the default behavior is to send a one-part us-ascii message.  Does
quoting in a reply work for you, or sending the HELLO text, as opposed
to actually entering the characters as input?

It's also interesting that your last name and the closing paren was
dropped from the attribution above, and my buffer name displays as
"*wide reply to Kai Gro\337johann*".  In the summary buffer that
character displays as an open block, and only the article buffer
actually gets it right.  I also get the \337 if I use M-w and C-y to
copy your name into a new buffer.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-21 21:41     ` Ken Raeburn
@ 2002-07-21 22:31       ` Ken Raeburn
  2002-07-21 22:42         ` Henrik Enberg
  2002-07-26 19:26         ` Simon Josefsson
  0 siblings, 2 replies; 13+ messages in thread
From: Ken Raeburn @ 2002-07-21 22:31 UTC (permalink / raw)

I wrote:
> That come through fine for me, although if I try to quote the
> non-ASCII characters in a reply, again I get asked for a charset and
> the default behavior is to send a one-part us-ascii message.  Does
> quoting in a reply work for you, or sending the HELLO text, as opposed
> to actually entering the characters as input?

I found it!  An old part of my .emacs code invoked

  (standard-display-european 1)

which is described as

    Semi-obsolete way to toggle display of ISO 8859 European characters...

but in fact appears to be actively detrimental in some cases.  The
docs say that when it's called non-interactively (as in my case), it
selects unibyte mode for all buffers, and selects the Latin-1 language
environment.  But MML didn't even try to send out the message as
Latin-1, it just prompted and defaulted to ASCII.  Even so, if I got
BIG5 characters into my message buffer and displaying properly, MML
ought to be able to figure out what they are.

Removing that call does seem to fix my problem.  I can even send much
of the HELLO message, although it complains about devanagari being an
invalid coding system, and offhand I don't know the right way to find
out what text it's complaining about.  (I worked it out by M-x
apropos, noting that it reported "indian-glyph-code-offset" in the
plist, and deleting any text from that general part of the world, and
a few others.  But that shouldn't be how I have to work it out.)

Ken

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-21 22:31       ` Ken Raeburn
@ 2002-07-21 22:42         ` Henrik Enberg
  2002-07-26 19:26         ` Simon Josefsson
  1 sibling, 0 replies; 13+ messages in thread
From: Henrik Enberg @ 2002-07-21 22:42 UTC (permalink / raw)
  Cc: ding

Ken Raeburn <raeburn@raeburn.org> writes:

> Removing that call does seem to fix my problem.  I can even send much
> of the HELLO message, although it complains about devanagari being an
> invalid coding system, and

Ditto when I try it.

> offhand I don't know the right way to find out what text it's
> complaining about.

I think that feature broke sometime between 20.7 and 21.x.  There is an
item in the Emacs TODO about it.

-- 
Booting... /vmemacs.el



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-21 22:31       ` Ken Raeburn
  2002-07-21 22:42         ` Henrik Enberg
@ 2002-07-26 19:26         ` Simon Josefsson
  2002-07-27 15:42           ` Ken Raeburn
  1 sibling, 1 reply; 13+ messages in thread
From: Simon Josefsson @ 2002-07-26 19:26 UTC (permalink / raw)
  Cc: ding

Ken Raeburn <raeburn@raeburn.org> writes:

> I found it!  An old part of my .emacs code invoked
>
>   (standard-display-european 1)
>
> which is described as
>
>     Semi-obsolete way to toggle display of ISO 8859 European characters...
>
> but in fact appears to be actively detrimental in some cases.

Yup.  Should Gnus make an effort to work in this case?  Ideally Gnus
should make sure all buffers it uses has the correct unibyte status
needed for its operation, or operate correctly regardless of unibyte
status.  Perhaps Gnus should require non-unibyte Emacs?  But you sort
of get what you asked for if you disable multibyte -- you don't get
any multibytes, even in Gnus.  Perhaps Gnus could fail more gracefully
though.  Hm.

I reported the devanagari problem though, sending the HELLO file using
Gnus is a good regression test and should always work (it does in
Emacs 21.2).  And users should never have to configure anything to get
proper non-ASCII support IMHO.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-26 19:26         ` Simon Josefsson
@ 2002-07-27 15:42           ` Ken Raeburn
  2002-07-27 20:50             ` Simon Josefsson
  2002-07-27 22:07             ` Simon Josefsson
  0 siblings, 2 replies; 13+ messages in thread
From: Ken Raeburn @ 2002-07-27 15:42 UTC (permalink / raw)


Simon Josefsson <jas@extundo.com> writes:
>>   (standard-display-european 1)
> Yup.  Should Gnus make an effort to work in this case?

> status.  Perhaps Gnus should require non-unibyte Emacs?  But you sort
> of get what you asked for if you disable multibyte -- you don't get
> any multibytes, even in Gnus.  Perhaps Gnus could fail more gracefully
> though.  Hm.

While standard-display-european may make unibyte the default, it
doesn't seem to actually disable multibyte support.  I can run

        emacs -q --unibyte --eval '(standard-display-european 1)'

and the HELLO file still displays with some Asian, Cyrillic, Hebrew
and other characters, and enable-multibyte-characters still winds up
set to t in the help buffer.

Ken



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-27 15:42           ` Ken Raeburn
@ 2002-07-27 20:50             ` Simon Josefsson
  2002-07-27 22:07             ` Simon Josefsson
  1 sibling, 0 replies; 13+ messages in thread
From: Simon Josefsson @ 2002-07-27 20:50 UTC (permalink / raw)
  Cc: ding

Ken Raeburn <raeburn@raeburn.org> writes:

> Simon Josefsson <jas@extundo.com> writes:
>>>   (standard-display-european 1)
>> Yup.  Should Gnus make an effort to work in this case?
>
>> status.  Perhaps Gnus should require non-unibyte Emacs?  But you sort
>> of get what you asked for if you disable multibyte -- you don't get
>> any multibytes, even in Gnus.  Perhaps Gnus could fail more gracefully
>> though.  Hm.
>
> While standard-display-european may make unibyte the default, it
> doesn't seem to actually disable multibyte support.  I can run
>
>         emacs -q --unibyte --eval '(standard-display-european 1)'
>
> and the HELLO file still displays with some Asian, Cyrillic, Hebrew
> and other characters, and enable-multibyte-characters still winds up
> set to t in the help buffer.

So Gnus should re-enable multibyte in the buffer where it needs it.
This is done in alot of places, but apparently not enough. I have a
suspision it might mean work to fix it, but I'll see if it is
something simple.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: naive charset question
  2002-07-27 15:42           ` Ken Raeburn
  2002-07-27 20:50             ` Simon Josefsson
@ 2002-07-27 22:07             ` Simon Josefsson
  1 sibling, 0 replies; 13+ messages in thread
From: Simon Josefsson @ 2002-07-27 22:07 UTC (permalink / raw)
  Cc: ding

Ken Raeburn <raeburn@raeburn.org> writes:

> Simon Josefsson <jas@extundo.com> writes:
>>>   (standard-display-european 1)
>> Yup.  Should Gnus make an effort to work in this case?
>
>> status.  Perhaps Gnus should require non-unibyte Emacs?  But you sort
>> of get what you asked for if you disable multibyte -- you don't get
>> any multibytes, even in Gnus.  Perhaps Gnus could fail more gracefully
>> though.  Hm.
>
> While standard-display-european may make unibyte the default, it
> doesn't seem to actually disable multibyte support.  I can run
>
>         emacs -q --unibyte --eval '(standard-display-european 1)'
>
> and the HELLO file still displays with some Asian, Cyrillic, Hebrew
> and other characters, and enable-multibyte-characters still winds up
> set to t in the help buffer.

C-h h overrides the default multibyte setting.  Message in Gnus do
not.  By the code this looks quite intentional (see `mm-emacs-mule',
`mm-enable-multibyte').  So if you disable multibyte, Gnus becomes
unibyte and always uses the preferred coding system.  This seems like
a good You Get What You Asked For approach, but perhaps a warning is
in order?  Opinions?

If you want to re-enable multibyte for message, it is possible to frob
default-enable-multibyte-characters when sending the message and it
will work.  Unfortunately it is not enough to re-enable multibyte in
the message buffer alone, the code uses the global default value.

The warning could thus offer to re-enable multibyte temporarily when
sending the message.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-07-27 22:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-21  3:02 naive charset question Ken Raeburn
2002-07-21  4:47 ` Ken Raeburn
2002-07-21  8:57   ` Kai Großjohann
2002-07-21  9:17     ` Kai Großjohann
2002-07-21 11:42       ` Henrik Enberg
2002-07-21 13:38         ` Kai Großjohann
2002-07-21 21:41     ` Ken Raeburn
2002-07-21 22:31       ` Ken Raeburn
2002-07-21 22:42         ` Henrik Enberg
2002-07-26 19:26         ` Simon Josefsson
2002-07-27 15:42           ` Ken Raeburn
2002-07-27 20:50             ` Simon Josefsson
2002-07-27 22:07             ` Simon Josefsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).