Gnus development mailing list
 help / color / mirror / Atom feed
* utf-8 -> latin-X if possible
@ 2002-05-04 11:36 Simon Josefsson
  2002-05-04 14:08 ` Simon Josefsson
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Simon Josefsson @ 2002-05-04 11:36 UTC (permalink / raw)


Sorry if this has been beaten to death, but I still don't understand
it, so:

I've started to use

  (prefer-coding-system 'utf-8)

since that gives me better utf-8 support in emacs, but it seems Gnus
start to send messages in utf-8 then.

How do I get Gnus to send it using latin-X, if possible, instead?

I tried loading ucs-tables.el, and invoked unify-8859* and ucs-unify*
in the message buffer, but it did nothing.

Wouldn't it be nice if there were a section explaining this stuff in
the manual?  I'll write it when I get this to work the way I want.

(I used to call `(prefer-coding-system 'latin-1)' after the statement
above to get latin-1 preferred over utf-8 again, but I'd like to not
do that.  It seems as if I'd like to prefer utf-8 on reading but
latin-1 on writing, not sure if that's possible.)




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-04 11:36 utf-8 -> latin-X if possible Simon Josefsson
@ 2002-05-04 14:08 ` Simon Josefsson
  2002-05-04 14:45 ` Torsten Hilbrich
  2002-05-05 12:06 ` Kai Großjohann
  2 siblings, 0 replies; 15+ messages in thread
From: Simon Josefsson @ 2002-05-04 14:08 UTC (permalink / raw)


(Talking to myself...)

Simon Josefsson <jas@extundo.com> writes:

> How do I get Gnus to send it using latin-X, if possible, instead?

Emacs config to make it Unicode friendly:

  (prefer-coding-system 'utf-8)

Gnus config to make it revert to iso-8859-1 when possible:

 '(mm-coding-system-priorities (quote (iso-latin-1)))

> Wouldn't it be nice if there were a section explaining this stuff in
> the manual?  I'll write it when I get this to work the way I want.

Yup.  I changed the Emacs MIME manual to be a users manual, relevant
sections included below.  If someone who knows English and/or Mule
reads it, please fix it.

Encoding Customization
======================

`mm-body-charset-encoding-alist'
     Mapping from MIME charset to encoding to use.  This variable is
     usually used except, e.g., when other requirements force a specific
     encoding (digitally signed messages require 7bit encodings).  The
     default is `((iso-2022-jp . 7bit) (iso-2022-jp-2 . 7bit))'.  As an
     example, if you do not want to have ISO-8859-1 characters
     quoted-printable encoded, you may add `(iso-8859-1 . 8bit)' to
     this variable.  You can override this setting on a per-message
     basis by using the `encoding' MML tag (*note MML Definition::).

`mm-coding-system-priorities'
     Prioritize coding systems to use for outgoing messages.  The
     default is nil, which means to use the defaults in Emacs.  It is a
     list of coding system symbols (aliases of coding systems does not
     work, use `M-x describe-coding-system' to make sure you are not
     specifying an alias in this variable).  For example, if you have
     configured Emacs to use prefer UTF-8, but wish that outgoing
     messages should be sent in ISO-8859-1 if possible, you can set
     this variable to `(iso-latin-1)'.

`mm-content-transfer-encoding-defaults'
     Mapping from MIME types to encoding to use.  This variable is
     usually used except, e.g., when other requirements force a safer
     encoding (digitally signed messages require 7bit encoding).
     Besides the normal MIME encodings, `qp-or-base64' may be used to
     indicate that for each case the most efficient of quoted-printable
     and base64 should be used.  You can override this setting on a
     per-message basis by using the `encoding' MML tag (*note MML
     Definition::).

`mm-use-ultra-safe-encoding'
     When this is non-nil, it means that textual parts are encoded as
     quoted-printable if they contain lines longer than 76 characters or
     starting with "From " in the body.  Non-7bit encodings (8bit,
     binary) are generally disallowed.  This reduce the probability
     that a non-8bit clean MTA or MDA changes the message.  This should
     never be set directly, but bound by other functions when necessary
     (e.g., when encoding messages that are to be digitally signed).

Charset Translation
===================

   During translation from MML to MIME, for each MIME part which has
been composed inside Emacs, an appropriate charset has to be chosen.

   If you are running a non-MULE Emacs, this process is simple: If the
part contains any non-ASCII (8-bit) characters, the MIME charset given
by `mail-parse-charset' (a symbol) is used.  (Never set this variable
directly, though.  If you want to change the default charset, please
consult the documentation of the package which you use to process MIME
messages.  *Note Various Message Variables: (message)Various Message
Variables, for example.)  If there are only ASCII characters, the MIME
charset US-ASCII is used, of course.

   Things are slightly more complicated when running Emacs with MULE
support.  In this case, a list of the MULE charsets used in the part is
obtained, and the MULE charsets are translated to MIME charsets by
consulting the variable `mm-mime-mule-charset-alist'.  If this results
in a single MIME charset, this is used to encode the part.  But if the
resulting list of MIME charsets contains more than one element, two
things can happen: If it is possible to encode the part via UTF-8, this
charset is used.  (For this, Emacs must support the `utf-8' coding
system, and the part must consist entirely of characters which have
Unicode counterparts.)  If UTF-8 is not available for some reason, the
part is split into several ones, so that each one can be encoded with a
single MIME charset.  The part can only be split at line boundaries,
though--if more than one MIME charset is required to encode a single
line, it is not possible to encode the part.

   When running Emacs with MULE support, the preferences for which
coding system to use is inherited from Emacs itself.  This means that
if Emacs is set up to prefer UTF-8, it will be used when encoding
messages.  You can modify this by altering the
`mm-coding-system-priorities' variable though (*note Encoding
Customization::).

   The charset to be used can be overriden by setting the `charset' MML
tag (*note MML Definition::) when composing the message.

   The encoding of characters (quoted-printable, 8bit etc) is orthogonal
to the discussion here, and is controlled by the variables
`mm-body-charset-encoding-alist' and
`mm-content-transfer-encoding-defaults' (*note Encoding
Customization::).




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-04 11:36 utf-8 -> latin-X if possible Simon Josefsson
  2002-05-04 14:08 ` Simon Josefsson
@ 2002-05-04 14:45 ` Torsten Hilbrich
  2002-05-05 12:06 ` Kai Großjohann
  2 siblings, 0 replies; 15+ messages in thread
From: Torsten Hilbrich @ 2002-05-04 14:45 UTC (permalink / raw)


Simon Josefsson <jas@extundo.com> writes:

[...]

> How do I get Gnus to send it using latin-X, if possible, instead?
>
> I tried loading ucs-tables.el, and invoked unify-8859* and ucs-unify*
> in the message buffer, but it did nothing.

I use the following lines to do this:

(if (fboundp 'unify-8859-on-encoding-mode)
    (progn
      (unify-8859-on-encoding-mode t)
      (unify-8859-on-decoding-mode t)))

        Torsten




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-04 11:36 utf-8 -> latin-X if possible Simon Josefsson
  2002-05-04 14:08 ` Simon Josefsson
  2002-05-04 14:45 ` Torsten Hilbrich
@ 2002-05-05 12:06 ` Kai Großjohann
  2002-05-05 15:06   ` Simon Josefsson
  2 siblings, 1 reply; 15+ messages in thread
From: Kai Großjohann @ 2002-05-05 12:06 UTC (permalink / raw)
  Cc: ding

Simon Josefsson <jas@extundo.com> writes:

> I've started to use
>
>   (prefer-coding-system 'utf-8)
>
> since that gives me better utf-8 support in emacs, but it seems Gnus
> start to send messages in utf-8 then.
>
> How do I get Gnus to send it using latin-X, if possible, instead?

Maybe it works if you frob coding-category-list such that UTF occurs
near the beginning, bug after the Latin-1 stuff.

prefer-coding-system seems to do more stuff.

kai
-- 
Silence is foo!



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-05 12:06 ` Kai Großjohann
@ 2002-05-05 15:06   ` Simon Josefsson
  2002-05-05 15:24     ` Kai Großjohann
  0 siblings, 1 reply; 15+ messages in thread
From: Simon Josefsson @ 2002-05-05 15:06 UTC (permalink / raw)
  Cc: ding

Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> Simon Josefsson <jas@extundo.com> writes:
>
>> I've started to use
>>
>>   (prefer-coding-system 'utf-8)
>>
>> since that gives me better utf-8 support in emacs, but it seems Gnus
>> start to send messages in utf-8 then.
>>
>> How do I get Gnus to send it using latin-X, if possible, instead?
>
> Maybe it works if you frob coding-category-list such that UTF occurs
> near the beginning, bug after the Latin-1 stuff.

I used to do this, and it worked, but I felt it was kind of gross.

> prefer-coding-system seems to do more stuff.

Yes.  Later I came up with the following which seems a bit more clean.

  (prefer-coding-system 'utf-8)
  (set-default-coding-systems 'latin-1)

I wish all of this stuff were customized.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-05 15:06   ` Simon Josefsson
@ 2002-05-05 15:24     ` Kai Großjohann
  2002-05-05 15:49       ` Simon Josefsson
  0 siblings, 1 reply; 15+ messages in thread
From: Kai Großjohann @ 2002-05-05 15:24 UTC (permalink / raw)
  Cc: ding

Simon Josefsson <jas@extundo.com> writes:

> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
>
>> Maybe it works if you frob coding-category-list such that UTF occurs
>> near the beginning, bug after the Latin-1 stuff.
>
> I used to do this, and it worked, but I felt it was kind of gross.

I don't understand.  It seems to be just the right variable for the
purpose, according to the documentation.  Do you also think
load-path is gross?  It uses a quite similar mechanism, just for a
different purpose.  Maybe I'm missing something?

kai
-- 
Silence is foo!



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-05 15:24     ` Kai Großjohann
@ 2002-05-05 15:49       ` Simon Josefsson
  2002-05-05 16:01         ` Kai Großjohann
  0 siblings, 1 reply; 15+ messages in thread
From: Simon Josefsson @ 2002-05-05 15:49 UTC (permalink / raw)
  Cc: ding

Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> Simon Josefsson <jas@extundo.com> writes:
>
>> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
>>
>>> Maybe it works if you frob coding-category-list such that UTF occurs
>>> near the beginning, bug after the Latin-1 stuff.
>>
>> I used to do this, and it worked, but I felt it was kind of gross.
>
> I don't understand.  It seems to be just the right variable for the
> purpose, according to the documentation.  Do you also think
> load-path is gross?  It uses a quite similar mechanism, just for a
> different purpose.  Maybe I'm missing something?

The grossness was because the variable isn't documented in the manual
nor is it customizable nor does it have a * as the first character in
the docstring.  That leaves me with the impression that users
shouldn't touch it.  Maybe I'm just too sensitive. ;-)




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-05 15:49       ` Simon Josefsson
@ 2002-05-05 16:01         ` Kai Großjohann
  2002-05-05 16:06           ` Simon Josefsson
  2002-05-05 16:06           ` Kai Großjohann
  0 siblings, 2 replies; 15+ messages in thread
From: Kai Großjohann @ 2002-05-05 16:01 UTC (permalink / raw)
  Cc: ding

Simon Josefsson <jas@extundo.com> writes:

> The grossness was because the variable isn't documented in the manual
> nor is it customizable nor does it have a * as the first character in
> the docstring.  That leaves me with the impression that users
> shouldn't touch it.  Maybe I'm just too sensitive. ;-)

The UTF-8 support of Emacs 21.2 isn't deemed sufficient yet, I
gather.  (But coding-category-list in the CVS code also mentions
utf-8 right at the end.  Hm.  What was the problem again that
prompted you to frob the coding system stuff in the first place?)

kai
-- 
Silence is foo!



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-05 16:01         ` Kai Großjohann
@ 2002-05-05 16:06           ` Simon Josefsson
  2002-05-05 16:06           ` Kai Großjohann
  1 sibling, 0 replies; 15+ messages in thread
From: Simon Josefsson @ 2002-05-05 16:06 UTC (permalink / raw)
  Cc: ding

Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> Simon Josefsson <jas@extundo.com> writes:
>
>> The grossness was because the variable isn't documented in the manual
>> nor is it customizable nor does it have a * as the first character in
>> the docstring.  That leaves me with the impression that users
>> shouldn't touch it.  Maybe I'm just too sensitive. ;-)
>
> The UTF-8 support of Emacs 21.2 isn't deemed sufficient yet, I
> gather.  (But coding-category-list in the CVS code also mentions
> utf-8 right at the end.  Hm.  What was the problem again that
> prompted you to frob the coding system stuff in the first place?)

raw-text and binary are earlier than utf-8 by default, so I don't
think utf-8 will ever be autodetected unless you do something.

The problem was that I want utf-8 support in emacs, but I don't want
Gnus to send everything in utf-8 by default.  The solution I posted in
another message (frobbing mm-coding-system-priorities) works so I
think this is solved.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-05 16:01         ` Kai Großjohann
  2002-05-05 16:06           ` Simon Josefsson
@ 2002-05-05 16:06           ` Kai Großjohann
  2002-05-06 10:01             ` Kai Großjohann
  1 sibling, 1 reply; 15+ messages in thread
From: Kai Großjohann @ 2002-05-05 16:06 UTC (permalink / raw)
  Cc: ding

Kai.Grossjohann@cs.uni-dortmund.de (Kai Großjohann) writes:

> The UTF-8 support of Emacs 21.2 isn't deemed sufficient yet, I
> gather.  (But coding-category-list in the CVS code also mentions
> utf-8 right at the end.  Hm.  What was the problem again that
> prompted you to frob the coding system stuff in the first place?)

CVS Emacs with CVS Gnus automatically generates utf-8 when sending a
message with both ¤ and € in it.  Maybe you wanted to avoid multipart
in these cases?

kai
-- 
Silence is foo!



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-05 16:06           ` Kai Großjohann
@ 2002-05-06 10:01             ` Kai Großjohann
  2002-05-06 10:09               ` Bjørn Mork
  0 siblings, 1 reply; 15+ messages in thread
From: Kai Großjohann @ 2002-05-06 10:01 UTC (permalink / raw)
  Cc: ding

Kai.Grossjohann@cs.uni-dortmund.de (Kai Großjohann) writes:

> CVS Emacs with CVS Gnus automatically generates utf-8 when sending a
> message with both <EUR> and <EUR> in it.  Maybe you wanted to avoid
> multipart in these cases?

What's this?  I typed "both <CURRENCY> and <EUR> in it", but out came
two <EUR> characters!!

There seems to be a bug.

kai
-- 
Silence is foo!



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-06 10:01             ` Kai Großjohann
@ 2002-05-06 10:09               ` Bjørn Mork
  2002-05-06 11:37                 ` Kai Großjohann
  0 siblings, 1 reply; 15+ messages in thread
From: Bjørn Mork @ 2002-05-06 10:09 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
> Kai.Grossjohann@cs.uni-dortmund.de (Kai Großjohann) writes:
>
>> CVS Emacs with CVS Gnus automatically generates utf-8 when sending a
>> message with both <EUR> and <EUR> in it.  Maybe you wanted to avoid
>> multipart in these cases?
>
> What's this?  I typed "both <CURRENCY> and <EUR> in it", but out came
> two <EUR> characters!!
>
> There seems to be a bug.

Must be a decode/display/font bug on your side. Looks good to me.


Bjørn
-- 
You make me sick.  



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-06 10:09               ` Bjørn Mork
@ 2002-05-06 11:37                 ` Kai Großjohann
  2002-05-06 13:18                   ` frank paulsen
  2002-05-06 13:28                   ` Kai Großjohann
  0 siblings, 2 replies; 15+ messages in thread
From: Kai Großjohann @ 2002-05-06 11:37 UTC (permalink / raw)
  Cc: ding

Bjørn Mork <bmork@dod.no> writes:

> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
>>
>> What's this?  I typed "both <CURRENCY> and <EUR> in it", but out came
>> two <EUR> characters!!
>>
>> There seems to be a bug.
>
> Must be a decode/display/font bug on your side. Looks good to me.

Yes, C-u C-x = on the first EUR shows me that Emacs thinks it's a
Latin-1 character.  But it displays the EUR glyph.

I'm confused.

kai
-- 
Silence is foo!



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-06 11:37                 ` Kai Großjohann
@ 2002-05-06 13:18                   ` frank paulsen
  2002-05-06 13:28                   ` Kai Großjohann
  1 sibling, 0 replies; 15+ messages in thread
From: frank paulsen @ 2002-05-06 13:18 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> Bjørn Mork <bmork@dod.no> writes:
>
> Yes, C-u C-x = on the first EUR shows me that Emacs thinks it's a
> Latin-1 character.  But it displays the EUR glyph.
>
> I'm confused.

IIRC this seems to happen after W-d (article-treat-dumbquotes).

-- 
frobnicate foo



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: utf-8 -> latin-X if possible
  2002-05-06 11:37                 ` Kai Großjohann
  2002-05-06 13:18                   ` frank paulsen
@ 2002-05-06 13:28                   ` Kai Großjohann
  1 sibling, 0 replies; 15+ messages in thread
From: Kai Großjohann @ 2002-05-06 13:28 UTC (permalink / raw)
  Cc: ding

Kai.Grossjohann@cs.uni-dortmund.de (Kai Großjohann) writes:

> Bjørn Mork <bmork@dod.no> writes:
>
>> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
>>>
>>> What's this?  I typed "both <CURRENCY> and <EUR> in it", but out came
>>> two <EUR> characters!!
>>>
>>> There seems to be a bug.
>>
>> Must be a decode/display/font bug on your side. Looks good to me.
>
> Yes, C-u C-x = on the first EUR shows me that Emacs thinks it's a
> Latin-1 character.  But it displays the EUR glyph.
>
> I'm confused.

It turns out I was telling Emacs to use a *-iso8859-15 font, which is
a bad idea.

kai
-- 
Silence is foo!



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2002-05-06 13:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-04 11:36 utf-8 -> latin-X if possible Simon Josefsson
2002-05-04 14:08 ` Simon Josefsson
2002-05-04 14:45 ` Torsten Hilbrich
2002-05-05 12:06 ` Kai Großjohann
2002-05-05 15:06   ` Simon Josefsson
2002-05-05 15:24     ` Kai Großjohann
2002-05-05 15:49       ` Simon Josefsson
2002-05-05 16:01         ` Kai Großjohann
2002-05-05 16:06           ` Simon Josefsson
2002-05-05 16:06           ` Kai Großjohann
2002-05-06 10:01             ` Kai Großjohann
2002-05-06 10:09               ` Bjørn Mork
2002-05-06 11:37                 ` Kai Großjohann
2002-05-06 13:18                   ` frank paulsen
2002-05-06 13:28                   ` Kai Großjohann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).