Gnus development mailing list
 help / color / mirror / Atom feed
* Guidance concerning something I want to do with charsets
@ 2008-11-19  2:27 Lloyd Zusman
  2008-11-19 18:02 ` Reiner Steib
  0 siblings, 1 reply; 3+ messages in thread
From: Lloyd Zusman @ 2008-11-19  2:27 UTC (permalink / raw)
  To: ding

I've been using gnus since it first came out, but during all this time,
I never did a whole lot with charsets. I just used to always cause it to
post in 7-bit ascii, and then later, I converted it to handle 8-bit
ascii and the ISO-8859-1 charset for when I'm composing messages in
Spanish.

However, now I'm thinking that I want to enter the 21st century with
regard to charsets, and I really don't know much about the topic, at
least with regard to emacs and gnus. I have a specific setup that I'd
like to implement in gnus, and I'm wondering if anyone here would be
willing to give me some initial guidance as to how to proceed.

I'm not looking for someone to solve my problem for me or write my code;
rather, I'm just hoping that I could be given a very general overview of
what I need to do, with some pointers to the appropriate docs. I'll do
the rest.

I'm using emacs 23.0.60.1 and gnus version "No Gnus v0.11", recently
obtained via CVS.

OK, here's what I want to accomplish:

1. For each group that I'm viewing in gnus, I'd like to associate a
   default charset. Initially, this will either be UTF-8 or ISO-8859-1,
   with the default being ISO-8859-1. I believe that I make this
   group-specific setting via the `charset' attribute of the Group
   Parameters, correct?  But I'm not sure where to set the default
   charset that will be used if I don't set this group parameter.

2. For each group that I'm viewing, I want to set my own variable which
   specifies whether or not the logic in Item 3 (see below) will be
   performed for that group. This variable would either be set to `t' or
   `nil' for each group. Should I set this user-specific variable in the
   Group Parameters, or is there some other mechanism that I should use?

3. This is the part that I know the least about. If the variable
   described in Item 2 is `nil', then the logic described here is not
   performed. However, if this variable is set to `t' for a given group,
   then I want gnus to look at the Content-Type header of all incoming
   messages in this group, and to use it as the charset for displaying
   each of these messages. However, I want to make use of the following
   specific logic (pseudo-code):

     if Content-Type specifies UTF-8
       use UTF-8 as the charset
     else
       use ISO-8859-1 as the charset

4. In all cases, the charset selected via Items 1, 2, and 3 should be
   used both for decoding the message for display *and* for encoding my
   replies and follow-ups. I don't know much about how to do this,
   either.
   
Is this a common set of tasks which are easy to perform in gnus, or am I
trying to do something that is as idiosyncratic as many of the other
tasks that I tend to want to perform?

Thanks for any guidance and pointers that any of you might be able to
offer me.


-- 
 Lloyd Zusman
 ljz@asfast.com
 God bless you.




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Guidance concerning something I want to do with charsets
  2008-11-19  2:27 Guidance concerning something I want to do with charsets Lloyd Zusman
@ 2008-11-19 18:02 ` Reiner Steib
  2008-11-19 20:28   ` Lloyd Zusman
  0 siblings, 1 reply; 3+ messages in thread
From: Reiner Steib @ 2008-11-19 18:02 UTC (permalink / raw)
  To: ding

On Wed, Nov 19 2008, Lloyd Zusman wrote:

> I'm not looking for someone to solve my problem for me or write my code;
> rather, I'm just hoping that I could be given a very general overview of
> what I need to do, with some pointers to the appropriate docs. I'll do
> the rest.

I have problems understanding what your problem is in the first place.

> 3. This is the part that I know the least about. If the variable
>    described in Item 2 is `nil', then the logic described here is not
>    performed. However, if this variable is set to `t' for a given group,
>    then I want gnus to look at the Content-Type header of all incoming
>    messages in this group, and to use it as the charset for displaying
>    each of these messages. However, I want to make use of the following
>    specific logic (pseudo-code):
>
>      if Content-Type specifies UTF-8
>        use UTF-8 as the charset
>      else
>        use ISO-8859-1 as the charset

That is the normal behavior of Gnus (and any other MIME-aware MUA).
If you have problems with that, there's either a bug in your config
(e.g. using Emacs in unibyte mode) or in Gnus.

I can't think of a common use case for the `nil' case beside incorrect
charset labelling of articles (e.g. declaring iso-8859-1 when it is
utf-8).  For this, you can use
`gnus-summary-show-article-charset-alist' and the numerical prefix for
`g':

,----[ (info "(gnus)Paging the Article") ]
| `A g'
| `g'
|      (Re)fetch the current article (`gnus-summary-show-article').  If
|      given a prefix, fetch the current article, but don't run any of
|      the article treatment functions.  This will give you a "raw"
|      article, just the way it came from the server.
| 
|      If given a numerical prefix, you can do semi-manual charset
|      stuff.  `C-u 0 g cn-gb-2312 RET' will decode the message as if it
|      were encoded in the `cn-gb-2312' charset.  If you have
| 
|           (setq gnus-summary-show-article-charset-alist
|                 '((1 . cn-gb-2312)
|                   (2 . big5)))
| 
|      then you can say `C-u 1 g' to get the same effect.
`----

> 4. In all cases, the charset selected via Items 1, 2, and 3 should be
>    used both for decoding the message for display *and* for encoding my
>    replies and follow-ups.

Why do you think that this is useful?

> Is this a common set of tasks which are easy to perform in gnus, or am I
> trying to do something that is as idiosyncratic as many of the other
> tasks that I tend to want to perform?

Gnus already does The Right Thing by default.  You can specify which
charsets to prefer via `mm-coding-system-priorities':

,----[ <f1> v mm-coding-system-priorities RET ]
| mm-coding-system-priorities is a variable defined in `mm-util.el'.
| Its value is 
| (iso-8859-1 iso-8859-15 utf-8)
| 
| Documentation:
| Preferred coding systems for encoding outgoing messages.
| 
| More than one suitable coding system may be found for some text.
| By default, the coding system with the highest priority is used
| to encode outgoing messages (see `sort-coding-systems').  If this
| variable is set, it overrides the default priority.
| 
| You can customize this variable.
`----

For Spanish, the value above is sensible.  If you don't care about
20th century clients, you may simply use '(utf-8).

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Guidance concerning something I want to do with charsets
  2008-11-19 18:02 ` Reiner Steib
@ 2008-11-19 20:28   ` Lloyd Zusman
  0 siblings, 0 replies; 3+ messages in thread
From: Lloyd Zusman @ 2008-11-19 20:28 UTC (permalink / raw)
  To: ding

Reiner Steib <reinersteib+gmane@imap.cc> writes:

> On Wed, Nov 19 2008, Lloyd Zusman wrote:
>
>> [ ... ]
>>
>>
>>      if Content-Type specifies UTF-8
>>        use UTF-8 as the charset
>>      else
>>        use ISO-8859-1 as the charset
>
> That is the normal behavior of Gnus (and any other MIME-aware MUA).
> If you have problems with that, there's either a bug in your config
> (e.g. using Emacs in unibyte mode) or in Gnus.

Well, I have indeed been using Emacs in unibyte mode. Ages ago (well,
some time in the 1990's, I think), I started using that setting. Based
on what you're saying here, I guess that's obsolete.

I send emails to people who seem to be limited to unibyte ISO-8859-1
messages and not UTF-8. Perhaps their mailers are primitive and ignore
the Content-Type header ... I'm not sure why.  This is the reason for my
having set this default, originally.

I'm now trying to catch up on my knowledge of charsets, so please
forgive my ignorance about this topic. I can see from what you wrote
below that there's a way run Emacs in multibyte mode by default but to
use ISO-8859-1 for dealing with messages 


> I can't think of a common use case for the `nil' case beside incorrect
> charset labelling of articles (e.g. declaring iso-8859-1 when it is
> utf-8).  For this, you can use
> `gnus-summary-show-article-charset-alist' and the numerical prefix for
> `g':

Yes, I do see incorrect charset labeling (or missing labeling
altogether) in some articles and email messages.

Is there a way to do this without the numerical prefix? I.e., some sort
of hook that I can use to match the sender or newsgroup against a
pattern and then force the buffer to be encoded via unibyte ISO-8859-1
if the pattern matches?


> ,----[ (info "(gnus)Paging the Article") ]
> | `A g'
> | `g'
> |      [ ... etc. ... ]
> `----
>
>> 4. In all cases, the charset selected via Items 1, 2, and 3 should be
>>    used both for decoding the message for display *and* for encoding my
>>    replies and follow-ups.
>
> Why do you think that this is useful?

Well, maybe the only reason I think this way is due to ignorance. Some
of the recipients of my email messages and readers of my newsgroup posts
(in a private, very small news server ... maybe 20 people) do not seem
to be able to read UTF-8 encodings. This is why I want to force
ISO-8859-1 when writing to these people or posting in that private
newsgroup, whether these are replies or whether I'm initiating the
message.


>> Is this a common set of tasks which are easy to perform in gnus, or am I
>> trying to do something that is as idiosyncratic as many of the other
>> tasks that I tend to want to perform?
>
> Gnus already does The Right Thing by default.  You can specify which
> charsets to prefer via `mm-coding-system-priorities':


OK. I think I understand. I'll dig into this further, and I'll see if I
can fix my emacs so as not to use unibyte by default any longer, and
then to make use of the features you have described.

All that remains is for me to figure out how to write the hook that I
mentioned above.

Thank you very much.


-- 
 Lloyd Zusman
 ljz@asfast.com
 God bless you.




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-11-19 20:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-19  2:27 Guidance concerning something I want to do with charsets Lloyd Zusman
2008-11-19 18:02 ` Reiner Steib
2008-11-19 20:28   ` Lloyd Zusman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).