coding-system difficulties

Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed

* coding-system difficulties
@ 2002-11-13 15:21 Cyprian Laskowski
       [not found] ` <84y97x5utn.fsf@lucy.cs.uni-dortmund.de>
       [not found] ` <1mr8dps4ix.fsf@Tempo.Update.UU.SE>
  0 siblings, 2 replies; 6+ messages in thread
From: Cyprian Laskowski @ 2002-11-13 15:21 UTC (permalink / raw)


Hi,

I've never really totally understood coding-system issues.  Here's a
recurring problem:

Sometimes I get email in non-English characters: sometimes Gnus shows
a readable version, sometimes not (even though both cases sometimes
arise with the same people).  But the following little sequence lets
me read:

C-x h
M-w
C-x C-f ~/tmp/weird_mail.txt RET
C-y
C-x C-s
C-x C-v RET


I'm not sure why this behaves differently, or how to proceed properly:
but I'm sure that it's not by binding this keyboard macro to `C-f'. :)


cyp


^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <84y97x5utn.fsf@lucy.cs.uni-dortmund.de>]

* Re: coding-system difficulties
       [not found] ` <84y97x5utn.fsf@lucy.cs.uni-dortmund.de>
@ 2002-11-14  5:33   ` Cyprian Laskowski
  2002-11-14 14:18     ` Kai Großjohann
  0 siblings, 1 reply; 6+ messages in thread
From: Cyprian Laskowski @ 2002-11-14  5:33 UTC (permalink / raw)

kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:

> Sometimes, the charset header is bogus.  You can type `1 g' then
> enter the charset to force Gnus to use a different charset.

Wow, I didn't know about that feature: neat.  However, for some
reason, it doesn't work for me in this case; or maybe I'm missing
something.  Here's some more details:

I wrote the following function to facilitate matters a bit (I wonder
if functions/interfaces for these sorts of investigations already
exist in splendid form?):

(defun list-buffer-coding-systems-and-charsets ()
  "Show list of charsets and possible coding systems for the buffer."
  (interactive)
  (let ((charsets (find-charset-region (point-min) (point-max))))
    (pop-to-buffer "*Coding Systems*")
    (erase-buffer)
    (mapc (lambda (charset)
            (insert "Charset: "
                    (symbol-name charset)
                    "\n"
                    "Coding systems: "
                    (prin1-to-string
                     (find-coding-systems-for-charsets
                      (list charset)))
                    "\n\n"))
          charsets)))

Now, to be concrete, if I run this function on a certain *Article*
buffer with Greek text, it yields this:

,----
| Charset: ascii
| Coding systems: (undecided)
`----

But if I save the article in a file with the command sequence
mentioned before, and rerun the function from that buffer, I get:

,----
| Charset: ascii
| Coding systems: (undecided)
| 
| Charset: greek-iso8859-7
| Coding systems: (iso-2022-jp-2 greek-iso-8bit tibetan-iso-8bit-with-esc thai-tis620-with-esc lao-with-esc korean-iso-8bit-with-esc japanese-iso-8bit-with-esc hebrew-iso-8bit-with-esc greek-iso-8bit-with-esc iso-latin-9-with-esc iso-latin-8-with-esc iso-latin-5-with-esc iso-latin-4-with-esc iso-latin-3-with-esc iso-latin-2-with-esc iso-latin-1-with-esc in-is13194-devanagari-with-esc cyrillic-iso-8bit-with-esc chinese-iso-8bit-with-esc compound-text iso-2022-8bit-ss2 iso-2022-7bit-lock iso-2022-7bit-ss2 iso-2022-7bit raw-text emacs-mule no-conversion)
`----

Now when I try `1 g greek-iso8859-7 RET' I still get unreadable text.
I even tried exhaustively running through all the charsets associated
with the greek-iso8859-7-compatible coding-systems listed above, but
still none of them was successful.  I don't understand what's
happening here, since obviously is able to render the text correctly
(as in the file).

Incidentally, here are content-type-related header lines in the
message, as far as I can tell.

,----
| MIME-Version: 1.0
| Content-Type: multipart/alternative; boundary="0-1503940496-1037189069=:69315"
| Content-Transfer-Encoding: 8bit
| X-Content-Length: 4518
`----

As an aside, I wonder: is there a way (perhaps in BBDB) to associate a
person with a charset, or coding-system or something?  Then when you
read a message from that person, or write a messag to them, the
correct setup is used, and you're totally in business.

Thanks to everyone for all the responses so far, as well as any more
to come. :)

cyp

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: coding-system difficulties
  2002-11-14  5:33   ` Cyprian Laskowski
@ 2002-11-14 14:18     ` Kai Großjohann
       [not found]       ` <m31y5oc8yo.fsf@swagbelly.net>
  0 siblings, 1 reply; 6+ messages in thread
From: Kai Großjohann @ 2002-11-14 14:18 UTC (permalink / raw)


Cyprian Laskowski <swagbelly@yahoo.com> writes:

> But if I save the article in a file with the command sequence
> mentioned before, and rerun the function from that buffer, I get:

What does M-x describe-coding-system RET RET says after doing C-x C-f
on the file?

kai
-- 
~/.signature is: umop ap!sdn    (Frank Nobis)


^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <m31y5oc8yo.fsf@swagbelly.net>]

[parent not found: <843cq3hn3g.fsf@lucy.cs.uni-dortmund.de>]

* Re: coding-system difficulties
       [not found]         ` <843cq3hn3g.fsf@lucy.cs.uni-dortmund.de>
@ 2002-11-15  5:49           ` Fredrik Staxeng
  0 siblings, 0 replies; 6+ messages in thread
From: Fredrik Staxeng @ 2002-11-15  5:49 UTC (permalink / raw)

kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:

>Cyprian Laskowski <swagbelly@yahoo.com> writes:
>
>> I have tried `1 g iso-2002-7bit RET' on the article, but still to no
>> avail.
>
>That's what I would have suggested.  Darn.  Anyone?
>
>kai

Changeing the autodetection stuff to remove all iso-2022 variants?
Surely this is governed by some hook.

I was under the impression that most of the encoding on list posted
earlier are used mostly for Japanese. It would be help to know what 
language the article is written in. Also, if there are any headers
that might indicate what charset is used?

-- 
Fredrik Stax\"ang | rot13: sfgk@hcqngr.hh.fr

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <1mr8dps4ix.fsf@Tempo.Update.UU.SE>]

[parent not found: <v9wunhckt2.fsf@marauder.physik.uni-ulm.de>]

[parent not found: <m3bs4sg069.fsf@swagbelly.net>]

* Re: coding-system difficulties
       [not found]     ` <m3bs4sg069.fsf@swagbelly.net>
@ 2002-11-14 12:23       ` Fredrik Staxeng
  2002-11-14 13:01       ` Hugh Baker
  1 sibling, 0 replies; 6+ messages in thread
From: Fredrik Staxeng @ 2002-11-14 12:23 UTC (permalink / raw)

Cyprian Laskowski <swagbelly@yahoo.com> writes:

>Reiner Steib <4uce.02.r.steib@gmx.net> writes:
>
>> If Cyprian uses Emacs with X, the solution is to install proper
>> Latin-9 fonts. Or you may want to use ucs-tables, see
>> <URL:http://my.gnus.org/Members/rsteib/howto_unify/>.
>
>But since Emacs clearly is able to render this stuff, why would I need
>extra fonts?  

Because it's the right thing. :-) 

>Maybe this is precisely the kind of thing that I don't understand ...
>Can someone suggest a good reference (thorough, but not too
>intimidating) for someone who has had the mixed blessing of usually
>dealing exclusively with ascii/English, but who now wants to become
>comfortable with dealing with these kinds of issues, instead of
>running into a corner every time or pestering Emacs gurus?

I have found some useful information on czyborra.com. For the most
part he seems to describe existing practice in a fairly objective
way. 

When you you want to compare specific characters sets, you can use 
the files on http://www.unicode.org/Public/MAPPINGS. If download 
the files and run diff on them you will see which characters that
differ.

Some countries seem to have one dominant way of coding their language
in computers. Some countries have a few mostly compatible encodings.
Some countries have incompatible encodings, so software tend to
acquire auto-detection capabilities.

Language is highly politicized issue in some countries.  Some people
want to push Unicode/UTF-8 as the solution.  Microsoft have used
embrace and extend even in this arena, and are of course met with some
resistance from the other systems. Some people resent that English
has (almost) fulfilled the esperanto dream of becoming the world's
standard second language. This issue also holds some mystical
attraction for people who like to take small problems and do big
overcomplex solutions. (I'm thinking of certain Dane, not anybody
present here).

So there is plenty to fight about. The politics are complex, but when
looking closer I have always found the technical issues to quite simple.
This is not surprising after all. When you take a computer system
and make it able to represent your language, you take the easiest 
way. You also preserve compatibility with English, so using
English on a Thai computer is not a problem. Using Greek on Thai
computer probably is. 

(Except for bidirectional scripts. I don't see any reasonable way to
handle that. But some UTF-8 supporters seem to have come to the
conclusion that the only feasible technical solution is to change the
script, so I am not alone)

-- 
Fredrik Stax\"ang | rot13: sfgk@hcqngr.hh.fr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: coding-system difficulties
       [not found]     ` <m3bs4sg069.fsf@swagbelly.net>
  2002-11-14 12:23       ` Fredrik Staxeng
@ 2002-11-14 13:01       ` Hugh Baker
  1 sibling, 0 replies; 6+ messages in thread
From: Hugh Baker @ 2002-11-14 13:01 UTC (permalink / raw)

Cyprian Laskowski <swagbelly@yahoo.com> writes:

> But since Emacs clearly is able to render this stuff, why would I need
> extra fonts?  

You might try installing the intl-fonts package.  I was doing some
linguistics homework using the International Phonetic Alphabet (IPA)
and I found it handy.  You should be able to find it somewhere in:

  http://www.gnu.org/software/

You will need to set your input-method as well as your coding-system.

-- 
 Hugh Baker <hugh.baker@toronto.edu> University of Toronto Undergraduate
  CSC148 Course Notes: <http://individual.utoronto.ca/hughbaker/csc148/>
   public static int gcd(int a,int b){int r=a%b; return(r!=0)?gcd(b,r):b;}
				~ ~ ~

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-11-15  5:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-13 15:21 coding-system difficulties Cyprian Laskowski
     [not found] ` <84y97x5utn.fsf@lucy.cs.uni-dortmund.de>
2002-11-14  5:33   ` Cyprian Laskowski
2002-11-14 14:18     ` Kai Großjohann
     [not found]       ` <m31y5oc8yo.fsf@swagbelly.net>
     [not found]         ` <843cq3hn3g.fsf@lucy.cs.uni-dortmund.de>
2002-11-15  5:49           ` Fredrik Staxeng
     [not found] ` <1mr8dps4ix.fsf@Tempo.Update.UU.SE>
     [not found]   ` <v9wunhckt2.fsf@marauder.physik.uni-ulm.de>
     [not found]     ` <m3bs4sg069.fsf@swagbelly.net>
2002-11-14 12:23       ` Fredrik Staxeng
2002-11-14 13:01       ` Hugh Baker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).