* coding-system difficulties @ 2002-11-13 15:21 Cyprian Laskowski [not found] ` <84y97x5utn.fsf@lucy.cs.uni-dortmund.de> [not found] ` <1mr8dps4ix.fsf@Tempo.Update.UU.SE> 0 siblings, 2 replies; 6+ messages in thread From: Cyprian Laskowski @ 2002-11-13 15:21 UTC (permalink / raw) Hi, I've never really totally understood coding-system issues. Here's a recurring problem: Sometimes I get email in non-English characters: sometimes Gnus shows a readable version, sometimes not (even though both cases sometimes arise with the same people). But the following little sequence lets me read: C-x h M-w C-x C-f ~/tmp/weird_mail.txt RET C-y C-x C-s C-x C-v RET I'm not sure why this behaves differently, or how to proceed properly: but I'm sure that it's not by binding this keyboard macro to `C-f'. :) cyp ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <84y97x5utn.fsf@lucy.cs.uni-dortmund.de>]
* Re: coding-system difficulties [not found] ` <84y97x5utn.fsf@lucy.cs.uni-dortmund.de> @ 2002-11-14 5:33 ` Cyprian Laskowski 2002-11-14 14:18 ` Kai Großjohann 0 siblings, 1 reply; 6+ messages in thread From: Cyprian Laskowski @ 2002-11-14 5:33 UTC (permalink / raw) kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes: > Sometimes, the charset header is bogus. You can type `1 g' then > enter the charset to force Gnus to use a different charset. Wow, I didn't know about that feature: neat. However, for some reason, it doesn't work for me in this case; or maybe I'm missing something. Here's some more details: I wrote the following function to facilitate matters a bit (I wonder if functions/interfaces for these sorts of investigations already exist in splendid form?): (defun list-buffer-coding-systems-and-charsets () "Show list of charsets and possible coding systems for the buffer." (interactive) (let ((charsets (find-charset-region (point-min) (point-max)))) (pop-to-buffer "*Coding Systems*") (erase-buffer) (mapc (lambda (charset) (insert "Charset: " (symbol-name charset) "\n" "Coding systems: " (prin1-to-string (find-coding-systems-for-charsets (list charset))) "\n\n")) charsets))) Now, to be concrete, if I run this function on a certain *Article* buffer with Greek text, it yields this: ,---- | Charset: ascii | Coding systems: (undecided) `---- But if I save the article in a file with the command sequence mentioned before, and rerun the function from that buffer, I get: ,---- | Charset: ascii | Coding systems: (undecided) | | Charset: greek-iso8859-7 | Coding systems: (iso-2022-jp-2 greek-iso-8bit tibetan-iso-8bit-with-esc thai-tis620-with-esc lao-with-esc korean-iso-8bit-with-esc japanese-iso-8bit-with-esc hebrew-iso-8bit-with-esc greek-iso-8bit-with-esc iso-latin-9-with-esc iso-latin-8-with-esc iso-latin-5-with-esc iso-latin-4-with-esc iso-latin-3-with-esc iso-latin-2-with-esc iso-latin-1-with-esc in-is13194-devanagari-with-esc cyrillic-iso-8bit-with-esc chinese-iso-8bit-with-esc compound-text iso-2022-8bit-ss2 iso-2022-7bit-lock iso-2022-7bit-ss2 iso-2022-7bit raw-text emacs-mule no-conversion) `---- Now when I try `1 g greek-iso8859-7 RET' I still get unreadable text. I even tried exhaustively running through all the charsets associated with the greek-iso8859-7-compatible coding-systems listed above, but still none of them was successful. I don't understand what's happening here, since obviously is able to render the text correctly (as in the file). Incidentally, here are content-type-related header lines in the message, as far as I can tell. ,---- | MIME-Version: 1.0 | Content-Type: multipart/alternative; boundary="0-1503940496-1037189069=:69315" | Content-Transfer-Encoding: 8bit | X-Content-Length: 4518 `---- As an aside, I wonder: is there a way (perhaps in BBDB) to associate a person with a charset, or coding-system or something? Then when you read a message from that person, or write a messag to them, the correct setup is used, and you're totally in business. Thanks to everyone for all the responses so far, as well as any more to come. :) cyp ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: coding-system difficulties 2002-11-14 5:33 ` Cyprian Laskowski @ 2002-11-14 14:18 ` Kai Großjohann [not found] ` <m31y5oc8yo.fsf@swagbelly.net> 0 siblings, 1 reply; 6+ messages in thread From: Kai Großjohann @ 2002-11-14 14:18 UTC (permalink / raw) Cyprian Laskowski <swagbelly@yahoo.com> writes: > But if I save the article in a file with the command sequence > mentioned before, and rerun the function from that buffer, I get: What does M-x describe-coding-system RET RET says after doing C-x C-f on the file? kai -- ~/.signature is: umop ap!sdn (Frank Nobis) ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <m31y5oc8yo.fsf@swagbelly.net>]
[parent not found: <843cq3hn3g.fsf@lucy.cs.uni-dortmund.de>]
* Re: coding-system difficulties [not found] ` <843cq3hn3g.fsf@lucy.cs.uni-dortmund.de> @ 2002-11-15 5:49 ` Fredrik Staxeng 0 siblings, 0 replies; 6+ messages in thread From: Fredrik Staxeng @ 2002-11-15 5:49 UTC (permalink / raw) kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes: >Cyprian Laskowski <swagbelly@yahoo.com> writes: > >> I have tried `1 g iso-2002-7bit RET' on the article, but still to no >> avail. > >That's what I would have suggested. Darn. Anyone? > >kai Changeing the autodetection stuff to remove all iso-2022 variants? Surely this is governed by some hook. I was under the impression that most of the encoding on list posted earlier are used mostly for Japanese. It would be help to know what language the article is written in. Also, if there are any headers that might indicate what charset is used? -- Fredrik Stax\"ang | rot13: sfgk@hcqngr.hh.fr ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1mr8dps4ix.fsf@Tempo.Update.UU.SE>]
[parent not found: <v9wunhckt2.fsf@marauder.physik.uni-ulm.de>]
[parent not found: <m3bs4sg069.fsf@swagbelly.net>]
* Re: coding-system difficulties [not found] ` <m3bs4sg069.fsf@swagbelly.net> @ 2002-11-14 12:23 ` Fredrik Staxeng 2002-11-14 13:01 ` Hugh Baker 1 sibling, 0 replies; 6+ messages in thread From: Fredrik Staxeng @ 2002-11-14 12:23 UTC (permalink / raw) Cyprian Laskowski <swagbelly@yahoo.com> writes: >Reiner Steib <4uce.02.r.steib@gmx.net> writes: > >> If Cyprian uses Emacs with X, the solution is to install proper >> Latin-9 fonts. Or you may want to use ucs-tables, see >> <URL:http://my.gnus.org/Members/rsteib/howto_unify/>. > >But since Emacs clearly is able to render this stuff, why would I need >extra fonts? Because it's the right thing. :-) >Maybe this is precisely the kind of thing that I don't understand ... >Can someone suggest a good reference (thorough, but not too >intimidating) for someone who has had the mixed blessing of usually >dealing exclusively with ascii/English, but who now wants to become >comfortable with dealing with these kinds of issues, instead of >running into a corner every time or pestering Emacs gurus? I have found some useful information on czyborra.com. For the most part he seems to describe existing practice in a fairly objective way. When you you want to compare specific characters sets, you can use the files on http://www.unicode.org/Public/MAPPINGS. If download the files and run diff on them you will see which characters that differ. Some countries seem to have one dominant way of coding their language in computers. Some countries have a few mostly compatible encodings. Some countries have incompatible encodings, so software tend to acquire auto-detection capabilities. Language is highly politicized issue in some countries. Some people want to push Unicode/UTF-8 as the solution. Microsoft have used embrace and extend even in this arena, and are of course met with some resistance from the other systems. Some people resent that English has (almost) fulfilled the esperanto dream of becoming the world's standard second language. This issue also holds some mystical attraction for people who like to take small problems and do big overcomplex solutions. (I'm thinking of certain Dane, not anybody present here). So there is plenty to fight about. The politics are complex, but when looking closer I have always found the technical issues to quite simple. This is not surprising after all. When you take a computer system and make it able to represent your language, you take the easiest way. You also preserve compatibility with English, so using English on a Thai computer is not a problem. Using Greek on Thai computer probably is. (Except for bidirectional scripts. I don't see any reasonable way to handle that. But some UTF-8 supporters seem to have come to the conclusion that the only feasible technical solution is to change the script, so I am not alone) -- Fredrik Stax\"ang | rot13: sfgk@hcqngr.hh.fr ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: coding-system difficulties [not found] ` <m3bs4sg069.fsf@swagbelly.net> 2002-11-14 12:23 ` Fredrik Staxeng @ 2002-11-14 13:01 ` Hugh Baker 1 sibling, 0 replies; 6+ messages in thread From: Hugh Baker @ 2002-11-14 13:01 UTC (permalink / raw) Cyprian Laskowski <swagbelly@yahoo.com> writes: > But since Emacs clearly is able to render this stuff, why would I need > extra fonts? You might try installing the intl-fonts package. I was doing some linguistics homework using the International Phonetic Alphabet (IPA) and I found it handy. You should be able to find it somewhere in: http://www.gnu.org/software/ You will need to set your input-method as well as your coding-system. -- Hugh Baker <hugh.baker@toronto.edu> University of Toronto Undergraduate CSC148 Course Notes: <http://individual.utoronto.ca/hughbaker/csc148/> public static int gcd(int a,int b){int r=a%b; return(r!=0)?gcd(b,r):b;} ~ ~ ~ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-11-15 5:49 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-11-13 15:21 coding-system difficulties Cyprian Laskowski [not found] ` <84y97x5utn.fsf@lucy.cs.uni-dortmund.de> 2002-11-14 5:33 ` Cyprian Laskowski 2002-11-14 14:18 ` Kai Großjohann [not found] ` <m31y5oc8yo.fsf@swagbelly.net> [not found] ` <843cq3hn3g.fsf@lucy.cs.uni-dortmund.de> 2002-11-15 5:49 ` Fredrik Staxeng [not found] ` <1mr8dps4ix.fsf@Tempo.Update.UU.SE> [not found] ` <v9wunhckt2.fsf@marauder.physik.uni-ulm.de> [not found] ` <m3bs4sg069.fsf@swagbelly.net> 2002-11-14 12:23 ` Fredrik Staxeng 2002-11-14 13:01 ` Hugh Baker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).