Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
From: chr@nybo.no (Christian Nybø)
Subject: Re: check for a particular header before calling a washing function?
Date: 07 Dec 2002 23:59:18 +0100	[thread overview]
Message-ID: <87ptsdmo55.fsf@nybo.no> (raw)
In-Reply-To: <847kelpkhj.fsf@lucy.cs.uni-dortmund.de>

kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:

> chr@nybo.no (Christian Nybø) writes:
> 
> > kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:
> >
> >> chr@nybo.no (Christian Nybø) writes:
> >> 
> >> > If the header "Content-Type: text/plain; charset=utf-8" is present, I
> >> > want to call a washing function that translates from utf-8 to latin-1.
> >> > The function is ready, but where do I hook the test in?
> >> 
> >> Why do you want to do that?  Maybe there is another way to achieve
> >> what you want.
> >
> > Some pos(t)ers in the no.* hierarchy use UTF-8, even though ISO-8859-1
> > would do the job.  Untranslated UTF-8 is hard to read.  My setup does
> > not translate UTF-8.  I want to teach it to translate it to Latin-1.
> 
> I guess you want this for articles you write?

No.  I write my articles in ISO-8859-1.  

I want to articles /encoded/ in UTF-8 to look nice, like for example
http://groups.google.com/groups?selm=lbzptse22mx.fsf%40aqualene.uio.no&oe=UTF-8&output=gplain

> Well, the UTF-8 article you forwarded was auto-converted by Gnus from
> UTF-8 to iso-8859-1, so it seems to work already :-)

I think you misunderstand me.  

> Oh, now I get it.  You can't see UTF-8 in your Emacs?  

Correct.  My Emacs is an XEmacs, btw.  I see two-character
combinations.  I've put together a few functions that convert from
UTF-8 to ISO-8859-1, but they're kinda broken as they assume that all
the characters will fit in ISO-8859-1, in other words, no character
codes above 255.  But they'll do, as I so far have only encountered
UTF-8-articles with characters in the range 0 to 255.

(defun utf-8-decode-region (start end)
  (interactive "r")
  (let 	((work-buffer (generate-new-buffer " *utf-8-work*")))
    (unwind-protect
	(save-excursion
	  (buffer-disable-undo work-buffer)
	  (progn 
	    (goto-char start)
	    (while (not (eobp))
	      (cond 
	       ((zerop (logand (following-char) #x80)) ; high bit is not set
		(insert-char (following-char) 1 t work-buffer))
	       ((= (logand #xE0 (following-char)) #xC0)
		(insert-char (logior (lsh (logand (following-char) #b00011111) 6) 
				     (progn (forward-char) (logand (following-char) #b00111111)))
			     1 t work-buffer)))
	      (forward-char))
	    (or (markerp end) (setq end (set-marker (make-marker) end)))
	    (goto-char start)
	    (insert-buffer-substring work-buffer)
	    (delete-region (point) end)))
      (and work-buffer (kill-buffer work-buffer)))))

(defun gnus-article-de-utf-8 ()
  "Convert utf-8 to latin-1"
  (interactive)
  (save-excursion
    (set-buffer gnus-article-buffer)
    (let ((buffer-read-only nil))
      (widen)
      (goto-char (point-min))
      (search-forward "\n\n" nil t)
      (utf-8-decode-region (point) (point-max)))))



> But this should work.  Type C-h h to view the HELLO file.  Does it
> say something about UTF-8 near the bottom?  What do you see?
> 

It starts like this:  

You need many fonts to read all.
Please correct this incomplete list and add more!

---------------------------------------------------------
Amharic	(^[$(3"c!<!N"^^[(B)	^[$(3!A!,!>^[(B
Arabic			^[[2]^[(38R^[(47d^[(3T!JSa^[(4W^[(3W^[[0]^[(B
Croatian (Hrvatski)	Bog (Bok), Dobar dan
Czech (^[.B^[Nhesky)		Dobr^[N} den

and does not mention anything about UTF-8.  File is
"/usr/local/src/xemacs-21.2.36/etc/HELLO"

> Oh, in Emacs 21.2, the HELLO file does not contain UTF-8.  Hm.
> 
> Search the web for the file UTF-8-demo.txt, download it, then use C-x
> RET c utf-8 RET C-x C-f to open the file in Emacs.  What do you see?

C-x RET is undefined.  I run XEmacs, and it's probably
compiled without support for setting a coding system.

Opening it as a plain file:

UTF-8 encoded sample plain-text file

‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Markus Kuhn [ˈmaʳkʊs kuːn] <mkuhn@acm.org> — 2002-07-25


-- 
chr


  parent reply	other threads:[~2002-12-07 22:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <877kemocp2.fsf@nybo.no>
2002-12-07  8:01 ` Bijan Soleymani
     [not found] ` <84n0nirlfi.fsf@lucy.cs.uni-dortmund.de>
     [not found]   ` <87vg25mz2u.fsf@nybo.no>
     [not found]     ` <847kelpkhj.fsf@lucy.cs.uni-dortmund.de>
2002-12-07 22:59       ` Christian Nybø [this message]
     [not found]         ` <84r8cs8p9e.fsf@lucy.cs.uni-dortmund.de>
2002-12-09 22:54           ` Christian Nybø

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ptsdmo55.fsf@nybo.no \
    --to=chr@nybo.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).