Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* Re: check for a particular header before calling a washing function?
       [not found]     ` <847kelpkhj.fsf@lucy.cs.uni-dortmund.de>
@ 2002-12-07 22:59       ` Christian Nybø
       [not found]         ` <84r8cs8p9e.fsf@lucy.cs.uni-dortmund.de>
  0 siblings, 1 reply; 3+ messages in thread
From: Christian Nybø @ 2002-12-07 22:59 UTC (permalink / raw)


kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:

> chr@nybo.no (Christian Nybø) writes:
> 
> > kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:
> >
> >> chr@nybo.no (Christian Nybø) writes:
> >> 
> >> > If the header "Content-Type: text/plain; charset=utf-8" is present, I
> >> > want to call a washing function that translates from utf-8 to latin-1.
> >> > The function is ready, but where do I hook the test in?
> >> 
> >> Why do you want to do that?  Maybe there is another way to achieve
> >> what you want.
> >
> > Some pos(t)ers in the no.* hierarchy use UTF-8, even though ISO-8859-1
> > would do the job.  Untranslated UTF-8 is hard to read.  My setup does
> > not translate UTF-8.  I want to teach it to translate it to Latin-1.
> 
> I guess you want this for articles you write?

No.  I write my articles in ISO-8859-1.  

I want to articles /encoded/ in UTF-8 to look nice, like for example
http://groups.google.com/groups?selm=lbzptse22mx.fsf%40aqualene.uio.no&oe=UTF-8&output=gplain

> Well, the UTF-8 article you forwarded was auto-converted by Gnus from
> UTF-8 to iso-8859-1, so it seems to work already :-)

I think you misunderstand me.  

> Oh, now I get it.  You can't see UTF-8 in your Emacs?  

Correct.  My Emacs is an XEmacs, btw.  I see two-character
combinations.  I've put together a few functions that convert from
UTF-8 to ISO-8859-1, but they're kinda broken as they assume that all
the characters will fit in ISO-8859-1, in other words, no character
codes above 255.  But they'll do, as I so far have only encountered
UTF-8-articles with characters in the range 0 to 255.

(defun utf-8-decode-region (start end)
  (interactive "r")
  (let 	((work-buffer (generate-new-buffer " *utf-8-work*")))
    (unwind-protect
	(save-excursion
	  (buffer-disable-undo work-buffer)
	  (progn 
	    (goto-char start)
	    (while (not (eobp))
	      (cond 
	       ((zerop (logand (following-char) #x80)) ; high bit is not set
		(insert-char (following-char) 1 t work-buffer))
	       ((= (logand #xE0 (following-char)) #xC0)
		(insert-char (logior (lsh (logand (following-char) #b00011111) 6) 
				     (progn (forward-char) (logand (following-char) #b00111111)))
			     1 t work-buffer)))
	      (forward-char))
	    (or (markerp end) (setq end (set-marker (make-marker) end)))
	    (goto-char start)
	    (insert-buffer-substring work-buffer)
	    (delete-region (point) end)))
      (and work-buffer (kill-buffer work-buffer)))))

(defun gnus-article-de-utf-8 ()
  "Convert utf-8 to latin-1"
  (interactive)
  (save-excursion
    (set-buffer gnus-article-buffer)
    (let ((buffer-read-only nil))
      (widen)
      (goto-char (point-min))
      (search-forward "\n\n" nil t)
      (utf-8-decode-region (point) (point-max)))))



> But this should work.  Type C-h h to view the HELLO file.  Does it
> say something about UTF-8 near the bottom?  What do you see?
> 

It starts like this:  

You need many fonts to read all.
Please correct this incomplete list and add more!

---------------------------------------------------------
Amharic	(^[$(3"c!<!N"^^[(B)	^[$(3!A!,!>^[(B
Arabic			^[[2]^[(38R^[(47d^[(3T!JSa^[(4W^[(3W^[[0]^[(B
Croatian (Hrvatski)	Bog (Bok), Dobar dan
Czech (^[.B^[Nhesky)		Dobr^[N} den

and does not mention anything about UTF-8.  File is
"/usr/local/src/xemacs-21.2.36/etc/HELLO"

> Oh, in Emacs 21.2, the HELLO file does not contain UTF-8.  Hm.
> 
> Search the web for the file UTF-8-demo.txt, download it, then use C-x
> RET c utf-8 RET C-x C-f to open the file in Emacs.  What do you see?

C-x RET is undefined.  I run XEmacs, and it's probably
compiled without support for setting a coding system.

Opening it as a plain file:

UTF-8 encoded sample plain-text file

‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Markus Kuhn [ˈmaʳkʊs kuːn] <mkuhn@acm.org> — 2002-07-25


-- 
chr


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: check for a particular header before calling a washing function?
       [not found]         ` <84r8cs8p9e.fsf@lucy.cs.uni-dortmund.de>
@ 2002-12-09 22:54           ` Christian Nybø
  0 siblings, 0 replies; 3+ messages in thread
From: Christian Nybø @ 2002-12-09 22:54 UTC (permalink / raw)


kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:

> chr@nybo.no (Christian Nybø) writes:
> 
> > Correct.  My Emacs is an XEmacs, btw.  I see two-character
> > combinations.
> 
> Aha.  Okay, you have a non-mule XEmacs.  The first step to fix this
> is to install a Mule XEmacs.
> 
> Then, get the Mule-UCS package and the Latin-Unity package and enable
> both.
> 
> I think that should solve your problems.

I think that it's overkill for my needs.  I added this to my .gnus:


(defun utf-8-decode-region (start end)
  (interactive "r")
  (let 	((work-buffer (generate-new-buffer " *utf-8-work*")))
    (unwind-protect
	(save-excursion
	  (buffer-disable-undo work-buffer)
	  (progn 
	    (goto-char start)
	    (while (not (eobp))
	      (cond 
	       ((zerop (logand (following-char) #x80)) ; high bit is not set
		(insert-char (following-char) 1 t work-buffer))
	       ((= (logand #xE0 (following-char)) #xC0)
		(insert-char (logior (lsh (logand (following-char) #b00011111) 6) 
				     (progn (forward-char) (logand (following-char) #b00111111)))
			     1 t work-buffer)))
	      (forward-char))
	    (or (markerp end) (setq end (set-marker (make-marker) end)))
	    (goto-char start)
	    (insert-buffer-substring work-buffer)
	    (delete-region (point) end)))
      (and work-buffer (kill-buffer work-buffer)))))

(defun gnus-article-de-utf-8 ()
  "Convert utf-8 to latin-1"
  (interactive)
  (save-excursion
    (set-buffer gnus-article-buffer)
    (let ((buffer-read-only nil))
      (widen)
      (goto-char (point-min))
      (search-forward "\n\n" nil t)
      (utf-8-decode-region (point) (point-max)))))

(define-key gnus-summary-mode-map "Wu" 'gnus-article-de-utf-8)

(defun gnus-article-maybe-de-utf-8 ()
  (let* ((ct (message-fetch-field "Content-Type" t))
	 (ctlist (and ct (mail-header-parse-content-type ct))))
    (when (string=
	   (cdr (assoc 'charset
		       (cdr ctlist)))
	   "utf-8")
      (gnus-article-de-utf-8))))

thus articles encoded in utf-8 consisting of characters in the range
from 0 to 255 are displayed nicely.

Thanks for your time.
-- 
chr


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: check for a particular header before calling a washing function?
       [not found] <877kemocp2.fsf@nybo.no>
@ 2002-12-07  8:01 ` Bijan Soleymani
       [not found] ` <84n0nirlfi.fsf@lucy.cs.uni-dortmund.de>
  1 sibling, 0 replies; 3+ messages in thread
From: Bijan Soleymani @ 2002-12-07  8:01 UTC (permalink / raw)


chr@nybo.no (Christian Nybø) writes:

> If the header "Content-Type: text/plain; charset=utf-8" is present, I
> want to call a washing function that translates from utf-8 to latin-1.
> The function is ready, but where do I hook the test in?
> -- 
> chr

There is some stuff about how to access the information in the header,
in the gnus info manual under the "headers" section. According to the
documentation there are functions for accessing each field. 

However there are only 9 fields defined, with the last one being
extra. I'm guessing that they split off the relevant info and put it
in the 8 first slots and the remaining stuff goes in extra. Probably
everything you're looking for would go in extra.

Bijan


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-12-09 22:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <877kemocp2.fsf@nybo.no>
2002-12-07  8:01 ` check for a particular header before calling a washing function? Bijan Soleymani
     [not found] ` <84n0nirlfi.fsf@lucy.cs.uni-dortmund.de>
     [not found]   ` <87vg25mz2u.fsf@nybo.no>
     [not found]     ` <847kelpkhj.fsf@lucy.cs.uni-dortmund.de>
2002-12-07 22:59       ` Christian Nybø
     [not found]         ` <84r8cs8p9e.fsf@lucy.cs.uni-dortmund.de>
2002-12-09 22:54           ` Christian Nybø

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).