Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* HTML in mail
@ 2002-08-28 16:26 Ichimusai
  2002-08-28 16:29 ` Ichimusai
  2002-08-28 17:00 ` Vasily Korytov
  0 siblings, 2 replies; 8+ messages in thread
From: Ichimusai @ 2002-08-28 16:26 UTC (permalink / raw)



At the company where I work, several people sends mail in RTF[1]. The
mailserver however changes it into HTML before it is dumped in my
mailbox and I installed the w3 package to cope with that.

It works well with those who use clients that produce proper HTML, for
example Eudora, the w3 package renders it well and it works
satisfactory.

But the RTF mail from Outlook that has been converted to HTML on the
mailserver does not render in w3. I have not investigated it further,
but it says "Drawing../" etc in the status line for a second, and then
it says that it gives up and shows it as plain text, with the HTML
tags and all. There is something in the "HTML" that Exchange produce
that makes w3 give up on it, and when I press W h to wash the mail all
text goes and all I get is a blank message with only the important
headers showing.

For fun I saved a mail, stripped everything but the HTML and ran it
through validator.w3.org and it came up with 53 errors (when I
selected DTD manually) and this was a small mail with perhaps 10 lines
of text.

To the question -- I know there are many very skilled people in here
who knows Emacs and Gnus much better then I do, what I would like is a
function that deletes everything betwen < and > so that I can at least
get rid of the ugly HTML tags when I read/reply to a mail. I am sure
this is easily done in a couple of lines of eLisp, but unfortunately I
am not skilled enough to write this myself, I would be very happy if
someone would lend me a couple of minutes if his/hers time and help me
with this, and how to hook that function into the wash keymap of Gnus
so that I can easily wash away the non-standard "HTML" of the Exchange
server.

[1] Rich Text Format (yes they use MS Outlook).

-- 
  // AA#769 ICQ: 1645566 http://www.ichimusai.org/
\X/  ASCII ribbon campaign - No HTML, RTF or MS Word in mail
Fools seldom defer
    -- Guy King, uk.rec.sheds


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: HTML in mail
  2002-08-28 16:26 HTML in mail Ichimusai
@ 2002-08-28 16:29 ` Ichimusai
  2002-08-28 17:00 ` Vasily Korytov
  1 sibling, 0 replies; 8+ messages in thread
From: Ichimusai @ 2002-08-28 16:29 UTC (permalink / raw)


Ichimusai <ichi@ichimusai.org> writes:

> At the company where I work, several people sends mail in RTF[1]. The
> mailserver however changes it into HTML before it is dumped in my
> mailbox and I installed the w3 package to cope with that.

Sorry for answering myself. I just remembered that I forgot to post
the version of Emacs and Gnus that I am using which might be crucial
:)

 * GNU Emacs 21.2.1 (i386-redhat-linux-gnu, X toolkit, Xaw3d scroll
   bars)

 * Gnus v5.9.0

-- 
  // AA#769 ICQ: 1645566 http://www.ichimusai.org/
\X/  ASCII ribbon campaign - No HTML, RTF or MS Word in mail
Cat rule #1:
  "When in doubht, wash!"


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: HTML in mail
  2002-08-28 16:26 HTML in mail Ichimusai
  2002-08-28 16:29 ` Ichimusai
@ 2002-08-28 17:00 ` Vasily Korytov
       [not found]   ` <m3it1ueqkj.fsf_-_@ichimusai.org>
  1 sibling, 1 reply; 8+ messages in thread
From: Vasily Korytov @ 2002-08-28 17:00 UTC (permalink / raw)


>>>>> "i" == ichi  writes:

 i> To the question -- I know there are many very skilled people in here
 i> who knows Emacs and Gnus much better then I do, what I would like is a
 i> function that deletes everything betwen < and > so that I can at least
 i> get rid of the ugly HTML tags when I read/reply to a mail. I am sure
 i> this is easily done in a couple of lines of eLisp, but unfortunately I
 i> am not skilled enough to write this myself, I would be very happy if
 i> someone would lend me a couple of minutes if his/hers time and help me
 i> with this, and how to hook that function into the wash keymap of Gnus
 i> so that I can easily wash away the non-standard "HTML" of the Exchange
 i> server.

Well, you can use another HTML renderer (but I'm not sure, if this
possibility appeared in Oort -- current alpha of Gnus -- or before; for
example, I use lynx). Also there's html2text.el in Oort, which may be
the filter, you want.

-- 
                     With respect, Vasily Korytov

PGP key fingerprint: A4FE 4665 A720 687F 4ECC 1474 7C16 C498 BAAB C999


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: HTML in mail (or perhaps replace-regexp might work?)
       [not found]     ` <871y8ig4cy.fsf@unix.home>
@ 2002-08-28 19:54       ` Vasily Korytov
  2002-08-28 20:15       ` Ichimusai
       [not found]       ` <5llm6qwvio.fsf@rum.cs.yale.edu>
  2 siblings, 0 replies; 8+ messages in thread
From: Vasily Korytov @ 2002-08-28 19:54 UTC (permalink / raw)


 VK> Besides, I don't recommend binding it globally, esp. to C-backspace.

Sorry, I mean M-backspace.

-- 
                     With respect, Vasily Korytov

PGP key fingerprint: A4FE 4665 A720 687F 4ECC 1474 7C16 C498 BAAB C999


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: HTML in mail (or perhaps replace-regexp might work?)
       [not found]     ` <871y8ig4cy.fsf@unix.home>
  2002-08-28 19:54       ` HTML in mail (or perhaps replace-regexp might work?) Vasily Korytov
@ 2002-08-28 20:15       ` Ichimusai
  2002-08-28 20:31         ` Ichimusai
       [not found]       ` <5llm6qwvio.fsf@rum.cs.yale.edu>
  2 siblings, 1 reply; 8+ messages in thread
From: Ichimusai @ 2002-08-28 20:15 UTC (permalink / raw)


Vasily Korytov <moderator@faqteam.org> writes:

> Try this function:
> 
> (defun vk-remove-tags ()
>   "Remove everything, that fits the  regexp in current buffer."
>   (interactive)
>   (save-excursion
>     (beginning-of-buffer)
>     (while (re-search-forward "" nil t)
>       (replace-match "" nil nil))))
> 
> Besides, I don't recommend binding it globally, esp. to C-backspace.

Brilliant!

I was close to the solution myself, I had it this way:

(defun wash-ugly-html ()
    (while (re-search-forward "\\(<.*>\\)\\|\\(&.*;\\)" nil t)
      (replace-match "" nil nil)))

(global-set-key "\C-\M-H" 'wash-ugly-html)

But of course that did not work very well. Replaced the regexp in your
function and it works pretty well so far I have tested it, with one
exception, is it possible to position just after the headers and not
at the top of the buffer before replacing? 

Because otherwise the References: line is caught in the <.*> part of
the regexp.

Oh, and one more thing, if you can help me, a message buffer is
usually read only, is it possible to have this operate on it anyway,
so that I don't have to reply to it and then wash it?

-- 
  // AA#769 ICQ: 1645566 http://www.ichimusai.org/
\X/  ASCII ribbon campaign - No HTML, RTF or MS Word in mail
6 curses equals 1 hexahex.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: HTML in mail (or perhaps replace-regexp might work?)
  2002-08-28 20:15       ` Ichimusai
@ 2002-08-28 20:31         ` Ichimusai
  0 siblings, 0 replies; 8+ messages in thread
From: Ichimusai @ 2002-08-28 20:31 UTC (permalink / raw)


Ichimusai <ichi@ichimusai.org> writes:

> Vasily Korytov <moderator@faqteam.org> writes:
> 
> > Try this function:
> > 
> > (defun vk-remove-tags ()
> >   "Remove everything, that fits the  regexp in current buffer."
> >   (interactive)
> >   (save-excursion
> >     (beginning-of-buffer)
> >     (while (re-search-forward "" nil t)
> >       (replace-match "" nil nil))))
> > 
> > Besides, I don't recommend binding it globally, esp. to C-backspace.
> 
> Brilliant!
> 
> I was close to the solution myself, I had it this way:
> 
> (defun wash-ugly-html ()
>     (while (re-search-forward "\\(<.*>\\)\\|\\(&.*;\\)" nil t)
>       (replace-match "" nil nil)))
> 
> (global-set-key "\C-\M-H" 'wash-ugly-html)
> 
> But of course that did not work very well. Replaced the regexp in your
> function and it works pretty well so far I have tested it, with one
> exception, is it possible to position just after the headers and not
> at the top of the buffer before replacing? 
> 
> Because otherwise the References: line is caught in the <.*> part of
> the regexp.

I solved the references lines problem by narrowing my regexp a bit, it
now looks like "\\(<[^<>.@]*>\\)\\|\\(&.*;\\)" instead.

Thus something like

<html>
> You said that you are using HTML?<BR>
</html>

Will remove the apropriate tags but leave the ">" quote mark.

Sorry guys if I am talking to myself, I'm rather excited though.

-- 
  // AA#769 ICQ: 1645566 http://www.ichimusai.org/
\X/  ASCII ribbon campaign - No HTML, RTF or MS Word in mail
Noone is perfect, but some parts of you are rather exquisite!
    -- Ichimusai (7 beers later 3 seconds before the slap)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: HTML in mail (or perhaps replace-regexp might work?)
       [not found]         ` <wupr9gda.fsf@ID-23066.news.dfncis.de>
@ 2002-09-13 15:26           ` Stefan Monnier <foo@acm.com>
  2002-09-14 15:15             ` Clemens Fischer
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Monnier <foo@acm.com> @ 2002-09-13 15:26 UTC (permalink / raw)


>>>>> "Clemens" == Clemens Fischer <ino@despammed.com> writes:
> i thought you could make REs non-greedy by appending a question mark
> to any closure, like "<.*?>" versus "<.*>"?  or is this not so?

Indeed, but "<[^>\n]*>" is better/faster and doesn't care about greed.


        Stefan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: HTML in mail (or perhaps replace-regexp might work?)
  2002-09-13 15:26           ` Stefan Monnier <foo@acm.com>
@ 2002-09-14 15:15             ` Clemens Fischer
  0 siblings, 0 replies; 8+ messages in thread
From: Clemens Fischer @ 2002-09-14 15:15 UTC (permalink / raw)


"Stefan Monnier" <monnier+gnu.emacs.gnus/news/@flint.cs.yale.edu> writes:

>> to any closure, like "<.*?>" versus "<.*>"?  or is this not so?
>
> Indeed, but "<[^>\n]*>" is better/faster and doesn't care about greed.

how can you say that?  did you profile this?  closures after character
classes are inherently slow with the implementations _i_ know.

-- 
clemens


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-09-14 15:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-08-28 16:26 HTML in mail Ichimusai
2002-08-28 16:29 ` Ichimusai
2002-08-28 17:00 ` Vasily Korytov
     [not found]   ` <m3it1ueqkj.fsf_-_@ichimusai.org>
     [not found]     ` <871y8ig4cy.fsf@unix.home>
2002-08-28 19:54       ` HTML in mail (or perhaps replace-regexp might work?) Vasily Korytov
2002-08-28 20:15       ` Ichimusai
2002-08-28 20:31         ` Ichimusai
     [not found]       ` <5llm6qwvio.fsf@rum.cs.yale.edu>
     [not found]         ` <wupr9gda.fsf@ID-23066.news.dfncis.de>
2002-09-13 15:26           ` Stefan Monnier <foo@acm.com>
2002-09-14 15:15             ` Clemens Fischer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).