From: Lars Magne Ingebrigtsen <larsi@gnus.org>
Subject: Re: Announce: nnwarchive
Date: 10 Nov 1999 16:32:19 +0100 [thread overview]
Message-ID: <m3aeom7470.fsf@quimbies.gnus.org> (raw)
In-Reply-To: <wziiu3bk0x7.fsf@mail.dotcom.fr>
Eric Marsden <emarsden@mail.dotcom.fr> writes:
> They all operate basically the same way: download the page, extract
> the information, return it formatted. They face the same challenges:
> conveniently and securely providing updates to users relatively often,
> as web sites undergo facelifts.
The layout parsing is one thing, but w3 can deal with that better than
the current regexp-based things in nnweb, I think.
> So I am wondering if there is scope for a generic "wash.el" which
> would take as an argument an URL (which is dynamically generated in
> certain cases), a parser function which returns matches, operating on
> the raw HTML, and provides automatic or semi-automatic update services
> which connect to some trusted web site where washing-authors can put
> updates.
It's an interesting idea. It doesn't even have to be code on the
trusted web site -- it could just be something that would say how the
w3 parse tree should be interpreted. Like -- "this node in the tree
is the author name, and this node is the body of the message".
It sounds like an interesting project. "How to extract information
from HTML pages." But it'll be difficult.
--
(domestic pets only, the antidote for overdose, milk.)
larsi@gnus.org * Lars Magne Ingebrigtsen
next prev parent reply other threads:[~1999-11-10 15:32 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
1999-11-09 17:02 Shenghuo ZHU
1999-11-09 17:52 ` Eric Marsden
1999-11-09 18:25 ` Shenghuo ZHU
1999-11-09 18:39 ` Eric Marsden
1999-11-09 18:54 ` Kai Großjohann
1999-11-09 19:11 ` Eric Marsden
1999-11-10 8:44 ` Kai Großjohann
1999-11-10 9:09 ` Eric Marsden
1999-11-10 15:28 ` Lars Magne Ingebrigtsen
1999-11-10 21:59 ` Kai Großjohann
1999-11-11 15:34 ` Lars Magne Ingebrigtsen
1999-11-11 18:14 ` Kai Großjohann
1999-11-11 23:11 ` Karl EICHWALDER
[not found] ` <vafiu38zyj4.fsf@ <shln84r5cy.fsf@tux.gnu.franken.de>
1999-11-12 9:12 ` Kai Großjohann
1999-11-12 10:14 ` Steinar Bang
1999-11-12 12:19 ` Kai Großjohann
1999-11-10 15:32 ` Lars Magne Ingebrigtsen [this message]
1999-11-10 16:12 ` Eric Marsden
1999-11-11 15:35 ` Lars Magne Ingebrigtsen
1999-11-09 19:12 ` Shenghuo ZHU
1999-11-10 15:25 ` Lars Magne Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m3aeom7470.fsf@quimbies.gnus.org \
--to=larsi@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).