From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/26582 Path: main.gmane.org!not-for-mail From: Lars Magne Ingebrigtsen Newsgroups: gmane.emacs.gnus.general Subject: Re: Announce: nnwarchive Date: 10 Nov 1999 16:32:19 +0100 Organization: Programmerer Ingebrigtsen Sender: owner-ding@hpc.uh.edu Message-ID: References: <5biu3bd2dh.fsf@giga.cs.rochester.edu> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035163763 20149 80.91.224.250 (21 Oct 2002 01:29:23 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 01:29:23 +0000 (UTC) Return-Path: Original-Received: from lisa.math.uh.edu (lisa.math.uh.edu [129.7.128.49]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id KAA04800 for ; Wed, 10 Nov 1999 10:29:51 -0500 (EST) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by lisa.math.uh.edu (8.9.1/8.9.1) with ESMTP id JAB03592; Wed, 10 Nov 1999 09:29:48 -0600 (CST) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Wed, 10 Nov 1999 09:30:05 -0600 (CST) Original-Received: from sclp3.sclp.com (root@sclp3.sclp.com [204.252.123.139]) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id JAA18982 for ; Wed, 10 Nov 1999 09:29:53 -0600 (CST) Original-Received: from quimby.gnus.org (quimby.gnus.org [193.69.4.139]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id KAA04787 for ; Wed, 10 Nov 1999 10:29:23 -0500 (EST) Original-Received: (from news@localhost) by quimby.gnus.org (8.9.3/8.9.3) id QAA10752 for ding@gnus.org; Wed, 10 Nov 1999 16:31:31 +0100 (CET) Original-To: ding@gnus.org Original-Path: not-for-mail Original-Newsgroups: gnus.ding Original-Lines: 28 Original-NNTP-Posting-Host: quimbies.gnus.org Original-X-Trace: quimby.gnus.org 942247891 7613 193.69.4.148 (10 Nov 1999 15:31:31 GMT) Original-X-Complaints-To: usenet@quimby.gnus.org Original-NNTP-Posting-Date: 10 Nov 1999 15:31:31 GMT Mail-Copies-To: never X-Now-Playing: Chris Watson's _Outside the Circle of Fire_: "Spotted hyena contact whoops, Billashaka Luger, Maasai Mara, Kenya" User-Agent: Gnus/5.070099 (Pterodactyl Gnus v0.99) XEmacs/21.2 (Sumida) X-Face: &w!^oO~dS|}-P0~ge{$c!h\ writes: > They all operate basically the same way: download the page, extract > the information, return it formatted. They face the same challenges: > conveniently and securely providing updates to users relatively often, > as web sites undergo facelifts. The layout parsing is one thing, but w3 can deal with that better than the current regexp-based things in nnweb, I think. > So I am wondering if there is scope for a generic "wash.el" which > would take as an argument an URL (which is dynamically generated in > certain cases), a parser function which returns matches, operating on > the raw HTML, and provides automatic or semi-automatic update services > which connect to some trusted web site where washing-authors can put > updates. It's an interesting idea. It doesn't even have to be code on the trusted web site -- it could just be something that would say how the w3 parse tree should be interpreted. Like -- "this node in the tree is the author name, and this node is the body of the message". It sounds like an interesting project. "How to extract information from HTML pages." But it'll be difficult. -- (domestic pets only, the antidote for overdose, milk.) larsi@gnus.org * Lars Magne Ingebrigtsen