From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/26587 Path: main.gmane.org!not-for-mail From: Eric Marsden Newsgroups: gmane.emacs.gnus.general Subject: Re: Announce: nnwarchive Date: 10 Nov 1999 17:12:06 +0100 Organization: LAAS-CNRS http://www.laas.fr/ Sender: owner-ding@hpc.uh.edu Message-ID: References: <5biu3bd2dh.fsf@giga.cs.rochester.edu> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: main.gmane.org 1035163767 20179 80.91.224.250 (21 Oct 2002 01:29:27 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 01:29:27 +0000 (UTC) Return-Path: Original-Received: from lisa.math.uh.edu (lisa.math.uh.edu [129.7.128.49]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id LAA05161 for ; Wed, 10 Nov 1999 11:12:43 -0500 (EST) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by lisa.math.uh.edu (8.9.1/8.9.1) with ESMTP id KAB03741; Wed, 10 Nov 1999 10:12:36 -0600 (CST) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Wed, 10 Nov 1999 10:12:54 -0600 (CST) Original-Received: from sclp3.sclp.com (root@sclp3.sclp.com [204.252.123.139]) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id KAA19780 for ; Wed, 10 Nov 1999 10:12:41 -0600 (CST) Original-Received: from laas.laas.fr (root@laas.laas.fr [140.93.0.15]) by sclp3.sclp.com (8.8.5/8.8.5) with ESMTP id LAA05156 for ; Wed, 10 Nov 1999 11:12:08 -0500 (EST) Original-Received: from dukas.laas.fr (dukas [140.93.21.58]) by laas.laas.fr (8.9.3/8.9.3) with ESMTP id RAA17269 for ; Wed, 10 Nov 1999 17:12:02 +0100 (MET) Original-Received: (from emarsden@localhost) by dukas.laas.fr (8.9.3/8.9.3) id RAA23146; Wed, 10 Nov 1999 17:12:07 +0100 (MET) Original-To: ding@gnus.org X-Eric-Conspiracy: there is no conspiracy X-Attribution: ecm X-URL: http://www.chez.com/emarsden/ In-Reply-To: Lars Magne Ingebrigtsen's message of "10 Nov 1999 16:32:19 +0100" Original-Lines: 26 X-Mailer: Gnus v5.7/Emacs 20.4 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:26587 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:26587 >>>>> "lmi" == Lars Magne Ingebrigtsen writes: lmi> It's an interesting idea. It doesn't even have to be code on the lmi> trusted web site -- it could just be something that would say how the lmi> w3 parse tree should be interpreted. Like -- "this node in the tree lmi> is the author name, and this node is the body of the message". lmi> lmi> It sounds like an interesting project. "How to extract information lmi> from HTML pages." But it'll be difficult. hmm, that's not where I was headed, but it's interesting. Current research in the area are WebL [1] which talks about a "markup algebra [which] extracts structured and unstructured values from pages for computation, and is based on algebraic operations on sets of markup elements.", and sgrep [2], a tool for searching structured text, which "implements an algebra of unrestricted text fragments called regions. The algebra allows the retrieval of document components, represented as regions, based on conditions on their relative containment and ordering.". [1] [2] -- Eric Marsden