9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Dimitry Golubovsky <dimitry@golubovsky.org>
To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu>
Subject: Re: [9fans] webscript
Date: Fri,  7 Oct 2005 07:42:06 -0400	[thread overview]
Message-ID: <43465F0E.6080104@golubovsky.org> (raw)
In-Reply-To: <ee9e417a0510070001r4a2eb88fx263c748a2bdcfb94@mail.gmail.com>

Russ Cox wrote:
> I would like to be able to write scripts like this:
> 
> 	load "http://apc-reset/outlets.htm"
> 	find "yoshimi"
> 	nearest option, set "Immediate Reboot"
> 	submit
> 
> or like this:
> 
> 	load "http://www.fedex.com/Tracking"
> 	find form
> 	enter "792544024753"
> 	submit
> 	
> 	if (find "No information") {
> 	   select enclosing td
> 	   print
> 	} else if (find "Ship date") {
> 	   select enclosing table
> 	   select enclosing table
> 	   print
> 	} else {
> 	   print ">>> Unexpected Results\n"
> 	   print
> 	}
> 
> Does anyone know of programs/languages that let you
> script web sessions like that?  Searching around finds lots
> of mentions of web scraping but no actual programs.
> 

Well, Haskell has several HTML/XML parser packages[0] that parse a 
sequence of tags and return some tree-like structures, and then various 
queries may be made over that tree, like extracting tags with given 
properties, or building new document trees/extracting/transforming 
subtrees. There are some facilities to connect to web servers and 
retrieve HTTP responses, and I believe to submit forms, too (although I 
never tried the latter practically, I only worked with the GET method).

Then, with Haskell you may create sort of your own domain specific 
language (DSL) to perform your tasks. In fact, your example seems like 
you indeed want some DSL to analyze web pages.

A very simple example of this may be found in my "cabalfind"[1] program 
which parses a search engine (eg Google) response in order to find links 
with required properties (pointing to files with ".cabal" [2] suffix). 
Although cabalfind does not introduce any DSLs.

However I see two issues here: someone (you?) has to learn Haskell, and, 
if thinking of Plan9, there is no Haskell implementation except an old 
Hugs which would be too slow for this task (I still have some plans to 
port GHC or NHC, but cannot yet find my own resources even to start 
porting).

-------
[0] but there must be a lot of that for Java; why don't you want to use 
that?

[1] http://www golubovsky.org/repos/cabalfind,
     http://www.haskell.org/hawiki/CabalFind

[2] cabal is Haskell software package management system.



  parent reply	other threads:[~2005-10-07 11:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-07  7:01 Russ Cox
2005-10-07 11:31 ` Eric Van Hensbergen
2005-10-07 19:09   ` lucio
2005-10-07 11:42 ` Dimitry Golubovsky [this message]
2005-10-08  2:03   ` Jack Johnson
2005-10-07 17:06 ` Bakul Shah
2005-10-07 18:11   ` Skip Tavakkolian
2005-10-07 20:11     ` erik quanstrom
2005-12-17 21:26 ` Caerwyn Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43465F0E.6080104@golubovsky.org \
    --to=dimitry@golubovsky.org \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).