9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] webscript
@ 2005-10-07  7:01 Russ Cox
  2005-10-07 11:31 ` Eric Van Hensbergen
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Russ Cox @ 2005-10-07  7:01 UTC (permalink / raw)
  To: 9fans

I would like to be able to write scripts like this:

	load "http://apc-reset/outlets.htm"
	find "yoshimi"
	nearest option, set "Immediate Reboot"
	submit

or like this:

	load "http://www.fedex.com/Tracking"
	find form
	enter "792544024753"
	submit
	
	if (find "No information") {
	   select enclosing td
	   print
	} else if (find "Ship date") {
	   select enclosing table
	   select enclosing table
	   print
	} else {
	   print ">>> Unexpected Results\n"
	   print
	}

Does anyone know of programs/languages that let you
script web sessions like that?  Searching around finds lots
of mentions of web scraping but no actual programs.

I have a rough idea of the general structure of the language
and grammar, and I think that libhtml does most of the
heavy lifting already.

Anyone interested in working on this?

Russ


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] webscript
  2005-10-07  7:01 [9fans] webscript Russ Cox
@ 2005-10-07 11:31 ` Eric Van Hensbergen
  2005-10-07 19:09   ` lucio
  2005-10-07 11:42 ` Dimitry Golubovsky
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Eric Van Hensbergen @ 2005-10-07 11:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Seems like I recall seeing an Expect derrivative that worked with
websites - can't seem to google it now.  Not sure it worked as easily
as you are describing, but it might be a step in the right direction.

         -eric

On 10/7/05, Russ Cox <rsc@swtch.com> wrote:
> I would like to be able to write scripts like this:
>
>         load "http://apc-reset/outlets.htm"
>         find "yoshimi"
>         nearest option, set "Immediate Reboot"
>         submit
>
> or like this:
>
>         load "http://www.fedex.com/Tracking"
>         find form
>         enter "792544024753"
>         submit
>
>         if (find "No information") {
>            select enclosing td
>            print
>         } else if (find "Ship date") {
>            select enclosing table
>            select enclosing table
>            print
>         } else {
>            print ">>> Unexpected Results\n"
>            print
>         }
>
> Does anyone know of programs/languages that let you
> script web sessions like that?  Searching around finds lots
> of mentions of web scraping but no actual programs.
>
> I have a rough idea of the general structure of the language
> and grammar, and I think that libhtml does most of the
> heavy lifting already.
>
> Anyone interested in working on this?
>
> Russ
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] webscript
  2005-10-07  7:01 [9fans] webscript Russ Cox
  2005-10-07 11:31 ` Eric Van Hensbergen
@ 2005-10-07 11:42 ` Dimitry Golubovsky
  2005-10-08  2:03   ` Jack Johnson
  2005-10-07 17:06 ` Bakul Shah
  2005-12-17 21:26 ` Caerwyn Jones
  3 siblings, 1 reply; 9+ messages in thread
From: Dimitry Golubovsky @ 2005-10-07 11:42 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Russ Cox wrote:
> I would like to be able to write scripts like this:
> 
> 	load "http://apc-reset/outlets.htm"
> 	find "yoshimi"
> 	nearest option, set "Immediate Reboot"
> 	submit
> 
> or like this:
> 
> 	load "http://www.fedex.com/Tracking"
> 	find form
> 	enter "792544024753"
> 	submit
> 	
> 	if (find "No information") {
> 	   select enclosing td
> 	   print
> 	} else if (find "Ship date") {
> 	   select enclosing table
> 	   select enclosing table
> 	   print
> 	} else {
> 	   print ">>> Unexpected Results\n"
> 	   print
> 	}
> 
> Does anyone know of programs/languages that let you
> script web sessions like that?  Searching around finds lots
> of mentions of web scraping but no actual programs.
> 

Well, Haskell has several HTML/XML parser packages[0] that parse a 
sequence of tags and return some tree-like structures, and then various 
queries may be made over that tree, like extracting tags with given 
properties, or building new document trees/extracting/transforming 
subtrees. There are some facilities to connect to web servers and 
retrieve HTTP responses, and I believe to submit forms, too (although I 
never tried the latter practically, I only worked with the GET method).

Then, with Haskell you may create sort of your own domain specific 
language (DSL) to perform your tasks. In fact, your example seems like 
you indeed want some DSL to analyze web pages.

A very simple example of this may be found in my "cabalfind"[1] program 
which parses a search engine (eg Google) response in order to find links 
with required properties (pointing to files with ".cabal" [2] suffix). 
Although cabalfind does not introduce any DSLs.

However I see two issues here: someone (you?) has to learn Haskell, and, 
if thinking of Plan9, there is no Haskell implementation except an old 
Hugs which would be too slow for this task (I still have some plans to 
port GHC or NHC, but cannot yet find my own resources even to start 
porting).

-------
[0] but there must be a lot of that for Java; why don't you want to use 
that?

[1] http://www golubovsky.org/repos/cabalfind,
     http://www.haskell.org/hawiki/CabalFind

[2] cabal is Haskell software package management system.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] webscript
  2005-10-07  7:01 [9fans] webscript Russ Cox
  2005-10-07 11:31 ` Eric Van Hensbergen
  2005-10-07 11:42 ` Dimitry Golubovsky
@ 2005-10-07 17:06 ` Bakul Shah
  2005-10-07 18:11   ` Skip Tavakkolian
  2005-12-17 21:26 ` Caerwyn Jones
  3 siblings, 1 reply; 9+ messages in thread
From: Bakul Shah @ 2005-10-07 17:06 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> I would like to be able to write scripts like this:
> 
> 	load "http://apc-reset/outlets.htm"
> 	find "yoshimi"
> 	nearest option, set "Immediate Reboot"
> 	submit
> 
> or like this:
> 
> 	load "http://www.fedex.com/Tracking"
> 	find form
> 	enter "792544024753"
> 	submit
> 	
> 	if (find "No information") {
> 	   select enclosing td
> 	   print
> 	} else if (find "Ship date") {
> 	   select enclosing table
> 	   select enclosing table
> 	   print
> 	} else {
> 	   print ">>> Unexpected Results\n"
> 	   print
> 	}
> 
> Does anyone know of programs/languages that let you
> script web sessions like that?  Searching around finds lots
> of mentions of web scraping but no actual programs.
> 
> I have a rough idea of the general structure of the language
> and grammar, and I think that libhtml does most of the
> heavy lifting already.

There are lots of html parsers but the interesting bit here
is that the parse tree seems to be operated on as a whole --
at least that is how I envision operators like find and
select-enclosing working.  This is useful for all sorts of
things: represent some data as a tree, stick probes in it,
walk around the tree, transform it, reuse parts of it in
other trees etc.  Then you can use it for munging any
structured document (email, source code, rcs files, excel,
xml, ...).  You'd need a parser to map a document's structure
into an s-expr and then you can do all the intresting stuff
in this awk-for-s-expr language.

Regular-tree expressions by Shivers & Bagrak may be of
some interest to you.  See
    http://www.cc.gatech.edu/fac/Olin.Shivers/papers/trx.pdf


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] webscript
  2005-10-07 17:06 ` Bakul Shah
@ 2005-10-07 18:11   ` Skip Tavakkolian
  2005-10-07 20:11     ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Skip Tavakkolian @ 2005-10-07 18:11 UTC (permalink / raw)
  To: 9fans

> You'd need a parser to map a document's structure
> into an s-expr and then you can do all the intresting stuff
> in this awk-for-s-expr language.

sexpr and scheme are made for each other.

irc, presotto was contemplating a solution a couple of years ago.

we had to use an old version of expat xml parsing library (it's open
source) for a project.  it has had many updates by now.  you supply it
helper functions for the node types you're interested in.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] webscript
  2005-10-07 11:31 ` Eric Van Hensbergen
@ 2005-10-07 19:09   ` lucio
  0 siblings, 0 replies; 9+ messages in thread
From: lucio @ 2005-10-07 19:09 UTC (permalink / raw)
  To: ericvh, 9fans

> Seems like I recall seeing an Expect derrivative that worked with
> websites - can't seem to google it now.  Not sure it worked as easily
> as you are describing, but it might be a step in the right direction.

I'm in the wrong academic league, but the nearest I can envisage doing
something along these lines is tclhttpd.  By Brent Welch, it has a lot
going for it.

++L



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] webscript
  2005-10-07 18:11   ` Skip Tavakkolian
@ 2005-10-07 20:11     ` erik quanstrom
  0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2005-10-07 20:11 UTC (permalink / raw)
  To: 9fans, Skip Tavakkolian

why not write httpfs analogous to olefs?

diverting from the topic at hand, awk would be very interesting 
if extended so that records could be defined by structured regular 
expressions other than '^.*\n'.

Skip Tavakkolian <9nut@9netics.com> writes

| 
| > You'd need a parser to map a document's structure
| > into an s-expr and then you can do all the intresting stuff
| > in this awk-for-s-expr language.
| 
| sexpr and scheme are made for each other.
| 
| irc, presotto was contemplating a solution a couple of years ago.
| 
| we had to use an old version of expat xml parsing library (it's open
| source) for a project.  it has had many updates by now.  you supply it
| helper functions for the node types you're interested in.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] webscript
  2005-10-07 11:42 ` Dimitry Golubovsky
@ 2005-10-08  2:03   ` Jack Johnson
  0 siblings, 0 replies; 9+ messages in thread
From: Jack Johnson @ 2005-10-08  2:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 10/7/05, Dimitry Golubovsky <dimitry@golubovsky.org> wrote:
> Russ Cox wrote:
> > I would like to be able to write scripts like this:
> >
> >       load "http://www.fedex.com/Tracking"
> >       find form
> >       enter "792544024753"
> >       submit

Looks a whole lot like AppleScript:

using terms from application "Address Book"
	on action property
		return "address"
	end action property
	
	on action title for thePerson with theAddress
		return "Im Stadtplandienst zeigen"
	end action title
	
	on should enable action for thePerson with theAddress
		return true
	end should enable action
	
	on perform action for thePerson with theAddress
		tell application "Address Book"
			set z to zip of theAddress
			set c to city of theAddress
			set s to street of theAddress
		end tell
		tell application "Safari"
			set browser to make new document
			tell browser
				set URL to "http://www.stadtplandienst.de/"
			end tell
			delay 2 -- give Safari a little time to load the page
			do JavaScript "document.forms[0].elements[\"plz\"].value = \"" & z
& "\";" in document 1
			do JavaScript "document.forms[0].elements[\"city\"].value = \"" & c
& "\";" in document 1
			do JavaScript "document.forms[0].elements[\"str\"].value = \"" & s
& "\";" in document 1
			do JavaScript "document.forms[0].submit()" in document 1
		end tell
		return true
	end perform action
end using terms from

(courtesy http://rsvp.atsites.de/discuss/msgReader$236 )

-Jack


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] webscript
  2005-10-07  7:01 [9fans] webscript Russ Cox
                   ` (2 preceding siblings ...)
  2005-10-07 17:06 ` Bakul Shah
@ 2005-12-17 21:26 ` Caerwyn Jones
  3 siblings, 0 replies; 9+ messages in thread
From: Caerwyn Jones @ 2005-12-17 21:26 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On 10/7/05, Russ Cox <rsc@swtch.com> wrote:
> I would like to be able to write scripts like this:
>
>         load "http://apc-reset/outlets.htm"
>         find "yoshimi"
>         nearest option, set "Immediate Reboot"
>         submit
>
...

> Does anyone know of programs/languages that let you
> script web sessions like that?

This seems very close to what you describe.
http://groups.csail.mit.edu/uid/chickenfoot/index.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-12-17 21:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-07  7:01 [9fans] webscript Russ Cox
2005-10-07 11:31 ` Eric Van Hensbergen
2005-10-07 19:09   ` lucio
2005-10-07 11:42 ` Dimitry Golubovsky
2005-10-08  2:03   ` Jack Johnson
2005-10-07 17:06 ` Bakul Shah
2005-10-07 18:11   ` Skip Tavakkolian
2005-10-07 20:11     ` erik quanstrom
2005-12-17 21:26 ` Caerwyn Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).