* [9fans] webscript
@ 2005-10-07 7:01 Russ Cox
2005-10-07 11:31 ` Eric Van Hensbergen
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Russ Cox @ 2005-10-07 7:01 UTC (permalink / raw)
To: 9fans
I would like to be able to write scripts like this:
load "http://apc-reset/outlets.htm"
find "yoshimi"
nearest option, set "Immediate Reboot"
submit
or like this:
load "http://www.fedex.com/Tracking"
find form
enter "792544024753"
submit
if (find "No information") {
select enclosing td
print
} else if (find "Ship date") {
select enclosing table
select enclosing table
print
} else {
print ">>> Unexpected Results\n"
print
}
Does anyone know of programs/languages that let you
script web sessions like that? Searching around finds lots
of mentions of web scraping but no actual programs.
I have a rough idea of the general structure of the language
and grammar, and I think that libhtml does most of the
heavy lifting already.
Anyone interested in working on this?
Russ
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] webscript
2005-10-07 7:01 [9fans] webscript Russ Cox
@ 2005-10-07 11:31 ` Eric Van Hensbergen
2005-10-07 19:09 ` lucio
2005-10-07 11:42 ` Dimitry Golubovsky
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Eric Van Hensbergen @ 2005-10-07 11:31 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Seems like I recall seeing an Expect derrivative that worked with
websites - can't seem to google it now. Not sure it worked as easily
as you are describing, but it might be a step in the right direction.
-eric
On 10/7/05, Russ Cox <rsc@swtch.com> wrote:
> I would like to be able to write scripts like this:
>
> load "http://apc-reset/outlets.htm"
> find "yoshimi"
> nearest option, set "Immediate Reboot"
> submit
>
> or like this:
>
> load "http://www.fedex.com/Tracking"
> find form
> enter "792544024753"
> submit
>
> if (find "No information") {
> select enclosing td
> print
> } else if (find "Ship date") {
> select enclosing table
> select enclosing table
> print
> } else {
> print ">>> Unexpected Results\n"
> print
> }
>
> Does anyone know of programs/languages that let you
> script web sessions like that? Searching around finds lots
> of mentions of web scraping but no actual programs.
>
> I have a rough idea of the general structure of the language
> and grammar, and I think that libhtml does most of the
> heavy lifting already.
>
> Anyone interested in working on this?
>
> Russ
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] webscript
2005-10-07 7:01 [9fans] webscript Russ Cox
2005-10-07 11:31 ` Eric Van Hensbergen
@ 2005-10-07 11:42 ` Dimitry Golubovsky
2005-10-08 2:03 ` Jack Johnson
2005-10-07 17:06 ` Bakul Shah
2005-12-17 21:26 ` Caerwyn Jones
3 siblings, 1 reply; 9+ messages in thread
From: Dimitry Golubovsky @ 2005-10-07 11:42 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Russ Cox wrote:
> I would like to be able to write scripts like this:
>
> load "http://apc-reset/outlets.htm"
> find "yoshimi"
> nearest option, set "Immediate Reboot"
> submit
>
> or like this:
>
> load "http://www.fedex.com/Tracking"
> find form
> enter "792544024753"
> submit
>
> if (find "No information") {
> select enclosing td
> print
> } else if (find "Ship date") {
> select enclosing table
> select enclosing table
> print
> } else {
> print ">>> Unexpected Results\n"
> print
> }
>
> Does anyone know of programs/languages that let you
> script web sessions like that? Searching around finds lots
> of mentions of web scraping but no actual programs.
>
Well, Haskell has several HTML/XML parser packages[0] that parse a
sequence of tags and return some tree-like structures, and then various
queries may be made over that tree, like extracting tags with given
properties, or building new document trees/extracting/transforming
subtrees. There are some facilities to connect to web servers and
retrieve HTTP responses, and I believe to submit forms, too (although I
never tried the latter practically, I only worked with the GET method).
Then, with Haskell you may create sort of your own domain specific
language (DSL) to perform your tasks. In fact, your example seems like
you indeed want some DSL to analyze web pages.
A very simple example of this may be found in my "cabalfind"[1] program
which parses a search engine (eg Google) response in order to find links
with required properties (pointing to files with ".cabal" [2] suffix).
Although cabalfind does not introduce any DSLs.
However I see two issues here: someone (you?) has to learn Haskell, and,
if thinking of Plan9, there is no Haskell implementation except an old
Hugs which would be too slow for this task (I still have some plans to
port GHC or NHC, but cannot yet find my own resources even to start
porting).
-------
[0] but there must be a lot of that for Java; why don't you want to use
that?
[1] http://www golubovsky.org/repos/cabalfind,
http://www.haskell.org/hawiki/CabalFind
[2] cabal is Haskell software package management system.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] webscript
2005-10-07 7:01 [9fans] webscript Russ Cox
2005-10-07 11:31 ` Eric Van Hensbergen
2005-10-07 11:42 ` Dimitry Golubovsky
@ 2005-10-07 17:06 ` Bakul Shah
2005-10-07 18:11 ` Skip Tavakkolian
2005-12-17 21:26 ` Caerwyn Jones
3 siblings, 1 reply; 9+ messages in thread
From: Bakul Shah @ 2005-10-07 17:06 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> I would like to be able to write scripts like this:
>
> load "http://apc-reset/outlets.htm"
> find "yoshimi"
> nearest option, set "Immediate Reboot"
> submit
>
> or like this:
>
> load "http://www.fedex.com/Tracking"
> find form
> enter "792544024753"
> submit
>
> if (find "No information") {
> select enclosing td
> print
> } else if (find "Ship date") {
> select enclosing table
> select enclosing table
> print
> } else {
> print ">>> Unexpected Results\n"
> print
> }
>
> Does anyone know of programs/languages that let you
> script web sessions like that? Searching around finds lots
> of mentions of web scraping but no actual programs.
>
> I have a rough idea of the general structure of the language
> and grammar, and I think that libhtml does most of the
> heavy lifting already.
There are lots of html parsers but the interesting bit here
is that the parse tree seems to be operated on as a whole --
at least that is how I envision operators like find and
select-enclosing working. This is useful for all sorts of
things: represent some data as a tree, stick probes in it,
walk around the tree, transform it, reuse parts of it in
other trees etc. Then you can use it for munging any
structured document (email, source code, rcs files, excel,
xml, ...). You'd need a parser to map a document's structure
into an s-expr and then you can do all the intresting stuff
in this awk-for-s-expr language.
Regular-tree expressions by Shivers & Bagrak may be of
some interest to you. See
http://www.cc.gatech.edu/fac/Olin.Shivers/papers/trx.pdf
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] webscript
2005-10-07 17:06 ` Bakul Shah
@ 2005-10-07 18:11 ` Skip Tavakkolian
2005-10-07 20:11 ` erik quanstrom
0 siblings, 1 reply; 9+ messages in thread
From: Skip Tavakkolian @ 2005-10-07 18:11 UTC (permalink / raw)
To: 9fans
> You'd need a parser to map a document's structure
> into an s-expr and then you can do all the intresting stuff
> in this awk-for-s-expr language.
sexpr and scheme are made for each other.
irc, presotto was contemplating a solution a couple of years ago.
we had to use an old version of expat xml parsing library (it's open
source) for a project. it has had many updates by now. you supply it
helper functions for the node types you're interested in.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] webscript
2005-10-07 11:31 ` Eric Van Hensbergen
@ 2005-10-07 19:09 ` lucio
0 siblings, 0 replies; 9+ messages in thread
From: lucio @ 2005-10-07 19:09 UTC (permalink / raw)
To: ericvh, 9fans
> Seems like I recall seeing an Expect derrivative that worked with
> websites - can't seem to google it now. Not sure it worked as easily
> as you are describing, but it might be a step in the right direction.
I'm in the wrong academic league, but the nearest I can envisage doing
something along these lines is tclhttpd. By Brent Welch, it has a lot
going for it.
++L
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] webscript
2005-10-07 18:11 ` Skip Tavakkolian
@ 2005-10-07 20:11 ` erik quanstrom
0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2005-10-07 20:11 UTC (permalink / raw)
To: 9fans, Skip Tavakkolian
why not write httpfs analogous to olefs?
diverting from the topic at hand, awk would be very interesting
if extended so that records could be defined by structured regular
expressions other than '^.*\n'.
Skip Tavakkolian <9nut@9netics.com> writes
|
| > You'd need a parser to map a document's structure
| > into an s-expr and then you can do all the intresting stuff
| > in this awk-for-s-expr language.
|
| sexpr and scheme are made for each other.
|
| irc, presotto was contemplating a solution a couple of years ago.
|
| we had to use an old version of expat xml parsing library (it's open
| source) for a project. it has had many updates by now. you supply it
| helper functions for the node types you're interested in.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] webscript
2005-10-07 11:42 ` Dimitry Golubovsky
@ 2005-10-08 2:03 ` Jack Johnson
0 siblings, 0 replies; 9+ messages in thread
From: Jack Johnson @ 2005-10-08 2:03 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On 10/7/05, Dimitry Golubovsky <dimitry@golubovsky.org> wrote:
> Russ Cox wrote:
> > I would like to be able to write scripts like this:
> >
> > load "http://www.fedex.com/Tracking"
> > find form
> > enter "792544024753"
> > submit
Looks a whole lot like AppleScript:
using terms from application "Address Book"
on action property
return "address"
end action property
on action title for thePerson with theAddress
return "Im Stadtplandienst zeigen"
end action title
on should enable action for thePerson with theAddress
return true
end should enable action
on perform action for thePerson with theAddress
tell application "Address Book"
set z to zip of theAddress
set c to city of theAddress
set s to street of theAddress
end tell
tell application "Safari"
set browser to make new document
tell browser
set URL to "http://www.stadtplandienst.de/"
end tell
delay 2 -- give Safari a little time to load the page
do JavaScript "document.forms[0].elements[\"plz\"].value = \"" & z
& "\";" in document 1
do JavaScript "document.forms[0].elements[\"city\"].value = \"" & c
& "\";" in document 1
do JavaScript "document.forms[0].elements[\"str\"].value = \"" & s
& "\";" in document 1
do JavaScript "document.forms[0].submit()" in document 1
end tell
return true
end perform action
end using terms from
(courtesy http://rsvp.atsites.de/discuss/msgReader$236 )
-Jack
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] webscript
2005-10-07 7:01 [9fans] webscript Russ Cox
` (2 preceding siblings ...)
2005-10-07 17:06 ` Bakul Shah
@ 2005-12-17 21:26 ` Caerwyn Jones
3 siblings, 0 replies; 9+ messages in thread
From: Caerwyn Jones @ 2005-12-17 21:26 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On 10/7/05, Russ Cox <rsc@swtch.com> wrote:
> I would like to be able to write scripts like this:
>
> load "http://apc-reset/outlets.htm"
> find "yoshimi"
> nearest option, set "Immediate Reboot"
> submit
>
...
> Does anyone know of programs/languages that let you
> script web sessions like that?
This seems very close to what you describe.
http://groups.csail.mit.edu/uid/chickenfoot/index.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-12-17 21:26 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-07 7:01 [9fans] webscript Russ Cox
2005-10-07 11:31 ` Eric Van Hensbergen
2005-10-07 19:09 ` lucio
2005-10-07 11:42 ` Dimitry Golubovsky
2005-10-08 2:03 ` Jack Johnson
2005-10-07 17:06 ` Bakul Shah
2005-10-07 18:11 ` Skip Tavakkolian
2005-10-07 20:11 ` erik quanstrom
2005-12-17 21:26 ` Caerwyn Jones
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).