From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <200510071706.j97H6pqi031134@gate.bitblocks.com> To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] webscript In-reply-to: Your message of "Fri, 07 Oct 2005 03:01:01 EDT." From: Bakul Shah Date: Fri, 7 Oct 2005 10:06:51 -0700 Topicbox-Message-UUID: 9684c82a-ead0-11e9-9d60-3106f5b1d025 > I would like to be able to write scripts like this: > > load "http://apc-reset/outlets.htm" > find "yoshimi" > nearest option, set "Immediate Reboot" > submit > > or like this: > > load "http://www.fedex.com/Tracking" > find form > enter "792544024753" > submit > > if (find "No information") { > select enclosing td > print > } else if (find "Ship date") { > select enclosing table > select enclosing table > print > } else { > print ">>> Unexpected Results\n" > print > } > > Does anyone know of programs/languages that let you > script web sessions like that? Searching around finds lots > of mentions of web scraping but no actual programs. > > I have a rough idea of the general structure of the language > and grammar, and I think that libhtml does most of the > heavy lifting already. There are lots of html parsers but the interesting bit here is that the parse tree seems to be operated on as a whole -- at least that is how I envision operators like find and select-enclosing working. This is useful for all sorts of things: represent some data as a tree, stick probes in it, walk around the tree, transform it, reuse parts of it in other trees etc. Then you can use it for munging any structured document (email, source code, rcs files, excel, xml, ...). You'd need a parser to map a document's structure into an s-expr and then you can do all the intresting stuff in this awk-for-s-expr language. Regular-tree expressions by Shivers & Bagrak may be of some interest to you. See http://www.cc.gatech.edu/fac/Olin.Shivers/papers/trx.pdf