From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <43465F0E.6080104@golubovsky.org> Date: Fri, 7 Oct 2005 07:42:06 -0400 From: Dimitry Golubovsky User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020605 MIME-Version: 1.0 To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] webscript References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Topicbox-Message-UUID: 9612725c-ead0-11e9-9d60-3106f5b1d025 Russ Cox wrote: > I would like to be able to write scripts like this: > > load "http://apc-reset/outlets.htm" > find "yoshimi" > nearest option, set "Immediate Reboot" > submit > > or like this: > > load "http://www.fedex.com/Tracking" > find form > enter "792544024753" > submit > > if (find "No information") { > select enclosing td > print > } else if (find "Ship date") { > select enclosing table > select enclosing table > print > } else { > print ">>> Unexpected Results\n" > print > } > > Does anyone know of programs/languages that let you > script web sessions like that? Searching around finds lots > of mentions of web scraping but no actual programs. > Well, Haskell has several HTML/XML parser packages[0] that parse a sequence of tags and return some tree-like structures, and then various queries may be made over that tree, like extracting tags with given properties, or building new document trees/extracting/transforming subtrees. There are some facilities to connect to web servers and retrieve HTTP responses, and I believe to submit forms, too (although I never tried the latter practically, I only worked with the GET method). Then, with Haskell you may create sort of your own domain specific language (DSL) to perform your tasks. In fact, your example seems like you indeed want some DSL to analyze web pages. A very simple example of this may be found in my "cabalfind"[1] program which parses a search engine (eg Google) response in order to find links with required properties (pointing to files with ".cabal" [2] suffix). Although cabalfind does not introduce any DSLs. However I see two issues here: someone (you?) has to learn Haskell, and, if thinking of Plan9, there is no Haskell implementation except an old Hugs which would be too slow for this task (I still have some plans to port GHC or NHC, but cannot yet find my own resources even to start porting). ------- [0] but there must be a lot of that for Java; why don't you want to use that? [1] http://www golubovsky.org/repos/cabalfind, http://www.haskell.org/hawiki/CabalFind [2] cabal is Haskell software package management system.