From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <775b8d190602081150l63d18e54i620042153e8e8db@mail.gmail.com> Date: Thu, 9 Feb 2006 06:50:28 +1100 From: Bruce Ellis To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] More 'Sam I am' In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <20060208173110.GJ1620@augusta.math.psu.edu> Topicbox-Message-UUID: f80cab30-ead0-11e9-9d60-3106f5b1d025 I heard an anecdote many years ago about Bill Joy claiming he was "gonna right a fortran compiler tonight" and nothing was ever heard of it again. Is this story scrambled or can anyone confirm or deny this? brucee On 2/9/06, uriel@cat-v.org wrote: > http://homepages.inf.ed.ac.uk/wadler/language.pdf > > I think sam is a much safer bet than some hideous lib that pretends to > be capable of parsing (pseudo)HTML. > > Years ago some people tried to write a web browser in python... some > years later they gave up, all they had produced was a spec for an XML > format to store bookmarks. Quoting boyd: "hysterical." > > uriel > > > On Tue, Feb 07, 2006 at 10:50:22PM -0800, Lyndon Nerenberg wrote: > >> So I thought, but something's not right. I can't demonstrate more > >> until I get to work in the morning. > > > > Hmm. I'm going to make an unpopular but pragmatic suggestion: Don't us= e > > sed or sam, but instead, use a language with an HTML parser available. > > There are some jobs for which regular expressions aren't the best tool; > > I personally think this is one of them. Here's a script I posted to > > USENET years ago to extract data from a table. > > > > #!/usr/local/bin/python > > > > import sys > > import htmllib > > import formatter > > > > class MyParser(htmllib.HTMLParser): > > def __init__(self, format): > > htmllib.HTMLParser.__init__(self, format) > > self.state =3D 0 > > > > def do_tr(self, data): > > if self.state: > > print htmllib.HTMLParser.save_end(self) > > self.state =3D 0 > > > > def do_td(self, data): > > if self.state: > > print "%s, " % htmllib.HTMLParser.save_end(self= ), > > self.state =3D 1 > > htmllib.HTMLParser.save_bgn(self) > > > > parse =3D MyParser(formatter.NullFormatter()) > > for file in sys.argv[1:]: > > parse.feed(open(sys.argv[1],"r").read()) > > parse.close() > > > > I wonder if this even still works..... > > > > - Dan C. > >