From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <775b8d190602100459o19b15ec6pb70618983d037783@mail.gmail.com> Date: Fri, 10 Feb 2006 23:59:32 +1100 From: Bruce Ellis To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] More 'Sam I am' In-Reply-To: <200602100947.k1A9lrbK018110@skeeve.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <20060208212850.GK1620@augusta.math.psu.edu> <200602100947.k1A9lrbK018110@skeeve.com> Topicbox-Message-UUID: f93dc318-ead0-11e9-9d60-3106f5b1d025 yuk On 2/10/06, Aharon Robbins wrote: > In article <20060208212850.GK1620@augusta.math.psu.edu> you write: > >On Wed, Feb 08, 2006 at 10:14:53AM -0800, Lyndon Nerenberg wrote: > >> The problem with this is the data I want is interspersed with data tha= t > >> I don't want. And the bits I don't want are variable length > >> inconsistent multi-line text that is a bitch to filter out of the > >> rendered output stream. It turns out that sam (against the raw HTML) > >> was the only tool that was able to do the job. I just wish I could wr= ap > >> it in a shell script that I could throw at the directory containing al= l > >> the .html files. > > > >I'm not talking about rendering, just parsing. Well, ultimately, > >what's important is that you get what you need out of the solution, I > >guess. Still, regular expressions alone give you part of the story, > >but not the whole thing. I submit that the power to actually parse > >the tokens in the data as opposed to just matching them (even if the > >regular expression language you're using is powerful enough to match > >the structure of the document) is more powerful. But hey, if sam > >floats your boat, fish on that river! > > > > - Dan C. > > Possibly of interest is the xmlgawk project: > > http://www.sourceforge.net/projects/xmlgawk > > This is an extended version of GNU Awk with an XML parser module add-on. > The idea that instead of reading lines, you get XML tokens (tags, fields > in the tags, and marked-up data). I am not directly involved in it, but > it looks like a rather promising alternative for people who would like > to process XML type data in the more traditional Unixy fashion. > > Arnold > -- > Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve = DOT com > P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8= 765 > Nof Ayalon Cell Phone: +972 50 729-7545 > D.N. Shimshon 99785 ISRAEL >