9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Dan Cross <cross@math.psu.edu>
To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu>
Subject: Re: [9fans] More 'Sam I am'
Date: Wed,  8 Feb 2006 16:28:50 -0500	[thread overview]
Message-ID: <20060208212850.GK1620@augusta.math.psu.edu> (raw)
In-Reply-To: <43EA351D.7080605@orthanc.ca>

On Wed, Feb 08, 2006 at 10:14:53AM -0800, Lyndon Nerenberg wrote:
> The problem with this is the data I want is interspersed with data that 
> I don't want.  And the bits I don't want are variable length 
> inconsistent multi-line text that is a bitch to filter out of the 
> rendered output stream.  It turns out that sam (against the raw HTML) 
> was the only tool that was able to do the job.  I just wish I could wrap 
> it in a shell script that I could throw at the directory containing all 
> the .html files.

I'm not talking about rendering, just parsing.  Well, ultimately,
what's important is that you get what you need out of the solution, I
guess.  Still, regular expressions alone give you part of the story,
but not the whole thing.  I submit that the power to actually parse
the tokens in the data as opposed to just matching them (even if the
regular expression language you're using is powerful enough to match
the structure of the document) is more powerful.  But hey, if sam
floats your boat, fish on that river!

	- Dan C.



  reply	other threads:[~2006-02-08 21:28 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-08  4:34 Lyndon Nerenberg
2006-02-08  5:29 ` Russ Cox
2006-02-08  5:51   ` Lyndon Nerenberg
2006-02-08  6:14     ` Russ Cox
2006-02-08  6:30       ` Lyndon Nerenberg
2006-02-08  6:46         ` geoff
2006-02-08  6:50           ` Lyndon Nerenberg
2006-02-08 17:31             ` Dan Cross
2006-02-08 18:14               ` Lyndon Nerenberg
2006-02-08 21:28                 ` Dan Cross [this message]
2006-02-10  9:47                   ` Aharon Robbins
2006-02-10 10:45                     ` Steve Simon
2006-02-10 14:40                       ` Dan Cross
2006-02-10 22:53                         ` lucio
2006-02-23 22:52                       ` Victor Nazarov
2006-02-10 11:05                     ` uriel
2006-02-10 12:59                     ` Bruce Ellis
2006-02-08 18:20               ` uriel
2006-02-08 19:50                 ` Bruce Ellis
2006-02-08 21:35                 ` Dan Cross
2006-02-08 21:43                   ` Ronald G Minnich
2006-02-08 22:57                     ` Christoph Lohmann
2006-02-09  0:03                       ` Dan Cross
2006-02-09  0:17                         ` Christoph Lohmann
2006-02-09  0:26                           ` Dan Cross
2006-02-09  0:43                             ` Christoph Lohmann
2006-02-09  1:11                               ` andrey mirtchovski
2006-02-09  1:47                                 ` Christoph Lohmann
2006-02-09  1:56                                 ` Marina Brown
2006-02-09  2:35                                   ` Federico Benavento
2006-02-09  7:34                                     ` Bruce Ellis
2006-02-09 20:11                                     ` Ronald G Minnich
2006-02-09 16:06                                   ` Dave Eckhardt
2006-02-09 22:44                                     ` Marina Brown
2006-02-09 23:06                                       ` Bakul Shah
2006-02-10  1:37                                         ` Micah Stetson
2006-02-08 22:58                     ` Lyndon Nerenberg
2006-02-09 13:04                     ` LiteStar numnums
2006-02-10 12:28 Aharon Robbins
2006-02-10 12:51 ` Dave Lukes
2006-02-10 14:04   ` Wes Kussmaul
2006-02-10 16:15     ` Skip Tavakkolian
2006-02-10 17:22       ` Wes Kussmaul
2006-02-10 17:41         ` Skip Tavakkolian
2006-02-10 18:21           ` Wes Kussmaul
2006-02-10 20:32             ` Lyndon Nerenberg
2006-02-11  4:36     ` Marina Brown
2006-02-11  4:39       ` Lyndon Nerenberg
2006-02-11  5:06     ` jmk
2006-02-11  6:52       ` lucio
2006-02-10 15:17 ` uriel
2006-02-10 17:42 ` Bakul Shah
2006-02-10 13:44 quanstro
2006-02-10 13:57 ` Bruce Ellis
2006-02-10 14:09 quanstro
2006-02-10 14:15 ` Bruce Ellis
2006-02-10 15:17 ` John Stalker
2006-02-10 15:22 quanstro
2006-02-10 16:49 quanstro
2006-02-11  0:10 quanstro
2006-02-11  3:01 ` jmk
2006-02-11  3:40 quanstro
2006-02-11  4:48 quanstro
2006-02-11 11:22 ` Bruce Ellis
2006-02-24  0:55 quanstro
2006-02-24  3:46 ` yard-ape
2006-02-24  4:40 ` Lucio De Re
2006-02-25  7:43   ` Serge Gagnon
2006-04-24 18:05   ` Serge Gagnon
2006-02-24 13:36 quanstro
2006-02-24 13:49 ` Anselm R. Garbe
2006-02-24 14:24   ` Gabriel Ivanes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060208212850.GK1620@augusta.math.psu.edu \
    --to=cross@math.psu.edu \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).