9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Structural Expressions
@ 2011-04-06 19:14 Taylan Ulrich B.
  0 siblings, 0 replies; only message in thread
From: Taylan Ulrich B. @ 2011-04-06 19:14 UTC (permalink / raw)
  To: 9fans

I've read Rob Pike's paper on structural expressions.  Maybe I'm missing
something, but it seems that, the command language of sam (especially
the text-selection mechanism of it; x, y, g chains) could be merged with
expressions in the style of awk actions, and more importantly, added a
functionality we might call "subroutines", to create a more elegant and
expressive text processing language.

The idea initially sparked up after looking at the refer database
example given in Pike's paper:

  ... Consider a refer database, which has multi-line records separated
  by blank lines.  Each line of a record begins with a percent sign and
  a character indicating the type of information on the line: A for
  author, T for title, etc.  Staying with sam notation, the command to
  search a refer database for all papers written by Bimmler is:

      x/(.+\n)+/ g/%A.*Bimmler/p

  -- break the file into non-empty sequences of non-empty lines and
  print any set of lines containing 'Bimmler' on a line after '%A'.
  (To be compatible with other tools, a '.' does not match a newline.)
  ...

What if the structure of the input stream was complex enough to disable
us from relying on a regex feature like the non-newline-matching dot?
It already goes against the whole idea of structural expressions, which
is to rely on the structure of the stream.
We have a structure made of newline-terminated records holding
information.  We want to search for a specific piece of information that
could be in any of the records, and when said information is found, act
on the original, whole structure.  This means that we have to go back to
a previous selection, in some way.  This is probably most cleanly
implemented with the idea of subroutines.
The following expression, written in a hypothetical sam alike syntax
allowing subroutines inside brackets, does the same thing as the sam
expression given in Pike's example:

  x/(.+\n)+/ [ x/.*\n/ g/%A.* Bimmler/ R ] p

(R to return true.)


The way in which awk alike syntax would be added to our hypothetical
language is open for discussion.  One possibility would be to allow code
blocks that are a subset of awk action blocks, perhaps limited to
variable usage and some comparison operators; guess what this code is
supposed to do:

  x/(.+\n)+/ [ x/.*\n/ x/^Age: (.*)/ { $1 > 17 } ] x/Name: (.*)\n/ y/Name: / p

An alternative would be to make the language awk-based instead of
sam-based, but using sam's text selection commands to populate $0, $1,
etc. (like mentioned by Pike).
Two versions of which one is stream and scripting oriented and leans
more towards awk, and the other buffer and interactive-usage oriented
which leans more towards sam, are perhaps the most likely possibilities.



---

Blergh, now i'm sick of formalism. Disclaimer: I'm 17.
Is this subroutines idea perhaps already implemented in some way?


Taylan Ulrich Bayırlı



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2011-04-06 19:14 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-06 19:14 [9fans] Structural Expressions Taylan Ulrich B.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).