Help parsing a file from one regex to another

zsh-users
 help / color / mirror / code / Atom feed

* Help parsing a file from one regex to another
@ 2005-07-08  6:36 Travis Spencer
  2005-07-08  7:30 ` Doug Kearns
  0 siblings, 1 reply; 6+ messages in thread
From: Travis Spencer @ 2005-07-08  6:36 UTC (permalink / raw)
  To: zsh-users

Hey,

I am trying to parse a range of data out of a file from one regular
expression up to another.  The input starts with `^@main::FLAGS' and
goes up to the next `)'  I am using awk right now, but it is *ugly*
and slow.  Does anyone have an suggestions on ways to speed it up and
beautify it?

TIA.

-- 

Regards,

Travis Spencer
Portland, OR USA

#!/bin/zsh

local flags tmp_file=${TEMP:-/tmp}/test$$

cat <<EOF > $tmp_file
# Where is the cache
$main::CACHEHOST        = "zok.cat.pdx.edu";
$main::CACHEPORT        = "5001";

# Name of flags
@main::FLAGS            = ( "ADMIN", "ARG", "CRACK", "DB", "DESKCATS",
                    "DROID", "ESHED", "HISS", "IMP", "KEYS",
                    "LABRES", "LINUX", "LOST", "LRP", "MAIL",
                    "MSDNAA", "NETWORK", "OIT", "PREFILTER",
                    "PRINTER", "PRINTING", "RESTORE",
                    "SECURITY", "SOFT", "SPAM", "TIER3",
                    "TUTOR", "UNIX", "WEB", "WINTEL");

# Name of priorities
@main::PRIORITIES       = ("GENERAL", "HOLD", "HOT", "STICKY", "WAIT");
EOF

function getflags {
    local config_file line_num

    config_file=${1-$tmp_file}
    line_num=$(grep -n main::FLAGS $config_file)
    flags=( $(
        awk '{
            if (NR >= line_num - 0) {
                if (/main::FLAGS/) {
                    for (i = 4; i <= NF; i++) {
                        print gensub(/"([^"]*)"(,|\);)/, "\\1", "s", $(0 + i));
                    }
                } else {
                    for (i = 1; i <= NF; i++) {
                         print gensub(/"([^"]*)"(,|\);)/, "\\1", "s", $(0 + i));
                    }
                }

                if (/);[[:blank:]]*$/) {
                    exit
                }
            }
        }' line_num=$line_num $config_file
    ) )
}

getflags
printf '>>%s<<\n' "${flags[@]}"
/bin/rm $tmp_file


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help parsing a file from one regex to another
  2005-07-08  6:36 Help parsing a file from one regex to another Travis Spencer
@ 2005-07-08  7:30 ` Doug Kearns
  2005-07-08 17:16   ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Doug Kearns @ 2005-07-08  7:30 UTC (permalink / raw)
  To: zsh-users

On Thu, Jul 07, 2005 at 11:36:12PM -0700, Travis Spencer wrote:
> Hey,
> 
> I am trying to parse a range of data out of a file from one regular
> expression up to another.  The input starts with `^@main::FLAGS' and
> goes up to the next `)'  I am using awk right now, but it is *ugly*
> and slow.  Does anyone have an suggestions on ways to speed it up and
> beautify it?

This is pretty raw...

flags=( ${=${${${(f)"$(<$tmp_file)"}[(r)@main::FLAGS*,(r)\);]}#*\"}//[^[:upper:][:blank:]]/} )

Regards,
Doug


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help parsing a file from one regex to another
  2005-07-08  7:30 ` Doug Kearns
@ 2005-07-08 17:16   ` Bart Schaefer
  2005-07-08 19:24     ` Travis Spencer
  2005-07-09  6:41     ` Doug Kearns
  0 siblings, 2 replies; 6+ messages in thread
From: Bart Schaefer @ 2005-07-08 17:16 UTC (permalink / raw)
  To: zsh-users

On Jul 8,  5:30pm, Doug Kearns wrote:
} Subject: Re: Help parsing a file from one regex to another
}
} > I am trying to parse a range of data out of a file from one regular
} > expression up to another.  The input starts with `^@main::FLAGS' and
} > goes up to the next `)'  I am using awk right now, but it is *ugly*
} 
} flags=( ${=${${${(f)"$(<$tmp_file)"}[(r)@main::FLAGS*,(r)\);]}#*\"}//[^[:upper:][:blank:]]/} )

That only works if the closing paren is on a line by itself, I think.
You need (r)*\); in the subscript expression, maybe even (r)*\);* if there
may be other stuff following the close-paren.

Also, that expression won't work if you replace FLAGS with PRIORITIES,
because (r)*\); always matches the end of the FLAGS assignment, which is
before the beginning of the PRIORITIES assignment.

To extract just the line(s) of interest from the file with sed:

sed -n -e '/^@main::FLAGS.*)/{p;q;}' -e '/^@main::FLAGS/,/)/p'

If you know that the close paren is never on the same line as the @main
then you can eliminate the first -e expression.

So to put the whole thing together:

flags=( ${${$(sed -n -e '/^@main::FLAGS.*)/{p;q;}' -e '/^@main::FLAGS/,/)/p' <$tmp_file)}//[^[:upper:]]/} )
shift flags	# Throw away the word "FLAGS" from @main::FLAGS

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help parsing a file from one regex to another
  2005-07-08 17:16   ` Bart Schaefer
@ 2005-07-08 19:24     ` Travis Spencer
  2005-07-09  6:41     ` Doug Kearns
  1 sibling, 0 replies; 6+ messages in thread
From: Travis Spencer @ 2005-07-08 19:24 UTC (permalink / raw)
  To: zsh-users

On 7/8/05, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Jul 8,  5:30pm, Doug Kearns wrote:
> > flags=( ${=${${${(f)"$(<$tmp_file)"}[(r)@main::FLAGS*,(r)\);]}#*\"}//[^[:upper:][:blank:]]/} )
> So to put the whole thing together:
> 
> flags=( ${${$(sed -n -e '/^@main::FLAGS.*)/{p;q;}' -e '/^@main::FLAGS/,/)/p' <$tmp_file)}//[^[:upper:]]/} )
> shift flags     # Throw away the word "FLAGS" from @main::FLAGS
> 

Thanks Bart and Doug for your replies.  They were both really helpful.  

-- 

Regards,

Travis Spencer
Portland, OR USA


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help parsing a file from one regex to another
  2005-07-08 17:16   ` Bart Schaefer
  2005-07-08 19:24     ` Travis Spencer
@ 2005-07-09  6:41     ` Doug Kearns
  2005-07-09 17:16       ` Bart Schaefer
  1 sibling, 1 reply; 6+ messages in thread
From: Doug Kearns @ 2005-07-09  6:41 UTC (permalink / raw)
  To: zsh-users

On Fri, Jul 08, 2005 at 05:16:27PM +0000, Bart Schaefer wrote:
> On Jul 8,  5:30pm, Doug Kearns wrote:
> } Subject: Re: Help parsing a file from one regex to another
> }
> } > I am trying to parse a range of data out of a file from one regular
> } > expression up to another.  The input starts with `^@main::FLAGS' and
> } > goes up to the next `)'  I am using awk right now, but it is *ugly*
> } 
> } flags=( ${=${${${(f)"$(<$tmp_file)"}[(r)@main::FLAGS*,(r)\);]}#*\"}//[^[:upper:][:blank:]]/} )
> 
> That only works if the closing paren is on a line by itself, I think.
> You need (r)*\); in the subscript expression, maybe even (r)*\);* if there
> may be other stuff following the close-paren.

Thanks for catching that. It 'worked' by accident and I didn't notice
that the PRIORITIES had been pulled in too...

I always forget that a non-matching pattern in the second expression
causes the rest of the array to be selected. Is this documented
somewhere that I'm missing?

<snip>

Regards,
Doug


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help parsing a file from one regex to another
  2005-07-09  6:41     ` Doug Kearns
@ 2005-07-09 17:16       ` Bart Schaefer
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Schaefer @ 2005-07-09 17:16 UTC (permalink / raw)
  To: zsh-users

On Jul 9,  4:41pm, Doug Kearns wrote:
} Subject: Re: Help parsing a file from one regex to another
}
} I always forget that a non-matching pattern in the second expression
} causes the rest of the array to be selected. Is this documented
} somewhere that I'm missing?

Hmm, no, I guess not.

The way the (i) and (I) subscript flags work (on an ordinary array, not
on an associative one) is that either they resolve to a matching index,
or they "fall off" the array in the direction that they're searching and
thus resolve to an index that has the correct magnitude but that can be
distinguished from any successful match.

So, given an array in which no element has the value "missing":

  $array[(I)missing] == 0
  $array[(i)missing] == $(($#array+1))

This is so you can get sensible results from tests like

  if [[ $array[(i)foreward] -ge $array[(I)backward] ]]; then : ...
  fi

Despite the documentation explaining (i) in terms of (r), the actual
implementation of (r) is more like $array[$array[(i)missing]], so the
side-effect of $array[0] being the same as $array[1] comes into play.

  ${(k)array[(R)missing]} == 0
  ${(v)array[(R)missing]} == $array[1]
  ${(k)array[(r)missing]} == $(($#array+1))
  ${(v)array[(r)missing]} == ""

and

  $array[(R)missing,(r)missing] == $array[*]

As a final note ... this behavior of (i) differs in really old versions
of zsh, where (as I recall) a missing element was always resolved to 0,
which meant that $array[$array[(i)missing]] != $array[(r)missing], which
seemed like a bad thing, so it was changed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-07-09 17:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-07-08  6:36 Help parsing a file from one regex to another Travis Spencer
2005-07-08  7:30 ` Doug Kearns
2005-07-08 17:16   ` Bart Schaefer
2005-07-08 19:24     ` Travis Spencer
2005-07-09  6:41     ` Doug Kearns
2005-07-09 17:16       ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).