9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] "Intervalic
@ 2006-02-22 18:33 yard-ape
  2006-02-22 18:51 ` Russ Cox
  2006-02-22 19:50 ` Bakul Shah
  0 siblings, 2 replies; 11+ messages in thread
From: yard-ape @ 2006-02-22 18:33 UTC (permalink / raw)
  To: 9fans, REs,
	/tm\x04� \b��
	\b\x01@,
	Explicit, or

[-- Attachment #1: p9res --]
[-- Type: text/plain, Size: 1378 bytes --]

I'm using awk on Plan9 to restructure a 70,000 cel table containing no proper delimiters---it's just visually-formatted with spaces.  (Records split over multiple lines, erratically justified columns, etc. etc.  A good time.)

For such a case in unix, I'd make heavy use of what I've seen referred to as "intervalic" regular expressions (numeric ranges expressed in braces: "\{n,n\}" in simple and basic unix regular expressions, "{n,n}" in extended posix regular expressions).  But regexp(6) doesn't mention these, and I get errors from sam, awk, ed, et. al. when I try them.

Am I misunderstanding the REP operators?  If not, how do you folks like to handle such problems as one might use intervalic expressions on?  Do you just use an explicit regular expression?  If so, I'm curious about the reasoning behind the design decision to leave intervalic expressions out.  

Contrived Example.  To match the character before the second occurance of "Unit" in the line:

Item Number     Unit    Unit/Lot  Date               Unit       Operating

Simple and Basic REs:
.\{24\}

Extended:
.{24}

Plan9:
'Item Number     Unit    '
(or):
'Item Number     Unit  +'
(or the more general):
........................


Anbd that last expression I suppose I would create with something like 

seq 24 | sed 's/.*/./g' | tr -d '\
'

Thanks in advance,

-Derek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 23:36       ` yard-ape
@ 2006-02-22 18:50         ` Russ Cox
  2006-02-23  0:05           ` yard-ape
  2006-02-22 23:42         ` andrey mirtchovski
  1 sibling, 1 reply; 11+ messages in thread
From: Russ Cox @ 2006-02-22 18:50 UTC (permalink / raw)
  To: 9fans

Here are the real rules:

	- a{0,n} can be replaced by a?a{0,n-1}
	- a{m,n} can be replaced by aa{m-1,n-1}
	- a{0,0} can be replaced by the empty string
	- REPEAT until all the { } are gone

This is just a complicated way of saying you
can replace a{n,m} with n copies of "a" followed
by m-n copies of "a?".

Russ



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 18:33 [9fans] "Intervalic yard-ape
@ 2006-02-22 18:51 ` Russ Cox
  2006-02-22 19:20   ` yard-ape
  2006-02-22 19:50 ` Bakul Shah
  1 sibling, 1 reply; 11+ messages in thread
From: Russ Cox @ 2006-02-22 18:51 UTC (permalink / raw)
  To: 9fans

> Am I misunderstanding the REP operators?  If not, how do you folks
> like to handle such problems as one might use intervalic expressions on?
> Do you just use an explicit regular expression?  If so, I'm curious
> about the reasoning behind the design decision to leave intervalic
> expressions out.

They're a recent addition to the regular expression world and
no one has cared enough to add them.  

If you're using awk and want the first 24 characters, I'd use substr.

Russ



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 18:51 ` Russ Cox
@ 2006-02-22 19:20   ` yard-ape
  2006-02-23  2:24     ` geoff
  0 siblings, 1 reply; 11+ messages in thread
From: yard-ape @ 2006-02-22 19:20 UTC (permalink / raw)
  To: 9fans

"Russ Cox" <rsc@swtch.com> wrote:

> If you're using awk and want the first 24 characters, I'd use substr.

Yup, substr is in there; but it needs to be fed the position dynamically with the regex.  Thanks again!

-Derek


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 18:33 [9fans] "Intervalic yard-ape
  2006-02-22 18:51 ` Russ Cox
@ 2006-02-22 19:50 ` Bakul Shah
  2006-02-22 20:58   ` yard-ape
  1 sibling, 1 reply; 11+ messages in thread
From: Bakul Shah @ 2006-02-22 19:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> For such a case in unix, I'd make heavy use of what I've seen referred to as 
> "intervalic" regular expressions (numeric ranges expressed in braces: "\{n,n\
> }" in simple and basic unix regular expressions, "{n,n}" in extended posix re
> gular expressions).  But regexp(6) doesn't mention these, and I get errors fr
> om sam, awk, ed, et. al. when I try them.

Not as convenient but can't you transform your extended RE
into basic REs?

    RE{0,n} == RE? RE{0,n-1}
    RE{m,n} == RE RE{m-1,n-1}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 19:50 ` Bakul Shah
@ 2006-02-22 20:58   ` yard-ape
  2006-02-22 21:47     ` Russ Cox
  0 siblings, 1 reply; 11+ messages in thread
From: yard-ape @ 2006-02-22 20:58 UTC (permalink / raw)
  To: 9fans

Bakul Shah <bakul+plan9@BitBlocks.com> wrote:

> Not as convenient but can't you transform your extended RE
> into basic REs?
>
>     RE{0,n} == RE? RE{0,n-1}
>     RE{m,n} == RE RE{m-1,n-1}

Sorry Bakul, you've lost me here.  These look like awk idioms with unix extended regular expressions forms, but I don't really understand them.  In any case, even basic REs aren't available in Plan9 awk---right?

-Derek


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 20:58   ` yard-ape
@ 2006-02-22 21:47     ` Russ Cox
  2006-02-22 23:36       ` yard-ape
  0 siblings, 1 reply; 11+ messages in thread
From: Russ Cox @ 2006-02-22 21:47 UTC (permalink / raw)
  To: 9fans

> Sorry Bakul, you've lost me here.  These look like awk idioms with unix
> extended regular expressions forms, but I don't really understand them.
> In any case, even basic REs aren't available in Plan9 awk---right?

What he meant is that if you have a regular expression of the form
a{0,n} for any a, then you can replace that with a?a{0,n-1}, and 
similarly a{m,n} can be replaced with a{m-1,n-1}.  This gives you
an algorithm to convert a so-called intervalic regular expression
into a standard Plan 9 regular expression.

And awk does have standard Plan 9 regular expressions.

Russ



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 21:47     ` Russ Cox
@ 2006-02-22 23:36       ` yard-ape
  2006-02-22 18:50         ` Russ Cox
  2006-02-22 23:42         ` andrey mirtchovski
  0 siblings, 2 replies; 11+ messages in thread
From: yard-ape @ 2006-02-22 23:36 UTC (permalink / raw)
  To: 9fans

"Russ Cox" <rsc@swtch.com> wrote:

> What he meant is that if you have a regular expression of the form
> a{0,n} for any a, then you can replace that with a?a{0,n-1}, and 
> similarly a{m,n} can be replaced with a{m-1,n-1}.  This gives you
> an algorithm to convert a so-called intervalic regular expression
> into a standard Plan 9 regular expression.
>
> And awk does have standard Plan 9 regular expressions.
>
> Russ

Huh?

bash$ cat <<EOF | gawk -W re-interval '{gsub( "[lque]{2,3}", "*");print}'
> Some random text
> to hopefully clarify the question
> EOF
Some random text
to hopefu*y clarify the *stion
bash$

...Applying the suggested algorithm as I understand it:

term% cat <<EOF | awk '{gsub( "[lque]?[lque]{1,2}", "*");print}'
	Some random text
	to hopefully clarify the question
	EOF
Some random text
to hopefully clarify the question
term%

-Derek


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 23:36       ` yard-ape
  2006-02-22 18:50         ` Russ Cox
@ 2006-02-22 23:42         ` andrey mirtchovski
  1 sibling, 0 replies; 11+ messages in thread
From: andrey mirtchovski @ 2006-02-22 23:42 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Huh?

http://pages.cpsc.ucalgary.ca/~mirtchov/p9/canthave.png


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 18:50         ` Russ Cox
@ 2006-02-23  0:05           ` yard-ape
  0 siblings, 0 replies; 11+ messages in thread
From: yard-ape @ 2006-02-23  0:05 UTC (permalink / raw)
  To: 9fans

"Russ Cox" <rsc@swtch.com> wrote:

> Here are the real rules:
>
> 	- a{0,n} can be replaced by a?a{0,n-1}
> 	- a{m,n} can be replaced by aa{m-1,n-1}
> 	- a{0,0} can be replaced by the empty string
> 	- REPEAT until all the { } are gone
>
> This is just a complicated way of saying you
> can replace a{n,m} with n copies of "a" followed
> by m-n copies of "a?".
>
> Russ

Got it.  Thanks for your patience.  I'll be quiet now.

-Derek


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [9fans] "Intervalic
  2006-02-22 19:20   ` yard-ape
@ 2006-02-23  2:24     ` geoff
  0 siblings, 0 replies; 11+ messages in thread
From: geoff @ 2006-02-23  2:24 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 185 bytes --]

You may want to generate some or all of the awk program on-the-fly, or
use awk -v to set awk variables from the command line, or use the
ENVIRON array to read environment variables.

[-- Attachment #2: Type: message/rfc822, Size: 3139 bytes --]

From: yard-ape@telus.net
To: 9fans@cse.psu.edu
Subject: Re: [9fans] "Intervalic
Date: Wed, 22 Feb 2006 11:20:20 -0800
Message-ID: <43fcb974.QuSET2EbMRC8b+Zg%yard-ape@telus.net>

"Russ Cox" <rsc@swtch.com> wrote:

> If you're using awk and want the first 24 characters, I'd use substr.

Yup, substr is in there; but it needs to be fed the position dynamically with the regex.  Thanks again!

-Derek

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-02-23  2:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-22 18:33 [9fans] "Intervalic yard-ape
2006-02-22 18:51 ` Russ Cox
2006-02-22 19:20   ` yard-ape
2006-02-23  2:24     ` geoff
2006-02-22 19:50 ` Bakul Shah
2006-02-22 20:58   ` yard-ape
2006-02-22 21:47     ` Russ Cox
2006-02-22 23:36       ` yard-ape
2006-02-22 18:50         ` Russ Cox
2006-02-23  0:05           ` yard-ape
2006-02-22 23:42         ` andrey mirtchovski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).