* [9fans] "Intervalic
@ 2006-02-22 18:33 yard-ape
2006-02-22 18:51 ` Russ Cox
2006-02-22 19:50 ` Bakul Shah
0 siblings, 2 replies; 11+ messages in thread
From: yard-ape @ 2006-02-22 18:33 UTC (permalink / raw)
To: 9fans, REs,
/tm\x04� \b��
\b\x01@,
Explicit, or
[-- Attachment #1: p9res --]
[-- Type: text/plain, Size: 1378 bytes --]
I'm using awk on Plan9 to restructure a 70,000 cel table containing no proper delimiters---it's just visually-formatted with spaces. (Records split over multiple lines, erratically justified columns, etc. etc. A good time.)
For such a case in unix, I'd make heavy use of what I've seen referred to as "intervalic" regular expressions (numeric ranges expressed in braces: "\{n,n\}" in simple and basic unix regular expressions, "{n,n}" in extended posix regular expressions). But regexp(6) doesn't mention these, and I get errors from sam, awk, ed, et. al. when I try them.
Am I misunderstanding the REP operators? If not, how do you folks like to handle such problems as one might use intervalic expressions on? Do you just use an explicit regular expression? If so, I'm curious about the reasoning behind the design decision to leave intervalic expressions out.
Contrived Example. To match the character before the second occurance of "Unit" in the line:
Item Number Unit Unit/Lot Date Unit Operating
Simple and Basic REs:
.\{24\}
Extended:
.{24}
Plan9:
'Item Number Unit '
(or):
'Item Number Unit +'
(or the more general):
........................
Anbd that last expression I suppose I would create with something like
seq 24 | sed 's/.*/./g' | tr -d '\
'
Thanks in advance,
-Derek
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [9fans] "Intervalic
2006-02-22 18:33 [9fans] "Intervalic yard-ape
@ 2006-02-22 18:51 ` Russ Cox
2006-02-22 19:20 ` yard-ape
2006-02-22 19:50 ` Bakul Shah
1 sibling, 1 reply; 11+ messages in thread
From: Russ Cox @ 2006-02-22 18:51 UTC (permalink / raw)
To: 9fans
> Am I misunderstanding the REP operators? If not, how do you folks
> like to handle such problems as one might use intervalic expressions on?
> Do you just use an explicit regular expression? If so, I'm curious
> about the reasoning behind the design decision to leave intervalic
> expressions out.
They're a recent addition to the regular expression world and
no one has cared enough to add them.
If you're using awk and want the first 24 characters, I'd use substr.
Russ
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [9fans] "Intervalic
2006-02-22 18:33 [9fans] "Intervalic yard-ape
2006-02-22 18:51 ` Russ Cox
@ 2006-02-22 19:50 ` Bakul Shah
2006-02-22 20:58 ` yard-ape
1 sibling, 1 reply; 11+ messages in thread
From: Bakul Shah @ 2006-02-22 19:50 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> For such a case in unix, I'd make heavy use of what I've seen referred to as
> "intervalic" regular expressions (numeric ranges expressed in braces: "\{n,n\
> }" in simple and basic unix regular expressions, "{n,n}" in extended posix re
> gular expressions). But regexp(6) doesn't mention these, and I get errors fr
> om sam, awk, ed, et. al. when I try them.
Not as convenient but can't you transform your extended RE
into basic REs?
RE{0,n} == RE? RE{0,n-1}
RE{m,n} == RE RE{m-1,n-1}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [9fans] "Intervalic
2006-02-22 19:50 ` Bakul Shah
@ 2006-02-22 20:58 ` yard-ape
2006-02-22 21:47 ` Russ Cox
0 siblings, 1 reply; 11+ messages in thread
From: yard-ape @ 2006-02-22 20:58 UTC (permalink / raw)
To: 9fans
Bakul Shah <bakul+plan9@BitBlocks.com> wrote:
> Not as convenient but can't you transform your extended RE
> into basic REs?
>
> RE{0,n} == RE? RE{0,n-1}
> RE{m,n} == RE RE{m-1,n-1}
Sorry Bakul, you've lost me here. These look like awk idioms with unix extended regular expressions forms, but I don't really understand them. In any case, even basic REs aren't available in Plan9 awk---right?
-Derek
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [9fans] "Intervalic
2006-02-22 20:58 ` yard-ape
@ 2006-02-22 21:47 ` Russ Cox
2006-02-22 23:36 ` yard-ape
0 siblings, 1 reply; 11+ messages in thread
From: Russ Cox @ 2006-02-22 21:47 UTC (permalink / raw)
To: 9fans
> Sorry Bakul, you've lost me here. These look like awk idioms with unix
> extended regular expressions forms, but I don't really understand them.
> In any case, even basic REs aren't available in Plan9 awk---right?
What he meant is that if you have a regular expression of the form
a{0,n} for any a, then you can replace that with a?a{0,n-1}, and
similarly a{m,n} can be replaced with a{m-1,n-1}. This gives you
an algorithm to convert a so-called intervalic regular expression
into a standard Plan 9 regular expression.
And awk does have standard Plan 9 regular expressions.
Russ
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [9fans] "Intervalic
2006-02-22 21:47 ` Russ Cox
@ 2006-02-22 23:36 ` yard-ape
2006-02-22 18:50 ` Russ Cox
2006-02-22 23:42 ` andrey mirtchovski
0 siblings, 2 replies; 11+ messages in thread
From: yard-ape @ 2006-02-22 23:36 UTC (permalink / raw)
To: 9fans
"Russ Cox" <rsc@swtch.com> wrote:
> What he meant is that if you have a regular expression of the form
> a{0,n} for any a, then you can replace that with a?a{0,n-1}, and
> similarly a{m,n} can be replaced with a{m-1,n-1}. This gives you
> an algorithm to convert a so-called intervalic regular expression
> into a standard Plan 9 regular expression.
>
> And awk does have standard Plan 9 regular expressions.
>
> Russ
Huh?
bash$ cat <<EOF | gawk -W re-interval '{gsub( "[lque]{2,3}", "*");print}'
> Some random text
> to hopefully clarify the question
> EOF
Some random text
to hopefu*y clarify the *stion
bash$
...Applying the suggested algorithm as I understand it:
term% cat <<EOF | awk '{gsub( "[lque]?[lque]{1,2}", "*");print}'
Some random text
to hopefully clarify the question
EOF
Some random text
to hopefully clarify the question
term%
-Derek
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [9fans] "Intervalic
2006-02-22 23:36 ` yard-ape
@ 2006-02-22 18:50 ` Russ Cox
2006-02-23 0:05 ` yard-ape
2006-02-22 23:42 ` andrey mirtchovski
1 sibling, 1 reply; 11+ messages in thread
From: Russ Cox @ 2006-02-22 18:50 UTC (permalink / raw)
To: 9fans
Here are the real rules:
- a{0,n} can be replaced by a?a{0,n-1}
- a{m,n} can be replaced by aa{m-1,n-1}
- a{0,0} can be replaced by the empty string
- REPEAT until all the { } are gone
This is just a complicated way of saying you
can replace a{n,m} with n copies of "a" followed
by m-n copies of "a?".
Russ
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [9fans] "Intervalic
2006-02-22 18:50 ` Russ Cox
@ 2006-02-23 0:05 ` yard-ape
0 siblings, 0 replies; 11+ messages in thread
From: yard-ape @ 2006-02-23 0:05 UTC (permalink / raw)
To: 9fans
"Russ Cox" <rsc@swtch.com> wrote:
> Here are the real rules:
>
> - a{0,n} can be replaced by a?a{0,n-1}
> - a{m,n} can be replaced by aa{m-1,n-1}
> - a{0,0} can be replaced by the empty string
> - REPEAT until all the { } are gone
>
> This is just a complicated way of saying you
> can replace a{n,m} with n copies of "a" followed
> by m-n copies of "a?".
>
> Russ
Got it. Thanks for your patience. I'll be quiet now.
-Derek
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [9fans] "Intervalic
2006-02-22 23:36 ` yard-ape
2006-02-22 18:50 ` Russ Cox
@ 2006-02-22 23:42 ` andrey mirtchovski
1 sibling, 0 replies; 11+ messages in thread
From: andrey mirtchovski @ 2006-02-22 23:42 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> Huh?
http://pages.cpsc.ucalgary.ca/~mirtchov/p9/canthave.png
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-02-23 2:24 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-22 18:33 [9fans] "Intervalic yard-ape
2006-02-22 18:51 ` Russ Cox
2006-02-22 19:20 ` yard-ape
2006-02-23 2:24 ` geoff
2006-02-22 19:50 ` Bakul Shah
2006-02-22 20:58 ` yard-ape
2006-02-22 21:47 ` Russ Cox
2006-02-22 23:36 ` yard-ape
2006-02-22 18:50 ` Russ Cox
2006-02-23 0:05 ` yard-ape
2006-02-22 23:42 ` andrey mirtchovski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).