Computer Old Farts Forum
 help / color / mirror / Atom feed
* [COFF] Grep has gone overboard
@ 2022-05-30  4:12 Dave Horsfall
  2022-05-30  4:47 ` [COFF] " Bakul Shah
  2022-05-30  7:38 ` Ralph Corderoy
  0 siblings, 2 replies; 11+ messages in thread
From: Dave Horsfall @ 2022-05-30  4:12 UTC (permalink / raw)
  To: Computer Old Farts Followers

I remember when the documentation was easy to read, but now...

I'm trying to find a regex i.e. as used by GREP to find all words in 
"badugly" with each letter used only once and all are 7-letter words i.e. 
they are all anagrams; perhaps it's my age (I hit 70 soon, after decades 
in Unix) but I cannot seem to find a way to restrict the count i.e. each 
letter used only once, but in any order.

I'll guess that it involves some obscure use of "{}*\" etc, which I've 
never really grokked...

Ta muchly.

-- Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  4:12 [COFF] Grep has gone overboard Dave Horsfall
@ 2022-05-30  4:47 ` Bakul Shah
  2022-05-30  6:35   ` Michael Kjörling
  2022-05-30  8:16   ` Ralph Corderoy
  2022-05-30  7:38 ` Ralph Corderoy
  1 sibling, 2 replies; 11+ messages in thread
From: Bakul Shah @ 2022-05-30  4:47 UTC (permalink / raw)
  To: Computer Old Farts Followers

You can write a program to generate all permutations and use that as your regexp.
For example abc maps to abc|acb|bac|bca|cab|cba. You can rearrange it as
a(bc|cb)|b(ac|ca)|c(ab|ba) to see how an n letter permutation is computed from
permutations of n-1 letters. I don’t think you can do better.

> On May 29, 2022, at 9:12 PM, Dave Horsfall <dave@horsfall.org> wrote:
> 
> I remember when the documentation was easy to read, but now...
> 
> I'm trying to find a regex i.e. as used by GREP to find all words in 
> "badugly" with each letter used only once and all are 7-letter words i.e. 
> they are all anagrams; perhaps it's my age (I hit 70 soon, after decades 
> in Unix) but I cannot seem to find a way to restrict the count i.e. each 
> letter used only once, but in any order.
> 
> I'll guess that it involves some obscure use of "{}*\" etc, which I've 
> never really grokked...
> 
> Ta muchly.
> 
> -- Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  4:47 ` [COFF] " Bakul Shah
@ 2022-05-30  6:35   ` Michael Kjörling
  2022-05-30  6:51     ` Bakul Shah
  2022-05-30  8:01     ` Ralph Corderoy
  2022-05-30  8:16   ` Ralph Corderoy
  1 sibling, 2 replies; 11+ messages in thread
From: Michael Kjörling @ 2022-05-30  6:35 UTC (permalink / raw)
  To: coff

On 29 May 2022 21:47 -0700, from bakul@iitbombay.org (Bakul Shah):
> You can write a program to generate all permutations and use that as your regexp.
> For example abc maps to abc|acb|bac|bca|cab|cba. You can rearrange it as
> a(bc|cb)|b(ac|ca)|c(ab|ba) to see how an n letter permutation is computed from
> permutations of n-1 letters. I don’t think you can do better.

I was trying to do something very similar using procmail regexes some
time ago, only with strings instead of individual characters (for
parsing and matching on arbitrarily-ordered, comma-separated values
within a header). I never did come up with a good solution.

-- 
Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se
 “Remember when, on the Internet, nobody cared that you were a dog?”


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  6:35   ` Michael Kjörling
@ 2022-05-30  6:51     ` Bakul Shah
  2022-05-30  8:26       ` Ralph Corderoy
  2022-05-30  8:01     ` Ralph Corderoy
  1 sibling, 1 reply; 11+ messages in thread
From: Bakul Shah @ 2022-05-30  6:51 UTC (permalink / raw)
  To: Michael Kjörling; +Cc: coff

[-- Attachment #1: Type: text/plain, Size: 1181 bytes --]

Forgot to add, you can solve this another way  (taking the abc example):
  grep ‘^...$’ | grep ‘[^a]*a[^a]*’ | grep ‘[^b]*b[^b]*’ | grep ‘[^c]*c[^c]*’
This will scale better for large N and easier to grok and probably what
Dave had in mind!

> On May 29, 2022, at 11:35 PM, Michael Kjörling <michael@kjorling.se> wrote:
> 
> On 29 May 2022 21:47 -0700, from bakul@iitbombay.org (Bakul Shah):
>> You can write a program to generate all permutations and use that as your regexp.
>> For example abc maps to abc|acb|bac|bca|cab|cba. You can rearrange it as
>> a(bc|cb)|b(ac|ca)|c(ab|ba) to see how an n letter permutation is computed from
>> permutations of n-1 letters. I don’t think you can do better.
> 
> I was trying to do something very similar using procmail regexes some
> time ago, only with strings instead of individual characters (for
> parsing and matching on arbitrarily-ordered, comma-separated values
> within a header). I never did come up with a good solution.
> 
> -- 
> Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se
> “Remember when, on the Internet, nobody cared that you were a dog?”
> 

[-- Attachment #2: Type: text/html, Size: 2163 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  4:12 [COFF] Grep has gone overboard Dave Horsfall
  2022-05-30  4:47 ` [COFF] " Bakul Shah
@ 2022-05-30  7:38 ` Ralph Corderoy
  2022-05-30  7:44   ` Ralph Corderoy
  1 sibling, 1 reply; 11+ messages in thread
From: Ralph Corderoy @ 2022-05-30  7:38 UTC (permalink / raw)
  To: Computer Old Farts Followers

Hi Dave,

> I'm trying to find a regex i.e. as used by GREP to find all words in 
> "badugly" with each letter used only once and all are 7-letter words i.e. 
> they are all anagrams

I tend to use egrep(1), as that's what my ~/bin/g runs, but sticking
with grep and its Basic Regular Expressions, as asked, I'd use

    $ grep -v '[^badugly]' /usr/share/dict/words |
    > grep '^.\{7\}$' |
    > grep -v '\(.\).*\1'
    ladybug
    $

Note that uses my locale.

Section 9 of POSIX has the detail, https://pubs.opengroup.org/onlinepubs/9699919799/.

> I'll guess that it involves some obscure use of "{}*\" etc, which I've
> never really grokked...

If you want it explained on the list, let me know.

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  7:38 ` Ralph Corderoy
@ 2022-05-30  7:44   ` Ralph Corderoy
  0 siblings, 0 replies; 11+ messages in thread
From: Ralph Corderoy @ 2022-05-30  7:44 UTC (permalink / raw)
  To: Computer Old Farts Followers

Hi again Dave,

I wrote:
> Section 9 of POSIX has the detail,
> https://pubs.opengroup.org/onlinepubs/9699919799/.

Sorry, they use frames.  Still.  From that URL you need ‘Base
Definitions’ in the top left, which I think is the default, and then ‘9.
Regular Expressions’ in the bottom left.  (This is as bad as reading one
GUI uesr explain to the other where to click, click, click.)

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  6:35   ` Michael Kjörling
  2022-05-30  6:51     ` Bakul Shah
@ 2022-05-30  8:01     ` Ralph Corderoy
  1 sibling, 0 replies; 11+ messages in thread
From: Ralph Corderoy @ 2022-05-30  8:01 UTC (permalink / raw)
  To: coff

Hi Michael,

> I was trying to do something very similar using procmail regexes some
> time ago, only with strings instead of individual characters (for
> parsing and matching on arbitrarily-ordered, comma-separated values
> within a header).  I never did come up with a good solution.

Assuming procmail has BRE, this does one interpretation of what you
said.

    $ grep '^ *field *: *\(foo\|bar\|xyzzy\)\( *, *\(foo\|bar\|xyzzy\)\)* *$' <<\E
    > otherfield: foo
    > field:
    > field: ,
    > field:foo
    > field : foo
    > field:xyzzy
    > field: xyzzy,bar, foo , xyzzy
    > field: xyzzy,bar, foo , xyzzy ,
    > E
    field:foo
    field : foo
    field:xyzzy
    field: xyzzy,bar, foo , xyzzy
    $

If I was doing it in sed, I'd s/$/,/ before and s/,$// after so the list
of alternatives need only be given once with each alternative having to
be followed by a comma, i.e. the comma switches from separator to
terminator.

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  4:47 ` [COFF] " Bakul Shah
  2022-05-30  6:35   ` Michael Kjörling
@ 2022-05-30  8:16   ` Ralph Corderoy
  2022-05-30  8:43     ` Bakul Shah
  1 sibling, 1 reply; 11+ messages in thread
From: Ralph Corderoy @ 2022-05-30  8:16 UTC (permalink / raw)
  To: Computer Old Farts Followers

Hi Bakul,

> You can write a program to generate all permutations and use that as
> your regexp.  For example abc maps to abc|acb|bac|bca|cab|cba.

Being lazy, I tend to use a shell's brace expansion and then whittle it
down to generate permutations.  I keep thinking it's surprising there's
no Bell Labs program to produce them given there's things like
factor(1).

The brace expansion is wasteful as it's going from n**l down to n! and
it can result in the old bash here not responding to SIGINT for a while
if it's producing gazillions.

    $ printf '%s\n' {b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y}\
    > {b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y} |
    > egrep -v '(.).*\1' |
    > fgrep -x -f - /usr/share/dict/words
    ladybug
    $

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  6:51     ` Bakul Shah
@ 2022-05-30  8:26       ` Ralph Corderoy
  0 siblings, 0 replies; 11+ messages in thread
From: Ralph Corderoy @ 2022-05-30  8:26 UTC (permalink / raw)
  To: coff

Hi Bakul,

> Forgot to add, you can solve this another way  (taking the abc
> example):
>   grep ‘^...$’ | grep ‘[^a]*a[^a]*’ | grep ‘[^b]*b[^b]*’ | grep ‘[^c]*c[^c]*’

I know you know this and the error is just from typing an email, but in
case other listeners are following along, all the greps need start and
end-of-line anchors.

Without them,

    grep '[^a]*a[^a]*'

still matches ‘aa’ because there are zero non-a before the first a, then
the first a matches, and there are zero non-a after it.  Here's ‘aa’
with indexes between the characters.

     a a  
    0 1 2

    [^a]*  matches  0-0
    a      matches  0-1
    [^a]*  matches  1-1

And it's easier to invert the test into

    grep -v 'a.*a'

I'll stop waffling on about regexps now.  Though I do have a SNOBOL4
question for TUHS which I should get around to writing.  :-)

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  8:16   ` Ralph Corderoy
@ 2022-05-30  8:43     ` Bakul Shah
  2022-05-30  9:19       ` Ralph Corderoy
  0 siblings, 1 reply; 11+ messages in thread
From: Bakul Shah @ 2022-05-30  8:43 UTC (permalink / raw)
  To: Ralph Corderoy; +Cc: Computer Old Farts Followers



> On May 30, 2022, at 1:16 AM, Ralph Corderoy <ralph@inputplus.co.uk> wrote:
> 
> Hi Bakul,
> 
>> You can write a program to generate all permutations and use that as
>> your regexp.  For example abc maps to abc|acb|bac|bca|cab|cba.
> 
> Being lazy, I tend to use a shell's brace expansion and then whittle it
> down to generate permutations.  I keep thinking it's surprising there's
> no Bell Labs program to produce them given there's things like
> factor(1).

For permutations I just use this k program:

$ cat perm.k
p:{:[1<x;,/(>:'(x,x)#1,x#0)[;0,'1+_f x-1];,!x]}
perm:{x@p[#x]}
$ k perm.k
  perm "abc"
("abc"
 "acb"
 "bac"
 "bca"
 "cab"
 "cba")
I won’t bother explaining but p n returns all permutations of
indices 0..n-1. Then perm is easy to construct. Similarly easy to
insert | between strings but your solution is much more in the spirit
of regexp and faster than my brute force solution!

Also, thanks for catching the anchoring error!

> 
> The brace expansion is wasteful as it's going from n**l down to n! and
> it can result in the old bash here not responding to SIGINT for a while
> if it's producing gazillions.
> 
>    $ printf '%s\n' {b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y}\
>> {b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y} |
>> egrep -v '(.).*\1' |
>> fgrep -x -f - /usr/share/dict/words
>    ladybug
>    $
> 
> -- 
> Cheers, Ralph.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [COFF] Re: Grep has gone overboard
  2022-05-30  8:43     ` Bakul Shah
@ 2022-05-30  9:19       ` Ralph Corderoy
  0 siblings, 0 replies; 11+ messages in thread
From: Ralph Corderoy @ 2022-05-30  9:19 UTC (permalink / raw)
  To: Computer Old Farts Followers

Hi Bakul,

> For permutations I just use this k program:
>
> $ cat perm.k
> p:{:[1<x;,/(>:'(x,x)#1,x#0)[;0,'1+_f x-1];,!x]}
> perm:{x@p[#x]}

I was thinking it looks like APL and it turns out to be a descendant.
https://en.wikipedia.org/wiki/K_(programming_language)

That pushed me into writing an awk-permutation generator.  It's
inefficient as it keeps altering $0 which cascades into re-working $1,
etc., even if that's lazily done.  Still, it sufficies for the small
cases I typically need.

    $ cat perm
    #! /bin/awk -f

    function perm(s,   f, l, t) {
        if (!NF) {
            print substr(s, 2)
            return
        }
        l = $0
        for (f = 1; f <= NF; f++) {
            t = $f
            $f = ""; $0 = $0
            perm(s " " t)
            $0 = l
        }
    }

    { perm("") }
    $ 
    $ ./perm <<<''

    $ ./perm <<<1
    1
    $ ./perm <<<'1 2'
    1 2
    2 1
    $ ./perm <<<'1 2 3'
    1 2 3
    1 3 2
    2 1 3
    2 3 1
    3 1 2
    3 2 1
    $ 
    $ time ./perm <<<'1 2 3 4 5 6 7 8 9' | wc -l
    362880

    real    0m5.421s
    user    0m5.406s
    sys     0m0.082s
    $
    $ dc <<<'9 8*7*6*5*4*3*2*p'
    362880
    $

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-30  9:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-30  4:12 [COFF] Grep has gone overboard Dave Horsfall
2022-05-30  4:47 ` [COFF] " Bakul Shah
2022-05-30  6:35   ` Michael Kjörling
2022-05-30  6:51     ` Bakul Shah
2022-05-30  8:26       ` Ralph Corderoy
2022-05-30  8:01     ` Ralph Corderoy
2022-05-30  8:16   ` Ralph Corderoy
2022-05-30  8:43     ` Bakul Shah
2022-05-30  9:19       ` Ralph Corderoy
2022-05-30  7:38 ` Ralph Corderoy
2022-05-30  7:44   ` Ralph Corderoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).