* [COFF] Grep has gone overboard @ 2022-05-30 4:12 Dave Horsfall 2022-05-30 4:47 ` [COFF] " Bakul Shah 2022-05-30 7:38 ` Ralph Corderoy 0 siblings, 2 replies; 11+ messages in thread From: Dave Horsfall @ 2022-05-30 4:12 UTC (permalink / raw) To: Computer Old Farts Followers I remember when the documentation was easy to read, but now... I'm trying to find a regex i.e. as used by GREP to find all words in "badugly" with each letter used only once and all are 7-letter words i.e. they are all anagrams; perhaps it's my age (I hit 70 soon, after decades in Unix) but I cannot seem to find a way to restrict the count i.e. each letter used only once, but in any order. I'll guess that it involves some obscure use of "{}*\" etc, which I've never really grokked... Ta muchly. -- Dave ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 4:12 [COFF] Grep has gone overboard Dave Horsfall @ 2022-05-30 4:47 ` Bakul Shah 2022-05-30 6:35 ` Michael Kjörling 2022-05-30 8:16 ` Ralph Corderoy 2022-05-30 7:38 ` Ralph Corderoy 1 sibling, 2 replies; 11+ messages in thread From: Bakul Shah @ 2022-05-30 4:47 UTC (permalink / raw) To: Computer Old Farts Followers You can write a program to generate all permutations and use that as your regexp. For example abc maps to abc|acb|bac|bca|cab|cba. You can rearrange it as a(bc|cb)|b(ac|ca)|c(ab|ba) to see how an n letter permutation is computed from permutations of n-1 letters. I don’t think you can do better. > On May 29, 2022, at 9:12 PM, Dave Horsfall <dave@horsfall.org> wrote: > > I remember when the documentation was easy to read, but now... > > I'm trying to find a regex i.e. as used by GREP to find all words in > "badugly" with each letter used only once and all are 7-letter words i.e. > they are all anagrams; perhaps it's my age (I hit 70 soon, after decades > in Unix) but I cannot seem to find a way to restrict the count i.e. each > letter used only once, but in any order. > > I'll guess that it involves some obscure use of "{}*\" etc, which I've > never really grokked... > > Ta muchly. > > -- Dave ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 4:47 ` [COFF] " Bakul Shah @ 2022-05-30 6:35 ` Michael Kjörling 2022-05-30 6:51 ` Bakul Shah 2022-05-30 8:01 ` Ralph Corderoy 2022-05-30 8:16 ` Ralph Corderoy 1 sibling, 2 replies; 11+ messages in thread From: Michael Kjörling @ 2022-05-30 6:35 UTC (permalink / raw) To: coff On 29 May 2022 21:47 -0700, from bakul@iitbombay.org (Bakul Shah): > You can write a program to generate all permutations and use that as your regexp. > For example abc maps to abc|acb|bac|bca|cab|cba. You can rearrange it as > a(bc|cb)|b(ac|ca)|c(ab|ba) to see how an n letter permutation is computed from > permutations of n-1 letters. I don’t think you can do better. I was trying to do something very similar using procmail regexes some time ago, only with strings instead of individual characters (for parsing and matching on arbitrarily-ordered, comma-separated values within a header). I never did come up with a good solution. -- Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?” ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 6:35 ` Michael Kjörling @ 2022-05-30 6:51 ` Bakul Shah 2022-05-30 8:26 ` Ralph Corderoy 2022-05-30 8:01 ` Ralph Corderoy 1 sibling, 1 reply; 11+ messages in thread From: Bakul Shah @ 2022-05-30 6:51 UTC (permalink / raw) To: Michael Kjörling; +Cc: coff [-- Attachment #1: Type: text/plain, Size: 1181 bytes --] Forgot to add, you can solve this another way (taking the abc example): grep ‘^...$’ | grep ‘[^a]*a[^a]*’ | grep ‘[^b]*b[^b]*’ | grep ‘[^c]*c[^c]*’ This will scale better for large N and easier to grok and probably what Dave had in mind! > On May 29, 2022, at 11:35 PM, Michael Kjörling <michael@kjorling.se> wrote: > > On 29 May 2022 21:47 -0700, from bakul@iitbombay.org (Bakul Shah): >> You can write a program to generate all permutations and use that as your regexp. >> For example abc maps to abc|acb|bac|bca|cab|cba. You can rearrange it as >> a(bc|cb)|b(ac|ca)|c(ab|ba) to see how an n letter permutation is computed from >> permutations of n-1 letters. I don’t think you can do better. > > I was trying to do something very similar using procmail regexes some > time ago, only with strings instead of individual characters (for > parsing and matching on arbitrarily-ordered, comma-separated values > within a header). I never did come up with a good solution. > > -- > Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se > “Remember when, on the Internet, nobody cared that you were a dog?” > [-- Attachment #2: Type: text/html, Size: 2163 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 6:51 ` Bakul Shah @ 2022-05-30 8:26 ` Ralph Corderoy 0 siblings, 0 replies; 11+ messages in thread From: Ralph Corderoy @ 2022-05-30 8:26 UTC (permalink / raw) To: coff Hi Bakul, > Forgot to add, you can solve this another way (taking the abc > example): > grep ‘^...$’ | grep ‘[^a]*a[^a]*’ | grep ‘[^b]*b[^b]*’ | grep ‘[^c]*c[^c]*’ I know you know this and the error is just from typing an email, but in case other listeners are following along, all the greps need start and end-of-line anchors. Without them, grep '[^a]*a[^a]*' still matches ‘aa’ because there are zero non-a before the first a, then the first a matches, and there are zero non-a after it. Here's ‘aa’ with indexes between the characters. a a 0 1 2 [^a]* matches 0-0 a matches 0-1 [^a]* matches 1-1 And it's easier to invert the test into grep -v 'a.*a' I'll stop waffling on about regexps now. Though I do have a SNOBOL4 question for TUHS which I should get around to writing. :-) -- Cheers, Ralph. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 6:35 ` Michael Kjörling 2022-05-30 6:51 ` Bakul Shah @ 2022-05-30 8:01 ` Ralph Corderoy 1 sibling, 0 replies; 11+ messages in thread From: Ralph Corderoy @ 2022-05-30 8:01 UTC (permalink / raw) To: coff Hi Michael, > I was trying to do something very similar using procmail regexes some > time ago, only with strings instead of individual characters (for > parsing and matching on arbitrarily-ordered, comma-separated values > within a header). I never did come up with a good solution. Assuming procmail has BRE, this does one interpretation of what you said. $ grep '^ *field *: *\(foo\|bar\|xyzzy\)\( *, *\(foo\|bar\|xyzzy\)\)* *$' <<\E > otherfield: foo > field: > field: , > field:foo > field : foo > field:xyzzy > field: xyzzy,bar, foo , xyzzy > field: xyzzy,bar, foo , xyzzy , > E field:foo field : foo field:xyzzy field: xyzzy,bar, foo , xyzzy $ If I was doing it in sed, I'd s/$/,/ before and s/,$// after so the list of alternatives need only be given once with each alternative having to be followed by a comma, i.e. the comma switches from separator to terminator. -- Cheers, Ralph. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 4:47 ` [COFF] " Bakul Shah 2022-05-30 6:35 ` Michael Kjörling @ 2022-05-30 8:16 ` Ralph Corderoy 2022-05-30 8:43 ` Bakul Shah 1 sibling, 1 reply; 11+ messages in thread From: Ralph Corderoy @ 2022-05-30 8:16 UTC (permalink / raw) To: Computer Old Farts Followers Hi Bakul, > You can write a program to generate all permutations and use that as > your regexp. For example abc maps to abc|acb|bac|bca|cab|cba. Being lazy, I tend to use a shell's brace expansion and then whittle it down to generate permutations. I keep thinking it's surprising there's no Bell Labs program to produce them given there's things like factor(1). The brace expansion is wasteful as it's going from n**l down to n! and it can result in the old bash here not responding to SIGINT for a while if it's producing gazillions. $ printf '%s\n' {b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y}\ > {b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y} | > egrep -v '(.).*\1' | > fgrep -x -f - /usr/share/dict/words ladybug $ -- Cheers, Ralph. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 8:16 ` Ralph Corderoy @ 2022-05-30 8:43 ` Bakul Shah 2022-05-30 9:19 ` Ralph Corderoy 0 siblings, 1 reply; 11+ messages in thread From: Bakul Shah @ 2022-05-30 8:43 UTC (permalink / raw) To: Ralph Corderoy; +Cc: Computer Old Farts Followers > On May 30, 2022, at 1:16 AM, Ralph Corderoy <ralph@inputplus.co.uk> wrote: > > Hi Bakul, > >> You can write a program to generate all permutations and use that as >> your regexp. For example abc maps to abc|acb|bac|bca|cab|cba. > > Being lazy, I tend to use a shell's brace expansion and then whittle it > down to generate permutations. I keep thinking it's surprising there's > no Bell Labs program to produce them given there's things like > factor(1). For permutations I just use this k program: $ cat perm.k p:{:[1<x;,/(>:'(x,x)#1,x#0)[;0,'1+_f x-1];,!x]} perm:{x@p[#x]} $ k perm.k perm "abc" ("abc" "acb" "bac" "bca" "cab" "cba") I won’t bother explaining but p n returns all permutations of indices 0..n-1. Then perm is easy to construct. Similarly easy to insert | between strings but your solution is much more in the spirit of regexp and faster than my brute force solution! Also, thanks for catching the anchoring error! > > The brace expansion is wasteful as it's going from n**l down to n! and > it can result in the old bash here not responding to SIGINT for a while > if it's producing gazillions. > > $ printf '%s\n' {b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y}\ >> {b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y}{b,a,d,u,g,l,y} | >> egrep -v '(.).*\1' | >> fgrep -x -f - /usr/share/dict/words > ladybug > $ > > -- > Cheers, Ralph. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 8:43 ` Bakul Shah @ 2022-05-30 9:19 ` Ralph Corderoy 0 siblings, 0 replies; 11+ messages in thread From: Ralph Corderoy @ 2022-05-30 9:19 UTC (permalink / raw) To: Computer Old Farts Followers Hi Bakul, > For permutations I just use this k program: > > $ cat perm.k > p:{:[1<x;,/(>:'(x,x)#1,x#0)[;0,'1+_f x-1];,!x]} > perm:{x@p[#x]} I was thinking it looks like APL and it turns out to be a descendant. https://en.wikipedia.org/wiki/K_(programming_language) That pushed me into writing an awk-permutation generator. It's inefficient as it keeps altering $0 which cascades into re-working $1, etc., even if that's lazily done. Still, it sufficies for the small cases I typically need. $ cat perm #! /bin/awk -f function perm(s, f, l, t) { if (!NF) { print substr(s, 2) return } l = $0 for (f = 1; f <= NF; f++) { t = $f $f = ""; $0 = $0 perm(s " " t) $0 = l } } { perm("") } $ $ ./perm <<<'' $ ./perm <<<1 1 $ ./perm <<<'1 2' 1 2 2 1 $ ./perm <<<'1 2 3' 1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1 $ $ time ./perm <<<'1 2 3 4 5 6 7 8 9' | wc -l 362880 real 0m5.421s user 0m5.406s sys 0m0.082s $ $ dc <<<'9 8*7*6*5*4*3*2*p' 362880 $ -- Cheers, Ralph. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 4:12 [COFF] Grep has gone overboard Dave Horsfall 2022-05-30 4:47 ` [COFF] " Bakul Shah @ 2022-05-30 7:38 ` Ralph Corderoy 2022-05-30 7:44 ` Ralph Corderoy 1 sibling, 1 reply; 11+ messages in thread From: Ralph Corderoy @ 2022-05-30 7:38 UTC (permalink / raw) To: Computer Old Farts Followers Hi Dave, > I'm trying to find a regex i.e. as used by GREP to find all words in > "badugly" with each letter used only once and all are 7-letter words i.e. > they are all anagrams I tend to use egrep(1), as that's what my ~/bin/g runs, but sticking with grep and its Basic Regular Expressions, as asked, I'd use $ grep -v '[^badugly]' /usr/share/dict/words | > grep '^.\{7\}$' | > grep -v '\(.\).*\1' ladybug $ Note that uses my locale. Section 9 of POSIX has the detail, https://pubs.opengroup.org/onlinepubs/9699919799/. > I'll guess that it involves some obscure use of "{}*\" etc, which I've > never really grokked... If you want it explained on the list, let me know. -- Cheers, Ralph. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [COFF] Re: Grep has gone overboard 2022-05-30 7:38 ` Ralph Corderoy @ 2022-05-30 7:44 ` Ralph Corderoy 0 siblings, 0 replies; 11+ messages in thread From: Ralph Corderoy @ 2022-05-30 7:44 UTC (permalink / raw) To: Computer Old Farts Followers Hi again Dave, I wrote: > Section 9 of POSIX has the detail, > https://pubs.opengroup.org/onlinepubs/9699919799/. Sorry, they use frames. Still. From that URL you need ‘Base Definitions’ in the top left, which I think is the default, and then ‘9. Regular Expressions’ in the bottom left. (This is as bad as reading one GUI uesr explain to the other where to click, click, click.) -- Cheers, Ralph. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-05-30 9:19 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-30 4:12 [COFF] Grep has gone overboard Dave Horsfall 2022-05-30 4:47 ` [COFF] " Bakul Shah 2022-05-30 6:35 ` Michael Kjörling 2022-05-30 6:51 ` Bakul Shah 2022-05-30 8:26 ` Ralph Corderoy 2022-05-30 8:01 ` Ralph Corderoy 2022-05-30 8:16 ` Ralph Corderoy 2022-05-30 8:43 ` Bakul Shah 2022-05-30 9:19 ` Ralph Corderoy 2022-05-30 7:38 ` Ralph Corderoy 2022-05-30 7:44 ` Ralph Corderoy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).