9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] empty *
@ 2012-06-14  8:28 Gorka Guardiola
  2012-06-14  9:32 ` Peter A. Cejchan
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Gorka Guardiola @ 2012-06-14  8:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

While playing with grep, I was suprised by grep '*\.c' not giving
an error (* is missing an operand). Arguably * applied to empty
can match empty, but surprisingly enough, Acme's edit behaves
differently. And even grep is not consistent (grep '*' is different than
grep '' whereas both should be an empty pattern or the first one
should be an error). Another funny one is that Edit gives back
an error complaining of missing operand to * when the regexp is
empty.

Greps from other systems accept an empty pattern
(and are thus consistent but they would not have
catched the error starting all this).


cpu% echo hola | grep '*a'
hola
cpu%  echo hola | grep '*'
grep: *: syntax error
cpu%  echo hola | grep ''
grep: empty pattern

Edit , s/*//
regexp: missing operand for *
Edit: bad regexp in s command

Edit , s/*c//
regexp: missing operand for *
Edit: bad regexp in s command

Edit , s///

regexp: missing operand for *
Edit: bad regexp in s command

G.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14  8:28 [9fans] empty * Gorka Guardiola
@ 2012-06-14  9:32 ` Peter A. Cejchan
  2012-06-14  9:54   ` Gorka Guardiola
  2012-06-14  9:42 ` Anthony Martin
       [not found] ` <CAM6ozu51kzDLxB0KoT96udomrKyHWuHvwqSum6=aXmonpr8kXQ@mail.gmail.c>
  2 siblings, 1 reply; 17+ messages in thread
From: Peter A. Cejchan @ 2012-06-14  9:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]

This is from manpage, but I not sure what _exactly_ it means, and whether
it applies to your problem:
          Care should be taken when using the shell metacharacters
          $*[^|()=\ and newline in pattern; it is safest to enclose
          the entire expression in single quotes '...'.  An expression
          starting with '*' will treat the rest of the expression as
          literal characters.

more strange behavior:
% echo foo.c | 9 grep '*\.c'
%
% echo foo.c | 9 grep '*.c'
foo.c
% echo fooxc | 9 grep '*.c'
%
% echo fooxc | 9 grep '.*.c'
fooxc
% echo fooxc | 9 grep '.*\.c'
%
% echo foo.c | 9 grep '.*\.c'
foo.c
% echo foo.c | 9 grep '*foo.c'
foo.c
% echo foo.c | 9 grep '*.00.c'
%

Looks like
          " An expression
          starting with '*' will treat the rest of the expression as
          literal characters."
(see above) really applies (for unknown reasons).


However, I am just a 'toy programmer', so you were warned ;-)
Regards,
++pac


On Thu, Jun 14, 2012 at 10:28 AM, Gorka Guardiola <paurea@gmail.com> wrote:

> While playing with grep, I was suprised by grep '*\.c' not giving
> an error (* is missing an operand). Arguably * applied to empty ... [snip]
>

[-- Attachment #2: Type: text/html, Size: 1744 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14  8:28 [9fans] empty * Gorka Guardiola
  2012-06-14  9:32 ` Peter A. Cejchan
@ 2012-06-14  9:42 ` Anthony Martin
  2012-06-14 13:29   ` erik quanstrom
       [not found] ` <CAM6ozu51kzDLxB0KoT96udomrKyHWuHvwqSum6=aXmonpr8kXQ@mail.gmail.c>
  2 siblings, 1 reply; 17+ messages in thread
From: Anthony Martin @ 2012-06-14  9:42 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

% sed -n 20,32p /sys/src/cmd/grep/grep.y
prog:	/* empty */
	{
		yyerror("empty pattern");
	}
|	expr newlines
	{
		$$.beg = ral(Tend);
		$$.end = $$.beg;
		$$ = re2cat(re2star(re2or(re2char(0x00, '\n'-1), re2char('\n'+1, 0xff))), $$);
		$$ = re2cat($1, $$);
		$$ = re2cat(re2star(re2char(0x00, 0xff)), $$);
		topre = $$;
	}
%

The above code sets up the initial state
machine including the pattern passed on
the command line, $1.

This combined with the fact that multiple
"stars" are coalesced causes the weirdness
you're seeing.

  Anthony



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14  9:32 ` Peter A. Cejchan
@ 2012-06-14  9:54   ` Gorka Guardiola
  0 siblings, 0 replies; 17+ messages in thread
From: Gorka Guardiola @ 2012-06-14  9:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Jun 14, 2012 at 11:32 AM, Peter A. Cejchan <tyapca7@gmail.com> wrote:
> This is from manpage, but I not sure what _exactly_ it means, and whether it
> applies to your problem:
>           Care should be taken when using the shell metacharacters
>           $*[^|()=\ and newline in pattern; it is safest to enclose
>           the entire expression in single quotes '...'.  An expression
>           starting with '*' will treat the rest of the expression as
>           literal characters.
>

Everything is enclosed in '' the shell is not seeing this.

G.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
       [not found] ` <CAM6ozu51kzDLxB0KoT96udomrKyHWuHvwqSum6=aXmonpr8kXQ@mail.gmail.c>
@ 2012-06-14 13:28   ` erik quanstrom
  0 siblings, 0 replies; 17+ messages in thread
From: erik quanstrom @ 2012-06-14 13:28 UTC (permalink / raw)
  To: 9fans

> % echo foo.c | 9 grep '*\.c'

correct.  match \.c as a literal string.  there is no match.

> % echo foo.c | 9 grep '*.c'
> foo.c

correct.  match .c as a littal string.  there is a match.

> % echo fooxc | 9 grep '*.c'
> %
> % echo fooxc | 9 grep '.*.c'
> fooxc

correct.  match 0-n any character then 1 any character then a c.  there is a match.

> % echo fooxc | 9 grep '.*\.c'

correct.  this time there's no match because '.' is treated as a literal not
a pattern.

> % echo foo.c | 9 grep '.*\.c'
> foo.c

correct.  match 0-n any characters, then a literal '.' then literal 'c'.  there is a match.

> % echo foo.c | 9 grep '*foo.c'
> foo.c

correct.  match the literal string foo.c.  there is a match.

remember that the match doesn't have to be anchored by default, so i sometimes
do this
	grep $somesym `{find /sys/src|grep '\.[chys]$'}
this is also packaged up in the local version of 'g'; this would be equivalent
	g $somesym /sys/src

- erik



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14  9:42 ` Anthony Martin
@ 2012-06-14 13:29   ` erik quanstrom
  0 siblings, 0 replies; 17+ messages in thread
From: erik quanstrom @ 2012-06-14 13:29 UTC (permalink / raw)
  To: 9fans

> This combined with the fact that multiple
> "stars" are coalesced causes the weirdness
> you're seeing.

there is no case of multiple '*'s in the patterns peter gave.
there is a case of patterns beginning with '*' which treats the
rest of the pattern as a literal, but that's different.

- erik



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14 14:49             ` erik quanstrom
@ 2012-06-14 15:08               ` tlaronde
  0 siblings, 0 replies; 17+ messages in thread
From: tlaronde @ 2012-06-14 15:08 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Jun 14, 2012 at 10:49:51AM -0400, erik quanstrom wrote:
> >
> > > cpu%  echo hola | grep ''
> > > grep: empty pattern
> >
> >  From the POSIX description, an empty pattern is not allowed. '*' is not
> > an empty pattern.
>
> i'm sorry this just isn't correct.  see the man page.
>
> plan 9 grep has no intentions of being posix.  grep '*' can be seen as a literal
> escape plus the pattern of ''.  this is an empty pattern, thus an error.

In this case, shouldn't the error message be exactly the same?

--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14 14:46           ` erik quanstrom
@ 2012-06-14 14:50             ` Gorka Guardiola
  0 siblings, 0 replies; 17+ messages in thread
From: Gorka Guardiola @ 2012-06-14 14:50 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Jun 14, 2012 at 4:46 PM, erik quanstrom
<quanstro@labs.coraid.com> wrote:
>> cpu%  echo hola | grep '*'
>> grep: *: syntax error
>> cpu%  echo hola | grep ''
>> grep: empty pattern
>>
>>
>> grep '*' and grep '' should still be the same, shouldn't they?
>
> yes, but does it matter?

Probably not.

G.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14 14:36           ` tlaronde
@ 2012-06-14 14:49             ` erik quanstrom
  2012-06-14 15:08               ` tlaronde
  0 siblings, 1 reply; 17+ messages in thread
From: erik quanstrom @ 2012-06-14 14:49 UTC (permalink / raw)
  To: 9fans

On Thu Jun 14 10:37:21 EDT 2012, tlaronde@polynum.com wrote:
> On Thu, Jun 14, 2012 at 04:13:12PM +0200, Gorka Guardiola wrote:
> > Also this:
> >
> > cpu%  echo hola | grep '*'
> > grep: *: syntax error
>
> The plan9 regexp are mainly Extended Regular Expression. If the POSIX
> description is taken, a leading '*' is a syntax error. I guess that the
> leading '*' followed by some non empty pattern is a Plan9 way to get
> "grep -F" ?
>
> > cpu%  echo hola | grep ''
> > grep: empty pattern
>
>  From the POSIX description, an empty pattern is not allowed. '*' is not
> an empty pattern.

i'm sorry this just isn't correct.  see the man page.

plan 9 grep has no intentions of being posix.  grep '*' can be seen as a literal
escape plus the pattern of ''.  this is an empty pattern, thus an error.

- erik



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
       [not found]         ` <CACm3i_hRKnkDW6SgFQQ3B4zfr3UHim=-x_uA0iYYRzZz0W8Xag@mail.gmail.c>
@ 2012-06-14 14:46           ` erik quanstrom
  2012-06-14 14:50             ` Gorka Guardiola
  0 siblings, 1 reply; 17+ messages in thread
From: erik quanstrom @ 2012-06-14 14:46 UTC (permalink / raw)
  To: 9fans

> cpu%  echo hola | grep '*'
> grep: *: syntax error
> cpu%  echo hola | grep ''
> grep: empty pattern
>
>
> grep '*' and grep '' should still be the same, shouldn't they?

yes, but does it matter?  you correctly get an error either way.

- erik



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14 14:13         ` Gorka Guardiola
@ 2012-06-14 14:36           ` tlaronde
  2012-06-14 14:49             ` erik quanstrom
  0 siblings, 1 reply; 17+ messages in thread
From: tlaronde @ 2012-06-14 14:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Jun 14, 2012 at 04:13:12PM +0200, Gorka Guardiola wrote:
> Also this:
>
> cpu%  echo hola | grep '*'
> grep: *: syntax error

The plan9 regexp are mainly Extended Regular Expression. If the POSIX
description is taken, a leading '*' is a syntax error. I guess that the
leading '*' followed by some non empty pattern is a Plan9 way to get
"grep -F" ?

> cpu%  echo hola | grep ''
> grep: empty pattern

>From the POSIX description, an empty pattern is not allowed. '*' is not
an empty pattern.

--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14 14:00       ` tlaronde
@ 2012-06-14 14:13         ` Gorka Guardiola
  2012-06-14 14:36           ` tlaronde
       [not found]         ` <CACm3i_hRKnkDW6SgFQQ3B4zfr3UHim=-x_uA0iYYRzZz0W8Xag@mail.gmail.c>
  1 sibling, 1 reply; 17+ messages in thread
From: Gorka Guardiola @ 2012-06-14 14:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Jun 14, 2012 at 4:00 PM,  <tlaronde@polynum.com> wrote:
> On Thu, Jun 14, 2012 at 09:44:25AM -0400, erik quanstrom wrote:
>> > > nope, that's not right.  * starting a pattern escapes the whole string.
>> > > this is unique to grep.
>
> I guess this is surprising because with a POSIX grep(1), if I read the
> description correctly:
>
> 1) If the * is the very character of a BRE (since POSIX has BRE and ERE)
> it shall be treated as is---but the remaining of the expression is
> interpreted.
>
> 2) In a ERE, if the * is the very first character, or follows |,
> ^ or ( this is undefined.

Also this:

cpu%  echo hola | grep '*'
grep: *: syntax error
cpu%  echo hola | grep ''
grep: empty pattern


grep '*' and grep '' should still be the same, shouldn't they?

G.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14 13:44     ` erik quanstrom
  2012-06-14 14:00       ` tlaronde
@ 2012-06-14 14:05       ` Lucio De Re
  1 sibling, 0 replies; 17+ messages in thread
From: Lucio De Re @ 2012-06-14 14:05 UTC (permalink / raw)
  To: 9fans

>> > nope, that's not right.  * starting a pattern escapes the whole string.
>> > this is unique to grep.
>> >
>> 
>> Argh, yes, it has a special meaning. I have somehow managed to miss
>> this for all this time.
> 
> it's easy to miss, but critical especially since we have other implementations
> that don't do this.  i'd argue that they should for consistency.

It seems to me that grep requires it, in some sense, because the Unix
grep has a -f option that resembles it.  Other regexp users almost
invariably achieve this by other, simpler means.

++L




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14 13:44     ` erik quanstrom
@ 2012-06-14 14:00       ` tlaronde
  2012-06-14 14:13         ` Gorka Guardiola
       [not found]         ` <CACm3i_hRKnkDW6SgFQQ3B4zfr3UHim=-x_uA0iYYRzZz0W8Xag@mail.gmail.c>
  2012-06-14 14:05       ` Lucio De Re
  1 sibling, 2 replies; 17+ messages in thread
From: tlaronde @ 2012-06-14 14:00 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Jun 14, 2012 at 09:44:25AM -0400, erik quanstrom wrote:
> > > nope, that's not right.  * starting a pattern escapes the whole string.
> > > this is unique to grep.

I guess this is surprising because with a POSIX grep(1), if I read the
description correctly:

1) If the * is the very character of a BRE (since POSIX has BRE and ERE)
it shall be treated as is---but the remaining of the expression is
interpreted.

2) In a ERE, if the * is the very first character, or follows |,
^ or ( this is undefined.

But I must admit that I was unaware, till Erik's message, of both the
Plan9 behavior, and even the "details" of the POSIX behavior...

-- 
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
       [not found]   ` <CACm3i_hoeuq10w5=Ri+-nocm1Str2USwTBW1m5iuhSxdU3_pMA@mail.gmail.c>
@ 2012-06-14 13:44     ` erik quanstrom
  2012-06-14 14:00       ` tlaronde
  2012-06-14 14:05       ` Lucio De Re
  0 siblings, 2 replies; 17+ messages in thread
From: erik quanstrom @ 2012-06-14 13:44 UTC (permalink / raw)
  To: 9fans

> > nope, that's not right.  * starting a pattern escapes the whole string.
> > this is unique to grep.
> >
> 
> Argh, yes, it has a special meaning. I have somehow managed to miss
> this for all this time.

it's easy to miss, but critical especially since we have other implementations
that don't do this.  i'd argue that they should for consistency.

- erik



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
  2012-06-14 13:33 ` erik quanstrom
@ 2012-06-14 13:38   ` Gorka Guardiola
       [not found]   ` <CACm3i_hoeuq10w5=Ri+-nocm1Str2USwTBW1m5iuhSxdU3_pMA@mail.gmail.c>
  1 sibling, 0 replies; 17+ messages in thread
From: Gorka Guardiola @ 2012-06-14 13:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Jun 14, 2012 at 3:33 PM, erik quanstrom
<quanstro@labs.coraid.com> wrote:
> On Thu Jun 14 04:29:51 EDT 2012, paurea@gmail.com wrote:
>> While playing with grep, I was suprised by grep '*\.c' not giving
>> an error (* is missing an operand). Arguably * applied to empty
>
> nope, that's not right.  * starting a pattern escapes the whole string.
> this is unique to grep.
>

Argh, yes, it has a special meaning. I have somehow managed to miss
this for all this time.

G.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [9fans] empty *
       [not found] <CACm3i_g9m6sVaBvvydKYyBxwE=NGYmhmpM4Nko6gE5uUngSwXA@mail.gmail.c>
@ 2012-06-14 13:33 ` erik quanstrom
  2012-06-14 13:38   ` Gorka Guardiola
       [not found]   ` <CACm3i_hoeuq10w5=Ri+-nocm1Str2USwTBW1m5iuhSxdU3_pMA@mail.gmail.c>
  0 siblings, 2 replies; 17+ messages in thread
From: erik quanstrom @ 2012-06-14 13:33 UTC (permalink / raw)
  To: 9fans

On Thu Jun 14 04:29:51 EDT 2012, paurea@gmail.com wrote:
> While playing with grep, I was suprised by grep '*\.c' not giving
> an error (* is missing an operand). Arguably * applied to empty

nope, that's not right.  * starting a pattern escapes the whole string.
this is unique to grep.

from grep(1):

          [....]  An expression
          starting with '*' will treat the rest of the expression as
          literal characters.

- erik



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-06-14 15:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-14  8:28 [9fans] empty * Gorka Guardiola
2012-06-14  9:32 ` Peter A. Cejchan
2012-06-14  9:54   ` Gorka Guardiola
2012-06-14  9:42 ` Anthony Martin
2012-06-14 13:29   ` erik quanstrom
     [not found] ` <CAM6ozu51kzDLxB0KoT96udomrKyHWuHvwqSum6=aXmonpr8kXQ@mail.gmail.c>
2012-06-14 13:28   ` erik quanstrom
     [not found] <CACm3i_g9m6sVaBvvydKYyBxwE=NGYmhmpM4Nko6gE5uUngSwXA@mail.gmail.c>
2012-06-14 13:33 ` erik quanstrom
2012-06-14 13:38   ` Gorka Guardiola
     [not found]   ` <CACm3i_hoeuq10w5=Ri+-nocm1Str2USwTBW1m5iuhSxdU3_pMA@mail.gmail.c>
2012-06-14 13:44     ` erik quanstrom
2012-06-14 14:00       ` tlaronde
2012-06-14 14:13         ` Gorka Guardiola
2012-06-14 14:36           ` tlaronde
2012-06-14 14:49             ` erik quanstrom
2012-06-14 15:08               ` tlaronde
     [not found]         ` <CACm3i_hRKnkDW6SgFQQ3B4zfr3UHim=-x_uA0iYYRzZz0W8Xag@mail.gmail.c>
2012-06-14 14:46           ` erik quanstrom
2012-06-14 14:50             ` Gorka Guardiola
2012-06-14 14:05       ` Lucio De Re

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).