9front - general discussion about 9front
 help / color / mirror / Atom feed
* Re: [9front] bug in sed
  2019-01-11 23:46 [9front] bug in sed umbraticus
@ 2019-01-11 23:45 ` Kurt H Maier
  2019-01-11 23:52 ` Eckard Brauer
  1 sibling, 0 replies; 28+ messages in thread
From: Kurt H Maier @ 2019-01-11 23:45 UTC (permalink / raw)
  To: 9front

On Sat, Jan 12, 2019 at 12:46:02PM +1300, umbraticus@prosimetrum.com wrote:
> 
> in (s)ed (yes, ed does it too), g means for every match per line,
> so s/^blah//g perhaps doesn't make sense, since there is only one ^ per line...

Agree that there is only one ^ per line, but that just means that g is
extraneous, not that it should change the behavior to having one ^ per
pass, which is what is happening now.

Even if s/^ //g doesn't make sense, it should still work.  Once per
line.

khm


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
@ 2019-01-11 23:46 umbraticus
  2019-01-11 23:45 ` Kurt H Maier
  2019-01-11 23:52 ` Eckard Brauer
  0 siblings, 2 replies; 28+ messages in thread
From: umbraticus @ 2019-01-11 23:46 UTC (permalink / raw)
  To: 9front

err, to be a little clearer:

in (s)sam, g means for every match in dot,
so s/^blah//g makes sense, since dot can be multi-line

in (s)ed (yes, ed does it too), g means for every match per line,
so s/^blah//g perhaps doesn't make sense, since there is only one ^ per line...


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 23:46 [9front] bug in sed umbraticus
  2019-01-11 23:45 ` Kurt H Maier
@ 2019-01-11 23:52 ` Eckard Brauer
  1 sibling, 0 replies; 28+ messages in thread
From: Eckard Brauer @ 2019-01-11 23:52 UTC (permalink / raw)
  To: 9front

> err, to be a little clearer:
> 
> in (s)sam, g means for every match in dot,
> so s/^blah//g makes sense, since dot can be multi-line
> 
> in (s)ed (yes, ed does it too), g means for every match per line,
> so s/^blah//g perhaps doesn't make sense, since there is only one ^
> per line...

No, IMO the question is which is correct for s/^blah//g, either:

(1) find /each/ ocurrence of pattern, then replace all of them with
replacement, or

(2) find /first/ occurence of pattern, replace it with replacement and
repeat starting at current position until end of line/range.

In the first case both ^ in pattern and g as flag won't make sense,
while in the second it could indeed. Most probably (s)ed implements
(2), while (s)sam implements (1).

E.

-- 
:)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-12  0:52 umbraticus
@ 2019-01-12  0:58 ` Kurt H Maier
  0 siblings, 0 replies; 28+ messages in thread
From: Kurt H Maier @ 2019-01-12  0:58 UTC (permalink / raw)
  To: 9front

On Sat, Jan 12, 2019 at 01:52:27PM +1300, umbraticus@prosimetrum.com wrote:
> 
> So should "non-overlapping" and "subsequent" not apply to ^ matches?
> I guess this was hiro's earlier point.

There is only one ^, so even if you use g there should inevitably be
only one substitution.  We need to stop comparing this to ssam, which
aside from also editing text has no bearing on sed's or ed's behavior.


khm


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
@ 2019-01-12  0:52 umbraticus
  2019-01-12  0:58 ` Kurt H Maier
  0 siblings, 1 reply; 28+ messages in thread
From: umbraticus @ 2019-01-12  0:52 UTC (permalink / raw)
  To: 9front

yes, sam does it all at once, while sed iterates.

from sed(1):

                       g    Global.  Substitute for all non-
                            overlapping instances of the regular
                            expression rather than just the first one.

from ed(1):

               ... If the global replacement indi-
               cator `g' appears after the command, all subsequent
               matches on the line are also replaced...

So should "non-overlapping" and "subsequent" not apply to ^ matches?
I guess this was hiro's earlier point.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
@ 2019-01-11 23:49 umbraticus
  0 siblings, 0 replies; 28+ messages in thread
From: umbraticus @ 2019-01-11 23:49 UTC (permalink / raw)
  To: 9front

> Even if s/^ //g doesn't make sense, it should still work.  Once per line.

Yes, I agree.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 23:06 umbraticus
@ 2019-01-11 23:29 ` hiro
  0 siblings, 0 replies; 28+ messages in thread
From: hiro @ 2019-01-11 23:29 UTC (permalink / raw)
  To: 9front

i think i know why some think this is obviously wrong:
if you do something like s/^x/xx/g it is never done more than once, so
why would empty replacement trigger different behavior?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
@ 2019-01-11 23:06 umbraticus
  2019-01-11 23:29 ` hiro
  0 siblings, 1 reply; 28+ messages in thread
From: umbraticus @ 2019-01-11 23:06 UTC (permalink / raw)
  To: 9front

> ... you need ,s instead of s at the beginning of the sam command...

ssam sets dot to the whole file before running the command.

I agree that ssam behaviour seems right and sed wrong...
^ and $ should only match once per line and don't really make sense with g.

umbraticus


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 19:01         ` hiro
@ 2019-01-11 21:11           ` Xiao-Yong Jin
  0 siblings, 0 replies; 28+ messages in thread
From: Xiao-Yong Jin @ 2019-01-11 21:11 UTC (permalink / raw)
  To: 9front


> On Jan 11, 2019, at 1:01 PM, hiro <23hiro@gmail.com> wrote:
> 
> i totally expected that ^ and $ would get special treatment here, even
> if it's not mentioned.

Looks like ^ and $ are treated differently with g.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 17:49     ` Sean Hinchee
@ 2019-01-11 19:03       ` hiro
  0 siblings, 0 replies; 28+ messages in thread
From: hiro @ 2019-01-11 19:03 UTC (permalink / raw)
  To: 9front

On 1/11/19, Sean Hinchee <henesy.dev@gmail.com> wrote:
> Yes, tenshi is my 9front vps. I'm pretty sure it's updated to latest
> release.

shiiiiit. please excuse me.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 18:30       ` Ethan Gardener
@ 2019-01-11 19:01         ` hiro
  2019-01-11 21:11           ` Xiao-Yong Jin
  0 siblings, 1 reply; 28+ messages in thread
From: hiro @ 2019-01-11 19:01 UTC (permalink / raw)
  To: 9front

some seem to think it's obvious after reading this, but i don't:

                       g    Global.  Substitute for all non-
                            overlapping instances of the regular
                            expression rather than just the first one.

i would say it's not clear whether one should ^ would ever be part of
an "overlap".
i totally expected that ^ and $ would get special treatment here, even
if it's not mentioned.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 11:21     ` hiro
  2019-01-11 12:39       ` hiro
@ 2019-01-11 18:30       ` Ethan Gardener
  2019-01-11 19:01         ` hiro
  1 sibling, 1 reply; 28+ messages in thread
From: Ethan Gardener @ 2019-01-11 18:30 UTC (permalink / raw)
  To: 9front

On Fri, Jan 11, 2019, at 11:21 AM, hiro wrote:
> though it also messes up strstr if it's NOT at the beginning of the
> line, so it is also too synthetic.

Thanks for verifying.  Ouch.. that's serious.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 10:20   ` hiro
  2019-01-11 10:22     ` hiro
  2019-01-11 10:32     ` 有澤健治
@ 2019-01-11 17:49     ` Sean Hinchee
  2019-01-11 19:03       ` hiro
  2 siblings, 1 reply; 28+ messages in thread
From: Sean Hinchee @ 2019-01-11 17:49 UTC (permalink / raw)
  To: 9front

Yes, tenshi is my 9front vps. I'm pretty sure it's updated to latest 
release.

Cheers,
Sean

On 1/11/19 4:20 AM, hiro wrote:
> did you try on 9front henesy?
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 15:01         ` Stanley Lieber
@ 2019-01-11 15:19           ` Eckard Brauer
  0 siblings, 0 replies; 28+ messages in thread
From: Eckard Brauer @ 2019-01-11 15:19 UTC (permalink / raw)
  To: 9front

Maybe it should simply be explained better in the man page.

man sam states:

       s/regexp/text/
              Substitute text for the first match to the regular expression in the
              range. Set dot to the modified range. In text the character & stands
              for the string that matched the expression. Backslash behaves as usual
              unless followed by a digit: \d stands for the string that matched the
              subexpression begun by the d-th left parenthesis.  If s is followed
              immediately by a number n, as in s2/x/y/, the n-th match in the range is
              substituted.  If the command is followed by a g, as in s/x/y/g, all 
              matches  in the range are substituted.

where the 1st 2 sentences would lead me to the opinion that a possibly
following s, as it is when specifying g at end, starts at the already
modified line. So the original problem would be a bug in sam. But a
different interpretation is of course possible.

E.

Am Fri, 11 Jan 2019 10:01:20 -0500
schrieb Stanley Lieber <sl@stanleylieber.com>:

> On Jan 11, 2019, at 7:39 AM, hiro <23hiro@gmail.com> wrote:
>  [...]  
> 
> isn’t the difference because sed operates line by line, while sam
> operates on the dot?
> 
> if i understand correctly, you need ,s instead of s at the beginning
> of the sam command to apply the transform across the entire input.
> 
> can’t test now.
> 
> sl
> 
> 



-- 
:)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 12:39       ` hiro
  2019-01-11 13:10         ` Nick Owens
@ 2019-01-11 15:01         ` Stanley Lieber
  2019-01-11 15:19           ` Eckard Brauer
  1 sibling, 1 reply; 28+ messages in thread
From: Stanley Lieber @ 2019-01-11 15:01 UTC (permalink / raw)
  To: 9front

On Jan 11, 2019, at 7:39 AM, hiro <23hiro@gmail.com> wrote:
> 
> here's another testcase without needing spaces
> 
> cpu% cat > test
> asdfasdf
> xasdf
> cpu% sed  's/^asdf//g' test
> 
> xasdf
> cpu% ssam  's/^asdf//g' test
> asdf
> xasdf

isn’t the difference because sed operates line by line, while sam operates on the dot?

if i understand correctly, you need ,s instead of s at the beginning of the sam command to apply the transform across the entire input.

can’t test now.

sl




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 12:39       ` hiro
@ 2019-01-11 13:10         ` Nick Owens
  2019-01-11 15:01         ` Stanley Lieber
  1 sibling, 0 replies; 28+ messages in thread
From: Nick Owens @ 2019-01-11 13:10 UTC (permalink / raw)
  To: 9front

in sed.c:/^substitute, when g is set, sed begins doing
match/substitute in a loop. after a substitution it moves the pointer
into the current string (called 'loc2') forward and tries to match
again. however, when it attempts to match the regexp again, the regexp
library is unaware that the string to be matched is not the true
beginning of the line. thus the match for ^ always succeeds.

compare OpenBSD regexec and sed, which use a REG_NOTBOL flag to
indicate that ^ cannot match in subsequent matches

https://man.openbsd.org/man3/regex.3
https://github.com/openbsd/src/blob/master/usr.bin/sed/process.c#L334

On Fri, Jan 11, 2019 at 4:58 AM hiro <23hiro@gmail.com> wrote:
>
> here's another testcase without needing spaces
>
> cpu% cat > test
> asdfasdf
> xasdf
>  cpu% sed  's/^asdf//g' test
>
> xasdf
> cpu% ssam  's/^asdf//g' test
> asdf
> xasdf


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 10:35         ` hiro
@ 2019-01-11 13:04           ` 有澤健治
  0 siblings, 0 replies; 28+ messages in thread
From: 有澤健治 @ 2019-01-11 13:04 UTC (permalink / raw)
  To: 9front

where is the definition of "g"?

> 2019/01/11 19:35、hiro <23hiro@gmail.com>のメール:
> 
> fucking hell. he's even more lazy than i thought: he copied the
> fucking p9p github issue into our mailinglist without testing anything
> himself.
> 
> in any case, we do have differing behavior on 9front, too: ssam
> matches only once, sed matches multiple times, as often as possible
> (as i would expect actually).
> 
> why did one put g at the end if one wanted to match only once?
> in my opinion ssam's behavior might be wrong.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 11:21     ` hiro
@ 2019-01-11 12:39       ` hiro
  2019-01-11 13:10         ` Nick Owens
  2019-01-11 15:01         ` Stanley Lieber
  2019-01-11 18:30       ` Ethan Gardener
  1 sibling, 2 replies; 28+ messages in thread
From: hiro @ 2019-01-11 12:39 UTC (permalink / raw)
  To: 9front

here's another testcase without needing spaces

cpu% cat > test
asdfasdf
xasdf
 cpu% sed  's/^asdf//g' test

xasdf
cpu% ssam  's/^asdf//g' test
asdf
xasdf


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 11:19   ` hiro
@ 2019-01-11 11:21     ` hiro
  2019-01-11 12:39       ` hiro
  2019-01-11 18:30       ` Ethan Gardener
  0 siblings, 2 replies; 28+ messages in thread
From: hiro @ 2019-01-11 11:21 UTC (permalink / raw)
  To: 9front

though it also messes up strstr if it's NOT at the beginning of the
line, so it is also too synthetic.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 10:55 ` Ethan Gardener
@ 2019-01-11 11:19   ` hiro
  2019-01-11 11:21     ` hiro
  0 siblings, 1 reply; 28+ messages in thread
From: hiro @ 2019-01-11 11:19 UTC (permalink / raw)
  To: 9front

eekee: i verified your example (s/(^|[^a-z_A-Z]+)str//g) and it does
make my opinion fade


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11  5:07 Sean Hinchee
  2019-01-11  8:42 ` 有澤健治
@ 2019-01-11 10:55 ` Ethan Gardener
  2019-01-11 11:19   ` hiro
  1 sibling, 1 reply; 28+ messages in thread
From: Ethan Gardener @ 2019-01-11 10:55 UTC (permalink / raw)
  To: 9front

tried with same test text: 0, 3, 5, 0 initial spaces on consecutive lines.  double-match found in 9front (updated yesterday), and p9p (master branch downloaded last week).  also found in acme-sac 0.13 & 9pm because it's an easy test and i'm feeling energetic.

double-match not found with gnu sed 4.4.

while the test case itself is pointless, it could occur as part of a more complex regex where the g is reasonable.  i haven't tested this, but i believe it could, for example, trip up the following expression changing all function names beginning with str; it would mess up strstr if strstr occurred at the beginning of a line.
s/(^|[^a-z_A-Z]+)str//g


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 10:28       ` hiro
@ 2019-01-11 10:35         ` hiro
  2019-01-11 13:04           ` 有澤健治
  0 siblings, 1 reply; 28+ messages in thread
From: hiro @ 2019-01-11 10:35 UTC (permalink / raw)
  To: 9front

fucking hell. he's even more lazy than i thought: he copied the
fucking p9p github issue into our mailinglist without testing anything
himself.

in any case, we do have differing behavior on 9front, too: ssam
matches only once, sed matches multiple times, as often as possible
(as i would expect actually).

why did one put g at the end if one wanted to match only once?
in my opinion ssam's behavior might be wrong.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 10:20   ` hiro
  2019-01-11 10:22     ` hiro
@ 2019-01-11 10:32     ` 有澤健治
  2019-01-11 17:49     ` Sean Hinchee
  2 siblings, 0 replies; 28+ messages in thread
From: 有澤健治 @ 2019-01-11 10:32 UTC (permalink / raw)
  To: 9front

sorry,
my mailer on mac somehow removed leading spaces in lines.
I cannot send correct mail, and so I added a character at the beginning of lines.
I resend once more.


> not for me.
> hebe% cat test.txt
> cat test.txt
> is
>  a test
> Bye
> hebe% sed 's/^  //g' test.txt
> cat test.txt
> is
>  a test
> Bye
> hebe% 







> 2019/01/11 19:20、hiro <23hiro@gmail.com>のメール:
> 
> did you try on 9front henesy?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 10:22     ` hiro
@ 2019-01-11 10:28       ` hiro
  2019-01-11 10:35         ` hiro
  0 siblings, 1 reply; 28+ messages in thread
From: hiro @ 2019-01-11 10:28 UTC (permalink / raw)
  To: 9front

the main problem is that his testcase sucks. it hasn't been minimized

Also his code shows different output on 9front.
But I experimented myself a bit and found a minimal testcase that
still shows what might create confusion:

cpu% echo '     x'  > test
cpu% sed 's/^  //g' test
 x


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11 10:20   ` hiro
@ 2019-01-11 10:22     ` hiro
  2019-01-11 10:28       ` hiro
  2019-01-11 10:32     ` 有澤健治
  2019-01-11 17:49     ` Sean Hinchee
  2 siblings, 1 reply; 28+ messages in thread
From: hiro @ 2019-01-11 10:22 UTC (permalink / raw)
  To: 9front

i think henesy's scenario is hard to reproduce
1) you have to notice that one time there are 2, the second one 3 spaces.
2) you probably have to use his unmaintained funtz9port

works on 9front:
cpu% cat > test
this
   is
  a
test
 cpu% sed 's/^  //g' test
this
 is
a


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11  8:42 ` 有澤健治
@ 2019-01-11 10:20   ` hiro
  2019-01-11 10:22     ` hiro
                       ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: hiro @ 2019-01-11 10:20 UTC (permalink / raw)
  To: 9front

did you try on 9front henesy?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [9front] bug in sed
  2019-01-11  5:07 Sean Hinchee
@ 2019-01-11  8:42 ` 有澤健治
  2019-01-11 10:20   ` hiro
  2019-01-11 10:55 ` Ethan Gardener
  1 sibling, 1 reply; 28+ messages in thread
From: 有澤健治 @ 2019-01-11  8:42 UTC (permalink / raw)
  To: 9front

not for me.

hebe% cat test.txt
is
 a test
Bye
hebe% 
hebe% sed 's/^  //g' test.txt
is
 a test
Bye
hebe% 


> 2019/01/11 14:07、Sean Hinchee <henesy.dev@gmail.com>のメール:
> 
> All,
> 
> There seems to be a bug in sed where it eats things it shouldn't:
> 
> 
> tenshi% ed test.txt
> 0
> a
> This
>  is
>    a test.
> Bye
> .
> w
> 26
> q
> tenshi% sed 's/^  //g' test.txt
> This
> is
> a test.
> Bye
> tenshi% ssam 's/^  //g' test.txt
> This
> is
>  a test.
> Bye
> 
> 
> This bug has shown up in plan9port: https://github.com/9fans/plan9port/issues/183
> 
> Cheers,
> Sean



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [9front] bug in sed
@ 2019-01-11  5:07 Sean Hinchee
  2019-01-11  8:42 ` 有澤健治
  2019-01-11 10:55 ` Ethan Gardener
  0 siblings, 2 replies; 28+ messages in thread
From: Sean Hinchee @ 2019-01-11  5:07 UTC (permalink / raw)
  To: 9front

All,

There seems to be a bug in sed where it eats things it shouldn't:


tenshi% ed test.txt
0
a
This
   is
     a test.
Bye
.
w
26
q
tenshi% sed 's/^  //g' test.txt
This
is
a test.
Bye
tenshi% ssam 's/^  //g' test.txt
This
is
   a test.
Bye


This bug has shown up in plan9port: 
https://github.com/9fans/plan9port/issues/183

Cheers,
Sean


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-01-12  0:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-11 23:46 [9front] bug in sed umbraticus
2019-01-11 23:45 ` Kurt H Maier
2019-01-11 23:52 ` Eckard Brauer
  -- strict thread matches above, loose matches on Subject: below --
2019-01-12  0:52 umbraticus
2019-01-12  0:58 ` Kurt H Maier
2019-01-11 23:49 umbraticus
2019-01-11 23:06 umbraticus
2019-01-11 23:29 ` hiro
2019-01-11  5:07 Sean Hinchee
2019-01-11  8:42 ` 有澤健治
2019-01-11 10:20   ` hiro
2019-01-11 10:22     ` hiro
2019-01-11 10:28       ` hiro
2019-01-11 10:35         ` hiro
2019-01-11 13:04           ` 有澤健治
2019-01-11 10:32     ` 有澤健治
2019-01-11 17:49     ` Sean Hinchee
2019-01-11 19:03       ` hiro
2019-01-11 10:55 ` Ethan Gardener
2019-01-11 11:19   ` hiro
2019-01-11 11:21     ` hiro
2019-01-11 12:39       ` hiro
2019-01-11 13:10         ` Nick Owens
2019-01-11 15:01         ` Stanley Lieber
2019-01-11 15:19           ` Eckard Brauer
2019-01-11 18:30       ` Ethan Gardener
2019-01-11 19:01         ` hiro
2019-01-11 21:11           ` Xiao-Yong Jin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).