ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
@ 2024-07-25 10:11 orisano (Nao Yonashiro) via ruby-core
  2024-07-25 11:13 ` [ruby-core:118683] " mame (Yusuke Endoh) via ruby-core
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: orisano (Nao Yonashiro) via ruby-core @ 2024-07-25 10:11 UTC (permalink / raw)
  To: ruby-core; +Cc: orisano (Nao Yonashiro)

Issue #20652 has been reported by orisano (Nao Yonashiro).

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652

* Author: orisano (Nao Yonashiro)
* Status: Open
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{' ' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118683] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
@ 2024-07-25 11:13 ` mame (Yusuke Endoh) via ruby-core
  2024-07-25 14:18 ` [ruby-core:118685] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: mame (Yusuke Endoh) via ruby-core @ 2024-07-25 11:13 UTC (permalink / raw)
  To: ruby-core; +Cc: mame (Yusuke Endoh)

Issue #20652 has been updated by mame (Yusuke Endoh).

Assignee set to jeremyevans0 (Jeremy Evans)

@jeremyevans0 What do you think? 

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109219

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{' ' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118685] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
  2024-07-25 11:13 ` [ruby-core:118683] " mame (Yusuke Endoh) via ruby-core
@ 2024-07-25 14:18 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-07-26  6:15 ` [ruby-core:118689] " byroot (Jean Boussier) via ruby-core
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-07-25 14:18 UTC (permalink / raw)
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20652 has been updated by jeremyevans0 (Jeremy Evans).


It's expected that fixing #17507 caused memory usage to increase.  If anyone can come up with an approach that fixes #17507 without causing an increase in memory usage, please submit a pull request for it.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109222

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{' ' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118689] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
  2024-07-25 11:13 ` [ruby-core:118683] " mame (Yusuke Endoh) via ruby-core
  2024-07-25 14:18 ` [ruby-core:118685] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-07-26  6:15 ` byroot (Jean Boussier) via ruby-core
  2024-07-26  7:19 ` [ruby-core:118691] " shyouhei (Shyouhei Urabe) via ruby-core
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-07-26  6:15 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


>  If anyone can come up with an approach that fixes #17507 without causing an increase in memory usage

I guess I'm a bit surprised by that because I wouldn't have thought the backref is only accessible by the same fiber, but maybe I'm missing something.

But on a more general note, I very very often wish I'd have a way to not have a `MatchData` created by methods that take a Regexp, it's very rare you need the `MatchData` created by `gsub` or `Regexp===`. Most of the time it's fine, but when you are trying to optimize a hotspot, it's really something you'd want to be able to skip. It's in big part why `Regexp#match?` was added and popularized, but is only usable in a subset of cases.

Maybe we could add a new Regexp flag to turn off this behavior?

e.g.

```ruby
case str
when /^a/
  p $~ # => #<MatchData "a">
when /^b/c
  p $~ # => nil
end
```

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109225

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118691] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (2 preceding siblings ...)
  2024-07-26  6:15 ` [ruby-core:118689] " byroot (Jean Boussier) via ruby-core
@ 2024-07-26  7:19 ` shyouhei (Shyouhei Urabe) via ruby-core
  2024-07-26  9:23 ` [ruby-core:118692] " byroot (Jean Boussier) via ruby-core
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: shyouhei (Shyouhei Urabe) via ruby-core @ 2024-07-26  7:19 UTC (permalink / raw)
  To: ruby-core; +Cc: shyouhei (Shyouhei Urabe)

Issue #20652 has been updated by shyouhei (Shyouhei Urabe).


I agree this is counter-intuitive.  The #17507 problem was that

```ruby
i = lambda { ...(touches $~)... }

many.times { Thread.start { many.times { i.call } } }
```

would break.  This is because `$~` is `i`'s local variable.  `i` is shared across threads so is `$~`.

@jeremyevans0 fixed this so that no two `$~`s are identical, no matter multi threaded situation or not.

The fix of course increases memory usage.  But because the root cause of the problem is sharing local variables across threads, to unshare them the bloat seems kind of inevitable to me.  I also think this is a needed fix.  Nobody thinks `gsub` modifies `$~` behind the scene.

I would agree with @byroot that it could be better if we had a `gsub` variant that doesn't touch `$~`.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109227

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118692] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (3 preceding siblings ...)
  2024-07-26  7:19 ` [ruby-core:118691] " shyouhei (Shyouhei Urabe) via ruby-core
@ 2024-07-26  9:23 ` byroot (Jean Boussier) via ruby-core
  2024-07-26 11:09 ` [ruby-core:118693] " Eregon (Benoit Daloze) via ruby-core
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-07-26  9:23 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


> it could be better if we had a gsub variant that doesn't touch $~.

Right, but the problem is beyond `gsub`, e.g.:

```ruby
>> "abba"[/(bb|[^b]{2})/]
=> "bb"
>> $~
=> #<MatchData "bb" 1:"bb">
```

So introducing `gsub( backref: false)` or `gsub_no_backref` would solve one case but would leave many others.

Hence why I think the only way to handle this general problem without introducing tons of new methods and lots of code churn would be a new Regexp flag. 

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109228

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118693] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (4 preceding siblings ...)
  2024-07-26  9:23 ` [ruby-core:118692] " byroot (Jean Boussier) via ruby-core
@ 2024-07-26 11:09 ` Eregon (Benoit Daloze) via ruby-core
  2024-07-26 12:05 ` [ruby-core:118694] " ko1 (Koichi Sasada) via ruby-core
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-07-26 11:09 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20652 has been updated by Eregon (Benoit Daloze).


FWIW, what TruffleRuby does for this is to store `$~` as a frame-local thread-local variable, but thread-local only if more than 1 thread has been seen, otherwise it's stored directly in the frame:
https://github.com/oracle/truffleruby/blob/3cd422433deebe3fa664f8c4540811c42ca02e93/src/main/java/org/truffleruby/language/threadlocal/ThreadAndFrameLocalStorage.java

I'm not sure how it works on CRuby, but `$~` is stored directly in the frame then threads might see a different `$~` than they expect which could lead to very subtle bugs.

I don't really like a Regexp flag for this because a Regexp might be used in different contexts and some usages might want `$~` and some might not.

I think in general a good fix to simplify this and avoid this kind of races would be to store `$~` in the caller frame (even if that's a block's frame) but not higher.
In this case it would be stored in the `lambda`'s frame and not outside.
That's also quite a bit faster.
Of course it would be somewhat incompatible, but how much code uses a `$~` outside a block when the Regexp call is made inside a block?
We could warn that such code should not rely on that for a release or so, before changing it.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109229

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118694] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (5 preceding siblings ...)
  2024-07-26 11:09 ` [ruby-core:118693] " Eregon (Benoit Daloze) via ruby-core
@ 2024-07-26 12:05 ` ko1 (Koichi Sasada) via ruby-core
  2024-07-26 12:09 ` [ruby-core:118695] " byroot (Jean Boussier) via ruby-core
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ko1 (Koichi Sasada) via ruby-core @ 2024-07-26 12:05 UTC (permalink / raw)
  To: ruby-core; +Cc: ko1 (Koichi Sasada)

Issue #20652 has been updated by ko1 (Koichi Sasada).


Eregon (Benoit Daloze) wrote in #note-7:
> FWIW, what TruffleRuby does for this is to store `$~` as a frame-local thread-local variable, but thread-local only if more than 1 thread has been seen, otherwise it's stored directly in the frame:

off-topic:

what does it happen when creating a thread just after storing `$~`?

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109230

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118695] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (6 preceding siblings ...)
  2024-07-26 12:05 ` [ruby-core:118694] " ko1 (Koichi Sasada) via ruby-core
@ 2024-07-26 12:09 ` byroot (Jean Boussier) via ruby-core
  2024-07-26 13:54 ` [ruby-core:118697] " austin (Austin Ziegler) via ruby-core
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-07-26 12:09 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


> I don't really like a Regexp flag for this because a Regexp might be used in different contexts and some usages might want $~ and some might not (which could lead to a bunch of duplication).

I see what you mean, but such flag would only really be worth using in places where saving that allocation is worth it, where right now you usually use a literal anyway, so I don't think duplication would be a concern.

Of course you could expect Rubocop and such to try to force it on everyone because it's faster™, but...

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109231

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118697] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (7 preceding siblings ...)
  2024-07-26 12:09 ` [ruby-core:118695] " byroot (Jean Boussier) via ruby-core
@ 2024-07-26 13:54 ` austin (Austin Ziegler) via ruby-core
  2024-07-26 17:04 ` [ruby-core:118702] " ko1 (Koichi Sasada) via ruby-core
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: austin (Austin Ziegler) via ruby-core @ 2024-07-26 13:54 UTC (permalink / raw)
  To: ruby-core; +Cc: austin (Austin Ziegler)

Issue #20652 has been updated by austin (Austin Ziegler).


byroot (Jean Boussier) wrote in #note-9:
> > I don't really like a Regexp flag for this because a Regexp might be used in different contexts and some usages might want $~ and some might not (which could lead to a bunch of duplication).
> 
> I see what you mean, but such flag would only really be worth using in places where saving that allocation is worth it, where right now you usually use a literal anyway, so I don't think duplication would be a concern.
> 
> Of course you could expect Rubocop and such to try to force it on everyone because it's faster™, but...

It could be a flag only accessible through `Regexp.new` (`Regexp.new(source, Regexp::OPTIMIZED_MATCHDATA)`) instead of having it with a literal (`/…/z`). The ugliness would make it less likely that Rubocop would try to force it on everyone.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109234

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118702] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (8 preceding siblings ...)
  2024-07-26 13:54 ` [ruby-core:118697] " austin (Austin Ziegler) via ruby-core
@ 2024-07-26 17:04 ` ko1 (Koichi Sasada) via ruby-core
  2024-07-27  7:00 ` [ruby-core:118707] " byroot (Jean Boussier) via ruby-core
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ko1 (Koichi Sasada) via ruby-core @ 2024-07-26 17:04 UTC (permalink / raw)
  To: ruby-core; +Cc: ko1 (Koichi Sasada)

Issue #20652 has been updated by ko1 (Koichi Sasada).


I found an idea that each `ec` points to unescaped MatchData rather than `$~` and reuse it.
In other words, all generate MatchData will be cached by `ec->last_matchdata` (or similar) and use it across scopes.

```ruby
def foo
  if /.../ =~ ... # generate MatchData1
    bar()
  end
end

def bar
  m = $str.match(//) # reuse MatchData1
end
```

It is thread-safe and increases an opportunity to reuse.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109239

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118707] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (9 preceding siblings ...)
  2024-07-26 17:04 ` [ruby-core:118702] " ko1 (Koichi Sasada) via ruby-core
@ 2024-07-27  7:00 ` byroot (Jean Boussier) via ruby-core
  2024-07-27  7:09 ` [ruby-core:118708] " byroot (Jean Boussier) via ruby-core
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-07-27  7:00 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


I don't follow, how can it be re-used in your example? :

```ruby
def foo
  if /foo/ =~ "foo" # generate MatchData1
    bar()
    p $~ # #<MatchData "all">
  end
end

def bar
  m = "all".match(/all/) # reuse MatchData1
  p $~ # #<MatchData "foo">
end

foo
```

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109242

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118708] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (10 preceding siblings ...)
  2024-07-27  7:00 ` [ruby-core:118707] " byroot (Jean Boussier) via ruby-core
@ 2024-07-27  7:09 ` byroot (Jean Boussier) via ruby-core
  2024-07-27 10:58 ` [ruby-core:118709] " Eregon (Benoit Daloze) via ruby-core
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-07-27  7:09 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


Interestingly the NO_MATCH regexp options was suggested in the ticket that led to `Regexp#match?` [Feature #8110]

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109243

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118709] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (11 preceding siblings ...)
  2024-07-27  7:09 ` [ruby-core:118708] " byroot (Jean Boussier) via ruby-core
@ 2024-07-27 10:58 ` Eregon (Benoit Daloze) via ruby-core
  2024-07-27 11:05 ` [ruby-core:118710] " Eregon (Benoit Daloze) via ruby-core
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-07-27 10:58 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20652 has been updated by Eregon (Benoit Daloze).


ko1 (Koichi Sasada) wrote in #note-8:
> what does it happen when creating a thread just after storing `$~`?

It's simply not set in the new thread, and that seems already the same behavior on CRuby:
```
$ ruby -ve '"a" =~ /a/; p $~; Thread.new { p $~ }.join'
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux]
#<MatchData "a">
nil
$ ruby -ve '"a" =~ /a/; p $~; Thread.new { p $~ }.join'
truffleruby 24.1.0-dev-c2e5209c, like ruby 3.2.2, GraalVM CE Native [x86_64-linux]
#<MatchData "a">
nil
```

ko1 (Koichi Sasada) wrote in #note-11:
> I found an idea that each thread points to unescaped MatchData rather than `$~` and reuse it.

I think that's too incompatible because `$~` is frame-local and thread-local, so we need multiple `$~` per thread, as @byroot showed.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109244

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118710] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (12 preceding siblings ...)
  2024-07-27 10:58 ` [ruby-core:118709] " Eregon (Benoit Daloze) via ruby-core
@ 2024-07-27 11:05 ` Eregon (Benoit Daloze) via ruby-core
  2024-07-27 22:04 ` [ruby-core:118711] " ko1 (Koichi Sasada) via ruby-core
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-07-27 11:05 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20652 has been updated by Eregon (Benoit Daloze).


byroot (Jean Boussier) wrote in #note-9:
> I see what you mean, but such flag would only really be worth using in places where saving that allocation is worth it, where right now you usually use a literal anyway, so I don't think duplication would be a concern.

I'm thinking cases of Regexps being stored in constants and potentially composed of other regexps/strings, like https://github.com/ruby/uri/blob/master/lib/uri/rfc3986_parser.rb does it for example.
It seems bad to duplicate some of these regexps if (for the same large Regexp) we have some call sites which need the MatchData and some which don't.

Also `Regexp#match` (which returns a MatchData) would make no sense with that flag, so it feels the wrong place to specify it.

Regarding `gsub` specifically, I think it shouldn't set `$~`, i.e. only set it if a block is passed.
For example with `"abc".gsub("a", "d")` I don't see any point to set `$~` after it.
That's potential incompatible, but we could warn for a release or so and I'd think few gems rely on that.


----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109245

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118711] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (13 preceding siblings ...)
  2024-07-27 11:05 ` [ruby-core:118710] " Eregon (Benoit Daloze) via ruby-core
@ 2024-07-27 22:04 ` ko1 (Koichi Sasada) via ruby-core
  2024-07-28 11:57 ` [ruby-core:118716] " byroot (Jean Boussier) via ruby-core
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ko1 (Koichi Sasada) via ruby-core @ 2024-07-27 22:04 UTC (permalink / raw)
  To: ruby-core; +Cc: ko1 (Koichi Sasada)

Issue #20652 has been updated by ko1 (Koichi Sasada).


Eregon (Benoit Daloze) wrote in #note-14:
> ko1 (Koichi Sasada) wrote in #note-11:
> > I found an idea that each thread points to unescaped MatchData rather than `$~` and reuse it.
> 
> I think that's too incompatible because `$~` is frame-local and thread-local, so we need multiple `$~` per thread, as @byroot showed.

No. It is not user visible behavior so no incompatiblity.
(don't change `$~`)

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109246

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118716] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (14 preceding siblings ...)
  2024-07-27 22:04 ` [ruby-core:118711] " ko1 (Koichi Sasada) via ruby-core
@ 2024-07-28 11:57 ` byroot (Jean Boussier) via ruby-core
  2024-07-29 17:39 ` [ruby-core:118720] " Dan0042 (Daniel DeLorme) via ruby-core
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-07-28 11:57 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


> I'm thinking cases of Regexps being stored in constants and potentially composed of other regexps/strings, like https://github.com/ruby/uri/blob/master/lib/uri/rfc3986_parser.rb does it for example.

Sure, there are cases where it wouldn't be convenient. But the thing is, adding this extra flag would only really make a difference in hotspots so I don't mind too much if there are some cases where it's not super convenient.

So I don't think it's a good argument against.

> Also Regexp#match (which returns a MatchData) would make no sense with that flag, so it feels the wrong place to specify it.

With the name I suggested, maybe, but with the proper name it would be fine.

> Regarding gsub/sub specifically, I think it shouldn't set $~, i.e. only set it if a block is passed.

That has backward compatibility concerns, unlikely to be accepted, and even if it was, the deprecation period would be annoying for not so much gain.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109250

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118720] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (15 preceding siblings ...)
  2024-07-28 11:57 ` [ruby-core:118716] " byroot (Jean Boussier) via ruby-core
@ 2024-07-29 17:39 ` Dan0042 (Daniel DeLorme) via ruby-core
  2024-07-29 19:19 ` [ruby-core:118722] " Dan0042 (Daniel DeLorme) via ruby-core
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-07-29 17:39 UTC (permalink / raw)
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20652 has been updated by Dan0042 (Daniel DeLorme).


byroot (Jean Boussier) wrote in #note-4:
> Maybe we could add a new Regexp flag to turn off this behavior?

My first reaction was "Yes! This is exactly was we need!" but after thinking more it feels un-rubyish. We shouldn't have to write code to micro-tweak the performance like that. Ideally the interpreter/jit should handle micro-optimizations. I'd be happier to see something like:
```ruby
def foo(str)
  str if str =~ /rx/
  #in this method we don't use $~ and friends, so the interpreter doesn't have to allocate MatchData
  #yes in theory there's an incompatibility with eval, but in practice I believe that's a non-issue (I'm open to be shown otherwise)
end
```


shyouhei (Shyouhei Urabe) wrote in #note-5:
> Nobody thinks `gsub` modifies `$~` behind the scene.

I'm not sure this is what you meant, but I definitely expect ANY regexp operation to modify `$~` (except where documented otherwise, like `Regexp#match?`)
BTW this includes String#sub; I sometimes write code like: `name, prefix = str.sub(/^(the) /i), $1`


----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109255

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118722] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (16 preceding siblings ...)
  2024-07-29 17:39 ` [ruby-core:118720] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2024-07-29 19:19 ` Dan0042 (Daniel DeLorme) via ruby-core
  2024-07-29 19:40 ` [ruby-core:118723] " jeremyevans0 (Jeremy Evans) via ruby-core
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-07-29 19:19 UTC (permalink / raw)
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20652 has been updated by Dan0042 (Daniel DeLorme).


After reading over https://github.com/ruby/ruby/pull/4734/files it seems there's two parts to it.
1. use a `set_match` pointer to return the match (this fixes the race condition)
2. always allocate a MatchData, never using `rb_backref_get()`

But it seems to me that #2 is only necessary if `set_match` is used. So what about using `rb_backref_get()` when possible? Like
```ruby
match = set_match ? Qnil : rb_backref_get();
```

@jeremyevans0 wdyt?

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109257

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118723] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (17 preceding siblings ...)
  2024-07-29 19:19 ` [ruby-core:118722] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2024-07-29 19:40 ` jeremyevans0 (Jeremy Evans) via ruby-core
  2024-07-29 21:29 ` [ruby-core:118724] " byroot (Jean Boussier) via ruby-core
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jeremyevans0 (Jeremy Evans) via ruby-core @ 2024-07-29 19:40 UTC (permalink / raw)
  To: ruby-core; +Cc: jeremyevans0 (Jeremy Evans)

Issue #20652 has been updated by jeremyevans0 (Jeremy Evans).


Dan0042 (Daniel DeLorme) wrote in #note-19:
> After reading over https://github.com/ruby/ruby/pull/4734/files it seems there's two parts to it.
> 1. use a `set_match` pointer to return the match (this fixes the race condition)
> 2. always allocate a MatchData, never using `rb_backref_get()`
> 
> But it seems to me that #2 is only necessary if `set_match` is used. So what about using `rb_backref_get()` when possible? Like
> ```ruby
> match = set_match ? Qnil : rb_backref_get();
> ```
> 
> @jeremyevans0 wdyt?

I'm not sure it is thread-safe.  This would modify a shared backref in code paths where `set_match` is `NULL`.  I haven't audited the related code, so I'm not sure what code calls `rb_reg_search0` and `rb_reg_match`.  Feel free to give it a try and see if passes the test and reduces allocations.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109258

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118724] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (18 preceding siblings ...)
  2024-07-29 19:40 ` [ruby-core:118723] " jeremyevans0 (Jeremy Evans) via ruby-core
@ 2024-07-29 21:29 ` byroot (Jean Boussier) via ruby-core
  2024-07-30  5:18 ` [ruby-core:118728] " ko1 (Koichi Sasada) via ruby-core
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-07-29 21:29 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


> I'd be happier to see something like:

@Eregon given how good TruffleRuby is at escape analysis and such, before I dive into why it wasn't done before, is TruffleRuby able to not create the MatchData when it's not accessed, or is there something in the semantic that make it impossible to predict?

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109259

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118728] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (19 preceding siblings ...)
  2024-07-29 21:29 ` [ruby-core:118724] " byroot (Jean Boussier) via ruby-core
@ 2024-07-30  5:18 ` ko1 (Koichi Sasada) via ruby-core
  2024-07-30  5:25 ` [ruby-core:118729] " byroot (Jean Boussier) via ruby-core
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ko1 (Koichi Sasada) via ruby-core @ 2024-07-30  5:18 UTC (permalink / raw)
  To: ruby-core; +Cc: ko1 (Koichi Sasada)

Issue #20652 has been updated by ko1 (Koichi Sasada).


ko1 (Koichi Sasada) wrote in #note-16:
> No. It is not user visible behavior so no incompatiblity.

Sorry my wrong. Please ignore about it.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109264

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118729] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (20 preceding siblings ...)
  2024-07-30  5:18 ` [ruby-core:118728] " ko1 (Koichi Sasada) via ruby-core
@ 2024-07-30  5:25 ` byroot (Jean Boussier) via ruby-core
  2024-07-30 11:37 ` [ruby-core:118736] " Eregon (Benoit Daloze) via ruby-core
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-07-30  5:25 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


> is there something in the semantic that make it impossible to predict?

Answering to myself:

```ruby
def match
  "foo" =~ /f(o)o/
  eval("$1")
end

p match
```


----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109265

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118736] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (21 preceding siblings ...)
  2024-07-30  5:25 ` [ruby-core:118729] " byroot (Jean Boussier) via ruby-core
@ 2024-07-30 11:37 ` Eregon (Benoit Daloze) via ruby-core
  2024-07-30 15:21 ` [ruby-core:118748] " byroot (Jean Boussier) via ruby-core
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-07-30 11:37 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20652 has been updated by Eregon (Benoit Daloze).


@byroot It depends in which situation but generally yes it's able to avoid the allocation.
If there is no block around, partial escape analysis avoids the allocation of the Ruby MatchData object as long as it's not leaked/stored globally (like `$m = $~`), even if it is accessed in that method.
There might be still be an allocation of the internal data structure representing the group offsets, if the Regexp sometimes matches and sometimes not (but tail duplication can fix this in some cases, e.g. if there is not too much code to duplicate).

If there is 1/multiple block(s) around the regexp match, then `$~` is stored in the method's frame and not the block's frame and then it's allocated unless there is a compilation covering the method and inlining everything until that block.

The case I checked is:
```ruby
def foo
  "a" =~ /(a)/
  $1
end

loop { foo() }
```
with `cd truffleruby && chruby truffleruby+graalvm-24.0.2 && jt -u ruby graph test.rb` and that shows the only allocation is for a new Ruby String (the return value).

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109275

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118748] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (22 preceding siblings ...)
  2024-07-30 11:37 ` [ruby-core:118736] " Eregon (Benoit Daloze) via ruby-core
@ 2024-07-30 15:21 ` byroot (Jean Boussier) via ruby-core
  2024-07-30 15:32 ` [ruby-core:118749] " Dan0042 (Daniel DeLorme) via ruby-core
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-07-30 15:21 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


Right, so it's not as simple as marking the ISeq as not needing the backref because it doesn't use `getspecial`.

I think we could only realistically do it in MRI if we accepted that `$~` and such wouldn't be accessible from `eval`. Could be worth asking at the developer meeting, but I'd be surprised if it was accepted.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109290

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:118749] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (23 preceding siblings ...)
  2024-07-30 15:21 ` [ruby-core:118748] " byroot (Jean Boussier) via ruby-core
@ 2024-07-30 15:32 ` Dan0042 (Daniel DeLorme) via ruby-core
  2025-02-24 17:33 ` [ruby-core:121152] " byroot (Jean Boussier) via ruby-core
  2025-02-24 17:35 ` [ruby-core:121153] " byroot (Jean Boussier) via ruby-core
  26 siblings, 0 replies; 28+ messages in thread
From: Dan0042 (Daniel DeLorme) via ruby-core @ 2024-07-30 15:32 UTC (permalink / raw)
  To: ruby-core; +Cc: Dan0042 (Daniel DeLorme)

Issue #20652 has been updated by Dan0042 (Daniel DeLorme).


> I'd be surprised if it was accepted.

Same here. Although perhaps I should clarify that `$~` would be accessible in eval if also present as a literal in the method.



----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-109292

* Author: orisano (Nao Yonashiro)
* Status: Open
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:121152] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (24 preceding siblings ...)
  2024-07-30 15:32 ` [ruby-core:118749] " Dan0042 (Daniel DeLorme) via ruby-core
@ 2025-02-24 17:33 ` byroot (Jean Boussier) via ruby-core
  2025-02-24 17:35 ` [ruby-core:121153] " byroot (Jean Boussier) via ruby-core
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-02-24 17:33 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


The commit says the issue is partially fixed, but I made a followup that elide the MatchData allocation in almost every cases. So I think this issue can be considered fully fixed.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-112089

* Author: orisano (Nao Yonashiro)
* Status: Closed
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [ruby-core:121153] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3
  2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
                   ` (25 preceding siblings ...)
  2025-02-24 17:33 ` [ruby-core:121152] " byroot (Jean Boussier) via ruby-core
@ 2025-02-24 17:35 ` byroot (Jean Boussier) via ruby-core
  26 siblings, 0 replies; 28+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2025-02-24 17:35 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20652 has been updated by byroot (Jean Boussier).


Sorry, I wasn't clear. The allocation is eluded if the previous MatchData wasn't accessed. If it was, I'm afraid there's nothing we can safely do.

----------------------------------------
Misc #20652: Memory allocation for gsub has increased from Ruby 2.7 to 3.3
https://bugs.ruby-lang.org/issues/20652#change-112090

* Author: orisano (Nao Yonashiro)
* Status: Closed
* Assignee: jeremyevans0 (Jeremy Evans)
----------------------------------------
I recently upgraded from ruby 2.7.7 to 3.3.1 and noticed that the GC load increased.
When I used the allocation profiler to investigate, I found that memory allocation from gsub had increased.

The problem was code like this:
```ruby
s = "foo              "
s.gsub(/ (\s+)/) { " #{'&nbsp;' * Regexp.last_match(1).length}" }
```

When I compared the results of heap-profiler between 2.7.7 and 3.3.1, I found that MatchData was increasing.

https://gist.github.com/orisano/98792dee260106e9b6fcb45bbabeb1e6

https://github.com/ruby/ruby/commit/abc0304cb28cb9dcc3476993bc487884c139fd11

I discovered that the cause is this commit, which stopped reusing backref to avoid race conditions.
Is there a way to reuse backref while still avoiding race conditions?



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-02-24 17:35 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-25 10:11 [ruby-core:118682] [Ruby master Misc#20652] Memory allocation for gsub has increased from Ruby 2.7 to 3.3 orisano (Nao Yonashiro) via ruby-core
2024-07-25 11:13 ` [ruby-core:118683] " mame (Yusuke Endoh) via ruby-core
2024-07-25 14:18 ` [ruby-core:118685] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-07-26  6:15 ` [ruby-core:118689] " byroot (Jean Boussier) via ruby-core
2024-07-26  7:19 ` [ruby-core:118691] " shyouhei (Shyouhei Urabe) via ruby-core
2024-07-26  9:23 ` [ruby-core:118692] " byroot (Jean Boussier) via ruby-core
2024-07-26 11:09 ` [ruby-core:118693] " Eregon (Benoit Daloze) via ruby-core
2024-07-26 12:05 ` [ruby-core:118694] " ko1 (Koichi Sasada) via ruby-core
2024-07-26 12:09 ` [ruby-core:118695] " byroot (Jean Boussier) via ruby-core
2024-07-26 13:54 ` [ruby-core:118697] " austin (Austin Ziegler) via ruby-core
2024-07-26 17:04 ` [ruby-core:118702] " ko1 (Koichi Sasada) via ruby-core
2024-07-27  7:00 ` [ruby-core:118707] " byroot (Jean Boussier) via ruby-core
2024-07-27  7:09 ` [ruby-core:118708] " byroot (Jean Boussier) via ruby-core
2024-07-27 10:58 ` [ruby-core:118709] " Eregon (Benoit Daloze) via ruby-core
2024-07-27 11:05 ` [ruby-core:118710] " Eregon (Benoit Daloze) via ruby-core
2024-07-27 22:04 ` [ruby-core:118711] " ko1 (Koichi Sasada) via ruby-core
2024-07-28 11:57 ` [ruby-core:118716] " byroot (Jean Boussier) via ruby-core
2024-07-29 17:39 ` [ruby-core:118720] " Dan0042 (Daniel DeLorme) via ruby-core
2024-07-29 19:19 ` [ruby-core:118722] " Dan0042 (Daniel DeLorme) via ruby-core
2024-07-29 19:40 ` [ruby-core:118723] " jeremyevans0 (Jeremy Evans) via ruby-core
2024-07-29 21:29 ` [ruby-core:118724] " byroot (Jean Boussier) via ruby-core
2024-07-30  5:18 ` [ruby-core:118728] " ko1 (Koichi Sasada) via ruby-core
2024-07-30  5:25 ` [ruby-core:118729] " byroot (Jean Boussier) via ruby-core
2024-07-30 11:37 ` [ruby-core:118736] " Eregon (Benoit Daloze) via ruby-core
2024-07-30 15:21 ` [ruby-core:118748] " byroot (Jean Boussier) via ruby-core
2024-07-30 15:32 ` [ruby-core:118749] " Dan0042 (Daniel DeLorme) via ruby-core
2025-02-24 17:33 ` [ruby-core:121152] " byroot (Jean Boussier) via ruby-core
2025-02-24 17:35 ` [ruby-core:121153] " byroot (Jean Boussier) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).