ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:118998] [Ruby master Feature#20709] Improve String#rindex performance on OSX
@ 2024-08-31 18:05 zack.ref@gmail.com (Zack Deveau) via ruby-core
  2024-09-03 18:28 ` [ruby-core:119029] " jhawthorn (John Hawthorn) via ruby-core
  0 siblings, 1 reply; 2+ messages in thread
From: zack.ref@gmail.com (Zack Deveau) via ruby-core @ 2024-08-31 18:05 UTC (permalink / raw)
  To: ruby-core; +Cc: zack.ref@gmail.com (Zack Deveau)

Issue #20709 has been reported by zack.ref@gmail.com (Zack Deveau).

----------------------------------------
Feature #20709: Improve String#rindex performance on OSX
https://bugs.ruby-lang.org/issues/20709

* Author: zack.ref@gmail.com (Zack Deveau)
* Status: Open
----------------------------------------
`String#rindex` is much slower on OSX than on Linux hosts. This appears due to the lack of a `memrchr` implementation on OSX. The [fallback solution](https://github.com/ruby/ruby/blob/e69945fc57efba5c8895c21235e109145865952d/string.c#L4348-L4402) currently performs a `memcmp` on each character in the search string looking for a substring match.

```c
    while (s) {
        if (memcmp(s, t, slen) == 0) {
            return s - sbeg;
        }
        if (s <= sbeg) break;
        s = rb_enc_prev_char(sbeg, s, e, enc);
    }
```


```
ruby-master | 10 char substring (long) near the end of a search string | osx

search string enc: UTF-8
search string size: 8010

                        user     system      total        real
index (long substring)  0.000075   0.000000   0.000075 (  0.000074)
rindex                  0.000445   0.000000   0.000445 (  0.000448)
```

Proposed improvement is available on this pull request:



This patch proposes using an implementation of `memrchr` made for the interpreter on hosts that lack a native implementation. Below are benchmark results which demonstrate that this approach either improves or maintains performance of `String#rindex` for all cases we could come up with.

```c
static void*
rb_memrchr(const char *search_str, int chr, long search_len)
{
    const char *ptr = search_str + search_len;
    do {
       if (*(--ptr) == chr) return (void *)ptr;
    } while (ptr >= search_str);

    return ((void *)0);
}

```

Special thanks to @jhawthorn who provided some helpful feedback on this solution during development!

# Benchmarks

Each of the following were made with a variant of the following general purpose benchmark. Note that `String#index` speed can be [impacted by the size of the provided substring](https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274) (Denoted as long/short substring in the results, assume long substring when none is specified).
```
require "benchmark"

substring = "98279827AA"
the_string = ""
800.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)}
the_string.concat(substring)
200.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)}
puts the_string.encoding
puts the_string.size

Benchmark.bm do |x|
  x.report(:index) { 100.times { the_string.index(substring) } }
  x.report(:rindex) { 100.times { the_string.rindex(substring) } }
end
```

```
ruby-master | substring near the end | osx
https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274

UTF-8

       user     system      total        real
index  0.000075   0.000000   0.000075 (  0.000074)
rindex  0.000445   0.000000   0.000445 (  0.000448)
------------------------------------------------------
ruby-patched | substring near the end | osx
https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274

UTF-8
                        user     system      total        real
index (short substring) 0.000400   0.000001   0.000401 (  0.000399)
index (long substring)  0.000080   0.000000   0.000080 (  0.000080)
rindex                  0.000057   0.000001   0.000058 (  0.000057)
```


```
ruby-master | substring in the middle | osx

UTF-8
       user     system      total        real
index  0.000031   0.000000   0.000031 (  0.000030)
rindex  0.001086   0.000002   0.001088 (  0.001088)
------------------------------------------------------
ruby-patched | substring in the middle | osx

UTF-8
       user     system      total        real
index  0.000075   0.000000   0.000075 (  0.000075)
rindex  0.000130   0.000000   0.000130 (  0.000133)
```

```
ruby-master | substring in the middle with a short substring | osx
https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274

UTF-8
                        user     system      total        real
index (short substring) 0.000243   0.000004   0.000247 (  0.000246)
rindex                  0.001176   0.000005   0.001181 (  0.001182)
------------------------------------------------------
ruby-patched | substring in the middle with a short substring | osx
https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274

UTF-8
                         user     system      total        real
index (short substring)  0.000255   0.000005   0.000260 (  0.000259)
rindex                   0.000130   0.000000   0.000130 (  0.000133)
```

```
ruby-master | substring at the end | osx

UTF-8
       user     system      total        real
index  0.000049   0.000001   0.000050 (  0.000048)
rindex  0.000008   0.000001   0.000009 (  0.000007)
------------------------------------------------------
ruby-patched | substring at the end | osx

UTF-8
       user     system      total        real
index  0.000075   0.000000   0.000075 (  0.000074)
rindex  0.000006   0.000000   0.000006 (  0.000006)
```


```
ruby-master | substring at the start | osx

UTF-8
       user     system      total        real
index  0.000008   0.000001   0.000009 (  0.000007)
rindex  0.002444   0.000008   0.002452 (  0.002452)
------------------------------------------------------
ruby-patched | substring at the start | osx

UTF-8
       user     system      total        real
index  0.000015   0.000000   0.000015 (  0.000015)
rindex  0.000280   0.000001   0.000281 (  0.000280)
```



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [ruby-core:119029] [Ruby master Feature#20709] Improve String#rindex performance on OSX
  2024-08-31 18:05 [ruby-core:118998] [Ruby master Feature#20709] Improve String#rindex performance on OSX zack.ref@gmail.com (Zack Deveau) via ruby-core
@ 2024-09-03 18:28 ` jhawthorn (John Hawthorn) via ruby-core
  0 siblings, 0 replies; 2+ messages in thread
From: jhawthorn (John Hawthorn) via ruby-core @ 2024-09-03 18:28 UTC (permalink / raw)
  To: ruby-core; +Cc: jhawthorn (John Hawthorn)

Issue #20709 has been updated by jhawthorn (John Hawthorn).

Status changed from Open to Closed

Merged in https://github.com/ruby/ruby/commit/e7cb70be4eb7411204f73ee748e317fefaa0410a

----------------------------------------
Feature #20709: Improve String#rindex performance on OSX
https://bugs.ruby-lang.org/issues/20709#change-109607

* Author: zack.ref@gmail.com (Zack Deveau)
* Status: Closed
----------------------------------------
`String#rindex` is much slower on OSX than on Linux hosts. This appears due to the lack of a `memrchr` implementation on OSX. The [fallback solution](https://github.com/ruby/ruby/blob/e69945fc57efba5c8895c21235e109145865952d/string.c#L4348-L4402) currently performs a `memcmp` on each character in the search string looking for a substring match.

```c
    while (s) {
        if (memcmp(s, t, slen) == 0) {
            return s - sbeg;
        }
        if (s <= sbeg) break;
        s = rb_enc_prev_char(sbeg, s, e, enc);
    }
```


```
ruby-master | 10 char substring (long) near the end of a search string | osx

search string enc: UTF-8
search string size: 8010

                        user     system      total        real
index (long substring)  0.000075   0.000000   0.000075 (  0.000074)
rindex                  0.000445   0.000000   0.000445 (  0.000448)
```

Proposed improvement is available on this pull request:
- https://github.com/ruby/ruby/pull/11519


This patch proposes using an implementation of `memrchr` made for the interpreter on hosts that lack a native implementation. Below are benchmark results which demonstrate that this approach either improves or maintains performance of `String#rindex` for all cases we could come up with.

```c
static void*
rb_memrchr(const char *search_str, int chr, long search_len)
{
    const char *ptr = search_str + search_len;
    do {
       if (*(--ptr) == chr) return (void *)ptr;
    } while (ptr >= search_str);

    return ((void *)0);
}

```

Special thanks to @jhawthorn who provided some helpful feedback on this solution during development!

# Benchmarks

Each of the following were made with a variant of the following general purpose benchmark. Note that `String#index` speed can be [impacted by the size of the provided substring](https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274) (Denoted as long/short substring in the results, assume long substring when none is specified).
```
require "benchmark"

substring = "98279827AA"
the_string = ""
800.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)}
the_string.concat(substring)
200.times{ the_string.concat(('a'..'z').to_a.shuffle[0,8].join)}
puts the_string.encoding
puts the_string.size

Benchmark.bm do |x|
  x.report(:index) { 100.times { the_string.index(substring) } }
  x.report(:rindex) { 100.times { the_string.rindex(substring) } }
end
```

```
ruby-master | substring near the end | osx
https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274

UTF-8

       user     system      total        real
index  0.000075   0.000000   0.000075 (  0.000074)
rindex  0.000445   0.000000   0.000445 (  0.000448)
------------------------------------------------------
ruby-patched | substring near the end | osx
https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274

UTF-8
                        user     system      total        real
index (short substring) 0.000400   0.000001   0.000401 (  0.000399)
index (long substring)  0.000080   0.000000   0.000080 (  0.000080)
rindex                  0.000057   0.000001   0.000058 (  0.000057)
```


```
ruby-master | substring in the middle | osx

UTF-8
       user     system      total        real
index  0.000031   0.000000   0.000031 (  0.000030)
rindex  0.001086   0.000002   0.001088 (  0.001088)
------------------------------------------------------
ruby-patched | substring in the middle | osx

UTF-8
       user     system      total        real
index  0.000075   0.000000   0.000075 (  0.000075)
rindex  0.000130   0.000000   0.000130 (  0.000133)
```

```
ruby-master | substring in the middle with a short substring | osx
https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274

UTF-8
                        user     system      total        real
index (short substring) 0.000243   0.000004   0.000247 (  0.000246)
rindex                  0.001176   0.000005   0.001181 (  0.001182)
------------------------------------------------------
ruby-patched | substring in the middle with a short substring | osx
https://github.com/ruby/ruby/blob/669d1f79d800d554918c7d5647b73a6431920ea8/re.c#L272-L274

UTF-8
                         user     system      total        real
index (short substring)  0.000255   0.000005   0.000260 (  0.000259)
rindex                   0.000130   0.000000   0.000130 (  0.000133)
```

```
ruby-master | substring at the end | osx

UTF-8
       user     system      total        real
index  0.000049   0.000001   0.000050 (  0.000048)
rindex  0.000008   0.000001   0.000009 (  0.000007)
------------------------------------------------------
ruby-patched | substring at the end | osx

UTF-8
       user     system      total        real
index  0.000075   0.000000   0.000075 (  0.000074)
rindex  0.000006   0.000000   0.000006 (  0.000006)
```


```
ruby-master | substring at the start | osx

UTF-8
       user     system      total        real
index  0.000008   0.000001   0.000009 (  0.000007)
rindex  0.002444   0.000008   0.002452 (  0.002452)
------------------------------------------------------
ruby-patched | substring at the start | osx

UTF-8
       user     system      total        real
index  0.000015   0.000000   0.000015 (  0.000015)
rindex  0.000280   0.000001   0.000281 (  0.000280)
```



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-09-03 18:28 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-31 18:05 [ruby-core:118998] [Ruby master Feature#20709] Improve String#rindex performance on OSX zack.ref@gmail.com (Zack Deveau) via ruby-core
2024-09-03 18:28 ` [ruby-core:119029] " jhawthorn (John Hawthorn) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).