ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
@ 2026-01-12  3:34 samyron (Scott Myron) via ruby-core
  2026-01-12  9:15 ` [ruby-core:124481] " byroot (Jean Boussier) via ruby-core
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: samyron (Scott Myron) via ruby-core @ 2026-01-12  3:34 UTC (permalink / raw)
  To: ruby-core; +Cc: samyron (Scott Myron)

Issue #21833 has been reported by samyron (Scott Myron).

----------------------------------------
Misc #21833: Switch default hash from SipHash13 to XXH3?
https://bugs.ruby-lang.org/issues/21833

* Author: samyron (Scott Myron)
* Status: Open
----------------------------------------
Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. 

I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash).

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

```
% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower
```

Running something a bit more real-world:

```
% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower
```

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:124481] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
  2026-01-12  3:34 [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3? samyron (Scott Myron) via ruby-core
@ 2026-01-12  9:15 ` byroot (Jean Boussier) via ruby-core
  2026-01-12 15:00 ` [ruby-core:124483] " samyron (Scott Myron) via ruby-core
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2026-01-12  9:15 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #21833 has been updated by byroot (Jean Boussier).


> Has there been any consideration switching to some other hash implementation?

There has been a few in the past, e.g. [Feature #16851]

> Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.

Well, the main concern is HashDOS, but looking at your branch, it seems you seed the hash function, so it's fine on that front.

----------------------------------------
Misc #21833: Switch default hash from SipHash13 to XXH3?
https://bugs.ruby-lang.org/issues/21833#change-116031

* Author: samyron (Scott Myron)
* Status: Open
----------------------------------------
Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. 

I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash).

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

```
% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower
```

Running something a bit more real-world:

```
% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower
```

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:124483] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
  2026-01-12  3:34 [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3? samyron (Scott Myron) via ruby-core
  2026-01-12  9:15 ` [ruby-core:124481] " byroot (Jean Boussier) via ruby-core
@ 2026-01-12 15:00 ` samyron (Scott Myron) via ruby-core
  2026-01-12 15:09 ` [ruby-core:124484] " samyron (Scott Myron) via ruby-core
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: samyron (Scott Myron) via ruby-core @ 2026-01-12 15:00 UTC (permalink / raw)
  To: ruby-core; +Cc: samyron (Scott Myron)

Issue #21833 has been updated by samyron (Scott Myron).


The same benchmarks on an M4 Pro:

```
benchmark-driver ~/Downloads/hash_strings.yml -e "ruby-master::~/.rubies/ruby-master/bin/ruby" -e "ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby"
<snip>
Comparison:
               tiny_hash_creation
         ruby-xxhash:     12650.7 i/s 
         ruby-master:     11909.3 i/s - 1.06x  slower

                med_hash_creation
         ruby-xxhash:     13716.7 i/s 
         ruby-master:     12271.1 i/s - 1.12x  slower

              large_hash_creation
         ruby-xxhash:     11178.4 i/s 
         ruby-master:      7120.5 i/s - 1.57x  slower

               huge_hash_creation
         ruby-xxhash:       235.8 i/s 
         ruby-master:        43.8 i/s - 5.38x  slower
```

```
benchmark-driver ~/Downloads/json_parse.yml -e "ruby-master::~/.rubies/ruby-master/bin/ruby" -e "ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby    
<snip>
Comparison:
                  parse_activitypub_json
                ruby-xxhash:     16495.3 i/s 
                ruby-master:     16192.3 i/s - 1.02x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1828.2 i/s 
                ruby-master:      1774.9 i/s - 1.03x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       881.7 i/s 
                ruby-master:       844.5 i/s - 1.04x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     18377.8 i/s 
                ruby-master:     17193.5 i/s - 1.07x  slower
```



----------------------------------------
Misc #21833: Switch default hash from SipHash13 to XXH3?
https://bugs.ruby-lang.org/issues/21833#change-116033

* Author: samyron (Scott Myron)
* Status: Open
----------------------------------------
Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. 

I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash).

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

```
% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower
```

Running something a bit more real-world:

```
% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower
```

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:124484] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
  2026-01-12  3:34 [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3? samyron (Scott Myron) via ruby-core
  2026-01-12  9:15 ` [ruby-core:124481] " byroot (Jean Boussier) via ruby-core
  2026-01-12 15:00 ` [ruby-core:124483] " samyron (Scott Myron) via ruby-core
@ 2026-01-12 15:09 ` samyron (Scott Myron) via ruby-core
  2026-01-12 20:10 ` [ruby-core:124486] " bdewater (Bart de Water) via ruby-core
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: samyron (Scott Myron) via ruby-core @ 2026-01-12 15:09 UTC (permalink / raw)
  To: ruby-core; +Cc: samyron (Scott Myron)

Issue #21833 has been updated by samyron (Scott Myron).


byroot (Jean Boussier) wrote in #note-2:
> Well, the main concern is HashDOS, but looking at your branch, it seems you seed the hash function, so it's fine on that front.

I did note the HashDOS conversation on https://bugs.ruby-lang.org/issues/13017 so I used XXH3_64bits_withSecret as an attempt to mitigate HashDOS by using the default secret and seed. 

Reading through the xxhash docs I think `XXH3_64bits_withSecretandSeed` _might_ even be faster but I have not tried it yet. 

----------------------------------------
Misc #21833: Switch default hash from SipHash13 to XXH3?
https://bugs.ruby-lang.org/issues/21833#change-116034

* Author: samyron (Scott Myron)
* Status: Open
----------------------------------------
Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. 

I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash).

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

```
% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower
```

Running something a bit more real-world:

```
% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower
```

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:124486] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
  2026-01-12  3:34 [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3? samyron (Scott Myron) via ruby-core
                   ` (2 preceding siblings ...)
  2026-01-12 15:09 ` [ruby-core:124484] " samyron (Scott Myron) via ruby-core
@ 2026-01-12 20:10 ` bdewater (Bart de Water) via ruby-core
  2026-01-19 23:27   ` [ruby-core:124593] " Vladimir Makarov via ruby-core
  2026-01-12 20:20 ` [ruby-core:124487] " samyron (Scott Myron) via ruby-core
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 10+ messages in thread
From: bdewater (Bart de Water) via ruby-core @ 2026-01-12 20:10 UTC (permalink / raw)
  To: ruby-core; +Cc: bdewater (Bart de Water)

Issue #21833 has been updated by bdewater (Bart de Water).


FWIW 
- https://github.com/Nicoshev/rapidhash claims to be even faster and passes the SMHasher tests
- Since Rust 1.36 they switched from SipHash13 to https://github.com/rust-lang/hashbrown for hashmaps

----------------------------------------
Misc #21833: Switch default hash from SipHash13 to XXH3?
https://bugs.ruby-lang.org/issues/21833#change-116038

* Author: samyron (Scott Myron)
* Status: Open
----------------------------------------
Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. 

I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash).

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

```
% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower
```

Running something a bit more real-world:

```
% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower
```

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:124487] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
  2026-01-12  3:34 [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3? samyron (Scott Myron) via ruby-core
                   ` (3 preceding siblings ...)
  2026-01-12 20:10 ` [ruby-core:124486] " bdewater (Bart de Water) via ruby-core
@ 2026-01-12 20:20 ` samyron (Scott Myron) via ruby-core
  2026-01-13 12:50 ` [ruby-core:124515] " samyron (Scott Myron) via ruby-core
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: samyron (Scott Myron) via ruby-core @ 2026-01-12 20:20 UTC (permalink / raw)
  To: ruby-core; +Cc: samyron (Scott Myron)

Issue #21833 has been updated by samyron (Scott Myron).


bdewater (Bart de Water) wrote in #note-6:
> FWIW 
> - https://github.com/Nicoshev/rapidhash claims to be even faster and passes the SMHasher tests
> - Since Rust 1.36 they switched from SipHash13 to https://github.com/rust-lang/hashbrown for hashmaps

I can give rapidhash a try. It's based on wyHash which Go uses (at least sometimes): https://github.com/golang/go/blob/cbe153806e67a16e362a1cdbbf1741d4ce82e98a/src/runtime/hash64.go

----------------------------------------
Misc #21833: Switch default hash from SipHash13 to XXH3?
https://bugs.ruby-lang.org/issues/21833#change-116039

* Author: samyron (Scott Myron)
* Status: Open
----------------------------------------
Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. 

I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash).

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

```
% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower
```

Running something a bit more real-world:

```
% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower
```

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:124515] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
  2026-01-12  3:34 [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3? samyron (Scott Myron) via ruby-core
                   ` (4 preceding siblings ...)
  2026-01-12 20:20 ` [ruby-core:124487] " samyron (Scott Myron) via ruby-core
@ 2026-01-13 12:50 ` samyron (Scott Myron) via ruby-core
  2026-02-11  3:56 ` [ruby-core:124766] " samyron (Scott Myron) via ruby-core
  2026-02-11  7:21 ` [ruby-core:124767] " byroot (Jean Boussier) via ruby-core
  7 siblings, 0 replies; 10+ messages in thread
From: samyron (Scott Myron) via ruby-core @ 2026-01-13 12:50 UTC (permalink / raw)
  To: ruby-core; +Cc: samyron (Scott Myron)

Issue #21833 has been updated by samyron (Scott Myron).


[rapidhash](https://github.com/Nicoshev/rapidhash) is (mostly) faster than xxh3 on my M1 Macbook Air. The `large_hash_creation` has twice reported that xxhash is faster than rapidhash.

Note that rapidhash uses a single 64bit seed. xxh3 uses a 136 byte secret. 

Hashing strings:
```
tiny_hash_creation
      ruby-rapidhash:      9267.6 i/s 
         ruby-xxhash:      8970.8 i/s - 1.03x  slower
         ruby-master:      8329.4 i/s - 1.11x  slower

                med_hash_creation
      ruby-rapidhash:      9276.3 i/s 
         ruby-xxhash:      9274.3 i/s - 1.00x  slower
         ruby-master:      8097.3 i/s - 1.15x  slower

              large_hash_creation
         ruby-xxhash:      7758.0 i/s 
      ruby-rapidhash:      7597.1 i/s - 1.02x  slower
         ruby-master:      4318.7 i/s - 1.80x  slower

               huge_hash_creation
      ruby-rapidhash:       187.1 i/s 
         ruby-xxhash:       165.4 i/s - 1.13x  slower
         ruby-master:        25.1 i/s - 7.45x  slower
```


JSON Parsing:
```
Comparison:
                  parse_activitypub_json
             ruby-rapidhash:     11210.7 i/s 
                ruby-xxhash:     11186.6 i/s - 1.00x  slower
                ruby-master:     11146.3 i/s - 1.01x  slower

                  parse_twitter_json_txt
             ruby-rapidhash:      1199.5 i/s 
                ruby-xxhash:      1186.2 i/s - 1.01x  slower
                ruby-master:      1169.7 i/s - 1.03x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       611.1 i/s 
             ruby-rapidhash:       609.1 i/s - 1.00x  slower
                ruby-master:       595.1 i/s - 1.03x  slower

                     parse_ohai_json_txt
             ruby-rapidhash:     12557.8 i/s 
                ruby-xxhash:     12365.8 i/s - 1.02x  slower
                ruby-master:     10824.1 i/s - 1.16x  slower

```

----------------------------------------
Misc #21833: Switch default hash from SipHash13 to XXH3?
https://bugs.ruby-lang.org/issues/21833#change-116075

* Author: samyron (Scott Myron)
* Status: Open
----------------------------------------
Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. 

I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash).

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

```
% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower
```

Running something a bit more real-world:

```
% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower
```

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:124593] Re: [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
  2026-01-12 20:10 ` [ruby-core:124486] " bdewater (Bart de Water) via ruby-core
@ 2026-01-19 23:27   ` Vladimir Makarov via ruby-core
  0 siblings, 0 replies; 10+ messages in thread
From: Vladimir Makarov via ruby-core @ 2026-01-19 23:27 UTC (permalink / raw)
  To: ruby-core; +Cc: Vladimir Makarov


On 1/12/26 3:10 PM, bdewater (Bart de Water) via ruby-core wrote:
> Issue #21833 has been updated by bdewater (Bart de Water).
>
>
> FWIW
> - https://github.com/Nicoshev/rapidhash claims to be even faster and passes the SMHasher tests
> - Since Rust 1.36 they switched from SipHash13 to https://github.com/rust-lang/hashbrown for hashmaps
>
Most of the fastest hash functions are based on multiplications as a
fast and portable way to mix data value bits. Instead of mixing N bits
at a time, you mix NxN bits with a single instruction. However, this
is no longer sufficient: the fastest hash functions now mix two data
words with a single multiplication.

rapidhash, wyHash, and xxHash are exactly this kind of function.

rapidhash and wyHash are very weak in terms of collision
resistance. Please, just look at

https://github.com/wangyi-fudan/wyhash/blob/46cebe9dc4e51f94d0dca287733bc5a94f76a10d/wyhash.h#L130 
for wyHash

and 
https://github.com/Nicoshev/rapidhash/blob/d60698faa10916879f85b2799bfdc6996b94c2b7/rapidhash.h#L383 
for rapidhash

Basically, they contain the following code:

```
update(state, mum(data64[n]^constant,data64[n+1]^state))
```

where `mum` is `uint64 mum(a uint64,b uint64) {uint128 r=a*b; return 
(uint64)r ^ (uint64)(r>>64);}`

If `data64[n] == constant`, then `mum` returns zero independently of the
value of `data64[n + 1]`. As a result, it is easy to generate many
inputs with the same hash value, causing hash tables to exhibit
quadratic behavior and enabling denial-of-service attacks on servers
that use hash tables.

Go uses AES instructions (on some x86 and arm64 CPUs) for map
hashing. If AES instructions are unavailable, it uses a hash function
“inspired by wyHash,” but without this vulnerability. It contains
analogous code:

https://github.com/golang/go/blob/532e3203492ebcac67b2f3aa2a52115f49d51997/src/runtime/hash64.go#L49-L51

However, instead of constants, Go uses randomly generated values. This
considerably decreases hash speed (because it requires additional
memory reads), but it makes the hash function much less vulnerable.

xxHash is somewhat better than wyHash and rapidhash. It has the 
following code:

https://github.com/Cyan4973/xxHash/blob/66979328cf3f15cecdc61ea58c9f81e6071f8983/xxhash.h#L4787-L4791

which is essentially:

```
update(state, mum(data64[n]^constant1 + seed,data64[n+1]^constant2 - seed))
```

If the seed is known, the same type of attack can be
performed. Therefore, xxHash should not be used with the default or
any other constant seed.

The solution to collision attacks against multiplication-based hash
functions is either not to mix two data words in a single
multiplication, or to detect zero multiplication and always return
value which dependent on both values. The first approach significantly
reduces hash speed. The second approach has a much smaller performance
impact, since modern CPUs allow such code to be vectorized and written
without introducing branches.

[VMUM V2](https://github.com/vnmakarov/mum-hash) uses the later approach and
has performance competitive with wyHash, rapidhash, and xxHash.

**In brief, using RapidHash and wyHash is dangerous. XXHash should only
be used with randomly generated seeds. SipHash is a safe choice (like
VMUM V2), as it is collision-resistant regardless of the seed. (Full
disclosure: as the author of VMUM V2, I may be biased.)**


______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:124766] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
  2026-01-12  3:34 [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3? samyron (Scott Myron) via ruby-core
                   ` (5 preceding siblings ...)
  2026-01-13 12:50 ` [ruby-core:124515] " samyron (Scott Myron) via ruby-core
@ 2026-02-11  3:56 ` samyron (Scott Myron) via ruby-core
  2026-02-11  7:21 ` [ruby-core:124767] " byroot (Jean Boussier) via ruby-core
  7 siblings, 0 replies; 10+ messages in thread
From: samyron (Scott Myron) via ruby-core @ 2026-02-11  3:56 UTC (permalink / raw)
  To: ruby-core; +Cc: samyron (Scott Myron)

Issue #21833 has been updated by samyron (Scott Myron).


Anonymous wrote in #note-9:
> <snip>
>  **In brief, using RapidHash and wyHash is dangerous. XXHash should only
>  be used with randomly generated seeds. SipHash is a safe choice (like
>  VMUM V2), as it is collision-resistant regardless of the seed. (Full
>  disclosure: as the author of VMUM V2, I may be biased.)**
>  
>  
>  ______________________________________________

Thank you for this awesome reply! I learned a lot from it. I do appreciate the thorough explanation of the issue with these multiplication based hash functions. 

Note that the [a5hash](https://github.com/avaneev/a5hash) algorithm explains this same problem (I believe) which is called "Blinding Multiplication". This is new terminology to me so I'm leaving it here in the event others find it helpful.

I used both a [random secret and seed](https://github.com/samyron/ruby/blob/1e4ff4ae311b7b1d0bc1dd4eb0e6750da714edc7/random.c#L1773-L1787) when incorporating XXH3 into ruby. 


----------------------------------------
Misc #21833: Switch default hash from SipHash13 to XXH3?
https://bugs.ruby-lang.org/issues/21833#change-116375

* Author: samyron (Scott Myron)
* Status: Open
----------------------------------------
Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. 

I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash).

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

```
% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower
```

Running something a bit more real-world:

```
% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower
```

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:124767] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3?
  2026-01-12  3:34 [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3? samyron (Scott Myron) via ruby-core
                   ` (6 preceding siblings ...)
  2026-02-11  3:56 ` [ruby-core:124766] " samyron (Scott Myron) via ruby-core
@ 2026-02-11  7:21 ` byroot (Jean Boussier) via ruby-core
  7 siblings, 0 replies; 10+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2026-02-11  7:21 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #21833 has been updated by byroot (Jean Boussier).


@samyron if you wish to bring this forward, I'd suggest to try ruby-bench headline benchmarks: https://github.com/ruby/ruby-bench?tab=readme-ov-file#specific-categories 

If your patch does show an improvement on these much more general and meaty benchmark I think it would be strong argument in favor. 

And either way, if you want this patch to get attention, you'll have to add it to the devmeeting agenda.

----------------------------------------
Misc #21833: Switch default hash from SipHash13 to XXH3?
https://bugs.ruby-lang.org/issues/21833#change-116376

* Author: samyron (Scott Myron)
* Status: Open
----------------------------------------
Has there been any consideration switching to some other hash implementation?

I've searched through the issues and haven't found anything related to switching the default hash from SipHash13 to anything else. 

I created a [branch](https://github.com/ruby/ruby/compare/master...samyron:ruby:sm/xxh3) which switched `rb_memhash` from SipHash13 to [XXH3](https://github.com/Cyan4973/xxHash).

I created a few simple benchmarks and ran them on my M1 Macbook Air. The results are very promising.

```
% cat ~/string_hash.yml 
prelude: |
  # Generate sets of short vs medium strings
  TINY_STRINGS = Array.new(100) { Array.new(3).map { (97 + rand(26)).chr }.join }.freeze
  SMALL_STRINGS = Array.new(100) { Array.new(8).map { (97 + rand(26)).chr }.join }.freeze
  MED_STRINGS  = Array.new(100) { Array.new(20).map { (97 + rand(26)).chr }.join }.freeze
  LARGE_STRINGS  = Array.new(100) { Array.new(200).map { (97 + rand(26)).chr }.join }.freeze
  HUGE_STRINGS  = Array.new(100) { Array.new(65536).map { (97 + rand(26)).chr }.join }.freeze

benchmark:
  tiny_strings: |
    TINY_STRINGS.each { |s| s.hash }
  
  small_strings: |
    SMALL_STRINGS.each { |s| s.hash }

  medium_strings: |
    MED_STRINGS.each { |s| s.hash }

  large_strings: |
    LARGE_STRINGS.each { |s| s.hash }

  huge_strings: |
    HUGE_STRINGS.each { |s| s.hash 

% benchmark-driver ~/string_hash.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
        tiny_strings    262.513k i/s -    283.844k times in 1.081258s (3.81μs/i)
       small_strings    259.803k i/s -    280.445k times in 1.079454s (3.85μs/i)
      medium_strings    249.553k i/s -    267.531k times in 1.072041s (4.01μs/i)
       large_strings    116.426k i/s -    126.005k times in 1.082275s (8.59μs/i)
        huge_strings     498.481 i/s -     500.000 times in 1.003047s (2.01ms/i)
Calculating -------------------------------------
                     ruby-master  ruby-xxhash 
        tiny_strings    264.070k     288.960k i/s -    787.538k times in 2.982305s 2.725421s
       small_strings    259.941k     286.229k i/s -    779.407k times in 2.998394s 2.723019s
      medium_strings    249.249k     283.952k i/s -    748.658k times in 3.003655s 2.636561s
       large_strings    116.572k     240.823k i/s -    349.278k times in 2.996244s 1.450351s
        huge_strings     500.164       5.296k i/s -      1.495k times in 2.989019s 0.282263s

Comparison:
                     tiny_strings
         ruby-xxhash:    288960.1 i/s 
         ruby-master:    264070.2 i/s - 1.09x  slower

                    small_strings
         ruby-xxhash:    286229.0 i/s 
         ruby-master:    259941.5 i/s - 1.10x  slower

                   medium_strings
         ruby-xxhash:    283952.5 i/s 
         ruby-master:    249249.0 i/s - 1.14x  slower

                    large_strings
         ruby-xxhash:    240823.1 i/s 
         ruby-master:    116571.9 i/s - 2.07x  slower

                     huge_strings
         ruby-xxhash:      5296.5 i/s 
         ruby-master:       500.2 i/s - 10.59x  slower
```

Running something a bit more real-world:

```
% cat ~/json_parse.yml 
prelude: |
  require 'json'
  activitypub_json_txt = File.read("/Users/scott/Development/json/benchmark/data/activitypub.json")
  twitter_json_txt = File.read("/Users/scott/Development/json/benchmark/data/twitter.json")
  citm_catalog_json_txt = File.read("/Users/scott/Development/json/benchmark/data/citm_catalog.json")
  ohai_json_txt = File.read("/Users/scott/Development/json/benchmark/data/ohai.json")

benchmark:
  parse_activitypub_json: |
    JSON.parse(activitypub_json_txt)
  parse_twitter_json_txt: |
    JSON.parse(twitter_json_txt)
  parse_citm_catalog_json_txt: |
    JSON.parse(citm_catalog_json_txt)
  parse_ohai_json_txt: |
    JSON.parse(ohai_json_txt)

% benchmark-driver ~/json_parse.yml \
  -e ruby-master::~/.rubies/ruby-master/bin/ruby \
  -e ruby-xxhash::~/.rubies/ruby-xxhash/bin/ruby  \
  --output compare
Warming up --------------------------------------
     parse_activitypub_json     10.969k i/s -     12.023k times in 1.096043s (91.16μs/i)
     parse_twitter_json_txt      1.169k i/s -      1.265k times in 1.082330s (855.60μs/i)
parse_citm_catalog_json_txt     591.782 i/s -     600.000 times in 1.013887s (1.69ms/i)
        parse_ohai_json_txt     12.000k i/s -     12.782k times in 1.065168s (83.33μs/i)
Calculating -------------------------------------
                            ruby-master  ruby-xxhash 
     parse_activitypub_json     10.986k      11.071k i/s -     32.908k times in 2.995440s 2.972542s
     parse_twitter_json_txt      1.162k       1.172k i/s -      3.506k times in 3.016331s 2.991486s
parse_citm_catalog_json_txt     588.758      601.926 i/s -      1.775k times in 3.014820s 2.948868s
        parse_ohai_json_txt     10.747k      12.400k i/s -     35.999k times in 3.349753s 2.903138s

Comparison:
                  parse_activitypub_json
                ruby-xxhash:     11070.7 i/s 
                ruby-master:     10986.0 i/s - 1.01x  slower

                  parse_twitter_json_txt
                ruby-xxhash:      1172.0 i/s 
                ruby-master:      1162.3 i/s - 1.01x  slower

             parse_citm_catalog_json_txt
                ruby-xxhash:       601.9 i/s 
                ruby-master:       588.8 i/s - 1.02x  slower

                     parse_ohai_json_txt
                ruby-xxhash:     12400.0 i/s 
                ruby-master:     10746.8 i/s - 1.15x  slower
```

Admittedly, I'm not a hash expert nor a cryptographer. There doesn't seem to be any known vulnerabilities with XXH3 that I have found.




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-02-11  7:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12  3:34 [ruby-core:124478] [Ruby Misc#21833] Switch default hash from SipHash13 to XXH3? samyron (Scott Myron) via ruby-core
2026-01-12  9:15 ` [ruby-core:124481] " byroot (Jean Boussier) via ruby-core
2026-01-12 15:00 ` [ruby-core:124483] " samyron (Scott Myron) via ruby-core
2026-01-12 15:09 ` [ruby-core:124484] " samyron (Scott Myron) via ruby-core
2026-01-12 20:10 ` [ruby-core:124486] " bdewater (Bart de Water) via ruby-core
2026-01-19 23:27   ` [ruby-core:124593] " Vladimir Makarov via ruby-core
2026-01-12 20:20 ` [ruby-core:124487] " samyron (Scott Myron) via ruby-core
2026-01-13 12:50 ` [ruby-core:124515] " samyron (Scott Myron) via ruby-core
2026-02-11  3:56 ` [ruby-core:124766] " samyron (Scott Myron) via ruby-core
2026-02-11  7:21 ` [ruby-core:124767] " byroot (Jean Boussier) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).