ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:116134] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled
@ 2024-01-10  5:55 ziggythehamster (Keith Gable) via ruby-core
  2024-01-10  9:21 ` [ruby-core:116136] " byroot (Jean Boussier) via ruby-core
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: ziggythehamster (Keith Gable) via ruby-core @ 2024-01-10  5:55 UTC (permalink / raw)
  To: ruby-core; +Cc: ziggythehamster (Keith Gable)

Issue #20174 has been reported by ziggythehamster (Keith Gable).

----------------------------------------
Bug #20174: Ruby 3.2 jit_cont_free segfault with YJIT enabled
https://bugs.ruby-lang.org/issues/20174

* Author: ziggythehamster (Keith Gable)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [x86_64-linux]
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Ruby 3.2 segfaults reproducibly for us on aarch64 (Graviton2) and x86_64 with YJIT enabled ... however all of my attempts to make a minimal reproducible test case have failed. The control frame of the segfault isn't consistent, but the C backtrace is. It doesn't occur in 3.3, so I used the backtrace to find the function (jit_cont_free) and compare it between 3.2 and 3.3. The only change that seemed like a plausible candidate was a one-line guard added by k0kubun in e07e9f8491d9ab8b22d2bdf6a8aeba834dac7eef, so I added .patch to the end of the URL on GitHub and added that as a patch against 3.2. This resolved the problem. I would therefore suggest backporting that change from 3.3 to 3.2 :).

The change that triggered this is a two line change to a gemspec (that we need to refactor) that we made because Rake released a new version, which I will censor and annotate below:

```
# frozen_string_literal: true

$LOAD_PATH.push File.expand_path('lib', __dir__)

require 'rubygems/dependency_installer'
# before: Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake'))
# after:
Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake', '~> 13.1.0'))

require 'rake/file_list'

Gem::Specification.new do |spec|
  spec.name        = 'censored'
  spec.version     = '0.0.1.pre'
  spec.author      = 'censored'
  spec.email       = 'censored'

  spec.summary     = 'censored'
  spec.description = 'censored'
  spec.homepage    = 'censored'
  spec.license     = 'All rights reserved'

  spec.required_ruby_version = '>= 2.6.0'

  spec.metadata['homepage_uri'] = spec.homepage

  gitignore  = File.read('.gitignore').lines.reject { |l| l.match?(/\A\s*#/) || l.match?(/\A\s*\z/) }.map(&:chomp)
  spec.files = Rake::FileList.new('\.[a-zA-Z0-9]*', '\.[a-zA-Z0-9]*/*', '**/*')
                             .exclude(gitignore)
                             .reject { |f| File.directory?(f) || f.match(%r{\A(test|spec|features|vendor|.git|.bundle)/}) }

  to_include = gitignore.select { |l| l.match?(/\A\s*!/) }.map { |l| l.delete_prefix('!') }
  spec.files += Rake::FileList.new(to_include).reject { |f| File.directory?(f) }

  spec.require_paths = ['lib']

  spec.add_development_dependency 'awesome_print',         '~> 1.8.0'
  spec.add_development_dependency 'pry',                   '~> 0.14.2'
  # before:   spec.add_development_dependency 'rake',                  '~> 13.0.1'
  # after:
  spec.add_development_dependency 'rake',                  '~> 13.1.0'
  spec.add_development_dependency 'rdoc',                  '~> 6.3.1'
  spec.add_development_dependency 'rspec',                 '~> 3.11.0'
  spec.add_development_dependency 'rspec_junit_formatter', '~> 0.5.1'
  spec.add_development_dependency 'rubocop',               '~> 1.39.0'
  spec.add_development_dependency 'rubocop-packaging',     '~> 0.5.2'
  spec.add_development_dependency 'rubocop-rake',          '~> 0.6.0'
  spec.add_development_dependency 'rubocop-rspec',         '~> 2.12.1'
  spec.add_development_dependency 'simplecov',             '~> 0.21.2'
  spec.add_development_dependency 'simplecov-cobertura',   '~> 2.1.0'
  spec.add_development_dependency 'yard',                  '~> 0.9.25'

  spec.add_runtime_dependency 'activesupport',      '>= 5.1.7', '< 8'
  spec.add_runtime_dependency 'censored-m',         '~> 0.1.72'
  spec.add_runtime_dependency 'censored-r',         '~> 0.1.175'
  spec.add_runtime_dependency 'aws-sdk-athena',     '~> 1.43'
  spec.add_runtime_dependency 'aws-sdk-cloudwatch', '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-core',       '~> 3.122'
  spec.add_runtime_dependency 'aws-sdk-dynamodb',   '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-firehose',   '~> 1.1'
  spec.add_runtime_dependency 'aws-sdk-glue',       '~> 1.108'
  spec.add_runtime_dependency 'aws-sdk-kinesis',    '~> 1.13'
  spec.add_runtime_dependency 'aws-sdk-redshift',   '~> 1.2'
  spec.add_runtime_dependency 'aws-sdk-s3',         '~> 1.9'
  spec.add_runtime_dependency 'aws-sdk-sns',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-sqs',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-ssm',        '~> 1.76'
  spec.add_runtime_dependency 'concurrent-ruby',    '>= 1.1.5'
  spec.add_runtime_dependency 'dry-configurable',   '~> 0.13'
end

```

However, I cannot turn this gemspec into a reproducer on my end. The smallest change makes the segfault go away.

To avoid a gigantic issue description, I have attached the censored segfault backtraces and the RbConfig from the x86_64 build (since I'm compiling my own Ruby). I'll note again that while aarch64 and x86_64 appear to have failed while doing something in optparse this time, it appears random (or more likely: GC pressure related). I've also seen it fail when doing a require_relative much earlier. It always fails with the same C backtrace.

I have absolutely no idea why `cont` might be NULL. The backtrace shows it is called by `cont_free`, which has a `VM_ASSERT` for detecting this condition. Obviously, there must be some situation where jit_cont becomes NULL due to YJIT, but I have no idea what that situation is.

In case someone thinks this might be compiler/compiler option related, I am using the following:

* Amazon Linux 2
* LLVM/Clang 11 except that because Amazon Linux doesn't ship lld, we are using `gcc10-ld.gold` as the linker
* rustc 1.68.2 (9eb3afe9e 2023-03-27) (Amazon Linux 1.68.2-1.amzn2.0.3)
* OpenSSL 3.0.12 (self-compiled with corp-dictated hardened configuration options ... I can share if someone thinks this is relevant)
* `extra_warnflags="-Wno-address-of-packed-member -Wno-declaration-after-statement -Wno-register"`
* aarch64: `optflags="-O3 -mcpu=neoverse-n1"`
* x86_64: `optflags="-O3 -march=sandybridge"`

---Files--------------------------------
bt_aarch64.txt (34.5 KB)
bt_x86_64.txt (30.3 KB)
rbconfig_x86_64.txt (9.36 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:116136] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled
  2024-01-10  5:55 [ruby-core:116134] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled ziggythehamster (Keith Gable) via ruby-core
@ 2024-01-10  9:21 ` byroot (Jean Boussier) via ruby-core
  2024-01-10 18:55 ` [ruby-core:116156] " ziggythehamster (Keith Gable) via ruby-core
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-01-10  9:21 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20174 has been updated by byroot (Jean Boussier).

Status changed from Open to Closed
Backport changed from 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: DONTNEED

Thanks for the report. Editing the issue to mark this commit for backport.


Commit to backport: `e07e9f8491d9ab8b22d2bdf6a8aeba834dac7eef`

----------------------------------------
Bug #20174: Ruby 3.2 jit_cont_free segfault with YJIT enabled
https://bugs.ruby-lang.org/issues/20174#change-106142

* Author: ziggythehamster (Keith Gable)
* Status: Closed
* Priority: Normal
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [x86_64-linux]
* Backport: 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: DONTNEED
----------------------------------------
Ruby 3.2 segfaults reproducibly for us on aarch64 (Graviton2) and x86_64 with YJIT enabled ... however all of my attempts to make a minimal reproducible test case have failed. The control frame of the segfault isn't consistent, but the C backtrace is. It doesn't occur in 3.3, so I used the backtrace to find the function (jit_cont_free) and compare it between 3.2 and 3.3. The only change that seemed like a plausible candidate was a one-line guard added by k0kubun in e07e9f8491d9ab8b22d2bdf6a8aeba834dac7eef, so I added .patch to the end of the URL on GitHub and added that as a patch against 3.2. This resolved the problem. I would therefore suggest backporting that change from 3.3 to 3.2 :).

The change that triggered this is a two line change to a gemspec (that we need to refactor) that we made because Rake released a new version, which I will censor and annotate below:

```
# frozen_string_literal: true

$LOAD_PATH.push File.expand_path('lib', __dir__)

require 'rubygems/dependency_installer'
# before: Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake'))
# after:
Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake', '~> 13.1.0'))

require 'rake/file_list'

Gem::Specification.new do |spec|
  spec.name        = 'censored'
  spec.version     = '0.0.1.pre'
  spec.author      = 'censored'
  spec.email       = 'censored'

  spec.summary     = 'censored'
  spec.description = 'censored'
  spec.homepage    = 'censored'
  spec.license     = 'All rights reserved'

  spec.required_ruby_version = '>= 2.6.0'

  spec.metadata['homepage_uri'] = spec.homepage

  gitignore  = File.read('.gitignore').lines.reject { |l| l.match?(/\A\s*#/) || l.match?(/\A\s*\z/) }.map(&:chomp)
  spec.files = Rake::FileList.new('\.[a-zA-Z0-9]*', '\.[a-zA-Z0-9]*/*', '**/*')
                             .exclude(gitignore)
                             .reject { |f| File.directory?(f) || f.match(%r{\A(test|spec|features|vendor|.git|.bundle)/}) }

  to_include = gitignore.select { |l| l.match?(/\A\s*!/) }.map { |l| l.delete_prefix('!') }
  spec.files += Rake::FileList.new(to_include).reject { |f| File.directory?(f) }

  spec.require_paths = ['lib']

  spec.add_development_dependency 'awesome_print',         '~> 1.8.0'
  spec.add_development_dependency 'pry',                   '~> 0.14.2'
  # before:   spec.add_development_dependency 'rake',                  '~> 13.0.1'
  # after:
  spec.add_development_dependency 'rake',                  '~> 13.1.0'
  spec.add_development_dependency 'rdoc',                  '~> 6.3.1'
  spec.add_development_dependency 'rspec',                 '~> 3.11.0'
  spec.add_development_dependency 'rspec_junit_formatter', '~> 0.5.1'
  spec.add_development_dependency 'rubocop',               '~> 1.39.0'
  spec.add_development_dependency 'rubocop-packaging',     '~> 0.5.2'
  spec.add_development_dependency 'rubocop-rake',          '~> 0.6.0'
  spec.add_development_dependency 'rubocop-rspec',         '~> 2.12.1'
  spec.add_development_dependency 'simplecov',             '~> 0.21.2'
  spec.add_development_dependency 'simplecov-cobertura',   '~> 2.1.0'
  spec.add_development_dependency 'yard',                  '~> 0.9.25'

  spec.add_runtime_dependency 'activesupport',      '>= 5.1.7', '< 8'
  spec.add_runtime_dependency 'censored-m',         '~> 0.1.72'
  spec.add_runtime_dependency 'censored-r',         '~> 0.1.175'
  spec.add_runtime_dependency 'aws-sdk-athena',     '~> 1.43'
  spec.add_runtime_dependency 'aws-sdk-cloudwatch', '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-core',       '~> 3.122'
  spec.add_runtime_dependency 'aws-sdk-dynamodb',   '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-firehose',   '~> 1.1'
  spec.add_runtime_dependency 'aws-sdk-glue',       '~> 1.108'
  spec.add_runtime_dependency 'aws-sdk-kinesis',    '~> 1.13'
  spec.add_runtime_dependency 'aws-sdk-redshift',   '~> 1.2'
  spec.add_runtime_dependency 'aws-sdk-s3',         '~> 1.9'
  spec.add_runtime_dependency 'aws-sdk-sns',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-sqs',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-ssm',        '~> 1.76'
  spec.add_runtime_dependency 'concurrent-ruby',    '>= 1.1.5'
  spec.add_runtime_dependency 'dry-configurable',   '~> 0.13'
end

```

However, I cannot turn this gemspec into a reproducer on my end. The smallest change makes the segfault go away.

To avoid a gigantic issue description, I have attached the censored segfault backtraces and the RbConfig from the x86_64 build (since I'm compiling my own Ruby). I'll note again that while aarch64 and x86_64 appear to have failed while doing something in optparse this time, it appears random (or more likely: GC pressure related). I've also seen it fail when doing a require_relative much earlier. It always fails with the same C backtrace.

I have absolutely no idea why `cont` might be NULL. The backtrace shows it is called by `cont_free`, which has a `VM_ASSERT` for detecting this condition. Obviously, there must be some situation where jit_cont becomes NULL due to YJIT, but I have no idea what that situation is.

In case someone thinks this might be compiler/compiler option related, I am using the following:

* Amazon Linux 2
* LLVM/Clang 11 except that because Amazon Linux doesn't ship lld, we are using `gcc10-ld.gold` as the linker
* rustc 1.68.2 (9eb3afe9e 2023-03-27) (Amazon Linux 1.68.2-1.amzn2.0.3)
* OpenSSL 3.0.12 (self-compiled with corp-dictated hardened configuration options ... I can share if someone thinks this is relevant)
* `extra_warnflags="-Wno-address-of-packed-member -Wno-declaration-after-statement -Wno-register"`
* aarch64: `optflags="-O3 -mcpu=neoverse-n1"`
* x86_64: `optflags="-O3 -march=sandybridge"`

---Files--------------------------------
bt_aarch64.txt (34.5 KB)
bt_x86_64.txt (30.3 KB)
rbconfig_x86_64.txt (9.36 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:116156] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled
  2024-01-10  5:55 [ruby-core:116134] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled ziggythehamster (Keith Gable) via ruby-core
  2024-01-10  9:21 ` [ruby-core:116136] " byroot (Jean Boussier) via ruby-core
@ 2024-01-10 18:55 ` ziggythehamster (Keith Gable) via ruby-core
  2024-01-10 18:56 ` [ruby-core:116157] " byroot (Jean Boussier) via ruby-core
  2024-01-18  3:20 ` [ruby-core:116285] " nagachika (Tomoyuki Chikanaga) via ruby-core
  3 siblings, 0 replies; 5+ messages in thread
From: ziggythehamster (Keith Gable) via ruby-core @ 2024-01-10 18:55 UTC (permalink / raw)
  To: ruby-core; +Cc: ziggythehamster (Keith Gable)

Issue #20174 has been updated by ziggythehamster (Keith Gable).


byroot (Jean Boussier) wrote in #note-1:
> Thanks for the report. Editing the issue to mark this commit for backport.
> 
> 
> Commit to backport: `e07e9f8491d9ab8b22d2bdf6a8aeba834dac7eef`

This being my first bug - did you mean to make it status Closed?

----------------------------------------
Bug #20174: Ruby 3.2 jit_cont_free segfault with YJIT enabled
https://bugs.ruby-lang.org/issues/20174#change-106161

* Author: ziggythehamster (Keith Gable)
* Status: Closed
* Priority: Normal
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [x86_64-linux]
* Backport: 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: DONTNEED
----------------------------------------
Ruby 3.2 segfaults reproducibly for us on aarch64 (Graviton2) and x86_64 with YJIT enabled ... however all of my attempts to make a minimal reproducible test case have failed. The control frame of the segfault isn't consistent, but the C backtrace is. It doesn't occur in 3.3, so I used the backtrace to find the function (jit_cont_free) and compare it between 3.2 and 3.3. The only change that seemed like a plausible candidate was a one-line guard added by k0kubun in e07e9f8491d9ab8b22d2bdf6a8aeba834dac7eef, so I added .patch to the end of the URL on GitHub and added that as a patch against 3.2. This resolved the problem. I would therefore suggest backporting that change from 3.3 to 3.2 :).

The change that triggered this is a two line change to a gemspec (that we need to refactor) that we made because Rake released a new version, which I will censor and annotate below:

```
# frozen_string_literal: true

$LOAD_PATH.push File.expand_path('lib', __dir__)

require 'rubygems/dependency_installer'
# before: Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake'))
# after:
Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake', '~> 13.1.0'))

require 'rake/file_list'

Gem::Specification.new do |spec|
  spec.name        = 'censored'
  spec.version     = '0.0.1.pre'
  spec.author      = 'censored'
  spec.email       = 'censored'

  spec.summary     = 'censored'
  spec.description = 'censored'
  spec.homepage    = 'censored'
  spec.license     = 'All rights reserved'

  spec.required_ruby_version = '>= 2.6.0'

  spec.metadata['homepage_uri'] = spec.homepage

  gitignore  = File.read('.gitignore').lines.reject { |l| l.match?(/\A\s*#/) || l.match?(/\A\s*\z/) }.map(&:chomp)
  spec.files = Rake::FileList.new('\.[a-zA-Z0-9]*', '\.[a-zA-Z0-9]*/*', '**/*')
                             .exclude(gitignore)
                             .reject { |f| File.directory?(f) || f.match(%r{\A(test|spec|features|vendor|.git|.bundle)/}) }

  to_include = gitignore.select { |l| l.match?(/\A\s*!/) }.map { |l| l.delete_prefix('!') }
  spec.files += Rake::FileList.new(to_include).reject { |f| File.directory?(f) }

  spec.require_paths = ['lib']

  spec.add_development_dependency 'awesome_print',         '~> 1.8.0'
  spec.add_development_dependency 'pry',                   '~> 0.14.2'
  # before:   spec.add_development_dependency 'rake',                  '~> 13.0.1'
  # after:
  spec.add_development_dependency 'rake',                  '~> 13.1.0'
  spec.add_development_dependency 'rdoc',                  '~> 6.3.1'
  spec.add_development_dependency 'rspec',                 '~> 3.11.0'
  spec.add_development_dependency 'rspec_junit_formatter', '~> 0.5.1'
  spec.add_development_dependency 'rubocop',               '~> 1.39.0'
  spec.add_development_dependency 'rubocop-packaging',     '~> 0.5.2'
  spec.add_development_dependency 'rubocop-rake',          '~> 0.6.0'
  spec.add_development_dependency 'rubocop-rspec',         '~> 2.12.1'
  spec.add_development_dependency 'simplecov',             '~> 0.21.2'
  spec.add_development_dependency 'simplecov-cobertura',   '~> 2.1.0'
  spec.add_development_dependency 'yard',                  '~> 0.9.25'

  spec.add_runtime_dependency 'activesupport',      '>= 5.1.7', '< 8'
  spec.add_runtime_dependency 'censored-m',         '~> 0.1.72'
  spec.add_runtime_dependency 'censored-r',         '~> 0.1.175'
  spec.add_runtime_dependency 'aws-sdk-athena',     '~> 1.43'
  spec.add_runtime_dependency 'aws-sdk-cloudwatch', '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-core',       '~> 3.122'
  spec.add_runtime_dependency 'aws-sdk-dynamodb',   '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-firehose',   '~> 1.1'
  spec.add_runtime_dependency 'aws-sdk-glue',       '~> 1.108'
  spec.add_runtime_dependency 'aws-sdk-kinesis',    '~> 1.13'
  spec.add_runtime_dependency 'aws-sdk-redshift',   '~> 1.2'
  spec.add_runtime_dependency 'aws-sdk-s3',         '~> 1.9'
  spec.add_runtime_dependency 'aws-sdk-sns',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-sqs',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-ssm',        '~> 1.76'
  spec.add_runtime_dependency 'concurrent-ruby',    '>= 1.1.5'
  spec.add_runtime_dependency 'dry-configurable',   '~> 0.13'
end

```

However, I cannot turn this gemspec into a reproducer on my end. The smallest change makes the segfault go away.

To avoid a gigantic issue description, I have attached the censored segfault backtraces and the RbConfig from the x86_64 build (since I'm compiling my own Ruby). I'll note again that while aarch64 and x86_64 appear to have failed while doing something in optparse this time, it appears random (or more likely: GC pressure related). I've also seen it fail when doing a require_relative much earlier. It always fails with the same C backtrace.

I have absolutely no idea why `cont` might be NULL. The backtrace shows it is called by `cont_free`, which has a `VM_ASSERT` for detecting this condition. Obviously, there must be some situation where jit_cont becomes NULL due to YJIT, but I have no idea what that situation is.

In case someone thinks this might be compiler/compiler option related, I am using the following:

* Amazon Linux 2
* LLVM/Clang 11 except that because Amazon Linux doesn't ship lld, we are using `gcc10-ld.gold` as the linker
* rustc 1.68.2 (9eb3afe9e 2023-03-27) (Amazon Linux 1.68.2-1.amzn2.0.3)
* OpenSSL 3.0.12 (self-compiled with corp-dictated hardened configuration options ... I can share if someone thinks this is relevant)
* `extra_warnflags="-Wno-address-of-packed-member -Wno-declaration-after-statement -Wno-register"`
* aarch64: `optflags="-O3 -mcpu=neoverse-n1"`
* x86_64: `optflags="-O3 -march=sandybridge"`

---Files--------------------------------
bt_aarch64.txt (34.5 KB)
bt_x86_64.txt (30.3 KB)
rbconfig_x86_64.txt (9.36 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:116157] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled
  2024-01-10  5:55 [ruby-core:116134] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled ziggythehamster (Keith Gable) via ruby-core
  2024-01-10  9:21 ` [ruby-core:116136] " byroot (Jean Boussier) via ruby-core
  2024-01-10 18:55 ` [ruby-core:116156] " ziggythehamster (Keith Gable) via ruby-core
@ 2024-01-10 18:56 ` byroot (Jean Boussier) via ruby-core
  2024-01-18  3:20 ` [ruby-core:116285] " nagachika (Tomoyuki Chikanaga) via ruby-core
  3 siblings, 0 replies; 5+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-01-10 18:56 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20174 has been updated by byroot (Jean Boussier).


Yes, it's how you mark a commit for backport (closed ticket with the backport field filled)

----------------------------------------
Bug #20174: Ruby 3.2 jit_cont_free segfault with YJIT enabled
https://bugs.ruby-lang.org/issues/20174#change-106162

* Author: ziggythehamster (Keith Gable)
* Status: Closed
* Priority: Normal
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [x86_64-linux]
* Backport: 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: DONTNEED
----------------------------------------
Ruby 3.2 segfaults reproducibly for us on aarch64 (Graviton2) and x86_64 with YJIT enabled ... however all of my attempts to make a minimal reproducible test case have failed. The control frame of the segfault isn't consistent, but the C backtrace is. It doesn't occur in 3.3, so I used the backtrace to find the function (jit_cont_free) and compare it between 3.2 and 3.3. The only change that seemed like a plausible candidate was a one-line guard added by k0kubun in e07e9f8491d9ab8b22d2bdf6a8aeba834dac7eef, so I added .patch to the end of the URL on GitHub and added that as a patch against 3.2. This resolved the problem. I would therefore suggest backporting that change from 3.3 to 3.2 :).

The change that triggered this is a two line change to a gemspec (that we need to refactor) that we made because Rake released a new version, which I will censor and annotate below:

```
# frozen_string_literal: true

$LOAD_PATH.push File.expand_path('lib', __dir__)

require 'rubygems/dependency_installer'
# before: Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake'))
# after:
Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake', '~> 13.1.0'))

require 'rake/file_list'

Gem::Specification.new do |spec|
  spec.name        = 'censored'
  spec.version     = '0.0.1.pre'
  spec.author      = 'censored'
  spec.email       = 'censored'

  spec.summary     = 'censored'
  spec.description = 'censored'
  spec.homepage    = 'censored'
  spec.license     = 'All rights reserved'

  spec.required_ruby_version = '>= 2.6.0'

  spec.metadata['homepage_uri'] = spec.homepage

  gitignore  = File.read('.gitignore').lines.reject { |l| l.match?(/\A\s*#/) || l.match?(/\A\s*\z/) }.map(&:chomp)
  spec.files = Rake::FileList.new('\.[a-zA-Z0-9]*', '\.[a-zA-Z0-9]*/*', '**/*')
                             .exclude(gitignore)
                             .reject { |f| File.directory?(f) || f.match(%r{\A(test|spec|features|vendor|.git|.bundle)/}) }

  to_include = gitignore.select { |l| l.match?(/\A\s*!/) }.map { |l| l.delete_prefix('!') }
  spec.files += Rake::FileList.new(to_include).reject { |f| File.directory?(f) }

  spec.require_paths = ['lib']

  spec.add_development_dependency 'awesome_print',         '~> 1.8.0'
  spec.add_development_dependency 'pry',                   '~> 0.14.2'
  # before:   spec.add_development_dependency 'rake',                  '~> 13.0.1'
  # after:
  spec.add_development_dependency 'rake',                  '~> 13.1.0'
  spec.add_development_dependency 'rdoc',                  '~> 6.3.1'
  spec.add_development_dependency 'rspec',                 '~> 3.11.0'
  spec.add_development_dependency 'rspec_junit_formatter', '~> 0.5.1'
  spec.add_development_dependency 'rubocop',               '~> 1.39.0'
  spec.add_development_dependency 'rubocop-packaging',     '~> 0.5.2'
  spec.add_development_dependency 'rubocop-rake',          '~> 0.6.0'
  spec.add_development_dependency 'rubocop-rspec',         '~> 2.12.1'
  spec.add_development_dependency 'simplecov',             '~> 0.21.2'
  spec.add_development_dependency 'simplecov-cobertura',   '~> 2.1.0'
  spec.add_development_dependency 'yard',                  '~> 0.9.25'

  spec.add_runtime_dependency 'activesupport',      '>= 5.1.7', '< 8'
  spec.add_runtime_dependency 'censored-m',         '~> 0.1.72'
  spec.add_runtime_dependency 'censored-r',         '~> 0.1.175'
  spec.add_runtime_dependency 'aws-sdk-athena',     '~> 1.43'
  spec.add_runtime_dependency 'aws-sdk-cloudwatch', '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-core',       '~> 3.122'
  spec.add_runtime_dependency 'aws-sdk-dynamodb',   '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-firehose',   '~> 1.1'
  spec.add_runtime_dependency 'aws-sdk-glue',       '~> 1.108'
  spec.add_runtime_dependency 'aws-sdk-kinesis',    '~> 1.13'
  spec.add_runtime_dependency 'aws-sdk-redshift',   '~> 1.2'
  spec.add_runtime_dependency 'aws-sdk-s3',         '~> 1.9'
  spec.add_runtime_dependency 'aws-sdk-sns',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-sqs',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-ssm',        '~> 1.76'
  spec.add_runtime_dependency 'concurrent-ruby',    '>= 1.1.5'
  spec.add_runtime_dependency 'dry-configurable',   '~> 0.13'
end

```

However, I cannot turn this gemspec into a reproducer on my end. The smallest change makes the segfault go away.

To avoid a gigantic issue description, I have attached the censored segfault backtraces and the RbConfig from the x86_64 build (since I'm compiling my own Ruby). I'll note again that while aarch64 and x86_64 appear to have failed while doing something in optparse this time, it appears random (or more likely: GC pressure related). I've also seen it fail when doing a require_relative much earlier. It always fails with the same C backtrace.

I have absolutely no idea why `cont` might be NULL. The backtrace shows it is called by `cont_free`, which has a `VM_ASSERT` for detecting this condition. Obviously, there must be some situation where jit_cont becomes NULL due to YJIT, but I have no idea what that situation is.

In case someone thinks this might be compiler/compiler option related, I am using the following:

* Amazon Linux 2
* LLVM/Clang 11 except that because Amazon Linux doesn't ship lld, we are using `gcc10-ld.gold` as the linker
* rustc 1.68.2 (9eb3afe9e 2023-03-27) (Amazon Linux 1.68.2-1.amzn2.0.3)
* OpenSSL 3.0.12 (self-compiled with corp-dictated hardened configuration options ... I can share if someone thinks this is relevant)
* `extra_warnflags="-Wno-address-of-packed-member -Wno-declaration-after-statement -Wno-register"`
* aarch64: `optflags="-O3 -mcpu=neoverse-n1"`
* x86_64: `optflags="-O3 -march=sandybridge"`

---Files--------------------------------
bt_aarch64.txt (34.5 KB)
bt_x86_64.txt (30.3 KB)
rbconfig_x86_64.txt (9.36 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:116285] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled
  2024-01-10  5:55 [ruby-core:116134] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled ziggythehamster (Keith Gable) via ruby-core
                   ` (2 preceding siblings ...)
  2024-01-10 18:56 ` [ruby-core:116157] " byroot (Jean Boussier) via ruby-core
@ 2024-01-18  3:20 ` nagachika (Tomoyuki Chikanaga) via ruby-core
  3 siblings, 0 replies; 5+ messages in thread
From: nagachika (Tomoyuki Chikanaga) via ruby-core @ 2024-01-18  3:20 UTC (permalink / raw)
  To: ruby-core; +Cc: nagachika (Tomoyuki Chikanaga)

Issue #20174 has been updated by nagachika (Tomoyuki Chikanaga).

Backport changed from 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED, 3.3: DONTNEED to 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE, 3.3: DONTNEED

ruby_3_2 3302e251dccec1e981945ab19d316d0856c68bf6 merged revision(s) e07e9f8491d9ab8b22d2bdf6a8aeba834dac7eef.

----------------------------------------
Bug #20174: Ruby 3.2 jit_cont_free segfault with YJIT enabled
https://bugs.ruby-lang.org/issues/20174#change-106307

* Author: ziggythehamster (Keith Gable)
* Status: Closed
* Priority: Normal
* ruby -v: ruby 3.2.2 (2023-03-30 revision e51014f9c0) +YJIT [x86_64-linux]
* Backport: 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE, 3.3: DONTNEED
----------------------------------------
Ruby 3.2 segfaults reproducibly for us on aarch64 (Graviton2) and x86_64 with YJIT enabled ... however all of my attempts to make a minimal reproducible test case have failed. The control frame of the segfault isn't consistent, but the C backtrace is. It doesn't occur in 3.3, so I used the backtrace to find the function (jit_cont_free) and compare it between 3.2 and 3.3. The only change that seemed like a plausible candidate was a one-line guard added by k0kubun in e07e9f8491d9ab8b22d2bdf6a8aeba834dac7eef, so I added .patch to the end of the URL on GitHub and added that as a patch against 3.2. This resolved the problem. I would therefore suggest backporting that change from 3.3 to 3.2 :).

The change that triggered this is a two line change to a gemspec (that we need to refactor) that we made because Rake released a new version, which I will censor and annotate below:

```
# frozen_string_literal: true

$LOAD_PATH.push File.expand_path('lib', __dir__)

require 'rubygems/dependency_installer'
# before: Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake'))
# after:
Gem::DependencyInstaller.new.install(Gem::Dependency.new('rake', '~> 13.1.0'))

require 'rake/file_list'

Gem::Specification.new do |spec|
  spec.name        = 'censored'
  spec.version     = '0.0.1.pre'
  spec.author      = 'censored'
  spec.email       = 'censored'

  spec.summary     = 'censored'
  spec.description = 'censored'
  spec.homepage    = 'censored'
  spec.license     = 'All rights reserved'

  spec.required_ruby_version = '>= 2.6.0'

  spec.metadata['homepage_uri'] = spec.homepage

  gitignore  = File.read('.gitignore').lines.reject { |l| l.match?(/\A\s*#/) || l.match?(/\A\s*\z/) }.map(&:chomp)
  spec.files = Rake::FileList.new('\.[a-zA-Z0-9]*', '\.[a-zA-Z0-9]*/*', '**/*')
                             .exclude(gitignore)
                             .reject { |f| File.directory?(f) || f.match(%r{\A(test|spec|features|vendor|.git|.bundle)/}) }

  to_include = gitignore.select { |l| l.match?(/\A\s*!/) }.map { |l| l.delete_prefix('!') }
  spec.files += Rake::FileList.new(to_include).reject { |f| File.directory?(f) }

  spec.require_paths = ['lib']

  spec.add_development_dependency 'awesome_print',         '~> 1.8.0'
  spec.add_development_dependency 'pry',                   '~> 0.14.2'
  # before:   spec.add_development_dependency 'rake',                  '~> 13.0.1'
  # after:
  spec.add_development_dependency 'rake',                  '~> 13.1.0'
  spec.add_development_dependency 'rdoc',                  '~> 6.3.1'
  spec.add_development_dependency 'rspec',                 '~> 3.11.0'
  spec.add_development_dependency 'rspec_junit_formatter', '~> 0.5.1'
  spec.add_development_dependency 'rubocop',               '~> 1.39.0'
  spec.add_development_dependency 'rubocop-packaging',     '~> 0.5.2'
  spec.add_development_dependency 'rubocop-rake',          '~> 0.6.0'
  spec.add_development_dependency 'rubocop-rspec',         '~> 2.12.1'
  spec.add_development_dependency 'simplecov',             '~> 0.21.2'
  spec.add_development_dependency 'simplecov-cobertura',   '~> 2.1.0'
  spec.add_development_dependency 'yard',                  '~> 0.9.25'

  spec.add_runtime_dependency 'activesupport',      '>= 5.1.7', '< 8'
  spec.add_runtime_dependency 'censored-m',         '~> 0.1.72'
  spec.add_runtime_dependency 'censored-r',         '~> 0.1.175'
  spec.add_runtime_dependency 'aws-sdk-athena',     '~> 1.43'
  spec.add_runtime_dependency 'aws-sdk-cloudwatch', '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-core',       '~> 3.122'
  spec.add_runtime_dependency 'aws-sdk-dynamodb',   '~> 1.5'
  spec.add_runtime_dependency 'aws-sdk-firehose',   '~> 1.1'
  spec.add_runtime_dependency 'aws-sdk-glue',       '~> 1.108'
  spec.add_runtime_dependency 'aws-sdk-kinesis',    '~> 1.13'
  spec.add_runtime_dependency 'aws-sdk-redshift',   '~> 1.2'
  spec.add_runtime_dependency 'aws-sdk-s3',         '~> 1.9'
  spec.add_runtime_dependency 'aws-sdk-sns',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-sqs',        '~> 1.3'
  spec.add_runtime_dependency 'aws-sdk-ssm',        '~> 1.76'
  spec.add_runtime_dependency 'concurrent-ruby',    '>= 1.1.5'
  spec.add_runtime_dependency 'dry-configurable',   '~> 0.13'
end

```

However, I cannot turn this gemspec into a reproducer on my end. The smallest change makes the segfault go away.

To avoid a gigantic issue description, I have attached the censored segfault backtraces and the RbConfig from the x86_64 build (since I'm compiling my own Ruby). I'll note again that while aarch64 and x86_64 appear to have failed while doing something in optparse this time, it appears random (or more likely: GC pressure related). I've also seen it fail when doing a require_relative much earlier. It always fails with the same C backtrace.

I have absolutely no idea why `cont` might be NULL. The backtrace shows it is called by `cont_free`, which has a `VM_ASSERT` for detecting this condition. Obviously, there must be some situation where jit_cont becomes NULL due to YJIT, but I have no idea what that situation is.

In case someone thinks this might be compiler/compiler option related, I am using the following:

* Amazon Linux 2
* LLVM/Clang 11 except that because Amazon Linux doesn't ship lld, we are using `gcc10-ld.gold` as the linker
* rustc 1.68.2 (9eb3afe9e 2023-03-27) (Amazon Linux 1.68.2-1.amzn2.0.3)
* OpenSSL 3.0.12 (self-compiled with corp-dictated hardened configuration options ... I can share if someone thinks this is relevant)
* `extra_warnflags="-Wno-address-of-packed-member -Wno-declaration-after-statement -Wno-register"`
* aarch64: `optflags="-O3 -mcpu=neoverse-n1"`
* x86_64: `optflags="-O3 -march=sandybridge"`

---Files--------------------------------
bt_aarch64.txt (34.5 KB)
bt_x86_64.txt (30.3 KB)
rbconfig_x86_64.txt (9.36 KB)


-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-18  3:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-10  5:55 [ruby-core:116134] [Ruby master Bug#20174] Ruby 3.2 jit_cont_free segfault with YJIT enabled ziggythehamster (Keith Gable) via ruby-core
2024-01-10  9:21 ` [ruby-core:116136] " byroot (Jean Boussier) via ruby-core
2024-01-10 18:55 ` [ruby-core:116156] " ziggythehamster (Keith Gable) via ruby-core
2024-01-10 18:56 ` [ruby-core:116157] " byroot (Jean Boussier) via ruby-core
2024-01-18  3:20 ` [ruby-core:116285] " nagachika (Tomoyuki Chikanaga) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).