ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:122221] [Ruby Feature#21358] Advanced filtering support for #dig
@ 2025-05-21 23:01 lovro-bikic via ruby-core
  2025-05-22 10:00 ` [ruby-core:122230] " nobu (Nobuyoshi Nakada) via ruby-core
  2025-06-05  9:08 ` [ruby-core:122441] " matz (Yukihiro Matsumoto) via ruby-core
  0 siblings, 2 replies; 3+ messages in thread
From: lovro-bikic via ruby-core @ 2025-05-21 23:01 UTC (permalink / raw)
  To: ruby-core; +Cc: lovro-bikic

Issue #21358 has been reported by lovro-bikic (Lovro Bikić).

----------------------------------------
Feature #21358: Advanced filtering support for #dig
https://bugs.ruby-lang.org/issues/21358

* Author: lovro-bikic (Lovro Bikić)
* Status: Open
----------------------------------------
Currently, `#dig` can be used to access nested data structures using "simple" keys, such as array indices or hash keys.

Real-world applications sometimes require non-trivial data access, for example, finding an item in an array based on some criteria, then returning some property of that item.

This feature request is to add support for such non-trivial access to `#dig`, concretely by allowing `Proc`s as dig keys.

## Introductory example

Given the following data structure (similar to one from [dig docs](https://docs.ruby-lang.org/en/3.4/dig_methods_rdoc.html)):
```ruby
item = {
  id: '0001',
  batters: {
    batter: [
      { id: '1001', type: 'Regular' },
      { id: '1002', type: 'Chocolate' },
      { id: '1003', type: 'Blueberry' },
      { id: '1004', type: 'Devils Food' }
    ]
  }
}
```
and the requirement "find ID of batter with type 'Chocolate'" (assuming we don't know its position in the array), currently the datum can be retrieved like so:
```ruby
item.dig(:batters, :batter)&.find { it[:type] == 'Chocolate' }&.[](:id)
# => "1002"
```

If `#dig` supported `Proc` as keys, the solution could be:
```ruby
item.dig(:batters, :batter, -> { it[:type] == 'Chocolate' }, :id)
# => "1002"
```

## Implementation

Here's a monkey patch which adds `Proc` support to `Array#dig`, to play around with:
```ruby
class Array
  alias_method :original_dig, :dig

  def dig(key, *identifiers)
    case key
    when Proc
      val = find(&key)

      identifiers.any? ? val&.dig(*identifiers) : val
    else
      original_dig(key, *identifiers)
    end
  end
end
```

This code also shows how I would define `Proc` argument behavior for arrays. The proc is called with each array item, and the *first* one for which a truthy value is evaluated is returned.

## Precedence

I see this feature similar to [JSONPath's filter selector](https://www.rfc-editor.org/rfc/rfc9535.html#name-filter-selector), where the [equivalent path](https://jsonpath.com/#eJx1jlFrwjAQx79KOAQnlFKnPliQjTn2LHs1PsT2rIEs0eQilNLvvtN2k4LCwR3_-_0uaSAgkbYV5A2grbRFyMEfiuVitoAEXKRTpI2iY4D8oEzANoFzRF8zNkr3igh96Pv27eU9pfqEYrUS4_XRFc4owvFkl-qSj5WuiD9oidVGWiEk6FJCLiHLsqmEpMv6m7wQN-qecbTtEsGrP3nay3B9-ZZ8YxWNYrxNHuKvQ_z_o0-F2VD4MBH36H39VJgPhU-8aBPEl3NMtJ2xuzaeuaD9Bf90dqM=) for the introductory example would be:
```
$.batters.batter[?(@.type == 'Chocolate')].id
```

I am not familiar with similar implementations in other programming languages, so I cannot draw parallels there.

## Real-world examples

Here's a [GitHub code search](https://github.com/search?q=%2F(%5Cw%5C%5B%5Cw%2B%3F%5C%5D%7C%5C.(fetch%7Cdig%7Cat%7C%5C%5B%5Cw%2B%3F%5C%5D)%5C(.%2B%3F%5C))%26%3F%5C.(find%7Cdetect)%20%5C%7B%2F%20lang%3Aruby&type=code&p=1) with potential candidates that could be refactored with this new feature. There are some false positives on this link, sorry about that. The regex is also probably incomplete, so there could be more results. 

I will highlight two examples from the results and how they could be refactored:
```ruby
# https://github.com/moraki-finance/ruby-experian/blob/84f7def9987b6377f4718a0730fdb564d6e9a0fb/lib/experian/trade_report.rb#L89
value_section&.find { |d| d["Tipo"] == value_name }&.dig("ListaValores", "Valor")&.find { |v| v["Periodo"] == period.to_s }&.dig("Individual")&.to_i
value_section&.dig(-> { it["Tipo"] == value_name }, "ListaValores", "Valor", -> { it["Periodo"] == period.to_s }, "Individual")&.to_i

# https://github.com/cyfronet-fid/marketplace/blob/f84947777aa79d02fb987092416a3cb143db3d01/lib/import/datasources.rb#L59
datasource_data&.dig("identifiers", "alternativeIdentifiers")&.find { |id| id["type"] == "PID" }&.[]("value")
datasource_data&.dig("identifiers", "alternativeIdentifiers", -> { it["type"] == "PID" }, "value")
```

## Final thoughts

The main benefit of this feature would be to not have to insert other methods in-between `#dig`, for example `#find` in the above examples. This could lead to cleaner code when handling complex data structures (such as returned from JSON APIs), with fewer usages of safe navigation.

On the other hand, `Proc` argument could be misinterpreted by developers to mean something else (e.g. return _all_ items for which the proc returns a truthy value). `Proc` arguments could also be seen as non-idiomatic Ruby.

It should also be clearly defined how many results such `dig`s return, just the first one for which the Proc is truthy (`#find` behavior), or all for which it's truthy (`#select` behavior). I admit at this point I'm not entirely sure, though I'm leaning towards the former (`#find`).

Final note: in this feature request I considered only `Array#dig`, but `Hash#dig` could be considered as well (the proc receives each key/value pair).




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ruby-core:122230] [Ruby Feature#21358] Advanced filtering support for #dig
  2025-05-21 23:01 [ruby-core:122221] [Ruby Feature#21358] Advanced filtering support for #dig lovro-bikic via ruby-core
@ 2025-05-22 10:00 ` nobu (Nobuyoshi Nakada) via ruby-core
  2025-06-05  9:08 ` [ruby-core:122441] " matz (Yukihiro Matsumoto) via ruby-core
  1 sibling, 0 replies; 3+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2025-05-22 10:00 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #21358 has been updated by nobu (Nobuyoshi Nakada).


You might be able to achieve similar behavior using pattern matching.

```ruby
item => batters: {batter: [*, {type: "Chocolate", id:}, *]}
id #=> "1002"
```


----------------------------------------
Feature #21358: Advanced filtering support for #dig
https://bugs.ruby-lang.org/issues/21358#change-113378

* Author: lovro-bikic (Lovro Bikić)
* Status: Open
----------------------------------------
Currently, `#dig` can be used to access nested data structures using "simple" keys, such as array indices or hash keys.

Real-world applications sometimes require non-trivial data access, for example, finding an item in an array based on some criteria, then returning some property of that item.

This feature request is to add support for such non-trivial access to `#dig`, concretely by allowing `Proc`s as dig keys.

## Introductory example

Given the following data structure (similar to one from [dig docs](https://docs.ruby-lang.org/en/3.4/dig_methods_rdoc.html)):
```ruby
item = {
  id: '0001',
  batters: {
    batter: [
      { id: '1001', type: 'Regular' },
      { id: '1002', type: 'Chocolate' },
      { id: '1003', type: 'Blueberry' },
      { id: '1004', type: 'Devils Food' }
    ]
  }
}
```
and the requirement "find ID of batter with type 'Chocolate'" (assuming we don't know its position in the array), currently the datum can be retrieved like so:
```ruby
item.dig(:batters, :batter)&.find { it[:type] == 'Chocolate' }&.[](:id)
# => "1002"
```

If `#dig` supported `Proc` as keys, the solution could be:
```ruby
item.dig(:batters, :batter, -> { it[:type] == 'Chocolate' }, :id)
# => "1002"
```

## Implementation

Here's a monkey patch which adds `Proc` support to `Array#dig`, to play around with:
```ruby
class Array
  alias_method :original_dig, :dig

  def dig(key, *identifiers)
    case key
    when Proc
      val = find(&key)

      identifiers.any? ? val&.dig(*identifiers) : val
    else
      original_dig(key, *identifiers)
    end
  end
end
```

This code also shows how I would define `Proc` argument behavior for arrays. The proc is called with each array item, and the *first* one for which a truthy value is evaluated is returned.

## Precedence

I see this feature similar to [JSONPath's filter selector](https://www.rfc-editor.org/rfc/rfc9535.html#name-filter-selector), where the [equivalent path](https://jsonpath.com/#eJx1jlFrwjAQx79KOAQnlFKnPliQjTn2LHs1PsT2rIEs0eQilNLvvtN2k4LCwR3_-_0uaSAgkbYV5A2grbRFyMEfiuVitoAEXKRTpI2iY4D8oEzANoFzRF8zNkr3igh96Pv27eU9pfqEYrUS4_XRFc4owvFkl-qSj5WuiD9oidVGWiEk6FJCLiHLsqmEpMv6m7wQN-qecbTtEsGrP3nay3B9-ZZ8YxWNYrxNHuKvQ_z_o0-F2VD4MBH36H39VJgPhU-8aBPEl3NMtJ2xuzaeuaD9Bf90dqM=) for the introductory example would be:
```
$.batters.batter[?(@.type == 'Chocolate')].id
```

I am not familiar with similar implementations in other programming languages, so I cannot draw parallels there.

## Real-world examples

Here's a [GitHub code search](https://github.com/search?q=%2F(%5Cw%5C%5B%5Cw%2B%3F%5C%5D%7C%5C.(fetch%7Cdig%7Cat%7C%5C%5B%5Cw%2B%3F%5C%5D)%5C(.%2B%3F%5C))%26%3F%5C.(find%7Cdetect)%20%5C%7B%2F%20lang%3Aruby&type=code&p=1) with potential candidates that could be refactored with this new feature. There are some false positives on this link, sorry about that. The regex is also probably incomplete, so there could be more results. 

I will highlight two examples from the results and how they could be refactored:
```ruby
# https://github.com/moraki-finance/ruby-experian/blob/84f7def9987b6377f4718a0730fdb564d6e9a0fb/lib/experian/trade_report.rb#L89
value_section&.find { |d| d["Tipo"] == value_name }&.dig("ListaValores", "Valor")&.find { |v| v["Periodo"] == period.to_s }&.dig("Individual")&.to_i
value_section&.dig(-> { it["Tipo"] == value_name }, "ListaValores", "Valor", -> { it["Periodo"] == period.to_s }, "Individual")&.to_i

# https://github.com/cyfronet-fid/marketplace/blob/f84947777aa79d02fb987092416a3cb143db3d01/lib/import/datasources.rb#L59
datasource_data&.dig("identifiers", "alternativeIdentifiers")&.find { |id| id["type"] == "PID" }&.[]("value")
datasource_data&.dig("identifiers", "alternativeIdentifiers", -> { it["type"] == "PID" }, "value")
```

## Final thoughts

The main benefit of this feature would be to not have to insert other methods in-between `#dig`, for example `#find` in the above examples. This could lead to cleaner code when handling complex data structures (such as returned from JSON APIs), with fewer usages of safe navigation.

On the other hand, `Proc` argument could be misinterpreted by developers to mean something else (e.g. return _all_ items for which the proc returns a truthy value). `Proc` arguments could also be seen as non-idiomatic Ruby.

It should also be clearly defined how many results such `dig`s return, just the first one for which the Proc is truthy (`#find` behavior), or all for which it's truthy (`#select` behavior). I admit at this point I'm not entirely sure, though I'm leaning towards the former (`#find`).

Final note: in this feature request I considered only `Array#dig`, but `Hash#dig` could be considered as well (the proc receives each key/value pair).




-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ruby-core:122441] [Ruby Feature#21358] Advanced filtering support for #dig
  2025-05-21 23:01 [ruby-core:122221] [Ruby Feature#21358] Advanced filtering support for #dig lovro-bikic via ruby-core
  2025-05-22 10:00 ` [ruby-core:122230] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2025-06-05  9:08 ` matz (Yukihiro Matsumoto) via ruby-core
  1 sibling, 0 replies; 3+ messages in thread
From: matz (Yukihiro Matsumoto) via ruby-core @ 2025-06-05  9:08 UTC (permalink / raw)
  To: ruby-core; +Cc: matz (Yukihiro Matsumoto)

Issue #21358 has been updated by matz (Yukihiro Matsumoto).


I prefer pattern matching.

Matz.


----------------------------------------
Feature #21358: Advanced filtering support for #dig
https://bugs.ruby-lang.org/issues/21358#change-113622

* Author: lovro-bikic (Lovro Bikić)
* Status: Feedback
----------------------------------------
Currently, `#dig` can be used to access nested data structures using "simple" keys, such as array indices or hash keys.

Real-world applications sometimes require non-trivial data access, for example, finding an item in an array based on some criteria, then returning some property of that item.

This feature request is to add support for such non-trivial access to `#dig`, concretely by allowing `Proc`s as dig keys.

## Introductory example

Given the following data structure (similar to one from [dig docs](https://docs.ruby-lang.org/en/3.4/dig_methods_rdoc.html)):
```ruby
item = {
  id: '0001',
  batters: {
    batter: [
      { id: '1001', type: 'Regular' },
      { id: '1002', type: 'Chocolate' },
      { id: '1003', type: 'Blueberry' },
      { id: '1004', type: 'Devils Food' }
    ]
  }
}
```
and the requirement "find ID of batter with type 'Chocolate'" (assuming we don't know its position in the array), currently the datum can be retrieved like so:
```ruby
item.dig(:batters, :batter)&.find { it[:type] == 'Chocolate' }&.[](:id)
# => "1002"
```

If `#dig` supported `Proc` as keys, the solution could be:
```ruby
item.dig(:batters, :batter, -> { it[:type] == 'Chocolate' }, :id)
# => "1002"
```

## Implementation

Here's a monkey patch which adds `Proc` support to `Array#dig`, to play around with:
```ruby
class Array
  alias_method :original_dig, :dig

  def dig(key, *identifiers)
    case key
    when Proc
      val = find(&key)

      identifiers.any? ? val&.dig(*identifiers) : val
    else
      original_dig(key, *identifiers)
    end
  end
end
```

This code also shows how I would define `Proc` argument behavior for arrays. The proc is called with each array item, and the *first* one for which a truthy value is evaluated is returned.

## Precedence

I see this feature similar to [JSONPath's filter selector](https://www.rfc-editor.org/rfc/rfc9535.html#name-filter-selector), where the [equivalent path](https://jsonpath.com/#eJx1jlFrwjAQx79KOAQnlFKnPliQjTn2LHs1PsT2rIEs0eQilNLvvtN2k4LCwR3_-_0uaSAgkbYV5A2grbRFyMEfiuVitoAEXKRTpI2iY4D8oEzANoFzRF8zNkr3igh96Pv27eU9pfqEYrUS4_XRFc4owvFkl-qSj5WuiD9oidVGWiEk6FJCLiHLsqmEpMv6m7wQN-qecbTtEsGrP3nay3B9-ZZ8YxWNYrxNHuKvQ_z_o0-F2VD4MBH36H39VJgPhU-8aBPEl3NMtJ2xuzaeuaD9Bf90dqM=) for the introductory example would be:
```
$.batters.batter[?(@.type == 'Chocolate')].id
```

I am not familiar with similar implementations in other programming languages, so I cannot draw parallels there.

## Real-world examples

Here's a [GitHub code search](https://github.com/search?q=%2F(%5Cw%5C%5B%5Cw%2B%3F%5C%5D%7C%5C.(fetch%7Cdig%7Cat%7C%5C%5B%5Cw%2B%3F%5C%5D)%5C(.%2B%3F%5C))%26%3F%5C.(find%7Cdetect)%20%5C%7B%2F%20lang%3Aruby&type=code&p=1) with potential candidates that could be refactored with this new feature. There are some false positives on this link, sorry about that. The regex is also probably incomplete, so there could be more results. 

I will highlight two examples from the results and how they could be refactored:
```ruby
# https://github.com/moraki-finance/ruby-experian/blob/84f7def9987b6377f4718a0730fdb564d6e9a0fb/lib/experian/trade_report.rb#L89
value_section&.find { |d| d["Tipo"] == value_name }&.dig("ListaValores", "Valor")&.find { |v| v["Periodo"] == period.to_s }&.dig("Individual")&.to_i
value_section&.dig(-> { it["Tipo"] == value_name }, "ListaValores", "Valor", -> { it["Periodo"] == period.to_s }, "Individual")&.to_i

# https://github.com/cyfronet-fid/marketplace/blob/f84947777aa79d02fb987092416a3cb143db3d01/lib/import/datasources.rb#L59
datasource_data&.dig("identifiers", "alternativeIdentifiers")&.find { |id| id["type"] == "PID" }&.[]("value")
datasource_data&.dig("identifiers", "alternativeIdentifiers", -> { it["type"] == "PID" }, "value")
```

## Final thoughts

The main benefit of this feature would be to not have to insert other methods in-between `#dig`, for example `#find` in the above examples. This could lead to cleaner code when handling complex data structures (such as returned from JSON APIs), with fewer usages of safe navigation.

On the other hand, `Proc` argument could be misinterpreted by developers to mean something else (e.g. return _all_ items for which the proc returns a truthy value). `Proc` arguments could also be seen as non-idiomatic Ruby.

It should also be clearly defined how many results such `dig`s return, just the first one for which the Proc is truthy (`#find` behavior), or all for which it's truthy (`#select` behavior). I admit at this point I'm not entirely sure, though I'm leaning towards the former (`#find`).

Final note: in this feature request I considered only `Array#dig`, but `Hash#dig` could be considered as well (the proc receives each key/value pair).




-- 
https://bugs.ruby-lang.org/
______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-06-05  9:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-21 23:01 [ruby-core:122221] [Ruby Feature#21358] Advanced filtering support for #dig lovro-bikic via ruby-core
2025-05-22 10:00 ` [ruby-core:122230] " nobu (Nobuyoshi Nakada) via ruby-core
2025-06-05  9:08 ` [ruby-core:122441] " matz (Yukihiro Matsumoto) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).