From: "tenderlovemaking (Aaron Patterson) via ruby-core" <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: "tenderlovemaking (Aaron Patterson)" <noreply@ruby-lang.org>
Subject: [ruby-core:120559] [Ruby master Feature#21005] Update the source location method to include line start/stop and column start/stop details
Date: Wed, 08 Jan 2025 17:22:04 +0000 (UTC) [thread overview]
Message-ID: <redmine.journal-111369.20250108172204.40939@ruby-lang.org> (raw)
In-Reply-To: <redmine.issue-21005.20250105210016.40939@ruby-lang.org>
Issue #21005 has been updated by tenderlovemaking (Aaron Patterson).
Eregon (Benoit Daloze) wrote in #note-14:
> Maybe CRuby does not currently preserve the information of end line and start/end column for procs and methods?
I think we do, but I can investigate. IIRC it's on the InstructionSequence object (so we'd still have to use RubyVM::InstructionSequence on CRuby).
> For `def` it would be trivial to preserve it but I guess for blocks and `define_method` it might be trickier.
> For such cases `source_location` could internally use the `node_id` stuff if that's easier or deemed a better trade-off on CRuby.
I _think_ start / column info is always there, but not 100% sure.
> In summary:
> * I think we can build `Prism.node_for(Proc|Method|UnboundMethod)` on `(Proc|Method|UnboundMethod)#source_location` with start/end line/column.
> * Those would all be public APIs working on all Ruby implementations.
> * Users don't need to know about low-level implementation-specific (i.e. CRuby-only) concepts like `node_id`.
👍
Dan0042 (Daniel DeLorme) wrote in #note-15:
> Eregon (Benoit Daloze) wrote in #note-3:
> > If it's a new method, I think we should return a "code location" object (could be `Ruby::CodeLocation` or `Ruby::Location` or `Ruby::SourceLocation` or so) and have the following methods (inspired from https://bugs.ruby-lang.org/issues/6012#note-19):
> > * start_line
> > * end_line
>
> I really like the idea of a `#source` method that returns a `Ruby::SourceLocation` object. However, when there's a start and end, I believe Ruby should ideally align with its own core conventions and return a `Range`. For example, `method.source.lines => start...end`. While I understand concerns about the allocation of `Range` objects and performance, I feel that: 1) this might be an example of premature micro-optimization, and 2) from an API design perspective, a `Range` object feels like the natural default. Separate `start`/`end` accessors could remain a low-level, performance-focused API if truly necessary.
>
> As an alternative to `start`/`end` accessors, it would be even better if the `Range`-returning method could be optimized via opcode. Referring back to the idea of `Range#bounds` in #20080, we could have something like `start_line, end_line = method.source.lines.bounds`, which could be optimized via opcode to avoid allocating a `Range` object entirely.
>
> It would be great to see Ruby continue to embrace its own language idioms and explore such optimizations for a more elegant API.
As @mame was pointing out, I don't think a single range for lines makes sense.
Consider the following code:
```ruby
def foo; <<FOO; end; def bar; <<BAR; end
foo method
FOO
bar method
BAR
```
What should the `lines` method report for the source code for `bar`? It cannot be a single `Range` because lines 2 and 3 and part of the `foo` method. `bar` is only defined line lines 1, 4, and 5. If we were to provide a `source` method to return the text of the `bar` method object, what text would it return?
Since heredocs are allowed to extend beyond the `end` of the method / block, I really don't think it makes sense to try to provide a single start / end line. In order to truly provide the source location of the method, we would have to return multiple objects and I think that type of interface would just be too cumbersome to use. I completely agree that we should provide a way to get the AST for a method rather than try to provide line / column information alone.
----------------------------------------
Feature #21005: Update the source location method to include line start/stop and column start/stop details
https://bugs.ruby-lang.org/issues/21005#change-111369
* Author: bkuhlmann (Brooke Kuhlmann)
* Status: Open
----------------------------------------
## Why
👋 Hello. After discussing with Kevin Newton and Benoit Daloze in [Feature 20999](https://bugs.ruby-lang.org/issues/20999), I'd like to propose adding line start/stop and column start/stop information to the `#source_location` method for the following objects:
- [Binding](https://docs.ruby-lang.org/en/master/Binding.html)
- [Proc](https://docs.ruby-lang.org/en/master/Proc.html)
- [Method](https://docs.ruby-lang.org/en/master/Method.html)
- [UnboundMethod](https://docs.ruby-lang.org/en/master/UnboundMethod.html)
At the moment, when using `#source_location`, you only get the following information:
``` ruby
def demo = "A demonstration."
# From disk.
method(:demo).source_location # ["/Users/bkuhlmann/Engineering/Misc/demo", 15]
# From memory.
method(:demo).source_location # ["(irb)", 3]
```
Notice, when asking for the source location, we only get the path/location as the first element and the line number as the second element but I'd like to obtain a much richer set of data which includes line start/stop and column start/stop so I can avoid leaning on the `RubyVM` for this information. Example:
``` ruby
def demo = "A demonstration."
# From disk.
instructions = RubyVM::InstructionSequence.of method(:demo)
puts [instructions.absolute_path, *instructions.to_a.dig(4, :code_location)]
[
"/Users/bkuhlmann/Engineering/Misc/demo", # Source path.
15, # Line start.
0, # Column start.
15, # Line stop.
29 # Column stop.
]
# From memory.
instructions = RubyVM::InstructionSequence.of method(:demo)
puts instructions.script_lines
[
"def demo = \"A demonstration.\"\n",
""
]
```
By having access to the path (or lack thereof in case of IRB), line start/stop, and column start/stop, this means we could avoid using the RubyVM to obtain raw source code for any of these objects. This would not only enhance debugging situations but also improve Domain Specific Languages that wish to leverage this information for introducing new features and/or new debugging capabilities to the language.
## How
Building upon the examples provided above, I'd like to see `Binding`, `Proc`, `Method`, and `UnboundMethod` respond to `#source_location` as follows:
``` ruby
[
"/Users/bkuhlmann/Engineering/Misc/demo", # Source path.
15, # Line start.
15, # Line stop.
0, # Column start.
29 # Column stop.
]
```
Notice, for data grouping purposes, I changed the array structure to always start with the path as the first element, followed by line information, and ending with column information. Alternatively, it could might be nice to improve upon the above by answering a hash each time, instead, for a more self-describing data structure. Example:
``` ruby
{
path: "/Users/bkuhlmann/Engineering/Misc/demo",
line_start: 15,
line_stop: 15,
column_start: 0,
column_stop: 29
}
```
For in-memory, situations like IRB, it would be nice to answer the equivalent of `RubyVM::InstructionSequence#script_lines` which would always be an `Array` with no line or column information since only the source code is necessary. Example:
``` ruby
[
"def demo = \"A demonstration.\"\n",
""
]
```
From a pattern matching perspective, this could provide the best of both worlds especially if information is answered as either a `Hash` or and `Array`. Example:
```
def demo = "A demonstration."
case method(:demo).source_location
in Hash then puts "Source information obtained from disk."
in Array then puts "Source obtained from memory."
else fail TypeError, "Unrecognized source location type."
end
```
This above is only a simple example but there's a lot we could do with this information if the above pattern match was enhanced to deal with the extraction and formatting of the actual source code!
## Notes
This feature request is related to the following discussions in case more context is of help:
- [Feature 6012](https://bugs.ruby-lang.org/issues/6012)
- [Feature 20999](https://bugs.ruby-lang.org/issues/20999)
--
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/
next prev parent reply other threads:[~2025-01-08 17:22 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-05 21:00 [ruby-core:120488] " bkuhlmann (Brooke Kuhlmann) via ruby-core
2025-01-06 14:39 ` [ruby-core:120504] " Eregon (Benoit Daloze) via ruby-core
2025-01-06 20:20 ` [ruby-core:120508] " bkuhlmann (Brooke Kuhlmann) via ruby-core
2025-01-07 8:51 ` [ruby-core:120517] " mame (Yusuke Endoh) via ruby-core
2025-01-07 11:12 ` [ruby-core:120521] " byroot (Jean Boussier) via ruby-core
2025-01-07 15:23 ` [ruby-core:120526] " bkuhlmann (Brooke Kuhlmann) via ruby-core
2025-01-07 20:04 ` [ruby-core:120534] " byroot (Jean Boussier) via ruby-core
2025-01-07 20:28 ` [ruby-core:120536] " Earlopain (Earlopain _) via ruby-core
2025-01-08 0:40 ` [ruby-core:120540] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-01-08 0:49 ` [ruby-core:120541] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-01-08 1:24 ` [ruby-core:120542] " mame (Yusuke Endoh) via ruby-core
2025-01-08 4:55 ` [ruby-core:120550] " tenderlovemaking (Aaron Patterson) via ruby-core
2025-01-08 10:24 ` [ruby-core:120555] " Eregon (Benoit Daloze) via ruby-core
2025-01-08 14:15 ` [ruby-core:120557] " Dan0042 (Daniel DeLorme) via ruby-core
2025-01-08 17:22 ` tenderlovemaking (Aaron Patterson) via ruby-core [this message]
2025-01-09 14:36 ` [ruby-core:120578] " Eregon (Benoit Daloze) via ruby-core
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=redmine.journal-111369.20250108172204.40939@ruby-lang.org \
--to=ruby-core@ml.ruby-lang.org \
--cc=noreply@ruby-lang.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).