ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding?
@ 2024-10-09 15:44 kddnewton (Kevin Newton) via ruby-core
  2024-10-09 18:20 ` [ruby-core:119496] " Eregon (Benoit Daloze) via ruby-core
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: kddnewton (Kevin Newton) via ruby-core @ 2024-10-09 15:44 UTC (permalink / raw)
  To: ruby-core; +Cc: kddnewton (Kevin Newton)

Issue #20792 has been reported by kddnewton (Kevin Newton).

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119496] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
@ 2024-10-09 18:20 ` Eregon (Benoit Daloze) via ruby-core
  2024-10-09 18:23 ` [ruby-core:119497] " Eregon (Benoit Daloze) via ruby-core
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-10-09 18:20 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20792 has been updated by Eregon (Benoit Daloze).


I have wanted this feature too, how about adding an argument to `String#valid_encoding?`?
Like `binary_string.valid_encoding?(Encoding::UTF_8)`.

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110110

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119497] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
  2024-10-09 18:20 ` [ruby-core:119496] " Eregon (Benoit Daloze) via ruby-core
@ 2024-10-09 18:23 ` Eregon (Benoit Daloze) via ruby-core
  2024-10-09 18:53 ` [ruby-core:119498] " Eregon (Benoit Daloze) via ruby-core
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-10-09 18:23 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20792 has been updated by Eregon (Benoit Daloze).


In terms of performance though, both methods need 2 code range scans if the String is then `force_encoding(Encoding::UTF_8)` if valid and used later for some operation needing the code range.
Maybe the String should stay in the argument encoding if it's valid in it?

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110111

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119498] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
  2024-10-09 18:20 ` [ruby-core:119496] " Eregon (Benoit Daloze) via ruby-core
  2024-10-09 18:23 ` [ruby-core:119497] " Eregon (Benoit Daloze) via ruby-core
@ 2024-10-09 18:53 ` Eregon (Benoit Daloze) via ruby-core
  2024-10-09 19:07 ` [ruby-core:119499] " austin (Austin Ziegler) via ruby-core
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-10-09 18:53 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20792 has been updated by Eregon (Benoit Daloze).


I think I discussed this with @byroot a couple times as well, what about `String#with_encoding(Encoding)`, which is like `force_encoding` but doesn't mutate the receiver.
Then this would be efficient in that it would scan the code range once:
```ruby
utf8 = binary_str.with_encoding(Encoding::UTF_8)
if utf8.valid_encoding?
  return utf8
else
  return binary_str
end
```

And of course it would shorten `str.dup.force_encoding(Encoding::UTF_8)` to `str.with_encoding(Encoding::UTF_8)`.

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110112

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119499] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
                   ` (2 preceding siblings ...)
  2024-10-09 18:53 ` [ruby-core:119498] " Eregon (Benoit Daloze) via ruby-core
@ 2024-10-09 19:07 ` austin (Austin Ziegler) via ruby-core
  2024-10-09 22:34 ` [ruby-core:119500] " Eregon (Benoit Daloze) via ruby-core
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: austin (Austin Ziegler) via ruby-core @ 2024-10-09 19:07 UTC (permalink / raw)
  To: ruby-core; +Cc: austin (Austin Ziegler)

Issue #20792 has been updated by austin (Austin Ziegler).


Eregon (Benoit Daloze) wrote in #note-3:
> I think I discussed this with @byroot a couple times as well, what about `String#with_encoding(Encoding)`, which is like `force_encoding` but doesn't mutate the receiver.
> Then this would be efficient in that it would scan the code range once:
> ```ruby
> utf8 = binary_str.with_encoding(Encoding::UTF_8)
> if utf8.valid_encoding?
>   return utf8
> else
>   return binary_str
> end
> ```
> 
> And of course it would shorten `str.dup.force_encoding(Encoding::UTF_8)` to `str.with_encoding(Encoding::UTF_8)`.

A variation of this would be something like:

```ruby
if utf8 = binary_str.try_encoding(Encoding::UTF_8)
  return utf8
else
  return binary_str
end
```

The implementation would be the equivalent of:

```ruby
def try_encoding(encoding)
  target = self.dup.force_encoding(encoding)
  target.valid_encoding? ? target : nil
end
```

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110113

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119500] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
                   ` (3 preceding siblings ...)
  2024-10-09 19:07 ` [ruby-core:119499] " austin (Austin Ziegler) via ruby-core
@ 2024-10-09 22:34 ` Eregon (Benoit Daloze) via ruby-core
  2024-10-10  3:44 ` [ruby-core:119501] " nirvdrum (Kevin Menard) via ruby-core
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-10-09 22:34 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20792 has been updated by Eregon (Benoit Daloze).


Right, that would also solve this specific usage.
However I think `String#with_encoding` is a good general method to have, as for instance there are [lots of `.dup.force_encoding`](https://github.com/search?q=.dup.force_encoding+language%3ARuby&type=code&l=Ruby).
It's one of the rare cases where Ruby core API don't have a nice way to do something in a non-mutating manner so that would fix it.
It's especially striking for frozen literal strings where one just wants it in some specific encoding, but currently one has to `"foo".dup.force_encoding(SOME_ENCODING)`.
There is `String#b` but that's only for the BINARY encoding (and the name is not really clear).

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110114

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119501] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
                   ` (4 preceding siblings ...)
  2024-10-09 22:34 ` [ruby-core:119500] " Eregon (Benoit Daloze) via ruby-core
@ 2024-10-10  3:44 ` nirvdrum (Kevin Menard) via ruby-core
  2024-10-11  8:05 ` [ruby-core:119509] " nobu (Nobuyoshi Nakada) via ruby-core
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: nirvdrum (Kevin Menard) via ruby-core @ 2024-10-10  3:44 UTC (permalink / raw)
  To: ruby-core; +Cc: nirvdrum (Kevin Menard)

Issue #20792 has been updated by nirvdrum (Kevin Menard).


Eregon (Benoit Daloze) wrote in #note-1:
> I have wanted this feature too, how about adding an optional argument to `String#valid_encoding?`?
> Like `binary_string.valid_encoding?(Encoding::UTF_8)`.

I'm partial to this one. Alternatively, it could be nice to have the inverse: `Encoding#valid_string?` (or `Encoding#valid_bytes?`). I like the symmetry, but you'd give up the short-hand of specifying the encoding by its string name.

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110115

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119509] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
                   ` (5 preceding siblings ...)
  2024-10-10  3:44 ` [ruby-core:119501] " nirvdrum (Kevin Menard) via ruby-core
@ 2024-10-11  8:05 ` nobu (Nobuyoshi Nakada) via ruby-core
  2024-10-11 15:04 ` [ruby-core:119512] " byroot (Jean Boussier) via ruby-core
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: nobu (Nobuyoshi Nakada) via ruby-core @ 2024-10-11  8:05 UTC (permalink / raw)
  To: ruby-core; +Cc: nobu (Nobuyoshi Nakada)

Issue #20792 has been updated by nobu (Nobuyoshi Nakada).


nirvdrum (Kevin Menard) wrote in #note-6:
> I'm partial to this one. Alternatively, it could be nice to have the inverse: `Encoding#valid_string?` (or `Encoding#valid_bytes?`).

I prefer `valid_sequence?` over `valid_bytes?`.

> I like the symmetry, but you'd give up the short-hand of specifying the encoding by its string name.

`Encoding.valid_string?(str, enc)` may be possible as a short-hand for `Encoding.find(enc).valid_string?(str)`.

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110124

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119512] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
                   ` (6 preceding siblings ...)
  2024-10-11  8:05 ` [ruby-core:119509] " nobu (Nobuyoshi Nakada) via ruby-core
@ 2024-10-11 15:04 ` byroot (Jean Boussier) via ruby-core
  2024-10-14 17:41 ` [ruby-core:119525] " kddnewton (Kevin Newton) via ruby-core
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: byroot (Jean Boussier) via ruby-core @ 2024-10-11 15:04 UTC (permalink / raw)
  To: ruby-core; +Cc: byroot (Jean Boussier)

Issue #20792 has been updated by byroot (Jean Boussier).


> as a short-hand for Encoding.find(enc).valid_string?(str).

I suspect most of the time you'd check a specific encoding, so `Encoding::UTF_8.valid_sequence?(str)` would be enough.

That said I think that what @Eregon mentioned in term of performance makes sense. Most of the time when using such method, the next step will be to dup the string and call `force_encoding` on it.

So a higher level method would have the advantage of being able to store the computed coderange on the resulting string.

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110127

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119525] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
                   ` (7 preceding siblings ...)
  2024-10-11 15:04 ` [ruby-core:119512] " byroot (Jean Boussier) via ruby-core
@ 2024-10-14 17:41 ` kddnewton (Kevin Newton) via ruby-core
  2024-10-14 19:24 ` [ruby-core:119526] " Eregon (Benoit Daloze) via ruby-core
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: kddnewton (Kevin Newton) via ruby-core @ 2024-10-14 17:41 UTC (permalink / raw)
  To: ruby-core; +Cc: kddnewton (Kevin Newton)

Issue #20792 has been updated by kddnewton (Kevin Newton).


I think the advantage right now is that it doesn't require a mutable string to check. It seems like all of these other options would? Unless you mean to make `with_encoding` return `nil` if it wasn't valid?

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110140

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119526] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
                   ` (8 preceding siblings ...)
  2024-10-14 17:41 ` [ruby-core:119525] " kddnewton (Kevin Newton) via ruby-core
@ 2024-10-14 19:24 ` Eregon (Benoit Daloze) via ruby-core
  2024-10-21  8:50 ` [ruby-core:119538] " kddnewton (Kevin Newton) via ruby-core
  2024-10-21 10:52 ` [ruby-core:119571] " Eregon (Benoit Daloze) via ruby-core
  11 siblings, 0 replies; 13+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-10-14 19:24 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20792 has been updated by Eregon (Benoit Daloze).


> I think the advantage right now is that it doesn't require a mutable string to check.

`with_encoding` would always be the same as `.dup.force_encoding` (except slightly more efficient).
It doesn't mutate the receiver.

For the description use case you could then use `valid_encoding?` like in https://bugs.ruby-lang.org/issues/20792#note-3
That might compute the code range (if not already computed), but that's needed to know if the encoding is valid anyway, and interestingly it will remember this coderange for further operations on that new String.

Using `forcible_encoding?` it can't remember the code range and so it would have to be recomputed on `force_encoding` or the next operation needing it.

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110141

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119538] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
                   ` (9 preceding siblings ...)
  2024-10-14 19:24 ` [ruby-core:119526] " Eregon (Benoit Daloze) via ruby-core
@ 2024-10-21  8:50 ` kddnewton (Kevin Newton) via ruby-core
  2024-10-21 10:52 ` [ruby-core:119571] " Eregon (Benoit Daloze) via ruby-core
  11 siblings, 0 replies; 13+ messages in thread
From: kddnewton (Kevin Newton) via ruby-core @ 2024-10-21  8:50 UTC (permalink / raw)
  To: ruby-core; +Cc: kddnewton (Kevin Newton)

Issue #20792 has been updated by kddnewton (Kevin Newton).


Yeah I'm saying it doesn't require a mutable string because it just checks the current string, it doesn't require a dup/allocation. So this avoids having to allocate a new string to check if it's possible.

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110154

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:119571] [Ruby master Feature#20792] String#forcible_encoding?
  2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
                   ` (10 preceding siblings ...)
  2024-10-21  8:50 ` [ruby-core:119538] " kddnewton (Kevin Newton) via ruby-core
@ 2024-10-21 10:52 ` Eregon (Benoit Daloze) via ruby-core
  11 siblings, 0 replies; 13+ messages in thread
From: Eregon (Benoit Daloze) via ruby-core @ 2024-10-21 10:52 UTC (permalink / raw)
  To: ruby-core; +Cc: Eregon (Benoit Daloze)

Issue #20792 has been updated by Eregon (Benoit Daloze).


Right.
But if it is valid in that encoding, wouldn't you always or almost always then want the String (or a copy of it) in that encoding?
If you do, `.with_encoding` would be more efficient as it keeps the computed coderange.
If you don't then indeed just a predicate would avoid the String instance allocation (Strings are copy-on-write, so it's just one object allocation, string bytes are not copied).

Do you have an example where you wouldn't want a String in that encoding?
The [linked example](https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30) needs a UTF-8 String on line 19. So `with_encoding` seems a perfect fit there and more efficient than the predicate (1 vs 2 coderange scans).
The code also has to explicitly workaround `force_encoding` being inplace (a common inconvenience with `force_encoding`) on line 27, which `with_encoding` addresses.

----------------------------------------
Feature #20792: String#forcible_encoding?
https://bugs.ruby-lang.org/issues/20792#change-110185

* Author: kddnewton (Kevin Newton)
* Status: Open
----------------------------------------
I would like to add a method to String called `forcible_encoding?(encoding)`. This would return true or false depending on whether the receiver can be forced into the given encoding without breaking the string. It would effectively be an alias for:

```ruby
def forcible_encoding?(enc)
  original = encoding
  result = force_encoding(enc).valid_encoding?
  force_encoding(original)
  result
end
```

I would like this method because there are extremely rare but possible circumstances where source files are marked as binary but contain UTF-8-encoded characters. In that case I would like to check if it's possible to cleanly force UTF-8 before actually doing it. The code I'm trying to replace is here: https://github.com/ruby/prism/blob/d6e9b8de36b4d18debfe36e4545116539964ceeb/lib/prism/parse_result.rb#L15-L30.

The pull request for the code is here: https://github.com/ruby/ruby/pull/11851.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-10-21 10:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-09 15:44 [ruby-core:119495] [Ruby master Feature#20792] String#forcible_encoding? kddnewton (Kevin Newton) via ruby-core
2024-10-09 18:20 ` [ruby-core:119496] " Eregon (Benoit Daloze) via ruby-core
2024-10-09 18:23 ` [ruby-core:119497] " Eregon (Benoit Daloze) via ruby-core
2024-10-09 18:53 ` [ruby-core:119498] " Eregon (Benoit Daloze) via ruby-core
2024-10-09 19:07 ` [ruby-core:119499] " austin (Austin Ziegler) via ruby-core
2024-10-09 22:34 ` [ruby-core:119500] " Eregon (Benoit Daloze) via ruby-core
2024-10-10  3:44 ` [ruby-core:119501] " nirvdrum (Kevin Menard) via ruby-core
2024-10-11  8:05 ` [ruby-core:119509] " nobu (Nobuyoshi Nakada) via ruby-core
2024-10-11 15:04 ` [ruby-core:119512] " byroot (Jean Boussier) via ruby-core
2024-10-14 17:41 ` [ruby-core:119525] " kddnewton (Kevin Newton) via ruby-core
2024-10-14 19:24 ` [ruby-core:119526] " Eregon (Benoit Daloze) via ruby-core
2024-10-21  8:50 ` [ruby-core:119538] " kddnewton (Kevin Newton) via ruby-core
2024-10-21 10:52 ` [ruby-core:119571] " Eregon (Benoit Daloze) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).