ruby-dev (Japanese) list archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-dev:51165] [Ruby master Bug#18588] ruby -e 'p gets' with japanese charactors gets additional invalid leading chars and caught Encoding::InvalidByteSequenceError
@ 2022-02-16 15:52 YO4 (Yoshinao Muramatsu)
  2022-03-09 14:01 ` [ruby-dev:51168] " YO4 (Yoshinao Muramatsu)
  0 siblings, 1 reply; 2+ messages in thread
From: YO4 (Yoshinao Muramatsu) @ 2022-02-16 15:52 UTC (permalink / raw)
  To: ruby-dev

Issue #18588 has been reported by YO4 (Yoshinao Muramatsu).

----------------------------------------
Bug #18588: ruby -e 'p gets' with japanese charactors gets additional invalid leading chars and caught Encoding::InvalidByteSequenceError
https://bugs.ruby-lang.org/issues/18588

* Author: YO4 (Yoshinao Muramatsu)
* Status: Open
* Priority: Normal
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
### Input a line starting with japanese charactor from console, almost every time ruby gets additional invalid leading charactors.

## Reproduce process

```
R:\ruby32\bin>ruby -e 'p gets'
あ
-e:1:in `gets': "\\xA0" on Windows-31J (Encoding::InvalidByteSequenceError)
        from -e:1:in `gets'
        from -e:1:in `<main>'
```

## expected result

```
R:\ruby32\bin>ruby -e 'p gets'
あ
"あ"
```

## your ruby version (ruby -v)

```
R:\ruby32\bin>ruby -v
ruby 3.2.0dev (2022-02-16T08:57:04Z master 00c7a0d491) [x64-mswin64_140]

R:\ruby32\bin>ver

Microsoft Windows [Version 10.0.19043.1526]
```

## other observations
### environment
* On command prompt window with Legacy Console mode, this issue NOT occurs.
* On Windows Terminal, this issue occurs.
* On Windows Sandbox(Japanese Locale), this issue occurs.
* RubyInstaller binaries has same issue

```
C:\src\git>ruby -v
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x64-mingw-ucrt]

C:\src\git>ruby -Eutf-8 -e 'p gets'
あ
-e:1:in `gets': "\\xA0" on Windows-31J (Encoding::InvalidByteSequenceError)
        from -e:1:in `gets'
        from -e:1:in `<main>'
```

### A line starting with single byte charactor(s) got valid value.

```
R:\ruby32\bin>ruby -e 'p gets'
:あ
":あ\n"  # <= valid
```

### external encoding affects
* with Windows-31J, second enter key for line input.

```
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets'
あ
   # <= Second enter key required
"\xA0\xFFあ\n" # <= \xA0\xFF is additional chars
```

### charactor variations

```
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
あ  # <= \x{82A0}

"\xA0\xFF\x82\xA0\n"

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
   # <= \x{8140} fullwidth space

"@\x00\x81@\n"

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
、  # <= \x{8141}

"A\x00\x81A\n"

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
。  # <= \x{8142}

"B\x00\x81B\n"
```

### sysread got valid value.

```
R:\ruby32\bin>ruby -e 'p STDIN.sysread(1024).force_encoding(Encoding::Windows_31J)'
あ
"\x{82A0}\r\n" # <= valid
```

### STDIN.binmode can not resolv this.

```
R:\ruby32\bin>ruby -e 'STDIN.binmode; p gets.force_encoding(Encoding::Windows_31J)'
あ
   # <= Second enter key required
"\xA0\xFF\x{82A0}\r\r\n" # <= invalid
```

### Ruby 3.0 and earlier versions has a different behavior. especialy sysread returns invalid.

```
C:\src\git>ruby -v
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x64-mingw32]

C:\src\git>ruby -Eutf-8 -e 'p gets'
あ
   # <= Second enter key required
"\xA0\xFF\x82\xA0\n"  # <= exception not occures but invalid value
C:\src\git>ruby -EWindows-31J -e 'p gets'
あ
   # <= Second enter key required
"\xA0\xFFあ\n"  # <= also invalid value
C:\src\git>ruby -e 'p STDIN.sysread(1024).force_encoding(Encoding::Windows_31J)'
あ
"\xA0\xFF\x{82A0}\r"
```

## conclusion
1.  ruby 3.1/3.2dev gets return invalid vs sysread return valid
1.  ruby 3.1/3.2dev sysread return valid vs 3.0 sysread return invalid 
1.  The fact that it works fine in legacy console suggests that windows has some issue, but from the previous it looks like ruby can handle it.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [ruby-dev:51168] [Ruby master Bug#18588] ruby -e 'p gets' with japanese charactors gets additional invalid leading chars and caught Encoding::InvalidByteSequenceError
  2022-02-16 15:52 [ruby-dev:51165] [Ruby master Bug#18588] ruby -e 'p gets' with japanese charactors gets additional invalid leading chars and caught Encoding::InvalidByteSequenceError YO4 (Yoshinao Muramatsu)
@ 2022-03-09 14:01 ` YO4 (Yoshinao Muramatsu)
  0 siblings, 0 replies; 2+ messages in thread
From: YO4 (Yoshinao Muramatsu) @ 2022-03-09 14:01 UTC (permalink / raw)
  To: ruby-dev

Issue #18588 has been updated by YO4 (Yoshinao Muramatsu).


It seems to ANSI version of PeekConsoleInput read multibyte charactor partially, subsequent ReadFile returns wrong data on newer Windows 10 versions.
I reported this to microsoft/terminal (https://github.com/microsoft/terminal/issues/12626)

To avoid this behavior, we can use Unicode version of of PeekConsoleInput/ReadConsoleInput.
PR https://github.com/ruby/ruby/pull/5634.

----------------------------------------
Bug #18588: ruby -e 'p gets' with japanese charactors gets additional invalid leading chars and caught Encoding::InvalidByteSequenceError
https://bugs.ruby-lang.org/issues/18588#change-96734

* Author: YO4 (Yoshinao Muramatsu)
* Status: Open
* Priority: Normal
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
### Input a line starting with japanese charactor from console, almost every time ruby gets additional invalid leading charactors.

## Reproduce process

```
R:\ruby32\bin>ruby -e 'p gets'
あ
-e:1:in `gets': "\\xA0" on Windows-31J (Encoding::InvalidByteSequenceError)
        from -e:1:in `gets'
        from -e:1:in `<main>'
```

## expected result

```
R:\ruby32\bin>ruby -e 'p gets'
あ
"あ"
```

## your ruby version (ruby -v)

```
R:\ruby32\bin>ruby -v
ruby 3.2.0dev (2022-02-16T08:57:04Z master 00c7a0d491) [x64-mswin64_140]

R:\ruby32\bin>ver

Microsoft Windows [Version 10.0.19043.1526]
```

## other observations
### environment
* On command prompt window with Legacy Console mode, this issue NOT occurs.
* On Windows Terminal, this issue occurs.
* On Windows Sandbox(Japanese Locale), this issue occurs.
* RubyInstaller binaries has same issue

```
C:\src\git>ruby -v
ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x64-mingw-ucrt]

C:\src\git>ruby -Eutf-8 -e 'p gets'
あ
-e:1:in `gets': "\\xA0" on Windows-31J (Encoding::InvalidByteSequenceError)
        from -e:1:in `gets'
        from -e:1:in `<main>'
```

### A line starting with single byte charactor(s) got valid value.

```
R:\ruby32\bin>ruby -e 'p gets'
:あ
":あ\n"  # <= valid
```

### external encoding affects
* with Windows-31J, second enter key for line input.

```
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets'
あ
   # <= Second enter key required
"\xA0\xFFあ\n" # <= \xA0\xFF is additional chars
```

### charactor variations

```
R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
あ  # <= \x{82A0}

"\xA0\xFF\x82\xA0\n"

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
   # <= \x{8140} fullwidth space

"@\x00\x81@\n"

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
、  # <= \x{8141}

"A\x00\x81A\n"

R:\ruby32\bin>ruby -EWindows-31J -e 'p gets.b'
。  # <= \x{8142}

"B\x00\x81B\n"
```

### sysread got valid value.

```
R:\ruby32\bin>ruby -e 'p STDIN.sysread(1024).force_encoding(Encoding::Windows_31J)'
あ
"\x{82A0}\r\n" # <= valid
```

### STDIN.binmode can not resolv this.

```
R:\ruby32\bin>ruby -e 'STDIN.binmode; p gets.force_encoding(Encoding::Windows_31J)'
あ
   # <= Second enter key required
"\xA0\xFF\x{82A0}\r\r\n" # <= invalid
```

### Ruby 3.0 and earlier versions has a different behavior. especialy sysread returns invalid.

```
C:\src\git>ruby -v
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x64-mingw32]

C:\src\git>ruby -Eutf-8 -e 'p gets'
あ
   # <= Second enter key required
"\xA0\xFF\x82\xA0\n"  # <= exception not occures but invalid value
C:\src\git>ruby -EWindows-31J -e 'p gets'
あ
   # <= Second enter key required
"\xA0\xFFあ\n"  # <= also invalid value
C:\src\git>ruby -e 'p STDIN.sysread(1024).force_encoding(Encoding::Windows_31J)'
あ
"\xA0\xFF\x{82A0}\r"
```

## conclusion
1.  ruby 3.1/3.2dev gets return invalid vs sysread return valid
1.  ruby 3.1/3.2dev sysread return valid vs 3.0 sysread return invalid 
1.  The fact that it works fine in legacy console suggests that windows has some issue, but from the previous it looks like ruby can handle it.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-03-09 14:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16 15:52 [ruby-dev:51165] [Ruby master Bug#18588] ruby -e 'p gets' with japanese charactors gets additional invalid leading chars and caught Encoding::InvalidByteSequenceError YO4 (Yoshinao Muramatsu)
2022-03-09 14:01 ` [ruby-dev:51168] " YO4 (Yoshinao Muramatsu)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).