* non-ascii chars in cmd.exe (Windows)
@ 2020-11-12 21:39 Pablo Rodriguez
2020-11-12 21:55 ` Hans Hagen
0 siblings, 1 reply; 5+ messages in thread
From: Pablo Rodriguez @ 2020-11-12 21:39 UTC (permalink / raw)
To: mailing list for ConTeXt users
Dear list,
I have the following sample:
\starttext
\startluacode
io.write(' Name? ')
document.name = io.read() or ''
\stopluacode
\cldcontext{document.name} is the name.
\stoptext
Running it on Linux, I can input non-ascii characters.
When running in Windows, if the input chars contains a non-ascii one,
document.name is empty.
I have no problem passing arguments from cmd with Unicode characters
(such as in '--arguments="name={αβγ}"'.
I’m using current latest (ConTeXt MkIV 2020.11.08 12:42).
I’m not sure what I am missing or whether I have hit a bug.
Many thanks for your help,
Pablo
--
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: non-ascii chars in cmd.exe (Windows)
2020-11-12 21:39 non-ascii chars in cmd.exe (Windows) Pablo Rodriguez
@ 2020-11-12 21:55 ` Hans Hagen
2020-11-13 13:49 ` Pablo Rodriguez
0 siblings, 1 reply; 5+ messages in thread
From: Hans Hagen @ 2020-11-12 21:55 UTC (permalink / raw)
To: mailing list for ConTeXt users, Pablo Rodriguez
On 11/12/2020 10:39 PM, Pablo Rodriguez wrote:
> Dear list,
>
> I have the following sample:
>
> \starttext
> \startluacode
> io.write(' Name? ')
> document.name = io.read() or ''
> \stopluacode
> \cldcontext{document.name} is the name.
> \stoptext
>
> Running it on Linux, I can input non-ascii characters.
>
> When running in Windows, if the input chars contains a non-ascii one,
> document.name is empty.
>
> I have no problem passing arguments from cmd with Unicode characters
> (such as in '--arguments="name={αβγ}"'.
>
> I’m using current latest (ConTeXt MkIV 2020.11.08 12:42).
>
> I’m not sure what I am missing or whether I have hit a bug.
This has been discussed before I think. It has to do with how you
configured your system (what encoding) and how consisteltly you take
that into account. If you have some mixed setup, just don't use non-ascii.
Anyway, in LMTX all file, commandline and systemn operations are utf 8
and on windows get translated into wide system calls so there it should
work ok if you use utf8.
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: non-ascii chars in cmd.exe (Windows)
2020-11-12 21:55 ` Hans Hagen
@ 2020-11-13 13:49 ` Pablo Rodriguez
2020-11-13 14:15 ` Hans Hagen
0 siblings, 1 reply; 5+ messages in thread
From: Pablo Rodriguez @ 2020-11-13 13:49 UTC (permalink / raw)
To: ntg-context
On 11/12/20 10:55 PM, Hans Hagen wrote:
> On 11/12/2020 10:39 PM, Pablo Rodriguez wrote:
>> Dear list,
>>
>> I have the following sample:
>>
>> \starttext
>> \startluacode
>> io.write(' Name? ')
>> document.name = io.read() or ''
>> \stopluacode
>> \cldcontext{document.name} is the name.
>> \stoptext
>> [...]
>> I’m using current latest (ConTeXt MkIV 2020.11.08 12:42).
>>
>> I’m not sure what I am missing or whether I have hit a bug.
>
> This has been discussed before I think. It has to do with how you
> configured your system (what encoding) and how consisteltly you take
> that into account. If you have some mixed setup, just don't use non-ascii.
Many thanks for your reply, Hans.
I use chcp 65001. I have no problem displaying UTF-8 messages and
passing arguments with UTF-8 characters.
> Anyway, in LMTX all file, commandline and systemn operations are utf 8
> and on windows get translated into wide system calls so there it should
> work ok if you use utf8.
Well, according to https://ss64.com/nt/cmd.html the console uses
UTF-16LE. Or UCS-2 (as described in
https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/#console-built-in-a-pre-unicode-dawn).
Since console input is UTF-16LE, I guess I may need something in Lua
similar to 'regimes.translate(str, "utf16le")'.
How can I convert a string from UTF-16LE (into UTF-8)?
Many thanks for your help,
Pablo
--
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: non-ascii chars in cmd.exe (Windows)
2020-11-13 13:49 ` Pablo Rodriguez
@ 2020-11-13 14:15 ` Hans Hagen
2020-11-13 20:01 ` Pablo Rodriguez
0 siblings, 1 reply; 5+ messages in thread
From: Hans Hagen @ 2020-11-13 14:15 UTC (permalink / raw)
To: mailing list for ConTeXt users, Pablo Rodriguez
On 11/13/2020 2:49 PM, Pablo Rodriguez wrote:
> On 11/12/20 10:55 PM, Hans Hagen wrote:
>> On 11/12/2020 10:39 PM, Pablo Rodriguez wrote:
>>> Dear list,
>>>
>>> I have the following sample:
>>>
>>> \starttext
>>> \startluacode
>>> io.write(' Name? ')
>>> document.name = io.read() or ''
>>> \stopluacode
>>> \cldcontext{document.name} is the name.
>>> \stoptext
>>> [...]
>>> I’m using current latest (ConTeXt MkIV 2020.11.08 12:42).
>>>
>>> I’m not sure what I am missing or whether I have hit a bug.
>>
>> This has been discussed before I think. It has to do with how you
>> configured your system (what encoding) and how consisteltly you take
>> that into account. If you have some mixed setup, just don't use non-ascii.
>
> Many thanks for your reply, Hans.
>
> I use chcp 65001. I have no problem displaying UTF-8 messages and
> passing arguments with UTF-8 characters.
>
>> Anyway, in LMTX all file, commandline and systemn operations are utf 8
>> and on windows get translated into wide system calls so there it should
>> work ok if you use utf8.
>
> Well, according to https://ss64.com/nt/cmd.html the console uses
> UTF-16LE. Or UCS-2 (as described in
> https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/#console-built-in-a-pre-unicode-dawn).
>
> Since console input is UTF-16LE, I guess I may need something in Lua
> similar to 'regimes.translate(str, "utf16le")'.
the console uses whatever code page you have configured and it also
relates to the code page used for filenames .. the 16 bit values are
used deep down and what you use gets translated into that (often folks
still use some 8 bit code page so that then gets mapped) .. there is no
way the system can know if what you provide is as it's bytes in whatever
encoding used
so, you need to look what your system is configured for
(this is independent of the output to the console which is what the
65001 does)
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: non-ascii chars in cmd.exe (Windows)
2020-11-13 14:15 ` Hans Hagen
@ 2020-11-13 20:01 ` Pablo Rodriguez
0 siblings, 0 replies; 5+ messages in thread
From: Pablo Rodriguez @ 2020-11-13 20:01 UTC (permalink / raw)
To: ntg-context
On 11/13/20 3:15 PM, Hans Hagen wrote:
> On 11/13/2020 2:49 PM, Pablo Rodriguez wrote:
>> [...]
>> Since console input is UTF-16LE, I guess I may need something in Lua
>> similar to 'regimes.translate(str, "utf16le")'.
>
> the console uses whatever code page you have configured and it also
> relates to the code page used for filenames .. the 16 bit values are
> used deep down and what you use gets translated into that (often folks
> still use some 8 bit code page so that then gets mapped) .. there is no
> way the system can know if what you provide is as it's bytes in whatever
> encoding used
>
> so, you need to look what your system is configured for
I think that codepage is named locale codepage (as different from the
one in the console [or whatever chcp should change]).
I wonder whether this is named codeset in this Windows tool:
> wmic os get locale, oslanguage, codeset
CodeSet Locale OSLanguage
1252 0c0a 3082
> (this is independent of the output to the console which is what the
> 65001 does)
It seems that UTF-8 as locale codepage is only available in Win10.
An easy workaround is to rewrite the batch file to:
chcp 65001
set /P "name=Name? "
context --arguments="name={%name%}" document.tex
Many thanks for your help,
Pablo
--
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://context.aanhet.net
archive : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-11-13 20:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12 21:39 non-ascii chars in cmd.exe (Windows) Pablo Rodriguez
2020-11-12 21:55 ` Hans Hagen
2020-11-13 13:49 ` Pablo Rodriguez
2020-11-13 14:15 ` Hans Hagen
2020-11-13 20:01 ` Pablo Rodriguez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).