ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Unicode text conversion
@ 2017-01-27 12:48 Procházka Lukáš Ing.
  2017-01-27 13:08 ` Hans Hagen
  0 siblings, 1 reply; 8+ messages in thread
From: Procházka Lukáš Ing. @ 2017-01-27 12:48 UTC (permalink / raw)
  To: ConTeXt

Hello,

does ConTeXt contain a built-in Lua conversion routine to convert text in UTF-16 (and others) to UTF-8 string?

Something like:

----
\startluacode
   local str = "A unicode string" -- Or e.g. string loaded from a file

   str = convert(str, "utf16", "utf8")

   context(str)
\stopuacode
----

TIA.

Best regards,

Lukas


-- 
Ing. Lukáš Procházka | mailto:LPr@pontex.cz
Pontex s. r. o.      | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn
Bezová 1658
147 14 Praha 4

Tel: +420 241 096 751 (+420 720 951 172)
Fax: +420 244 461 038

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode text conversion
  2017-01-27 12:48 Unicode text conversion Procházka Lukáš Ing.
@ 2017-01-27 13:08 ` Hans Hagen
  2017-01-27 15:16   ` Procházka Lukáš Ing.
  2017-06-14 12:07   ` Procházka Lukáš Ing.
  0 siblings, 2 replies; 8+ messages in thread
From: Hans Hagen @ 2017-01-27 13:08 UTC (permalink / raw)
  To: ntg-context

On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote:
> Hello,
>
> does ConTeXt contain a built-in Lua conversion routine to convert text
> in UTF-16 (and others) to UTF-8 string?
>
> Something like:
>
> ----
> \startluacode
>   local str = "A unicode string" -- Or e.g. string loaded from a file
>
>   str = convert(str, "utf16", "utf8")
>
>   context(str)
> \stopuacode
> ----

utf.utf16_to_utf8_le
utf.utf16_to_utf8_be
utf.utf32_to_utf8_le
utf.utf32_to_utf8_be

normally when files are in utf16 and have a bom they will be dealt with 
properly

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode text conversion
  2017-01-27 13:08 ` Hans Hagen
@ 2017-01-27 15:16   ` Procházka Lukáš Ing.
  2017-06-14 12:07   ` Procházka Lukáš Ing.
  1 sibling, 0 replies; 8+ messages in thread
From: Procházka Lukáš Ing. @ 2017-01-27 15:16 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hello Hans,

that's it, thank you!

Best regards,

Lukas


On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen <pragma@wxs.nl> wrote:

> utf16_to_utf8_le


-- 
Ing. Lukáš Procházka | mailto:LPr@pontex.cz
Pontex s. r. o.      | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn
Bezová 1658
147 14 Praha 4

Tel: +420 241 096 751 (+420 720 951 172)
Fax: +420 244 461 038

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode text conversion
  2017-01-27 13:08 ` Hans Hagen
  2017-01-27 15:16   ` Procházka Lukáš Ing.
@ 2017-06-14 12:07   ` Procházka Lukáš Ing.
  2017-06-14 12:21     ` Hans Hagen
  1 sibling, 1 reply; 8+ messages in thread
From: Procházka Lukáš Ing. @ 2017-06-14 12:07 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hello,

is there also a way to convert CP1250 to UTF8 and vice versa?

Best regards,

Lukas


On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen <pragma@wxs.nl> wrote:

> On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote:
>> Hello,
>>
>> does ConTeXt contain a built-in Lua conversion routine to convert text
>> in UTF-16 (and others) to UTF-8 string?
>>
>> Something like:
>>
>> ----
>> \startluacode
>>   local str = "A unicode string" -- Or e.g. string loaded from a file
>>
>>   str = convert(str, "utf16", "utf8")
>>
>>   context(str)
>> \stopuacode
>> ----
>
> utf.utf16_to_utf8_le
> utf.utf16_to_utf8_be
> utf.utf32_to_utf8_le
> utf.utf32_to_utf8_be
>
> normally when files are in utf16 and have a bom they will be dealt with
> properly
>
> -----------------------------------------------------------------
>                                            Hans Hagen | PRAGMA ADE
>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -----------------------------------------------------------------
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
> archive  : https://bitbucket.org/phg/context-mirror/commits/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________


-- 
Ing. Lukáš Procházka | mailto:LPr@pontex.cz
Pontex s. r. o.      | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn
Bezová 1658
147 14 Praha 4

Mob.: +420 702 033 396
Tel.: +420 241 096 751

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode text conversion
  2017-06-14 12:07   ` Procházka Lukáš Ing.
@ 2017-06-14 12:21     ` Hans Hagen
  2017-06-14 12:31       ` Procházka Lukáš Ing.
  0 siblings, 1 reply; 8+ messages in thread
From: Hans Hagen @ 2017-06-14 12:21 UTC (permalink / raw)
  To: ntg-context

On 6/14/2017 2:07 PM, Procházka Lukáš Ing. wrote:
> Hello,
> 
> is there also a way to convert CP1250 to UTF8 and vice versa?

regimes.toregime('8859-1',"abcde Ä","?")

there's also fromregime

> Best regards,
> 
> Lukas
> 
> 
> On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen <pragma@wxs.nl> wrote:
> 
>> On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote:
>>> Hello,
>>>
>>> does ConTeXt contain a built-in Lua conversion routine to convert text
>>> in UTF-16 (and others) to UTF-8 string?
>>>
>>> Something like:
>>>
>>> ----
>>> \startluacode
>>>   local str = "A unicode string" -- Or e.g. string loaded from a file
>>>
>>>   str = convert(str, "utf16", "utf8")
>>>
>>>   context(str)
>>> \stopuacode
>>> ----
>>
>> utf.utf16_to_utf8_le
>> utf.utf16_to_utf8_be
>> utf.utf32_to_utf8_le
>> utf.utf32_to_utf8_be
>>
>> normally when files are in utf16 and have a bom they will be dealt with
>> properly
>>
>> -----------------------------------------------------------------
>>                                            Hans Hagen | PRAGMA ADE
>>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
>> -----------------------------------------------------------------
>> ___________________________________________________________________________________ 
>>
>> If your question is of interest to others as well, please add an entry 
>> to the Wiki!
>>
>> maillist : ntg-context@ntg.nl / 
>> http://www.ntg.nl/mailman/listinfo/ntg-context
>> webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
>> archive  : https://bitbucket.org/phg/context-mirror/commits/
>> wiki     : http://contextgarden.net
>> ___________________________________________________________________________________ 
>>
> 
> 


-- 

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode text conversion
  2017-06-14 12:21     ` Hans Hagen
@ 2017-06-14 12:31       ` Procházka Lukáš Ing.
  2017-06-14 17:19         ` Pablo Rodriguez
  2017-06-16  6:27         ` Procházka Lukáš Ing.
  0 siblings, 2 replies; 8+ messages in thread
From: Procházka Lukáš Ing. @ 2017-06-14 12:31 UTC (permalink / raw)
  To: mailing list for ConTeXt users

OK, thank you;
I deduce:

regimes.toregime('8859-1',"abcde Ä","?")

means actually:

- "convert from the current regime" (be e.g. UTF8)
- regimes.toregime(<regime-to-convert-to>, <string-to-convert>, <what-does-this-mean?>)

Lukas


On Wed, 14 Jun 2017 14:21:47 +0200, Hans Hagen <pragma@wxs.nl> wrote:

> On 6/14/2017 2:07 PM, Procházka Lukáš Ing. wrote:
>> Hello,
>>
>> is there also a way to convert CP1250 to UTF8 and vice versa?
>
> regimes.toregime('8859-1',"abcde Ä","?")
>
> there's also fromregime
>
>> Best regards,
>>
>> Lukas
>>
>>
>> On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen <pragma@wxs.nl> wrote:
>>
>>> On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote:
>>>> Hello,
>>>>
>>>> does ConTeXt contain a built-in Lua conversion routine to convert text
>>>> in UTF-16 (and others) to UTF-8 string?
>>>>
>>>> Something like:
>>>>
>>>> ----
>>>> \startluacode
>>>>   local str = "A unicode string" -- Or e.g. string loaded from a file
>>>>
>>>>   str = convert(str, "utf16", "utf8")
>>>>
>>>>   context(str)
>>>> \stopuacode
>>>> ----
>>>
>>> utf.utf16_to_utf8_le
>>> utf.utf16_to_utf8_be
>>> utf.utf32_to_utf8_le
>>> utf.utf32_to_utf8_be
>>>
>>> normally when files are in utf16 and have a bom they will be dealt with
>>> properly
>>>
>>> -----------------------------------------------------------------
>>>                                            Hans Hagen | PRAGMA ADE
>>>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>>>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
>>> -----------------------------------------------------------------
>>> ___________________________________________________________________________________
>>>
>>> If your question is of interest to others as well, please add an entry
>>> to the Wiki!
>>>
>>> maillist : ntg-context@ntg.nl /
>>> http://www.ntg.nl/mailman/listinfo/ntg-context
>>> webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
>>> archive  : https://bitbucket.org/phg/context-mirror/commits/
>>> wiki     : http://contextgarden.net
>>> ___________________________________________________________________________________
>>>
>>
>>
>
>


-- 
Ing. Lukáš Procházka | mailto:LPr@pontex.cz
Pontex s. r. o.      | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn
Bezová 1658
147 14 Praha 4

Mob.: +420 702 033 396
Tel.: +420 241 096 751

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode text conversion
  2017-06-14 12:31       ` Procházka Lukáš Ing.
@ 2017-06-14 17:19         ` Pablo Rodriguez
  2017-06-16  6:27         ` Procházka Lukáš Ing.
  1 sibling, 0 replies; 8+ messages in thread
From: Pablo Rodriguez @ 2017-06-14 17:19 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 06/14/2017 02:31 PM, Procházka Lukáš Ing. wrote:
> OK, thank you;
> I deduce:
> 
> regimes.toregime('8859-1',"abcde Ä","?")
> 
> means actually:
> 
> - "convert from the current regime" (be e.g. UTF8)
> - regimes.toregime(<regime-to-convert-to>, <string-to-convert>, <what-does-this-mean?>)
Hi Lukas,

 -- Usage:
 -- regimes.toregime(<target-encoding>, <text>, <character-on-failure>))

From http://wiki.contextgarden.net/Encodings_and_Regimes.

Just in case it helps,

Pablo
-- 
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Unicode text conversion
  2017-06-14 12:31       ` Procházka Lukáš Ing.
  2017-06-14 17:19         ` Pablo Rodriguez
@ 2017-06-16  6:27         ` Procházka Lukáš Ing.
  1 sibling, 0 replies; 8+ messages in thread
From: Procházka Lukáš Ing. @ 2017-06-16  6:27 UTC (permalink / raw)
  To: mailing list for ConTeXt users

[-- Attachment #1: Type: text/plain, Size: 2237 bytes --]

Hello,

thank you for the answers.

So - IMHO the following code should provide CP-1250 to UTF-8 conversion:

----
\enableregime[cp1250]

\startluacode
   local cvt = function(fn)
     print(fn)

     local str = io.loaddata(fn)

--print(str)

     str = regimes.toregime("utf", str, "?")

     io.savedata(fn .. "~", str)
   end

   --

   cvt("_01-Identifikacni-Udaje.mkiv")
\stopluacode
----

But ConTeXt (yesterday's beta) fails with:

----
lua error       > lua error on line 17 in file X://Users/LPr/~/~Asci/Cvt2UTF8.mkiv:

...eta/tex/texmf-context/tex/context/base/mkiv/regi-ini.lua:127: bad argument #1 to 'for iterator' (table expected, got boolean)
stack traceback:
         [C]: in function 'for iterator'
         ...eta/tex/texmf-context/tex/context/base/mkiv/regi-ini.lua:127: in function '__index'
         ...eta/tex/texmf-context/tex/context/base/mkiv/regi-ini.lua:182: in function 'toregime'
         [ctxlua]:7: in function 'cvt'
         [ctxlua]:14: in main chunk

  7         local str = io.loaddata(fn)
  8
  9         str = regimes.toregime("utf", str, "?")
10
11         io.savedata(fn .. "~", str)
12       end
13
14       --
15
16       cvt("_01-Identifikacni-Udaje.mkiv")
17 >>  \stopluacode
----

What's wrong with my code?

Any help would be appreciated.

Best regards,

Lukas


On Wed, 14 Jun 2017 14:31:47 +0200, Procházka Lukáš Ing. <LPr@pontex.cz> wrote:

> OK, thank you;
> I deduce:
>
> regimes.toregime('8859-1',"abcde Ä","?")
>
> means actually:
>
> - "convert from the current regime" (be e.g. UTF8)
> - regimes.toregime(<regime-to-convert-to>, <string-to-convert>, <what-does-this-mean?>)
>
> Lukas
>
>
> On Wed, 14 Jun 2017 14:21:47 +0200, Hans Hagen <pragma@wxs.nl> wrote:
>
>> On 6/14/2017 2:07 PM, Procházka Lukáš Ing. wrote:
>>> Hello,
>>>
>>> is there also a way to convert CP1250 to UTF8 and vice versa?
>>
>> regimes.toregime('8859-1',"abcde Ä","?")
>>
>> there's also fromregime
>>


-- 
Ing. Lukáš Procházka | mailto:LPr@pontex.cz
Pontex s. r. o.      | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn
Bezová 1658
147 14 Praha 4

Mob.: +420 702 033 396
Tel.: +420 241 096 751

[-- Attachment #2: Cvt2UTF8.mkiv --]
[-- Type: application/octet-stream, Size: 560 bytes --]

\enableregime[cp1250]

\startluacode
  local cvt = function(fn)
    print(fn)

    local str = io.loaddata(fn)

--print(str)

    str = regimes.toregime("utf", str, "?")

    io.savedata(fn .. "~", str)
  end

  --

  cvt("_01-Identifikacni-Udaje.mkiv")
\stopluacode

\startluacode
  do return end

  local pa = lfs.currentdir()

  for fn in lfs.dir(pa) do
    fn = pa .. "/" .. fn

    local atts = lfs.attributes(fn)

    if atts and atts.mode == "file" and fn:find("%.mkiv$") then
      cvt(fn)
    end
  end
\stopluacode

[-- Attachment #3: _01-Identifikacni-Udaje.mkiv --]
[-- Type: application/octet-stream, Size: 6620 bytes --]

[-- Attachment #4: Type: text/plain, Size: 492 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-06-16  6:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-27 12:48 Unicode text conversion Procházka Lukáš Ing.
2017-01-27 13:08 ` Hans Hagen
2017-01-27 15:16   ` Procházka Lukáš Ing.
2017-06-14 12:07   ` Procházka Lukáš Ing.
2017-06-14 12:21     ` Hans Hagen
2017-06-14 12:31       ` Procházka Lukáš Ing.
2017-06-14 17:19         ` Pablo Rodriguez
2017-06-16  6:27         ` Procházka Lukáš Ing.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).