* Unicode text conversion @ 2017-01-27 12:48 Procházka Lukáš Ing. 2017-01-27 13:08 ` Hans Hagen 0 siblings, 1 reply; 8+ messages in thread From: Procházka Lukáš Ing. @ 2017-01-27 12:48 UTC (permalink / raw) To: ConTeXt Hello, does ConTeXt contain a built-in Lua conversion routine to convert text in UTF-16 (and others) to UTF-8 string? Something like: ---- \startluacode local str = "A unicode string" -- Or e.g. string loaded from a file str = convert(str, "utf16", "utf8") context(str) \stopuacode ---- TIA. Best regards, Lukas -- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Tel: +420 241 096 751 (+420 720 951 172) Fax: +420 244 461 038 ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unicode text conversion 2017-01-27 12:48 Unicode text conversion Procházka Lukáš Ing. @ 2017-01-27 13:08 ` Hans Hagen 2017-01-27 15:16 ` Procházka Lukáš Ing. 2017-06-14 12:07 ` Procházka Lukáš Ing. 0 siblings, 2 replies; 8+ messages in thread From: Hans Hagen @ 2017-01-27 13:08 UTC (permalink / raw) To: ntg-context On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote: > Hello, > > does ConTeXt contain a built-in Lua conversion routine to convert text > in UTF-16 (and others) to UTF-8 string? > > Something like: > > ---- > \startluacode > local str = "A unicode string" -- Or e.g. string loaded from a file > > str = convert(str, "utf16", "utf8") > > context(str) > \stopuacode > ---- utf.utf16_to_utf8_le utf.utf16_to_utf8_be utf.utf32_to_utf8_le utf.utf32_to_utf8_be normally when files are in utf16 and have a bom they will be dealt with properly ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unicode text conversion 2017-01-27 13:08 ` Hans Hagen @ 2017-01-27 15:16 ` Procházka Lukáš Ing. 2017-06-14 12:07 ` Procházka Lukáš Ing. 1 sibling, 0 replies; 8+ messages in thread From: Procházka Lukáš Ing. @ 2017-01-27 15:16 UTC (permalink / raw) To: mailing list for ConTeXt users Hello Hans, that's it, thank you! Best regards, Lukas On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen <pragma@wxs.nl> wrote: > utf16_to_utf8_le -- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Tel: +420 241 096 751 (+420 720 951 172) Fax: +420 244 461 038 ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unicode text conversion 2017-01-27 13:08 ` Hans Hagen 2017-01-27 15:16 ` Procházka Lukáš Ing. @ 2017-06-14 12:07 ` Procházka Lukáš Ing. 2017-06-14 12:21 ` Hans Hagen 1 sibling, 1 reply; 8+ messages in thread From: Procházka Lukáš Ing. @ 2017-06-14 12:07 UTC (permalink / raw) To: mailing list for ConTeXt users Hello, is there also a way to convert CP1250 to UTF8 and vice versa? Best regards, Lukas On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen <pragma@wxs.nl> wrote: > On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote: >> Hello, >> >> does ConTeXt contain a built-in Lua conversion routine to convert text >> in UTF-16 (and others) to UTF-8 string? >> >> Something like: >> >> ---- >> \startluacode >> local str = "A unicode string" -- Or e.g. string loaded from a file >> >> str = convert(str, "utf16", "utf8") >> >> context(str) >> \stopuacode >> ---- > > utf.utf16_to_utf8_le > utf.utf16_to_utf8_be > utf.utf32_to_utf8_le > utf.utf32_to_utf8_be > > normally when files are in utf16 and have a bom they will be dealt with > properly > > ----------------------------------------------------------------- > Hans Hagen | PRAGMA ADE > Ridderstraat 27 | 8061 GH Hasselt | The Netherlands > tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl > ----------------------------------------------------------------- > ___________________________________________________________________________________ > If your question is of interest to others as well, please add an entry to the Wiki! > > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://context.aanhet.net > archive : https://bitbucket.org/phg/context-mirror/commits/ > wiki : http://contextgarden.net > ___________________________________________________________________________________ -- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Mob.: +420 702 033 396 Tel.: +420 241 096 751 ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unicode text conversion 2017-06-14 12:07 ` Procházka Lukáš Ing. @ 2017-06-14 12:21 ` Hans Hagen 2017-06-14 12:31 ` Procházka Lukáš Ing. 0 siblings, 1 reply; 8+ messages in thread From: Hans Hagen @ 2017-06-14 12:21 UTC (permalink / raw) To: ntg-context On 6/14/2017 2:07 PM, Procházka Lukáš Ing. wrote: > Hello, > > is there also a way to convert CP1250 to UTF8 and vice versa? regimes.toregime('8859-1',"abcde Ä","?") there's also fromregime > Best regards, > > Lukas > > > On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen <pragma@wxs.nl> wrote: > >> On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote: >>> Hello, >>> >>> does ConTeXt contain a built-in Lua conversion routine to convert text >>> in UTF-16 (and others) to UTF-8 string? >>> >>> Something like: >>> >>> ---- >>> \startluacode >>> local str = "A unicode string" -- Or e.g. string loaded from a file >>> >>> str = convert(str, "utf16", "utf8") >>> >>> context(str) >>> \stopuacode >>> ---- >> >> utf.utf16_to_utf8_le >> utf.utf16_to_utf8_be >> utf.utf32_to_utf8_le >> utf.utf32_to_utf8_be >> >> normally when files are in utf16 and have a bom they will be dealt with >> properly >> >> ----------------------------------------------------------------- >> Hans Hagen | PRAGMA ADE >> Ridderstraat 27 | 8061 GH Hasselt | The Netherlands >> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl >> ----------------------------------------------------------------- >> ___________________________________________________________________________________ >> >> If your question is of interest to others as well, please add an entry >> to the Wiki! >> >> maillist : ntg-context@ntg.nl / >> http://www.ntg.nl/mailman/listinfo/ntg-context >> webpage : http://www.pragma-ade.nl / http://context.aanhet.net >> archive : https://bitbucket.org/phg/context-mirror/commits/ >> wiki : http://contextgarden.net >> ___________________________________________________________________________________ >> > > -- ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unicode text conversion 2017-06-14 12:21 ` Hans Hagen @ 2017-06-14 12:31 ` Procházka Lukáš Ing. 2017-06-14 17:19 ` Pablo Rodriguez 2017-06-16 6:27 ` Procházka Lukáš Ing. 0 siblings, 2 replies; 8+ messages in thread From: Procházka Lukáš Ing. @ 2017-06-14 12:31 UTC (permalink / raw) To: mailing list for ConTeXt users OK, thank you; I deduce: regimes.toregime('8859-1',"abcde Ä","?") means actually: - "convert from the current regime" (be e.g. UTF8) - regimes.toregime(<regime-to-convert-to>, <string-to-convert>, <what-does-this-mean?>) Lukas On Wed, 14 Jun 2017 14:21:47 +0200, Hans Hagen <pragma@wxs.nl> wrote: > On 6/14/2017 2:07 PM, Procházka Lukáš Ing. wrote: >> Hello, >> >> is there also a way to convert CP1250 to UTF8 and vice versa? > > regimes.toregime('8859-1',"abcde Ä","?") > > there's also fromregime > >> Best regards, >> >> Lukas >> >> >> On Fri, 27 Jan 2017 14:08:42 +0100, Hans Hagen <pragma@wxs.nl> wrote: >> >>> On 1/27/2017 1:48 PM, Procházka Lukáš Ing. wrote: >>>> Hello, >>>> >>>> does ConTeXt contain a built-in Lua conversion routine to convert text >>>> in UTF-16 (and others) to UTF-8 string? >>>> >>>> Something like: >>>> >>>> ---- >>>> \startluacode >>>> local str = "A unicode string" -- Or e.g. string loaded from a file >>>> >>>> str = convert(str, "utf16", "utf8") >>>> >>>> context(str) >>>> \stopuacode >>>> ---- >>> >>> utf.utf16_to_utf8_le >>> utf.utf16_to_utf8_be >>> utf.utf32_to_utf8_le >>> utf.utf32_to_utf8_be >>> >>> normally when files are in utf16 and have a bom they will be dealt with >>> properly >>> >>> ----------------------------------------------------------------- >>> Hans Hagen | PRAGMA ADE >>> Ridderstraat 27 | 8061 GH Hasselt | The Netherlands >>> tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl >>> ----------------------------------------------------------------- >>> ___________________________________________________________________________________ >>> >>> If your question is of interest to others as well, please add an entry >>> to the Wiki! >>> >>> maillist : ntg-context@ntg.nl / >>> http://www.ntg.nl/mailman/listinfo/ntg-context >>> webpage : http://www.pragma-ade.nl / http://context.aanhet.net >>> archive : https://bitbucket.org/phg/context-mirror/commits/ >>> wiki : http://contextgarden.net >>> ___________________________________________________________________________________ >>> >> >> > > -- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Mob.: +420 702 033 396 Tel.: +420 241 096 751 ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unicode text conversion 2017-06-14 12:31 ` Procházka Lukáš Ing. @ 2017-06-14 17:19 ` Pablo Rodriguez 2017-06-16 6:27 ` Procházka Lukáš Ing. 1 sibling, 0 replies; 8+ messages in thread From: Pablo Rodriguez @ 2017-06-14 17:19 UTC (permalink / raw) To: mailing list for ConTeXt users On 06/14/2017 02:31 PM, Procházka Lukáš Ing. wrote: > OK, thank you; > I deduce: > > regimes.toregime('8859-1',"abcde Ä","?") > > means actually: > > - "convert from the current regime" (be e.g. UTF8) > - regimes.toregime(<regime-to-convert-to>, <string-to-convert>, <what-does-this-mean?>) Hi Lukas, -- Usage: -- regimes.toregime(<target-encoding>, <text>, <character-on-failure>)) From http://wiki.contextgarden.net/Encodings_and_Regimes. Just in case it helps, Pablo -- http://www.ousia.tk ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Unicode text conversion 2017-06-14 12:31 ` Procházka Lukáš Ing. 2017-06-14 17:19 ` Pablo Rodriguez @ 2017-06-16 6:27 ` Procházka Lukáš Ing. 1 sibling, 0 replies; 8+ messages in thread From: Procházka Lukáš Ing. @ 2017-06-16 6:27 UTC (permalink / raw) To: mailing list for ConTeXt users [-- Attachment #1: Type: text/plain, Size: 2237 bytes --] Hello, thank you for the answers. So - IMHO the following code should provide CP-1250 to UTF-8 conversion: ---- \enableregime[cp1250] \startluacode local cvt = function(fn) print(fn) local str = io.loaddata(fn) --print(str) str = regimes.toregime("utf", str, "?") io.savedata(fn .. "~", str) end -- cvt("_01-Identifikacni-Udaje.mkiv") \stopluacode ---- But ConTeXt (yesterday's beta) fails with: ---- lua error > lua error on line 17 in file X://Users/LPr/~/~Asci/Cvt2UTF8.mkiv: ...eta/tex/texmf-context/tex/context/base/mkiv/regi-ini.lua:127: bad argument #1 to 'for iterator' (table expected, got boolean) stack traceback: [C]: in function 'for iterator' ...eta/tex/texmf-context/tex/context/base/mkiv/regi-ini.lua:127: in function '__index' ...eta/tex/texmf-context/tex/context/base/mkiv/regi-ini.lua:182: in function 'toregime' [ctxlua]:7: in function 'cvt' [ctxlua]:14: in main chunk 7 local str = io.loaddata(fn) 8 9 str = regimes.toregime("utf", str, "?") 10 11 io.savedata(fn .. "~", str) 12 end 13 14 -- 15 16 cvt("_01-Identifikacni-Udaje.mkiv") 17 >> \stopluacode ---- What's wrong with my code? Any help would be appreciated. Best regards, Lukas On Wed, 14 Jun 2017 14:31:47 +0200, Procházka Lukáš Ing. <LPr@pontex.cz> wrote: > OK, thank you; > I deduce: > > regimes.toregime('8859-1',"abcde Ä","?") > > means actually: > > - "convert from the current regime" (be e.g. UTF8) > - regimes.toregime(<regime-to-convert-to>, <string-to-convert>, <what-does-this-mean?>) > > Lukas > > > On Wed, 14 Jun 2017 14:21:47 +0200, Hans Hagen <pragma@wxs.nl> wrote: > >> On 6/14/2017 2:07 PM, Procházka Lukáš Ing. wrote: >>> Hello, >>> >>> is there also a way to convert CP1250 to UTF8 and vice versa? >> >> regimes.toregime('8859-1',"abcde Ä","?") >> >> there's also fromregime >> -- Ing. Lukáš Procházka | mailto:LPr@pontex.cz Pontex s. r. o. | mailto:pontex@pontex.cz | http://www.pontex.cz | IDDS:nrpt3sn Bezová 1658 147 14 Praha 4 Mob.: +420 702 033 396 Tel.: +420 241 096 751 [-- Attachment #2: Cvt2UTF8.mkiv --] [-- Type: application/octet-stream, Size: 560 bytes --] \enableregime[cp1250] \startluacode local cvt = function(fn) print(fn) local str = io.loaddata(fn) --print(str) str = regimes.toregime("utf", str, "?") io.savedata(fn .. "~", str) end -- cvt("_01-Identifikacni-Udaje.mkiv") \stopluacode \startluacode do return end local pa = lfs.currentdir() for fn in lfs.dir(pa) do fn = pa .. "/" .. fn local atts = lfs.attributes(fn) if atts and atts.mode == "file" and fn:find("%.mkiv$") then cvt(fn) end end \stopluacode [-- Attachment #3: _01-Identifikacni-Udaje.mkiv --] [-- Type: application/octet-stream, Size: 6620 bytes --] [-- Attachment #4: Type: text/plain, Size: 492 bytes --] ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-06-16 6:27 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-27 12:48 Unicode text conversion Procházka Lukáš Ing. 2017-01-27 13:08 ` Hans Hagen 2017-01-27 15:16 ` Procházka Lukáš Ing. 2017-06-14 12:07 ` Procházka Lukáš Ing. 2017-06-14 12:21 ` Hans Hagen 2017-06-14 12:31 ` Procházka Lukáš Ing. 2017-06-14 17:19 ` Pablo Rodriguez 2017-06-16 6:27 ` Procházka Lukáš Ing.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).