On 2012-02-10 12:11, Procházka Lukáš Ing. - Pontex s. r. o. wrote: > ... Well, my information was not correct. > > There are characters > 127 in the file, like "ř", "š"... > > Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly. So it wasn’t ASCII after all ;-) No problem, just use iconv: iconv -f CP1250 -t UTF8 infile > outfile I do this a lot with movie subtitles … Hth, Philipp PS: If you still insist on converting at the Lua end only then your starting point might be “regi-cp1250.lua” in the Context base/ dir. > > But I have problem loading them into ConTeXt. > > I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt. > > @Thomas: > > The table looks nice but there are no entries for CP 1250 to UTF conversion. > > I prepared some tables: character conversion and removal of diacritics (see the attachment); > maybe it would be handful to include them into ConTeXt somehow. > > Best regards, > > Lukas > > > On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang wrote: > > >On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote: > >>Hello, > >> > >>I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program. > >> > >>When I work with them in ConTeXt, I need to convert them to UTF. > > > >Not needed, as every ASCII string is a valid UTF8 string: > > “The UTF encoding has several good properties. By far the most > > important is that a byte in the ASCII range 0-127 represents > > itself in UTF. Thus UTF is backward compatible with ASCII.” > > http://doc.cat-v.org/plan_9/4th_edition/papers/utf > >You can use them in Luatex without further conversion. > > > >Regards > >Philipp > > > > > >> > >>Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion? > >> > >>Something like: > >> > >>\startluacode > >> local str = loadFile("a.txt") -- ASCII coded > >> > >> str = context.ACSII2UTF(str) -- Or something like this > >>\stopluacode > >> > >>Best regards, > >> > >>Lukas > >> > >> > >>-- > >>Ing. Lukáš Procházka [mailto:LPr@pontex.cz] > >>Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz] > >>Bezová 1658 > >>147 14 Praha 4 > >> > >>Tel: +420 244 062 238 > >>Fax: +420 244 461 038 > >> > >>___________________________________________________________________________________ > >>If your question is of interest to others as well, please add an entry to the Wiki! > >> > >>maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context > >>webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > >>archive : http://foundry.supelec.fr/projects/contextrev/ > >>wiki : http://contextgarden.net > >>___________________________________________________________________________________ > > > > > -- > Ing. Lukáš Procházka [mailto:LPr@pontex.cz] > Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz] > Bezová 1658 > 147 14 Praha 4 > > Tel: +420 244 062 238 > Fax: +420 244 461 038 > > ___________________________________________________________________________________ > If your question is of interest to others as well, please add an entry to the Wiki! > > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > ___________________________________________________________________________________ -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments