From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/74411 Path: news.gmane.org!not-for-mail From: =?iso-8859-2?B?UHJvY2jhemthIEx1a+G5IEluZy4gLSBQb250ZXggcy4gci4gby4=?= Newsgroups: gmane.comp.tex.context Subject: Re: UTF conversion via Lua (now with attachment) Date: Fri, 10 Feb 2012 12:13:38 +0100 Message-ID: References: <20120210105732.GA30993@phlegethon> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=----------c3I0V4XaJU3iwW6klYcXaF X-Trace: dough.gmane.org 1328872432 7538 80.91.229.3 (10 Feb 2012 11:13:52 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 10 Feb 2012 11:13:52 +0000 (UTC) To: "mailing list for ConTeXt users" Original-X-From: ntg-context-bounces@ntg.nl Fri Feb 10 12:13:52 2012 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RvoQH-0003PH-Of for gctc-ntg-context-518@m.gmane.org; Fri, 10 Feb 2012 12:13:49 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 2C200CB207; Fri, 10 Feb 2012 12:13:49 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 6XG9w31jeQ5L; Fri, 10 Feb 2012 12:13:46 +0100 (CET) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 62BB3CB1ED; Fri, 10 Feb 2012 12:13:46 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 797E1CB1ED for ; Fri, 10 Feb 2012 12:13:45 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id VzI+M2MkC6Zv for ; Fri, 10 Feb 2012 12:13:43 +0100 (CET) Original-Received: from filter2-ams.mf.surf.net (filter2-ams.mf.surf.net [192.87.102.70]) by balder.ntg.nl (Postfix) with ESMTP id 49ACBCAB03 for ; Fri, 10 Feb 2012 12:13:43 +0100 (CET) Original-Received: from mail.pontex.cz (mail.pontex.cz [89.233.168.98]) by filter2-ams.mf.surf.net (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id q1ABDfka002963 for ; Fri, 10 Feb 2012 12:13:42 +0100 Original-Received: from localhost ([127.0.0.1]) by mail.pontex.cz (Kerio MailServer 6.6.2) for ntg-context@ntg.nl; Fri, 10 Feb 2012 12:13:39 +0100 In-Reply-To: <20120210105732.GA30993@phlegethon> User-Agent: Opera Mail/11.61 (Win32) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=89.233.168.98; country=CZ; region=52; city=Prague; latitude=50.0833; longitude=14.4667; http://maps.google.com/maps?q=50.0833,14.4667&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0rGwbdGfc - c6acf8c65b24 - 20120210 X-Scanned-By: CanIt (www . roaringpenguin . com) on 192.87.102.70 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:74411 Archived-At: ------------c3I0V4XaJU3iwW6klYcXaF Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes Content-Transfer-Encoding: Quoted-Printable ... Well, my information was not correct. There are characters > 127 in the file, like "=C5=99", "=C5=A1"... Each char =3D 1 byte, and as I'm using Windows with CP 1250, the charact= ers are displayed correctly. But I have problem loading them into ConTeXt. I need to convert the bytes > 127 to UTF sequence, which would be accept= able by ConTeXt. @Thomas: The table looks nice but there are no entries for CP 1250 to UTF convers= ion. I prepared some tables: character conversion and removal of diacritics (= see the attachment); maybe it would be handful to include them into ConTeXt somehow. Best regards, Lukas On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang wrote: > On 2012-02-10 11:22, Proch=C3=A1zka Luk=C3=A1=C5=A1 Ing. - Pontex s. r= . o. wrote: >> Hello, >> >> I have many files with ASCII encoding; this encoding must be kept as = these files are processed also by another program. >> >> When I work with them in ConTeXt, I need to convert them to UTF. > > Not needed, as every ASCII string is a valid UTF8 string: > =E2=80=9CThe UTF encoding has several good properties. By far the m= ost > important is that a byte in the ASCII range 0-127 represents > itself in UTF. Thus UTF is backward compatible with ASCII.=E2=80=9D= > http://doc.cat-v.org/plan_9/4th_edition/papers/utf > You can use them in Luatex without further conversion. > > Regards > Philipp > > >> >> Does Lua (in ConTeXt scope) offer a transformation function or a tabl= e of chars [ASCII-code] -> [UTF-code] or anything to provide the convers= ion? >> >> Something like: >> >> \startluacode >> local str =3D loadFile("a.txt") -- ASCII coded >> >> str =3D context.ACSII2UTF(str) -- Or something like this >> \stopluacode >> >> Best regards, >> >> Lukas >> >> >> -- >> Ing. Luk=C3=A1=C5=A1 Proch=C3=A1zka [mailto:LPr@pontex.cz] >> Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz]= >> Bezov=C3=A1 1658 >> 147 14 Praha 4 >> >> Tel: +420 244 062 238 >> Fax: +420 244 461 038 >> >> _____________________________________________________________________= ______________ >> If your question is of interest to others as well, please add an entr= y to the Wiki! >> >> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/nt= g-context >> webpage : http://www.pragma-ade.nl / http://tex.aanhet.net >> archive : http://foundry.supelec.fr/projects/contextrev/ >> wiki : http://contextgarden.net >> _____________________________________________________________________= ______________ > -- = Ing. Luk=C3=A1=C5=A1 Proch=C3=A1zka [mailto:LPr@pontex.cz] Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz] Bezov=C3=A1 1658 147 14 Praha 4 Tel: +420 244 062 238 Fax: +420 244 461 038 ------------c3I0V4XaJU3iwW6klYcXaF Content-Disposition: attachment; filename=Cz2UTF.lua Content-Type: application/octet-stream; name="Cz2UTF.lua" Content-Transfer-Encoding: Base64 LS0gUmVjb2RlLmx1YSwgKEMpIEx1a2FzIFByb2NoYXprYSwgMi8yMDEyDQoNCkN6 MkN6ID0NCnsgWyLhIl0gPSAiYSIsDQogIFsi6CJdID0gImMiLA0KICBbIu8iXSA9 ICJkIiwNCiAgWyLpIl0gPSAiZSIsDQogIFsi7CJdID0gImUiLA0KICBbIu0iXSA9 ICJpIiwNCiAgWyLyIl0gPSAibiIsDQogIFsi8yJdID0gIm8iLA0KICBbIvgiXSA9 ICJyIiwNCiAgWyKaIl0gPSAicyIsDQogIFsinSJdID0gInQiLA0KICBbIvoiXSA9 ICJ1IiwNCiAgWyL5Il0gPSAidSIsDQogIFsi/SJdID0gInkiLA0KICBbIp4iXSA9 ICJ6IiwNCg0KICBbIsEiXSA9ICJBIiwNCiAgWyLIIl0gPSAiQyIsDQogIFsizyJd ID0gIkQiLA0KICBbIskiXSA9ICJFIiwNCiAgWyLMIl0gPSAiRSIsDQogIFsizSJd ID0gIkkiLA0KICBbItIiXSA9ICJOIiwNCiAgWyLTIl0gPSAiTyIsDQogIFsi2CJd ID0gIlIiLA0KICBbIooiXSA9ICJTIiwNCiAgWyKNIl0gPSAiVCIsDQogIFsi2iJd ID0gIlUiLA0KICBbItkiXSA9ICJVIiwNCiAgWyLdIl0gPSAiWSIsDQogIFsijiJd ID0gIloiLA0KfQ0KDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tDQoNCkN6MlVURjggPQ0KeyBbIuEiXSA9ICLDoSIs DQogIFsi6CJdID0gIsSNIiwNCiAgWyLvIl0gPSAixI8iLA0KICBbIukiXSA9ICLD qSIsDQogIFsi7CJdID0gIsSbIiwNCiAgWyLtIl0gPSAiw60iLA0KICBbIvIiXSA9 ICLFiCIsDQogIFsi8yJdID0gIsOzIiwNCiAgWyL4Il0gPSAixZkiLA0KICBbIpoi XSA9ICLFoSIsDQogIFsinSJdID0gIsWlIiwNCiAgWyL6Il0gPSAiw7oiLA0KICBb IvkiXSA9ICLFryIsDQogIFsi/SJdID0gIsO9IiwNCiAgWyKeIl0gPSAixb4iLA0K DQogIFsiwSJdID0gIsOBIiwNCiAgWyLIIl0gPSAixIwiLA0KICBbIs8iXSA9ICLE jiIsDQogIFsiySJdID0gIsOJIiwNCiAgWyLMIl0gPSAixJoiLA0KICBbIs0iXSA9 ICLDjSIsDQogIFsi0iJdID0gIsWHIiwNCiAgWyLTIl0gPSAiw5MiLA0KICBbItgi XSA9ICLFmCIsDQogIFsiiiJdID0gIsWgIiwNCiAgWyKNIl0gPSAixaQiLA0KICBb ItoiXSA9ICLDmiIsDQogIFsi2SJdID0gIsWuIiwNCiAgWyLdIl0gPSAiw50iLA0K ICBbIo4iXSA9ICLFvSIsDQp9DQoNCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCg0KVVRGODJDeiA9DQp7IFsiw6Ei XSA9ICLhIiwNCiAgWyLEjSJdID0gIugiLA0KICBbIsSPIl0gPSAi7yIsDQogIFsi w6kiXSA9ICLpIiwNCiAgWyLEmyJdID0gIuwiLA0KICBbIsOtIl0gPSAi7SIsDQog IFsixYgiXSA9ICLyIiwNCiAgWyLDsyJdID0gIvMiLA0KICBbIsWZIl0gPSAi+CIs DQogIFsixaEiXSA9ICKaIiwNCiAgWyLFpSJdID0gIp0iLA0KICBbIsO6Il0gPSAi +iIsDQogIFsixa8iXSA9ICL5IiwNCiAgWyLDvSJdID0gIv0iLA0KICBbIsW+Il0g PSAiniIsDQoNCiAgWyLDgSJdID0gIsEiLA0KICBbIsSMIl0gPSAiyCIsDQogIFsi xI4iXSA9ICLPIiwNCiAgWyLDiSJdID0gIskiLA0KICBbIsSaIl0gPSAizCIsDQog IFsiw40iXSA9ICLNIiwNCiAgWyLFhyJdID0gItIiLA0KICBbIsOTIl0gPSAi0yIs DQogIFsixZgiXSA9ICLYIiwNCiAgWyLFoCJdID0gIooiLA0KICBbIsWkIl0gPSAi jSIsDQogIFsiw5oiXSA9ICLaIiwNCiAgWyLFriJdID0gItkiLA0KICBbIsOdIl0g PSAi3SIsDQogIFsixb0iXSA9ICKOIiwNCn0NCg== ------------c3I0V4XaJU3iwW6klYcXaF Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ ------------c3I0V4XaJU3iwW6klYcXaF--