From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/74415 Path: news.gmane.org!not-for-mail From: Philipp Gesang Newsgroups: gmane.comp.tex.context Subject: Re: UTF conversion via Lua Date: Fri, 10 Feb 2012 12:30:23 +0100 Message-ID: <20120210113023.GB30993@phlegethon> References: <20120210105732.GA30993@phlegethon> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1667318415==" X-Trace: dough.gmane.org 1328873418 14894 80.91.229.3 (10 Feb 2012 11:30:18 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 10 Feb 2012 11:30:18 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Fri Feb 10 12:30:18 2012 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RvogD-00058A-30 for gctc-ntg-context-518@m.gmane.org; Fri, 10 Feb 2012 12:30:17 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 4FAE0CB212; Fri, 10 Feb 2012 12:30:16 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id yIl0WQ0B3N0K; Fri, 10 Feb 2012 12:30:07 +0100 (CET) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id DA0C1CB1ED; Fri, 10 Feb 2012 12:30:07 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id BC404CB1ED for ; Fri, 10 Feb 2012 12:30:05 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id QC8wlJgaEU5i for ; Fri, 10 Feb 2012 12:29:59 +0100 (CET) Original-Received: from filter5-ams.mf.surf.net (filter5-ams.mf.surf.net [192.87.102.73]) by balder.ntg.nl (Postfix) with ESMTP id B9561CAB03 for ; Fri, 10 Feb 2012 12:29:59 +0100 (CET) Original-Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by filter5-ams.mf.surf.net (8.14.3/8.14.3/Debian-9.4) with ESMTP id q1ABW2aq015713 for ; Fri, 10 Feb 2012 12:32:02 +0100 Original-Received: from ix.urz.uni-heidelberg.de (cyrus-portal.urz.uni-heidelberg.de [129.206.100.176]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id q1ABTwmG007394 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 10 Feb 2012 12:29:58 +0100 Original-Received: from extmail.urz.uni-heidelberg.de (extmail.urz.uni-heidelberg.de [129.206.100.140]) by ix.urz.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id q1ABTvlj008135 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 10 Feb 2012 12:29:58 +0100 Original-Received: from localhost (dslb-188-104-154-238.pools.arcor-ip.net [188.104.154.238]) (authenticated bits=0) by extmail.urz.uni-heidelberg.de (8.13.4/8.13.1) with ESMTP id q1ABTugA015996 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Fri, 10 Feb 2012 12:29:57 +0100 Mail-Followup-To: mailing list for ConTeXt users In-Reply-To: X-Operating-System: Linux phlegethon 3.2.5-1-ARCH X-Polite-Request: "Please try to be nice, don't send html mail." User-Agent: Mutt/1.5.21 (2010-09-15) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=129.206.210.211; country=DE; region=01; city=Heidelberg; latitude=49.4167; longitude=8.7000; http://maps.google.com/maps?q=49.4167,8.7000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0IGwbw2Dp - f563deda98ed - 20120210 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) on 192.87.102.73 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:74415 Archived-At: --===============1667318415== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="QTprm0S8XgL7H0Dt" Content-Disposition: inline --QTprm0S8XgL7H0Dt Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-02-10 12:11, Proch=C3=A1zka Luk=C3=A1=C5=A1 Ing. - Pontex s. r. o. = wrote: > ... Well, my information was not correct. >=20 > There are characters > 127 in the file, like "=C5=99", "=C5=A1"... >=20 > Each char =3D 1 byte, and as I'm using Windows with CP 1250, the characte= rs are displayed correctly. So it wasn=E2=80=99t ASCII after all ;-) No problem, just use iconv: iconv -f CP1250 -t UTF8 infile > outfile I do this a lot with movie subtitles =E2=80=A6 Hth, Philipp PS: If you still insist on converting at the Lua end only then your starting point might be =E2=80=9Cregi-cp1250.lua=E2=80=9D in the Context base/ dir. >=20 > But I have problem loading them into ConTeXt. >=20 > I need to convert the bytes > 127 to UTF sequence, which would be accepta= ble by ConTeXt. >=20 > @Thomas: >=20 > The table looks nice but there are no entries for CP 1250 to UTF conversi= on. >=20 > I prepared some tables: character conversion and removal of diacritics (s= ee the attachment); > maybe it would be handful to include them into ConTeXt somehow. >=20 > Best regards, >=20 > Lukas >=20 >=20 > On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang wrote: >=20 > >On 2012-02-10 11:22, Proch=C3=A1zka Luk=C3=A1=C5=A1 Ing. - Pontex s. r. = o. wrote: > >>Hello, > >> > >>I have many files with ASCII encoding; this encoding must be kept as th= ese files are processed also by another program. > >> > >>When I work with them in ConTeXt, I need to convert them to UTF. > > > >Not needed, as every ASCII string is a valid UTF8 string: > > =E2=80=9CThe UTF encoding has several good properties. By far the most > > important is that a byte in the ASCII range 0-127 represents > > itself in UTF. Thus UTF is backward compatible with ASCII.=E2=80=9D > > http://doc.cat-v.org/plan_9/4th_edition/papers/utf > >You can use them in Luatex without further conversion. > > > >Regards > >Philipp > > > > > >> > >>Does Lua (in ConTeXt scope) offer a transformation function or a table = of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion? > >> > >>Something like: > >> > >>\startluacode > >> local str =3D loadFile("a.txt") -- ASCII coded > >> > >> str =3D context.ACSII2UTF(str) -- Or something like this > >>\stopluacode > >> > >>Best regards, > >> > >>Lukas > >> > >> > >>-- > >>Ing. Luk=C3=A1=C5=A1 Proch=C3=A1zka [mailto:LPr@pontex.cz] > >>Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz] > >>Bezov=C3=A1 1658 > >>147 14 Praha 4 > >> > >>Tel: +420 244 062 238 > >>Fax: +420 244 461 038 > >> > >>_______________________________________________________________________= ____________ > >>If your question is of interest to others as well, please add an entry = to the Wiki! > >> > >>maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-= context > >>webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > >>archive : http://foundry.supelec.fr/projects/contextrev/ > >>wiki : http://contextgarden.net > >>_______________________________________________________________________= ____________ > > >=20 >=20 > --=20 > Ing. Luk=C3=A1=C5=A1 Proch=C3=A1zka [mailto:LPr@pontex.cz] > Pontex s. r. o. [mailto:pontex@pontex.cz] [http://www.pontex.cz] > Bezov=C3=A1 1658 > 147 14 Praha 4 >=20 > Tel: +420 244 062 238 > Fax: +420 244 461 038 >=20 > _________________________________________________________________________= __________ > If your question is of interest to others as well, please add an entry to= the Wiki! >=20 > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-co= ntext > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > _________________________________________________________________________= __________ --=20 () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments --QTprm0S8XgL7H0Dt Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAk80/88ACgkQ02lYlJYWs9IXKQCfX4nf13fxdhhPcBgC+JxYeeg/ oEMAnR0FWs0BbCrQk5EpqOznw6K54lUR =9qed -----END PGP SIGNATURE----- --QTprm0S8XgL7H0Dt-- --===============1667318415== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============1667318415==--