From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/74256 Path: news.gmane.org!not-for-mail From: Philipp Gesang Newsgroups: gmane.comp.tex.context Subject: Re: Problem with Lua processing UTF8 substrings Date: Wed, 1 Feb 2012 21:05:55 +0100 Message-ID: <20120201200555.GA1647@phlegethon> References: <4F2991C9.2090602@gyza.cz> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1270357245==" X-Trace: dough.gmane.org 1328126760 8817 80.91.229.3 (1 Feb 2012 20:06:00 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 1 Feb 2012 20:06:00 +0000 (UTC) To: hajtmar@gyza.cz, mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Wed Feb 01 21:05:58 2012 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RsgRK-0000nD-AS for gctc-ntg-context-518@m.gmane.org; Wed, 01 Feb 2012 21:05:58 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 88526CB29B; Wed, 1 Feb 2012 21:05:57 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id rS6DOezKXQH2; Wed, 1 Feb 2012 21:05:54 +0100 (CET) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 75AD2CB288; Wed, 1 Feb 2012 21:05:54 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id A6C57CB288 for ; Wed, 1 Feb 2012 21:05:52 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id OUg1lweriUwk for ; Wed, 1 Feb 2012 21:05:41 +0100 (CET) Original-Received: from filter3-ams.mf.surf.net (filter3-ams.mf.surf.net [192.87.102.71]) by balder.ntg.nl (Postfix) with ESMTP id A60B1CB285 for ; Wed, 1 Feb 2012 21:05:41 +0100 (CET) Original-Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by filter3-ams.mf.surf.net (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id q11K5eZK011294 for ; Wed, 1 Feb 2012 21:05:40 +0100 Original-Received: from ix.urz.uni-heidelberg.de (cyrus-portal.urz.uni-heidelberg.de [129.206.100.176]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id q11K5c8t013383 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Feb 2012 21:05:39 +0100 Original-Received: from extmail.urz.uni-heidelberg.de (extmail.urz.uni-heidelberg.de [129.206.100.140]) by ix.urz.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id q11K5awA018611 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Feb 2012 21:05:36 +0100 Original-Received: from localhost (dslb-188-110-101-210.pools.arcor-ip.net [188.110.101.210]) (authenticated bits=0) by extmail.urz.uni-heidelberg.de (8.13.4/8.13.1) with ESMTP id q11K5ZKK012958 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Wed, 1 Feb 2012 21:05:35 +0100 Mail-Followup-To: hajtmar@gyza.cz, mailing list for ConTeXt users In-Reply-To: <4F2991C9.2090602@gyza.cz> X-Operating-System: Linux phlegethon 3.2.2-1-ARCH X-Polite-Request: "Please try to be nice, don't send html mail." User-Agent: Mutt/1.5.21 (2010-09-15) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=129.206.210.211; country=DE; region=01; city=Heidelberg; latitude=49.4167; longitude=8.7000; http://maps.google.com/maps?q=49.4167,8.7000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0cGsI5Eyo - df4d9ab041f5 - 20120201 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) on 192.87.102.71 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:74256 Archived-At: --===============1270357245== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ReaqsoxgOBHFXBhH" Content-Disposition: inline --ReaqsoxgOBHFXBhH Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-02-01 20:26, Jaroslav Hajtmar wrote: > I want to use Lua to write characters (substrings) from a string, > but I get an error message: >=20 > ! String contains an invalid utf-8 sequence. >=20 > Can you please someone help? Have you tried the unicode library? The standard string library operates on bytes, therefore extracting a single byte yields an incomplete multibyte char if the codepoint is beyond ascii. =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7 \def\mymacro#1{% \startluacode local utf =3D unicode.utf8 local target =3D [=3D=3D[\detokenize{#1}]=3D=3D] for i=3D1, utf.len(target) do context(utf.sub(target,i,i)..", ") end \stopluacode% } %% alternatively, use utfcharacters \define[1]\myothermacro{% \startluacode local result =3D { } for i in string.utfcharacters[=3D=3D[\detokenize{#1}]=3D=3D] do result[\letterhash result+1] =3D i end context(table.concat(result, ", ")) \stopluacode } \starttext \mymacro{=C5=A1=C4=9B=C5=99=C4=9B=C5=BE=C5=99=C3=BD=C4=8D=C5=99=C4=8D=C5=BE= =C3=A1=C3=BD=C4=8D=C3=BD}\par \myothermacro{=C5=A1=C4=9B=C5=99=C4=9B=C5=BE=C5=99=C3=BD=C4=8D=C5=99=C4=8D= =C5=BE=C3=A1=C3=BD=C4=8D=C3=BD} \stoptext =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7= =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2= =B7=C2=B7=C2=B7 (Lazy people would just do a =E2=80=9Clocal string =3D unicode.utf8=E2=80= =9D at the top of the file.) Regards Philipp >=20 > Thanks > Jaroslav Hajtmar >=20 > Here is my minimal example: >=20 > \def\mymacro#1{\ctxlua{for i=3D1, string.len('#1') do > context(string.sub('#1',i,i)..", ") end}} >=20 > \starttext >=20 > %\mymacro{=C5=A1=C4=9B=C5=99=C4=9B=C5=BE=C5=99=C3=BD=C4=8D=C5=99=C4=8D=C5= =BE=C3=A1=C3=BD=C4=8D=C3=BD} % Here is a problem > \mymacro{asdfghjklqwertt} % Here is all OK >=20 > \stoptext >=20 >=20 >=20 >=20 >=20 >=20 >=20 > _________________________________________________________________________= __________ > If your question is of interest to others as well, please add an entry to= the Wiki! >=20 > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-co= ntext > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > _________________________________________________________________________= __________ --=20 () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments --ReaqsoxgOBHFXBhH Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8pmyMACgkQ02lYlJYWs9IzPQCfeyGdDQolmeeX/u4+DJfnmp7j axUAnAnqLjRLvBmzOdZ6x2G1C399aXwk =K0gK -----END PGP SIGNATURE----- --ReaqsoxgOBHFXBhH-- --===============1270357245== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============1270357245==--