* Problem with Lua processing UTF8 substrings
@ 2012-02-01 19:26 Jaroslav Hajtmar
2012-02-01 20:05 ` Philipp Gesang
0 siblings, 1 reply; 5+ messages in thread
From: Jaroslav Hajtmar @ 2012-02-01 19:26 UTC (permalink / raw)
To: mailing list for ConTeXt users
Hello ConTeXist,
I want to use Lua to write characters (substrings) from a string, but I
get an error message:
! String contains an invalid utf-8 sequence.
I tried various Lua functions for working with UTF8 strings for example:
string.subutf8(string, start[,end])
for i, char in str:nextutf8(orig_pos)
string.lenutf8(string),
but without success.
Can you please someone help?
Thanks
Jaroslav Hajtmar
Here is my minimal example:
\def\mymacro#1{\ctxlua{for i=1, string.len('#1') do
context(string.sub('#1',i,i)..", ") end}}
\starttext
%\mymacro{šěřěžřýčřčžáýčý} % Here is a problem
\mymacro{asdfghjklqwertt} % Here is all OK
\stoptext
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem with Lua processing UTF8 substrings
2012-02-01 19:26 Problem with Lua processing UTF8 substrings Jaroslav Hajtmar
@ 2012-02-01 20:05 ` Philipp Gesang
2012-02-01 20:17 ` Jaroslav Hajtmar
0 siblings, 1 reply; 5+ messages in thread
From: Philipp Gesang @ 2012-02-01 20:05 UTC (permalink / raw)
To: hajtmar, mailing list for ConTeXt users
[-- Attachment #1.1: Type: text/plain, Size: 2407 bytes --]
On 2012-02-01 20:26, Jaroslav Hajtmar wrote:
> I want to use Lua to write characters (substrings) from a string,
> but I get an error message:
>
> ! String contains an invalid utf-8 sequence.
>
> Can you please someone help?
Have you tried the unicode library? The standard string library
operates on bytes, therefore extracting a single byte yields an
incomplete multibyte char if the codepoint is beyond ascii.
·································································
\def\mymacro#1{%
\startluacode
local utf = unicode.utf8
local target = [==[\detokenize{#1}]==]
for i=1, utf.len(target) do
context(utf.sub(target,i,i)..", ")
end
\stopluacode%
}
%% alternatively, use utfcharacters
\define[1]\myothermacro{%
\startluacode
local result = { }
for i in string.utfcharacters[==[\detokenize{#1}]==] do
result[\letterhash result+1] = i
end
context(table.concat(result, ", "))
\stopluacode
}
\starttext
\mymacro{šěřěžřýčřčžáýčý}\par
\myothermacro{šěřěžřýčřčžáýčý}
\stoptext
·································································
(Lazy people would just do a “local string = unicode.utf8” at the
top of the file.)
Regards
Philipp
>
> Thanks
> Jaroslav Hajtmar
>
> Here is my minimal example:
>
> \def\mymacro#1{\ctxlua{for i=1, string.len('#1') do
> context(string.sub('#1',i,i)..", ") end}}
>
> \starttext
>
> %\mymacro{šěřěžřýčřčžáýčý} % Here is a problem
> \mymacro{asdfghjklqwertt} % Here is all OK
>
> \stoptext
>
>
>
>
>
>
>
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive : http://foundry.supelec.fr/projects/contextrev/
> wiki : http://contextgarden.net
> ___________________________________________________________________________________
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 485 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem with Lua processing UTF8 substrings
2012-02-01 20:05 ` Philipp Gesang
@ 2012-02-01 20:17 ` Jaroslav Hajtmar
2012-02-01 20:35 ` Philipp Gesang
0 siblings, 1 reply; 5+ messages in thread
From: Jaroslav Hajtmar @ 2012-02-01 20:17 UTC (permalink / raw)
To: mailing list for ConTeXt users
Hello Philipp.
Thanx very much for very quick and perfect help.
Is there any manual or source, where I can read these (and next and
similar) information?
One more thanx
Jaroslav Hajtmar
Dne 1.2.2012 21:05, Philipp Gesang napsal(a):
> \def\mymacro#1{%
> \startluacode
> local utf = unicode.utf8
> local target = [==[\detokenize{#1}]==]
> for i=1, utf.len(target) do
> context(utf.sub(target,i,i)..", ")
> end
> \stopluacode%
> }
>
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem with Lua processing UTF8 substrings
2012-02-01 20:17 ` Jaroslav Hajtmar
@ 2012-02-01 20:35 ` Philipp Gesang
2012-02-01 20:38 ` Jaroslav Hajtmar
0 siblings, 1 reply; 5+ messages in thread
From: Philipp Gesang @ 2012-02-01 20:35 UTC (permalink / raw)
To: hajtmar, mailing list for ConTeXt users
[-- Attachment #1.1: Type: text/plain, Size: 1749 bytes --]
On 2012-02-01 21:17, Jaroslav Hajtmar wrote:
> Hello Philipp.
> Thanx very much for very quick and perfect help.
> Is there any manual or source, where I can read these (and next and
> similar) information?
I’m sorry I have to disappoint you but the utf library is
documented only in the source.[1] Luckily it covers all the
functionality of the native string library, thus its usage should
be equivalent except that it works for utf sequences as well. If
you know some German there’s also a blog post by Patrick.[2]
The string.utfcharacters iterator is covered in luatexref-t.pdf.
Hope this helps
Philipp
[1] http://files.luaforge.net/releases/sln/slnunicode/1.1a
[2] http://www.luatex.de/2010/02/selene-unicode-bibliothek/?iframe=true
>
> One more thanx
> Jaroslav Hajtmar
>
>
>
> Dne 1.2.2012 21:05, Philipp Gesang napsal(a):
> >\def\mymacro#1{%
> > \startluacode
> > local utf = unicode.utf8
> > local target = [==[\detokenize{#1}]==]
> > for i=1, utf.len(target) do
> > context(utf.sub(target,i,i)..", ")
> > end
> > \stopluacode%
> >}
>
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
>
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive : http://foundry.supelec.fr/projects/contextrev/
> wiki : http://contextgarden.net
> ___________________________________________________________________________________
--
() ascii ribbon campaign - against html e-mail
/\ www.asciiribbon.org - against proprietary attachments
[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 485 bytes --]
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem with Lua processing UTF8 substrings
2012-02-01 20:35 ` Philipp Gesang
@ 2012-02-01 20:38 ` Jaroslav Hajtmar
0 siblings, 0 replies; 5+ messages in thread
From: Jaroslav Hajtmar @ 2012-02-01 20:38 UTC (permalink / raw)
To: mailing list for ConTeXt users
Thanx Philipp!
It just - I do not mind studying the code ...
Many thanx.
Jaroslav
Dne 1.2.2012 21:35, Philipp Gesang napsal(a):
> On 2012-02-01 21:17, Jaroslav Hajtmar wrote:
>
>> Hello Philipp.
>> Thanx very much for very quick and perfect help.
>> Is there any manual or source, where I can read these (and next and
>> similar) information?
>>
> I’m sorry I have to disappoint you but the utf library is
> documented only in the source.[1] Luckily it covers all the
> functionality of the native string library, thus its usage should
> be equivalent except that it works for utf sequences as well. If
> you know some German there’s also a blog post by Patrick.[2]
>
> The string.utfcharacters iterator is covered in luatexref-t.pdf.
>
> Hope this helps
> Philipp
>
> [1] http://files.luaforge.net/releases/sln/slnunicode/1.1a
> [2] http://www.luatex.de/2010/02/selene-unicode-bibliothek/?iframe=true
>
>
>
>> One more thanx
>> Jaroslav Hajtmar
>>
>>
>>
>> Dne 1.2.2012 21:05, Philipp Gesang napsal(a):
>>
>>> \def\mymacro#1{%
>>> \startluacode
>>> local utf = unicode.utf8
>>> local target = [==[\detokenize{#1}]==]
>>> for i=1, utf.len(target) do
>>> context(utf.sub(target,i,i)..", ")
>>> end
>>> \stopluacode%
>>> }
>>>
>> ___________________________________________________________________________________
>> If your question is of interest to others as well, please add an entry to the Wiki!
>>
>> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
>> webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
>> archive : http://foundry.supelec.fr/projects/contextrev/
>> wiki : http://contextgarden.net
>> ___________________________________________________________________________________
>>
>
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-02-01 20:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-01 19:26 Problem with Lua processing UTF8 substrings Jaroslav Hajtmar
2012-02-01 20:05 ` Philipp Gesang
2012-02-01 20:17 ` Jaroslav Hajtmar
2012-02-01 20:35 ` Philipp Gesang
2012-02-01 20:38 ` Jaroslav Hajtmar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).