ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Problem with Lua processing UTF8 substrings
@ 2012-02-01 19:26 Jaroslav Hajtmar
  2012-02-01 20:05 ` Philipp Gesang
  0 siblings, 1 reply; 5+ messages in thread
From: Jaroslav Hajtmar @ 2012-02-01 19:26 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hello ConTeXist,

I want to use Lua to write characters (substrings) from a string, but I 
get an error message:

! String contains an invalid utf-8 sequence.

I tried various Lua functions for working with UTF8 strings for example:
string.subutf8(string, start[,end])
for i, char in str:nextutf8(orig_pos)
string.lenutf8(string),

but without success.

Can you please someone help?

Thanks
Jaroslav Hajtmar

Here is my minimal example:

\def\mymacro#1{\ctxlua{for i=1, string.len('#1') do 
context(string.sub('#1',i,i)..", ") end}}

\starttext

%\mymacro{šěřěžřýčřčžáýčý} % Here is a problem
\mymacro{asdfghjklqwertt} % Here is all OK

\stoptext







___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem with Lua processing UTF8 substrings
  2012-02-01 19:26 Problem with Lua processing UTF8 substrings Jaroslav Hajtmar
@ 2012-02-01 20:05 ` Philipp Gesang
  2012-02-01 20:17   ` Jaroslav Hajtmar
  0 siblings, 1 reply; 5+ messages in thread
From: Philipp Gesang @ 2012-02-01 20:05 UTC (permalink / raw)
  To: hajtmar, mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 2407 bytes --]

On 2012-02-01 20:26, Jaroslav Hajtmar wrote:
> I want to use Lua to write characters (substrings) from a string,
> but I get an error message:
> 
> ! String contains an invalid utf-8 sequence.
> 
> Can you please someone help?

Have you tried the unicode library? The standard string library
operates on bytes, therefore extracting a single byte yields an
incomplete multibyte char if the codepoint is beyond ascii.

·································································

\def\mymacro#1{%
  \startluacode
    local utf = unicode.utf8
    local target = [==[\detokenize{#1}]==]
    for i=1, utf.len(target) do
      context(utf.sub(target,i,i)..", ")
    end
  \stopluacode%
}

%% alternatively, use utfcharacters
\define[1]\myothermacro{%
  \startluacode
    local result = { }
    for i in string.utfcharacters[==[\detokenize{#1}]==] do
      result[\letterhash result+1] = i
    end
    context(table.concat(result, ", "))
  \stopluacode
}

\starttext

\mymacro{šěřěžřýčřčžáýčý}\par
\myothermacro{šěřěžřýčřčžáýčý}

\stoptext

·································································

(Lazy people would just do a “local string = unicode.utf8” at the
top of the file.)

Regards
Philipp




> 
> Thanks
> Jaroslav Hajtmar
> 
> Here is my minimal example:
> 
> \def\mymacro#1{\ctxlua{for i=1, string.len('#1') do
> context(string.sub('#1',i,i)..", ") end}}
> 
> \starttext
> 
> %\mymacro{šěřěžřýčřčžáýčý} % Here is a problem
> \mymacro{asdfghjklqwertt} % Here is all OK
> 
> \stoptext
> 
> 
> 
> 
> 
> 
> 
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem with Lua processing UTF8 substrings
  2012-02-01 20:05 ` Philipp Gesang
@ 2012-02-01 20:17   ` Jaroslav Hajtmar
  2012-02-01 20:35     ` Philipp Gesang
  0 siblings, 1 reply; 5+ messages in thread
From: Jaroslav Hajtmar @ 2012-02-01 20:17 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hello Philipp.
Thanx very much for very quick and perfect help.
Is there any manual or source, where I can read these (and next and 
similar) information?

One more thanx
Jaroslav Hajtmar



Dne 1.2.2012 21:05, Philipp Gesang napsal(a):
> \def\mymacro#1{%
>    \startluacode
>      local utf = unicode.utf8
>      local target = [==[\detokenize{#1}]==]
>      for i=1, utf.len(target) do
>        context(utf.sub(target,i,i)..", ")
>      end
>    \stopluacode%
> }
>    

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem with Lua processing UTF8 substrings
  2012-02-01 20:17   ` Jaroslav Hajtmar
@ 2012-02-01 20:35     ` Philipp Gesang
  2012-02-01 20:38       ` Jaroslav Hajtmar
  0 siblings, 1 reply; 5+ messages in thread
From: Philipp Gesang @ 2012-02-01 20:35 UTC (permalink / raw)
  To: hajtmar, mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1749 bytes --]

On 2012-02-01 21:17, Jaroslav Hajtmar wrote:
> Hello Philipp.
> Thanx very much for very quick and perfect help.
> Is there any manual or source, where I can read these (and next and
> similar) information?

I’m sorry I have to disappoint you but the utf library is
documented only in the source.[1] Luckily it covers all the
functionality of the native string library, thus its usage should
be equivalent except that it works for utf sequences as well. If
you know some German there’s also a blog post by Patrick.[2]

The string.utfcharacters iterator is covered in luatexref-t.pdf.

Hope this helps
Philipp

[1] http://files.luaforge.net/releases/sln/slnunicode/1.1a
[2] http://www.luatex.de/2010/02/selene-unicode-bibliothek/?iframe=true


> 
> One more thanx
> Jaroslav Hajtmar
> 
> 
> 
> Dne 1.2.2012 21:05, Philipp Gesang napsal(a):
> >\def\mymacro#1{%
> >   \startluacode
> >     local utf = unicode.utf8
> >     local target = [==[\detokenize{#1}]==]
> >     for i=1, utf.len(target) do
> >       context(utf.sub(target,i,i)..", ")
> >     end
> >   \stopluacode%
> >}
> 
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problem with Lua processing UTF8 substrings
  2012-02-01 20:35     ` Philipp Gesang
@ 2012-02-01 20:38       ` Jaroslav Hajtmar
  0 siblings, 0 replies; 5+ messages in thread
From: Jaroslav Hajtmar @ 2012-02-01 20:38 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Thanx Philipp!
It just - I do not mind studying the code ...

Many thanx.
Jaroslav





Dne 1.2.2012 21:35, Philipp Gesang napsal(a):
> On 2012-02-01 21:17, Jaroslav Hajtmar wrote:
>    
>> Hello Philipp.
>> Thanx very much for very quick and perfect help.
>> Is there any manual or source, where I can read these (and next and
>> similar) information?
>>      
> I’m sorry I have to disappoint you but the utf library is
> documented only in the source.[1] Luckily it covers all the
> functionality of the native string library, thus its usage should
> be equivalent except that it works for utf sequences as well. If
> you know some German there’s also a blog post by Patrick.[2]
>
> The string.utfcharacters iterator is covered in luatexref-t.pdf.
>
> Hope this helps
> Philipp
>
> [1] http://files.luaforge.net/releases/sln/slnunicode/1.1a
> [2] http://www.luatex.de/2010/02/selene-unicode-bibliothek/?iframe=true
>
>
>    
>> One more thanx
>> Jaroslav Hajtmar
>>
>>
>>
>> Dne 1.2.2012 21:05, Philipp Gesang napsal(a):
>>      
>>> \def\mymacro#1{%
>>>    \startluacode
>>>      local utf = unicode.utf8
>>>      local target = [==[\detokenize{#1}]==]
>>>      for i=1, utf.len(target) do
>>>        context(utf.sub(target,i,i)..", ")
>>>      end
>>>    \stopluacode%
>>> }
>>>        
>> ___________________________________________________________________________________
>> If your question is of interest to others as well, please add an entry to the Wiki!
>>
>> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
>> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
>> archive  : http://foundry.supelec.fr/projects/contextrev/
>> wiki     : http://contextgarden.net
>> ___________________________________________________________________________________
>>      
>    

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-02-01 20:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-01 19:26 Problem with Lua processing UTF8 substrings Jaroslav Hajtmar
2012-02-01 20:05 ` Philipp Gesang
2012-02-01 20:17   ` Jaroslav Hajtmar
2012-02-01 20:35     ` Philipp Gesang
2012-02-01 20:38       ` Jaroslav Hajtmar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).