ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* lua questions
@ 2009-01-22 21:24 Thomas A. Schmitz
  2009-01-23 11:13 ` Hans Hagen
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas A. Schmitz @ 2009-01-22 21:24 UTC (permalink / raw)
  To: mailing ConTeXt users list for

Hi all,

this is a bit OT and should probably go to a lua list, but since some  
people here are very proficient in lua and I feel less embarrassed  
about noob questions here... I have a half-functioning python script  
to convert entries from a classics database into the bibtex format. I  
want to rewrite it in lua and make it more functional. Three little  
problems/questions:

1. I found a script to convert Roman numerals via lpeg here: http://lua-users.org/wiki/LpegRecipes 
  but it uses the syntax lpeg.Ca which my lpeg doesn't recognize and  
which I can't find in the lpeg manual. According to a talk by Roberto  
Ierusalimschy, "lpeg.Ca(patt) - "accumulates" the nested captures." (http://www.inf.puc-rio.br/~roberto/lpeg/slides-lpeg-workshop2008.pdf 
) Is this obsolete, has it been replaced by anything?

2. How can I check if a string begins with a class of words "(Der |Die  
|Das |The |An )" etc. and strip these words from the string? I do it  
with a compiled regexp in python, but "Programming in lua" has this to  
say: "Unlike some other systems, in Lua a modifier can only be applied  
to a character class; there is no way to group patterns under a  
modifier. For instance, there is no pattern that matches an optional  
word (unless the word has only one letter). Usually you can circumvent  
this limitation using some of the advanced techniques that we will see  
later." I haven't found these techniques yet.

3. How can I compare strings with utf8 characters? My naive approach
    if string.find(record, "Résumé")
doesn't appear to work (while the same method does work if the string  
has only ASCII characters).

Sorry if this is OT, and I'll be grateful for any pointers.

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lua questions
  2009-01-22 21:24 lua questions Thomas A. Schmitz
@ 2009-01-23 11:13 ` Hans Hagen
  2009-01-23 13:04   ` Thomas A. Schmitz
  2009-01-29 12:35   ` Thomas A. Schmitz
  0 siblings, 2 replies; 5+ messages in thread
From: Hans Hagen @ 2009-01-23 11:13 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Thomas A. Schmitz wrote:
> Hi all,
> 
> this is a bit OT and should probably go to a lua list, but since some 
> people here are very proficient in lua and I feel less embarrassed about 
> noob questions here... I have a half-functioning python script to 
> convert entries from a classics database into the bibtex format. I want 
> to rewrite it in lua and make it more functional. Three little 
> problems/questions:
> 
> 1. I found a script to convert Roman numerals via lpeg here: 
> http://lua-users.org/wiki/LpegRecipes but it uses the syntax lpeg.Ca 
> which my lpeg doesn't recognize and which I can't find in the lpeg 
> manual. According to a talk by Roberto Ierusalimschy, "lpeg.Ca(patt) - 
> "accumulates" the nested captures." 
> (http://www.inf.puc-rio.br/~roberto/lpeg/slides-lpeg-workshop2008.pdf) 
> Is this obsolete, has it been replaced by anything?

here is a variant that implements a function (and does not use the env 
trick)

do
     local add = function (x,y) return x+y end
     local P,Ca,Cc= lpeg.P,lpeg.Ca,lpeg.Cc
     local symbols = { 
I=1,V=5,X=10,L=50,C=100,D=500,M=1000,IV=4,IX=9,XL=40,CD=400,CM=900}
     local adders = { }
     for s,n in pairs(symbols) do adders[s] = P(s)*Cc(n)/add end
     local MS = adders.M^0
     local CS = 
(adders.D*adders.C^(-4)+adders.CD+adders.CM+adders.C^(-4))^(-1)
     local XS = (adders.L*adders.X^(-4)+adders.XL+adders.X^(-4))^(-1)
     local IS = 
(adders.V*adders.I^(-4)+adders.IX+adders.IV+adders.I^(-4))^(-1)
     local p = Ca(Cc(0)*MS*CS*XS*IS)
     function string:romantonumber()
         return p:match(self:upper())
     end
end

print(string.romantonumber("MMIX"))
print(string.romantonumber("MMIIIX"))


just run such script using

mtxrun --script yourscript.lua

as luatex (texlua) has the latest lpeg built in)


> 2. How can I check if a string begins with a class of words "(Der |Die 
> |Das |The |An )" etc. and strip these words from the string? I do it 
> with a compiled regexp in python, but "Programming in lua" has this to 
> say: "Unlike some other systems, in Lua a modifier can only be applied 
> to a character class; there is no way to group patterns under a 
> modifier. For instance, there is no pattern that matches an optional 
> word (unless the word has only one letter). Usually you can circumvent 
> this limitation using some of the advanced techniques that we will see 
> later." I haven't found these techniques yet.

local stripped = {
     "Der", "Die", "Das"
}

local p = lpeg.P(false)

for k, v in ipairs(stripped) do
     p = p + lpeg.P(v)
end

local w = p * " "

local stripper = lpeg.Cs(((w/"") + lpeg.C(1))^0)

lpeg.print(stripper)

str = "Germans somehow always talk about Der Thomas and Der Hans"

print(stripper:match(str))


> 3. How can I compare strings with utf8 characters? My naive approach
>    if string.find(record, "Résumé")
> doesn't appear to work (while the same method does work if the string 
> has only ASCII characters).

since lua is 8 bit clean utf should just work

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lua questions
  2009-01-23 11:13 ` Hans Hagen
@ 2009-01-23 13:04   ` Thomas A. Schmitz
  2009-01-29 12:35   ` Thomas A. Schmitz
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas A. Schmitz @ 2009-01-23 13:04 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Jan 23, 2009, at 12:13 PM, Hans Hagen wrote:

>
> here is a variant that implements a function (and does not use the  
> env trick)
>
> do
>    local add = function (x,y) return x+y end
>    local P,Ca,Cc= lpeg.P,lpeg.Ca,lpeg.Cc
>    local symbols =  
> { I=1,V=5,X=10,L=50,C=100,D=500,M=1000,IV=4,IX=9,XL=40,CD=400,CM=900}
>    local adders = { }
>    for s,n in pairs(symbols) do adders[s] = P(s)*Cc(n)/add end
>    local MS = adders.M^0
>    local CS = (adders.D*adders.C^(-4)+adders.CD+adders.CM 
> +adders.C^(-4))^(-1)
>    local XS = (adders.L*adders.X^(-4)+adders.XL+adders.X^(-4))^(-1)
>    local IS = (adders.V*adders.I^(-4)+adders.IX+adders.IV 
> +adders.I^(-4))^(-1)
>    local p = Ca(Cc(0)*MS*CS*XS*IS)
>    function string:romantonumber()
>        return p:match(self:upper())
>    end
> end
>
> print(string.romantonumber("MMIX"))
> print(string.romantonumber("MMIIIX"))
>
>
> just run such script using
>
> mtxrun --script yourscript.lua
>
> as luatex (texlua) has the latest lpeg built in)
>
Brilliant! This one does work when I use it with luatex (not with my  
system lua though, even though I have the latest released version of  
lpeg 0.9 installed. Bizarre...

>
>> 2. How can I check if a string begins with a class of words "(Der | 
>> Die |Das |The |An )" etc. and strip these words from the string? I  
>> do it with a compiled regexp in python, but "Programming in lua"  
>> has this to say: "Unlike some other systems, in Lua a modifier can  
>> only be applied to a character class; there is no way to group  
>> patterns under a modifier. For instance, there is no pattern that  
>> matches an optional word (unless the word has only one letter).  
>> Usually you can circumvent this limitation using some of the  
>> advanced techniques that we will see later." I haven't found these  
>> techniques yet.
>
> local stripped = {
>    "Der", "Die", "Das"
> }
>
> local p = lpeg.P(false)
>
> for k, v in ipairs(stripped) do
>    p = p + lpeg.P(v)
> end
>
> local w = p * " "
>
> local stripper = lpeg.Cs(((w/"") + lpeg.C(1))^0)
>
> lpeg.print(stripper)
>
> str = "Germans somehow always talk about Der Thomas and Der Hans"
>
> print(stripper:match(str))
>

Brilliant again! I can run with that, looks great! And who doesn't  
want a "local stripper" in his code?

>
>> 3. How can I compare strings with utf8 characters? My naive approach
>>   if string.find(record, "Résumé")
>> doesn't appear to work (while the same method does work if the  
>> string has only ASCII characters).
>
> since lua is 8 bit clean utf should just work

OK, then the problem must be somewhere else. I'll investigate.

Thanks a lot, and best wishes

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lua questions
  2009-01-23 11:13 ` Hans Hagen
  2009-01-23 13:04   ` Thomas A. Schmitz
@ 2009-01-29 12:35   ` Thomas A. Schmitz
  2009-01-29 12:42     ` Taco Hoekwater
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas A. Schmitz @ 2009-01-29 12:35 UTC (permalink / raw)
  To: mailing list for ConTeXt users


On Jan 23, 2009, at 12:13 PM, Hans Hagen wrote:

> Thomas A. Schmitz wrote:
>> it uses the syntax lpeg.Ca which my lpeg doesn't recognize and  
>> which I can't find in the lpeg manual.

[useful information snipped]

>
> just run such script using
>
> mtxrun --script yourscript.lua
>
> as luatex (texlua) has the latest lpeg built in)
>

Just one remark: my lpeg is

/*
** $Id: lpeg.c,v 1.98 2008/10/11 20:20:43 roberto Exp $

and doesn't have the lpeg.Ca pattern. The lpeg that comes with luatex is

/*
** $Id: lpeg.c,v 1.86 2008/03/07 17:20:19 roberto Exp $

so it's older, and it does have the lpeg.Ca pattern accumulator.

And can I ask one more question about lpeg? Suppose I have the string

"{\em This string is \quotation{heavily} emphasized.}"

and want to transform that into something like

"\color[red]{This string is \quotation{heavily} emphasized.}"

How would I go about this using lpeg? I must use a lpeg.V somewhere,  
but I can't figure out where and how.

Thanks, and all best

Thomas
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lua questions
  2009-01-29 12:35   ` Thomas A. Schmitz
@ 2009-01-29 12:42     ` Taco Hoekwater
  0 siblings, 0 replies; 5+ messages in thread
From: Taco Hoekwater @ 2009-01-29 12:42 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Thomas A. Schmitz wrote:
> 
> 
> The lpeg that comes with luatex is

lpeg in luatex is still 0.8.1

Best wishes,
Taco
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-01-29 12:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-22 21:24 lua questions Thomas A. Schmitz
2009-01-23 11:13 ` Hans Hagen
2009-01-23 13:04   ` Thomas A. Schmitz
2009-01-29 12:35   ` Thomas A. Schmitz
2009-01-29 12:42     ` Taco Hoekwater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).