ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* removing word in filtered XML
@ 2020-08-19 16:10 Pablo Rodriguez
  2020-08-20  9:08 ` Taco Hoekwater
  2020-08-20  9:27 ` Hans Hagen
  0 siblings, 2 replies; 9+ messages in thread
From: Pablo Rodriguez @ 2020-08-19 16:10 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Dear list,

I have the following sample:

  \startbuffer[demo]
  <html>
    <body>
      <div id="First">
        <p>This is
          <span class="special">One of the best</span> a paragraph.</p>
        <p>This is another paragraph.</p>
        <p>This is another
          <span class="special">Two of the best</span> paragraph.</p>
        <p>This is another
          <span class="special">Three</span> paragraph.</p>
        <p>This is another
          <span class="special">Four of five</span> paragraph.</p>
      </div>
    </body>
  </html>
  \stopbuffer

  \startxmlsetups xml:initialize
    \xmlsetsetup{#1}{html}{xml:gen}
  \stopxmlsetups

  \xmlregistersetup{xml:initialize}

  \startxmlsetups xml:gen
     \xmlfilter{#1}{/**/div/command(xml:special)}
  \stopxmlsetups

  \startxmlsetups xml:special
    %~ \startitem
    \cldcontext{string.gsub(lxml.flush([[#1]]),
       " of the ", "")}\stopitem
  \stopxmlsetups

  \starttext
    \xmlprocessbuffer{main}{demo}{}
  \stoptext

Is there any way to remove " of " and " of the " in the filtered content
(xml:special)?

Sorry, Lua code is crap for sure.

Many thanks for your help,

Pablo
--
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: removing word in filtered XML
  2020-08-19 16:10 removing word in filtered XML Pablo Rodriguez
@ 2020-08-20  9:08 ` Taco Hoekwater
  2020-08-20 10:23   ` Pablo Rodriguez
  2020-08-20  9:27 ` Hans Hagen
  1 sibling, 1 reply; 9+ messages in thread
From: Taco Hoekwater @ 2020-08-20  9:08 UTC (permalink / raw)
  To: mailing list for ConTeXt users



> On 19 Aug 2020, at 18:10, Pablo Rodriguez <oinos@gmx.es> wrote:
> 
> Dear list,
> 
> I have the following sample:
> 
>  \startbuffer[demo]
>  <html>
>    <body>
>      <div id="First">
>        <p>This is
>          <span class="special">One of the best</span> a paragraph.</p>
>        <p>This is another paragraph.</p>
>        <p>This is another
>          <span class="special">Two of the best</span> paragraph.</p>
>        <p>This is another
>          <span class="special">Three</span> paragraph.</p>
>        <p>This is another
>          <span class="special">Four of five</span> paragraph.</p>
>      </div>
>    </body>
>  </html>
>  \stopbuffer
> 
>  \startxmlsetups xml:initialize
>    \xmlsetsetup{#1}{html}{xml:gen}
>  \stopxmlsetups
> 
>  \xmlregistersetup{xml:initialize}
> 
>  \startxmlsetups xml:gen
>     \xmlfilter{#1}{/**/div/command(xml:special)}
>  \stopxmlsetups
> 
>  \startxmlsetups xml:special
>    %~ \startitem
>    \cldcontext{string.gsub(lxml.flush([[#1]]),
>       " of the ", "")}\stopitem
>  \stopxmlsetups
> 
>  \starttext
>    \xmlprocessbuffer{main}{demo}{}
>  \stoptext
> 
> Is there any way to remove " of " and " of the " in the filtered content
> (xml:special)?

There is pretty much always ‘a way’, but I do not know of a ’nice’ way. 
Your problem is that lxml.flush() and friends do not return a value,
they just do a direct context(‘xxxx’) call behind the scenes with no
return string for you to modify.

Also, the special (catcode, space handling) rules for setups and \cldcontext
do not help you.

That does not mean it can’t be done. As I don’t know a of a nice way,
here is a low-level ‘ugly' way:

\startluacode
function filter(a)
    local div = lxml.getid(a)
    process(div)
    lxml.flush(div)
end
function process(div)
    for c=1,#div.dt do
        if type(div.dt[c]) == 'string' then
            div.dt[c] = string.gsub(div.dt[c], " of the ", "")
        else 
            process(div.dt[c]) 
        end
    end
end
\stopluacode

 \startxmlsetups xml:special
   \ctxlua{filter([[#1]])}
 \stopxmlsetups


process() is recursive because your xml:special gets the whole <div>. Not sure if you intended it that way.
And if it can be done nicer, I am sure someone will correct me :)

Best wishes,
Taco
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: removing word in filtered XML
  2020-08-19 16:10 removing word in filtered XML Pablo Rodriguez
  2020-08-20  9:08 ` Taco Hoekwater
@ 2020-08-20  9:27 ` Hans Hagen
  2020-08-20 10:38   ` Pablo Rodriguez
  1 sibling, 1 reply; 9+ messages in thread
From: Hans Hagen @ 2020-08-20  9:27 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 8/19/2020 6:10 PM, Pablo Rodriguez wrote:
> Dear list,
> 
> I have the following sample:
> 
>    \startbuffer[demo]
>    <html>
>      <body>
>        <div id="First">
>          <p>This is
>            <span class="special">One of the best</span> a paragraph.</p>
>          <p>This is another paragraph.</p>
>          <p>This is another
>            <span class="special">Two of the best</span> paragraph.</p>
>          <p>This is another
>            <span class="special">Three</span> paragraph.</p>
>          <p>This is another
>            <span class="special">Four of five</span> paragraph.</p>
>        </div>
>      </body>
>    </html>
>    \stopbuffer
> 
>    \startxmlsetups xml:initialize
>      \xmlsetsetup{#1}{html}{xml:gen}
>    \stopxmlsetups
> 
>    \xmlregistersetup{xml:initialize}
> 
>    \startxmlsetups xml:gen
>       \xmlfilter{#1}{/**/div/command(xml:special)}
>    \stopxmlsetups
> 
>    \startxmlsetups xml:special
>      %~ \startitem
>      \cldcontext{string.gsub(lxml.flush([[#1]]),
>         " of the ", "")}\stopitem
>    \stopxmlsetups
> 
>    \starttext
>      \xmlprocessbuffer{main}{demo}{}
>    \stoptext
> 
> Is there any way to remove " of " and " of the " in the filtered content
> (xml:special)?
> 
> Sorry, Lua code is crap for sure.
\startbuffer[demo]
   <html>
     <body>
       <div id="First">
         <p>This is <span class="special">One of the best</span> a 
paragraph.</p>
         <p>This is another paragraph.</p>
         <p>This is another <span class="special">Two of the best</span> 
paragraph.</p>
         <p>This is another <span class="special">Three</span> 
paragraph.</p>
         <p>This is another <span class="special">Four of five</span> 
paragraph.</p>
       </div>
       <div id="Second">
         <p>This is <span class="special">One of the best</span> a 
paragraph.</p>
         <p>This is another paragraph.</p>
         <p>This is another <span class="special">Two of the best</span> 
paragraph.</p>
         <p>This is another <span class="special">Three</span> 
paragraph.</p>
         <p>This is another <span class="special">Four of five</span> 
paragraph.</p>
       </div>
     </body>
   </html>
\stopbuffer

\startxmlsetups xml:initialize
     \xmlsetsetup{#1}{html}{xml:gen}
     \xmlsetsetup{#1}{span[@class='special']}{xml:span:special}
\stopxmlsetups

\xmlregistersetup{xml:initialize}

\startxmlsetups xml:gen
     \startitemize
         \xmlfilter{#1}{/**/div/command(xml:special)}
     \stopitemize
\stopxmlsetups

\startxmlsetups xml:special
     \startitem
         <\xmlflush{#1}>
     \stopitem
\stopxmlsetups

\startxmlsetups xml:span:special
     (\cldcontext{(string.gsub([[\xmlraw{#1}{.}]]," of the ", ""))})
\stopxmlsetups

\starttext
     \xmlprocessbuffer{main}{demo}{}
\stoptext

Or make a finalizer as Taco posted.

Hans



-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: removing word in filtered XML
  2020-08-20  9:08 ` Taco Hoekwater
@ 2020-08-20 10:23   ` Pablo Rodriguez
  0 siblings, 0 replies; 9+ messages in thread
From: Pablo Rodriguez @ 2020-08-20 10:23 UTC (permalink / raw)
  To: ntg-context

On 8/20/20 11:08 AM, Taco Hoekwater wrote:
> [...]
> There is pretty much always ‘a way’, but I do not know of a ’nice’ way.
> Your problem is that lxml.flush() and friends do not return a value,
> they just do a direct context(‘xxxx’) call behind the scenes with no
> return string for you to modify.

Many thanks for your explanation, Taco.

> Also, the special (catcode, space handling) rules for setups and \cldcontext
> do not help you.
>
> That does not mean it can’t be done. As I don’t know a of a nice way,
> here is a low-level ‘ugly' way:
>
> \startluacode
> function filter(a)
>     local div = lxml.getid(a)
>     process(div)
>     lxml.flush(div)
> end
> function process(div)
>     for c=1,#div.dt do
>         if type(div.dt[c]) == 'string' then
>             div.dt[c] = string.gsub(div.dt[c], " of the ", "")
>         else
>             process(div.dt[c])
>         end
>     end
> end
> \stopluacode
>
>  \startxmlsetups xml:special
>    \ctxlua{filter([[#1]])}
>  \stopxmlsetups
>
> process() is recursive because your xml:special gets the whole <div>.
> Not sure if you intended it that way. And if it can be done nicer, I
> am sure someone will correct me :)

You’re right, my xml:special wasn’t intended to get the whole <div>. I
was tinkering with a previous sample. And I removed an \xmlfilter. Since
I got no output, I didn’t see what I was missing.

Many thanks for your help,

Pablo
--
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: removing word in filtered XML
  2020-08-20  9:27 ` Hans Hagen
@ 2020-08-20 10:38   ` Pablo Rodriguez
  2020-08-20 11:10     ` Hans Hagen
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Rodriguez @ 2020-08-20 10:38 UTC (permalink / raw)
  To: ntg-context

On 8/20/20 11:27 AM, Hans Hagen wrote:
> On 8/19/2020 6:10 PM, Pablo Rodriguez wrote:
>> [...]
>> Is there any way to remove " of " and " of the " in the filtered content
>> (xml:special)?
>>
>> Sorry, Lua code is crap for sure.
> [...]
> \startxmlsetups xml:initialize
>      \xmlsetsetup{#1}{html}{xml:gen}
>      \xmlsetsetup{#1}{span[@class='special']}{xml:span:special}
> \stopxmlsetups
> [...]
> \startxmlsetups xml:span:special
>      (\cldcontext{(string.gsub([[\xmlraw{#1}{.}]]," of the ", ""))})
> \stopxmlsetups

Many thanks for your reply, Hans.

I now see that \xmlraw is the way to go.

I have two questions in word replacement and Lua (maybe there is some
lpeg magic that could be used).

This time, I have to remove two words, such as in:

  string.gsub([[\xmlraw{#1}{.}]]," del ", " "):gsub(" de la ", " ")}

But they could be more (and replacements might be added to that list).

Is there a more elegant way than appending :gsub()?

Is there also a proper way for word scanning?

A "word" can be "Word ", " word " " word." " word?" (and so on). I would
like to avoid having to code all combinations (of course, if this were
already available).

Many thanks for your help,

Pablo
--
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: removing word in filtered XML
  2020-08-20 10:38   ` Pablo Rodriguez
@ 2020-08-20 11:10     ` Hans Hagen
  2020-08-20 14:20       ` Pablo Rodriguez
  0 siblings, 1 reply; 9+ messages in thread
From: Hans Hagen @ 2020-08-20 11:10 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Pablo Rodriguez

On 8/20/2020 12:38 PM, Pablo Rodriguez wrote:
> On 8/20/20 11:27 AM, Hans Hagen wrote:
>> On 8/19/2020 6:10 PM, Pablo Rodriguez wrote:
>>> [...]
>>> Is there any way to remove " of " and " of the " in the filtered content
>>> (xml:special)?
>>>
>>> Sorry, Lua code is crap for sure.
>> [...]
>> \startxmlsetups xml:initialize
>>       \xmlsetsetup{#1}{html}{xml:gen}
>>       \xmlsetsetup{#1}{span[@class='special']}{xml:span:special}
>> \stopxmlsetups
>> [...]
>> \startxmlsetups xml:span:special
>>       (\cldcontext{(string.gsub([[\xmlraw{#1}{.}]]," of the ", ""))})
>> \stopxmlsetups
> 
> Many thanks for your reply, Hans.
> 
> I now see that \xmlraw is the way to go.
> 
> I have two questions in word replacement and Lua (maybe there is some
> lpeg magic that could be used).
> 
> This time, I have to remove two words, such as in:
> 
>    string.gsub([[\xmlraw{#1}{.}]]," del ", " "):gsub(" de la ", " ")}
> 
> But they could be more (and replacements might be added to that list).
> 
> Is there a more elegant way than appending :gsub()?
> 
> Is there also a proper way for word scanning?
> 
> A "word" can be "Word ", " word " " word." " word?" (and so on). I would
> like to avoid having to code all combinations (of course, if this were
> already available).
old stuff present for a long time ... probaly documented somewhere ... 
if not than you have to wikify it ...

\starttext

\replaceword[whatever][this][that]
\replaceword[whatever][that][this]

\startlines
it is this or that
{\setreplacements[whatever]it is this or that}
it is this or that
\stoplines

\stoptext



-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: removing word in filtered XML
  2020-08-20 11:10     ` Hans Hagen
@ 2020-08-20 14:20       ` Pablo Rodriguez
  2020-08-21 12:59         ` Hans Hagen
  0 siblings, 1 reply; 9+ messages in thread
From: Pablo Rodriguez @ 2020-08-20 14:20 UTC (permalink / raw)
  To: ntg-context

On 8/20/20 1:10 PM, Hans Hagen wrote:
> On 8/20/2020 12:38 PM, Pablo Rodriguez wrote:
>> [...]
>> This time, I have to remove two words, such as in:
>>
>>    string.gsub([[\xmlraw{#1}{.}]]," del ", " "):gsub(" de la ", " ")}
>>
>> But they could be more (and replacements might be added to that list).
>>
>> Is there a more elegant way than appending :gsub()?
>>
>> Is there also a proper way for word scanning?
>>
>> A "word" can be "Word ", " word " " word." " word?" (and so on). I would
>> like to avoid having to code all combinations (of course, if this were
>> already available).
>
> old stuff present for a long time ... probaly documented somewhere ...
> if not than you have to wikify it ...

Many thanks for your reply, Hans.

It is already wikified
(https://wiki.contextgarden.net/Ligatures#Replacements).

I wonder whether \replaceword could be extended to replace multiple
words and also to remove them.

  \starttext

  \replaceword[whatever][this or][no]
  \replaceword[whatever][that][]

  \startlines
  it is this or that
  {\setreplacements[whatever]it is this or that}
  {\setreplacements[whatever]it is this or that}
  it is this or that
  \stoplines

  \stoptext

Many thanks for your help,

Pablo
--
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: removing word in filtered XML
  2020-08-20 14:20       ` Pablo Rodriguez
@ 2020-08-21 12:59         ` Hans Hagen
  2020-08-21 13:34           ` Pablo Rodriguez
  0 siblings, 1 reply; 9+ messages in thread
From: Hans Hagen @ 2020-08-21 12:59 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 8/20/2020 4:20 PM, Pablo Rodriguez wrote:

>    \replaceword[whatever][this or][no]
>    \replaceword[whatever][that][]
this feature creep is in the next upload

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: removing word in filtered XML
  2020-08-21 12:59         ` Hans Hagen
@ 2020-08-21 13:34           ` Pablo Rodriguez
  0 siblings, 0 replies; 9+ messages in thread
From: Pablo Rodriguez @ 2020-08-21 13:34 UTC (permalink / raw)
  To: ntg-context

On 8/21/20 2:59 PM, Hans Hagen wrote:
> On 8/20/2020 4:20 PM, Pablo Rodriguez wrote:
>
>>    \replaceword[whatever][this or][no]
>>    \replaceword[whatever][that][]
> this feature creep is in the next upload

Hans,

many thanks for the new feature.

Pablo
--
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-21 13:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19 16:10 removing word in filtered XML Pablo Rodriguez
2020-08-20  9:08 ` Taco Hoekwater
2020-08-20 10:23   ` Pablo Rodriguez
2020-08-20  9:27 ` Hans Hagen
2020-08-20 10:38   ` Pablo Rodriguez
2020-08-20 11:10     ` Hans Hagen
2020-08-20 14:20       ` Pablo Rodriguez
2020-08-21 12:59         ` Hans Hagen
2020-08-21 13:34           ` Pablo Rodriguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).