public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Converting everything that’s inside a specific div (including other div) while excluding everything else
@ 2020-09-29 20:45 Butch
       [not found] ` <ee79a1ca-efb1-463c-ace9-5398c8e623e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Butch @ 2020-09-29 20:45 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1703 bytes --]

Hello,

I am trying to convert specific parts of an HTML file to Markdown. I want 
to convert everything that’s inside a specific div (including other div) 
while excluding everything else. Is that possible?

Here is an example. I want to take this:

<div class="show">
    <p>This is the outer text.</p>
    <div class="inner">
        <p>This is the inner text.</p>
    </div>
</div>
<div class="hide">
    <p>This is the hidden text.</p>
</div>

And convert it so I have this:

::: {.show}
This is the outer text.

::: {.inner}
This is the inner text.
:::
:::

I.e., I want to convert everything that’s inside <div class="show"> 
(including other div) and to exclude everything else in the document.

If I use a filter like this:

function Div(el)
    if el.classes[1] == "show" then
        return el
    else
        return {}
    end
end

The resulting Markdown will be:

::: {.show}
This is the outer text.
:::

Which is kind of expected. So what can I do to include in the conversion 
not only <div class="show">, but also all the other div inside it?

The actual HTML files I want to convert are very large, so I can’t list all 
the classes I want to include (or exclude from) in the conversion.

Thanks in advance.


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ee79a1ca-efb1-463c-ace9-5398c8e623e3n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3682 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Converting everything that’s inside a specific div (including other div) while excluding everything else
       [not found] ` <ee79a1ca-efb1-463c-ace9-5398c8e623e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-09-29 22:41   ` John MacFarlane
       [not found]     ` <m2zh58xl1w.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: John MacFarlane @ 2020-09-29 22:41 UTC (permalink / raw)
  To: Butch, pandoc-discuss


This is the sort of thing that is currently a bit tricky
with our filter architecture.

One idea  might be to do several passes (i.e., several filters, which
you can include in the same lua file; see the docs).

In the first pass, you'd set a special attribute keep="false" on
all Divs.

In the second pass, you'd match Divs with the 'show' class,
and use walk_block to set keep="true" on all Divs inside it.
You'd also set keep="true" on it.

IN the third pass, you'd match Divs and remove them if
keep="false".

I think something like this could work.

Butch <idiosyncraticwriter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hello,
>
> I am trying to convert specific parts of an HTML file to Markdown. I want 
> to convert everything that’s inside a specific div (including other div) 
> while excluding everything else. Is that possible?
>
> Here is an example. I want to take this:
>
> <div class="show">
>     <p>This is the outer text.</p>
>     <div class="inner">
>         <p>This is the inner text.</p>
>     </div>
> </div>
> <div class="hide">
>     <p>This is the hidden text.</p>
> </div>
>
> And convert it so I have this:
>
> ::: {.show}
> This is the outer text.
>
> ::: {.inner}
> This is the inner text.
> :::
> :::
>
> I.e., I want to convert everything that’s inside <div class="show"> 
> (including other div) and to exclude everything else in the document.
>
> If I use a filter like this:
>
> function Div(el)
>     if el.classes[1] == "show" then
>         return el
>     else
>         return {}
>     end
> end
>
> The resulting Markdown will be:
>
> ::: {.show}
> This is the outer text.
> :::
>
> Which is kind of expected. So what can I do to include in the conversion 
> not only <div class="show">, but also all the other div inside it?
>
> The actual HTML files I want to convert are very large, so I can’t list all 
> the classes I want to include (or exclude from) in the conversion.
>
> Thanks in advance.
>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ee79a1ca-efb1-463c-ace9-5398c8e623e3n%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2zh58xl1w.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Converting everything that’s inside a specific div (including other div) while excluding everything else
       [not found]     ` <m2zh58xl1w.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2020-09-30  5:11       ` Butch
       [not found]         ` <d6b951e8-141e-4497-85cb-f5ecc8b992a4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Butch @ 2020-09-30  5:11 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3208 bytes --]


Thanks for the response. Yeah, I’ll either do what you suggested or use 
some external filters before passing the files through Pandoc.


On Tuesday, 29 September 2020 at 19:42:05 UTC-3 John MacFarlane wrote:

>
> This is the sort of thing that is currently a bit tricky
> with our filter architecture.
>
> One idea might be to do several passes (i.e., several filters, which
> you can include in the same lua file; see the docs).
>
> In the first pass, you'd set a special attribute keep="false" on
> all Divs.
>
> In the second pass, you'd match Divs with the 'show' class,
> and use walk_block to set keep="true" on all Divs inside it.
> You'd also set keep="true" on it.
>
> IN the third pass, you'd match Divs and remove them if
> keep="false".
>
> I think something like this could work.
>
> Butch <idiosyncr...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Hello,
> >
> > I am trying to convert specific parts of an HTML file to Markdown. I 
> want 
> > to convert everything that’s inside a specific div (including other div) 
> > while excluding everything else. Is that possible?
> >
> > Here is an example. I want to take this:
> >
> > <div class="show">
> > <p>This is the outer text.</p>
> > <div class="inner">
> > <p>This is the inner text.</p>
> > </div>
> > </div>
> > <div class="hide">
> > <p>This is the hidden text.</p>
> > </div>
> >
> > And convert it so I have this:
> >
> > ::: {.show}
> > This is the outer text.
> >
> > ::: {.inner}
> > This is the inner text.
> > :::
> > :::
> >
> > I.e., I want to convert everything that’s inside <div class="show"> 
> > (including other div) and to exclude everything else in the document.
> >
> > If I use a filter like this:
> >
> > function Div(el)
> > if el.classes[1] == "show" then
> > return el
> > else
> > return {}
> > end
> > end
> >
> > The resulting Markdown will be:
> >
> > ::: {.show}
> > This is the outer text.
> > :::
> >
> > Which is kind of expected. So what can I do to include in the conversion 
> > not only <div class="show">, but also all the other div inside it?
> >
> > The actual HTML files I want to convert are very large, so I can’t list 
> all 
> > the classes I want to include (or exclude from) in the conversion.
> >
> > Thanks in advance.
> >
> >
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/ee79a1ca-efb1-463c-ace9-5398c8e623e3n%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d6b951e8-141e-4497-85cb-f5ecc8b992a4n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4763 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Converting everything that’s inside a specific div (including other div) while excluding everything else
       [not found]         ` <d6b951e8-141e-4497-85cb-f5ecc8b992a4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-09-30  8:49           ` Albert Krewinkel
       [not found]             ` <87lfgrk5sb.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Albert Krewinkel @ 2020-09-30  8:49 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Maybe there's another way:

1. Collect all divs in the order that pandoc sees them.
2. In second traversal, check whether we want to keep the div. If
   so, use the div that we stored before, as it will still contain
   all children.

Here's the code:

    local divs = pandoc.List()
    local div_index = 0

    function collect (d)
      divs:insert(d)
    end

    function filter (div)
      div_index = div_index + 1
      if div.classes[1] == 'show' then
        return divs[div_index]
      else
        return {}
      end
    end

    return {
      {Div = collect},
      {Div = filter}
    }

Butch writes:

> Thanks for the response. Yeah, I’ll either do what you suggested or use
> some external filters before passing the files through Pandoc.
>
>
> On Tuesday, 29 September 2020 at 19:42:05 UTC-3 John MacFarlane wrote:
>
>>
>> This is the sort of thing that is currently a bit tricky
>> with our filter architecture.
>>
>> One idea might be to do several passes (i.e., several filters, which
>> you can include in the same lua file; see the docs).
>>
>> In the first pass, you'd set a special attribute keep="false" on
>> all Divs.
>>
>> In the second pass, you'd match Divs with the 'show' class,
>> and use walk_block to set keep="true" on all Divs inside it.
>> You'd also set keep="true" on it.
>>
>> IN the third pass, you'd match Divs and remove them if
>> keep="false".
>>
>> I think something like this could work.
>>
>> Butch <idiosyncr...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > Hello,
>> >
>> > I am trying to convert specific parts of an HTML file to Markdown. I
>> want
>> > to convert everything that’s inside a specific div (including other div)
>> > while excluding everything else. Is that possible?
>> >
>> > Here is an example. I want to take this:
>> >
>> > <div class="show">
>> > <p>This is the outer text.</p>
>> > <div class="inner">
>> > <p>This is the inner text.</p>
>> > </div>
>> > </div>
>> > <div class="hide">
>> > <p>This is the hidden text.</p>
>> > </div>
>> >
>> > And convert it so I have this:
>> >
>> > ::: {.show}
>> > This is the outer text.
>> >
>> > ::: {.inner}
>> > This is the inner text.
>> > :::
>> > :::
>> >
>> > I.e., I want to convert everything that’s inside <div class="show">
>> > (including other div) and to exclude everything else in the document.
>> >
>> > If I use a filter like this:
>> >
>> > function Div(el)
>> > if el.classes[1] == "show" then
>> > return el
>> > else
>> > return {}
>> > end
>> > end
>> >
>> > The resulting Markdown will be:
>> >
>> > ::: {.show}
>> > This is the outer text.
>> > :::
>> >
>> > Which is kind of expected. So what can I do to include in the conversion
>> > not only <div class="show">, but also all the other div inside it?
>> >
>> > The actual HTML files I want to convert are very large, so I can’t list
>> all
>> > the classes I want to include (or exclude from) in the conversion.
>> >
>> > Thanks in advance.
>> >
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/ee79a1ca-efb1-463c-ace9-5398c8e623e3n%40googlegroups.com
>> .
>>


--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87lfgrk5sb.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Converting everything that’s inside a specific div (including other div) while excluding everything else
       [not found]             ` <87lfgrk5sb.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2020-09-30 19:26               ` John MacFarlane
       [not found]                 ` <m2eemjxe0f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: John MacFarlane @ 2020-09-30 19:26 UTC (permalink / raw)
  To: Albert Krewinkel, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


Good idea -- I agree, that's probably a better approach!

Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:

> Maybe there's another way:
>
> 1. Collect all divs in the order that pandoc sees them.
> 2. In second traversal, check whether we want to keep the div. If
>    so, use the div that we stored before, as it will still contain
>    all children.
>
> Here's the code:
>
>     local divs = pandoc.List()
>     local div_index = 0
>
>     function collect (d)
>       divs:insert(d)
>     end
>
>     function filter (div)
>       div_index = div_index + 1
>       if div.classes[1] == 'show' then
>         return divs[div_index]
>       else
>         return {}
>       end
>     end
>
>     return {
>       {Div = collect},
>       {Div = filter}
>     }
>
> Butch writes:
>
>> Thanks for the response. Yeah, I’ll either do what you suggested or use
>> some external filters before passing the files through Pandoc.
>>
>>
>> On Tuesday, 29 September 2020 at 19:42:05 UTC-3 John MacFarlane wrote:
>>
>>>
>>> This is the sort of thing that is currently a bit tricky
>>> with our filter architecture.
>>>
>>> One idea might be to do several passes (i.e., several filters, which
>>> you can include in the same lua file; see the docs).
>>>
>>> In the first pass, you'd set a special attribute keep="false" on
>>> all Divs.
>>>
>>> In the second pass, you'd match Divs with the 'show' class,
>>> and use walk_block to set keep="true" on all Divs inside it.
>>> You'd also set keep="true" on it.
>>>
>>> IN the third pass, you'd match Divs and remove them if
>>> keep="false".
>>>
>>> I think something like this could work.
>>>
>>> Butch <idiosyncr...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>>
>>> > Hello,
>>> >
>>> > I am trying to convert specific parts of an HTML file to Markdown. I
>>> want
>>> > to convert everything that’s inside a specific div (including other div)
>>> > while excluding everything else. Is that possible?
>>> >
>>> > Here is an example. I want to take this:
>>> >
>>> > <div class="show">
>>> > <p>This is the outer text.</p>
>>> > <div class="inner">
>>> > <p>This is the inner text.</p>
>>> > </div>
>>> > </div>
>>> > <div class="hide">
>>> > <p>This is the hidden text.</p>
>>> > </div>
>>> >
>>> > And convert it so I have this:
>>> >
>>> > ::: {.show}
>>> > This is the outer text.
>>> >
>>> > ::: {.inner}
>>> > This is the inner text.
>>> > :::
>>> > :::
>>> >
>>> > I.e., I want to convert everything that’s inside <div class="show">
>>> > (including other div) and to exclude everything else in the document.
>>> >
>>> > If I use a filter like this:
>>> >
>>> > function Div(el)
>>> > if el.classes[1] == "show" then
>>> > return el
>>> > else
>>> > return {}
>>> > end
>>> > end
>>> >
>>> > The resulting Markdown will be:
>>> >
>>> > ::: {.show}
>>> > This is the outer text.
>>> > :::
>>> >
>>> > Which is kind of expected. So what can I do to include in the conversion
>>> > not only <div class="show">, but also all the other div inside it?
>>> >
>>> > The actual HTML files I want to convert are very large, so I can’t list
>>> all
>>> > the classes I want to include (or exclude from) in the conversion.
>>> >
>>> > Thanks in advance.
>>> >
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> > To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/ee79a1ca-efb1-463c-ace9-5398c8e623e3n%40googlegroups.com
>>> .
>>>
>
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87lfgrk5sb.fsf%40zeitkraut.de.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m2eemjxe0f.fsf%40MacBook-Pro.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Converting everything that’s inside a specific div (including other div) while excluding everything else
       [not found]                 ` <m2eemjxe0f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2020-10-01  5:24                   ` Butch
       [not found]                     ` <d8bdce6a-7632-4107-a700-0b228c9c3f74n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Butch @ 2020-10-01  5:24 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5511 bytes --]


Thanks for your help, Albert.

This filter seems to work only if <div class="show"> is not inside another 
block. Would it be possible to change it to work with all the divs in the 
document? In my example, how could I keep only <div class="inner">?

In addition to that, I'd need to exclude all other elements outside the 
divs I want to keep, even if they are not inside a div. For example, a 
<p>Paragraph</p> that is not inside any div should be excluded. Would that 
be feasible?


On Wednesday, 30 September 2020 at 16:26:27 UTC-3 John MacFarlane wrote:

>
> Good idea -- I agree, that's probably a better approach!
>
> Albert Krewinkel <albert...-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
>
> > Maybe there's another way:
> >
> > 1. Collect all divs in the order that pandoc sees them.
> > 2. In second traversal, check whether we want to keep the div. If
> > so, use the div that we stored before, as it will still contain
> > all children.
> >
> > Here's the code:
> >
> > local divs = pandoc.List()
> > local div_index = 0
> >
> > function collect (d)
> > divs:insert(d)
> > end
> >
> > function filter (div)
> > div_index = div_index + 1
> > if div.classes[1] == 'show' then
> > return divs[div_index]
> > else
> > return {}
> > end
> > end
> >
> > return {
> > {Div = collect},
> > {Div = filter}
> > }
> >
> > Butch writes:
> >
> >> Thanks for the response. Yeah, I’ll either do what you suggested or use
> >> some external filters before passing the files through Pandoc.
> >>
> >>
> >> On Tuesday, 29 September 2020 at 19:42:05 UTC-3 John MacFarlane wrote:
> >>
> >>>
> >>> This is the sort of thing that is currently a bit tricky
> >>> with our filter architecture.
> >>>
> >>> One idea might be to do several passes (i.e., several filters, which
> >>> you can include in the same lua file; see the docs).
> >>>
> >>> In the first pass, you'd set a special attribute keep="false" on
> >>> all Divs.
> >>>
> >>> In the second pass, you'd match Divs with the 'show' class,
> >>> and use walk_block to set keep="true" on all Divs inside it.
> >>> You'd also set keep="true" on it.
> >>>
> >>> IN the third pass, you'd match Divs and remove them if
> >>> keep="false".
> >>>
> >>> I think something like this could work.
> >>>
> >>> Butch <idiosyncr...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> >>>
> >>> > Hello,
> >>> >
> >>> > I am trying to convert specific parts of an HTML file to Markdown. I
> >>> want
> >>> > to convert everything that’s inside a specific div (including other 
> div)
> >>> > while excluding everything else. Is that possible?
> >>> >
> >>> > Here is an example. I want to take this:
> >>> >
> >>> > <div class="show">
> >>> > <p>This is the outer text.</p>
> >>> > <div class="inner">
> >>> > <p>This is the inner text.</p>
> >>> > </div>
> >>> > </div>
> >>> > <div class="hide">
> >>> > <p>This is the hidden text.</p>
> >>> > </div>
> >>> >
> >>> > And convert it so I have this:
> >>> >
> >>> > ::: {.show}
> >>> > This is the outer text.
> >>> >
> >>> > ::: {.inner}
> >>> > This is the inner text.
> >>> > :::
> >>> > :::
> >>> >
> >>> > I.e., I want to convert everything that’s inside <div class="show">
> >>> > (including other div) and to exclude everything else in the document.
> >>> >
> >>> > If I use a filter like this:
> >>> >
> >>> > function Div(el)
> >>> > if el.classes[1] == "show" then
> >>> > return el
> >>> > else
> >>> > return {}
> >>> > end
> >>> > end
> >>> >
> >>> > The resulting Markdown will be:
> >>> >
> >>> > ::: {.show}
> >>> > This is the outer text.
> >>> > :::
> >>> >
> >>> > Which is kind of expected. So what can I do to include in the 
> conversion
> >>> > not only <div class="show">, but also all the other div inside it?
> >>> >
> >>> > The actual HTML files I want to convert are very large, so I can’t 
> list
> >>> all
> >>> > the classes I want to include (or exclude from) in the conversion.
> >>> >
> >>> > Thanks in advance.
> >>> >
> >>> >
> >>> > --
> >>> > You received this message because you are subscribed to the Google
> >>> Groups "pandoc-discuss" group.
> >>> > To unsubscribe from this group and stop receiving emails from it, 
> send
> >>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >>> > To view this discussion on the web visit
> >>> 
> https://groups.google.com/d/msgid/pandoc-discuss/ee79a1ca-efb1-463c-ace9-5398c8e623e3n%40googlegroups.com
> >>> .
> >>>
> >
> >
> > --
> > Albert Krewinkel
> > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
> >
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/87lfgrk5sb.fsf%40zeitkraut.de
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d8bdce6a-7632-4107-a700-0b228c9c3f74n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 8816 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Converting everything that’s inside a specific div (including other div) while excluding everything else
       [not found]                     ` <d8bdce6a-7632-4107-a700-0b228c9c3f74n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-10-02 17:43                       ` Albert Krewinkel
       [not found]                         ` <87d020jzgd.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Albert Krewinkel @ 2020-10-02 17:43 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Butch writes:

> Thanks for your help, Albert.
>
> This filter seems to work only if <div class="show"> is not inside another
> block. Would it be possible to change it to work with all the divs in the
> document? In my example, how could I keep only <div class="inner">?

Taken by itself, that task is really difficult to achive, but...

> In addition to that, I'd need to exclude all other elements outside the
> divs I want to keep, even if they are not inside a div. For example, a
> <p>Paragraph</p> that is not inside any div should be excluded. Would that
> be feasible?

with this additional requirement it becomes easy again: we can collect
all Div which we'd like to keep, then we replace the document with the
list of collected divs:

    local keep = pandoc.List()

    function Div (div)
      if div.classes[1] == 'show' then
        return keep:insert(div)
      end
    end

    function Pandoc (doc)
      doc.blocks = keep
      return doc
    end

The limitation here is that we assume that "show" divs will not be
nested; nested "show" divs would be included twice.

Cheers

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Converting everything that’s inside a specific div (including other div) while excluding everything else
       [not found]                         ` <87d020jzgd.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2020-10-03  5:29                           ` Butch
  0 siblings, 0 replies; 8+ messages in thread
From: Butch @ 2020-10-03  5:29 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1736 bytes --]


Thanks a lot, Albert! That worked great for my purposes.


On Friday, 2 October 2020 at 14:43:40 UTC-3 Albert Krewinkel wrote:

> Butch writes:
>
> > Thanks for your help, Albert.
> >
> > This filter seems to work only if <div class="show"> is not inside 
> another
> > block. Would it be possible to change it to work with all the divs in the
> > document? In my example, how could I keep only <div class="inner">?
>
> Taken by itself, that task is really difficult to achive, but...
>
> > In addition to that, I'd need to exclude all other elements outside the
> > divs I want to keep, even if they are not inside a div. For example, a
> > <p>Paragraph</p> that is not inside any div should be excluded. Would 
> that
> > be feasible?
>
> with this additional requirement it becomes easy again: we can collect
> all Div which we'd like to keep, then we replace the document with the
> list of collected divs:
>
> local keep = pandoc.List()
>
> function Div (div)
> if div.classes[1] == 'show' then
> return keep:insert(div)
> end
> end
>
> function Pandoc (doc)
> doc.blocks = keep
> return doc
> end
>
> The limitation here is that we assume that "show" divs will not be
> nested; nested "show" divs would be included twice.
>
> Cheers
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3b061ebc-bd2c-4633-9e8a-4e70e1f40a30n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2545 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-03  5:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 20:45 Converting everything that’s inside a specific div (including other div) while excluding everything else Butch
     [not found] ` <ee79a1ca-efb1-463c-ace9-5398c8e623e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-09-29 22:41   ` John MacFarlane
     [not found]     ` <m2zh58xl1w.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-09-30  5:11       ` Butch
     [not found]         ` <d6b951e8-141e-4497-85cb-f5ecc8b992a4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-09-30  8:49           ` Albert Krewinkel
     [not found]             ` <87lfgrk5sb.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-09-30 19:26               ` John MacFarlane
     [not found]                 ` <m2eemjxe0f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-10-01  5:24                   ` Butch
     [not found]                     ` <d8bdce6a-7632-4107-a700-0b228c9c3f74n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-10-02 17:43                       ` Albert Krewinkel
     [not found]                         ` <87d020jzgd.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-10-03  5:29                           ` Butch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).