Thanks for your help, Albert. This filter seems to work only if
is not inside another block. Would it be possible to change it to work with all the divs in the document? In my example, how could I keep only
? In addition to that, I'd need to exclude all other elements outside the divs I want to keep, even if they are not inside a div. For example, a

Paragraph

that is not inside any div should be excluded. Would that be feasible? On Wednesday, 30 September 2020 at 16:26:27 UTC-3 John MacFarlane wrote: > > Good idea -- I agree, that's probably a better approach! > > Albert Krewinkel writes: > > > Maybe there's another way: > > > > 1. Collect all divs in the order that pandoc sees them. > > 2. In second traversal, check whether we want to keep the div. If > > so, use the div that we stored before, as it will still contain > > all children. > > > > Here's the code: > > > > local divs = pandoc.List() > > local div_index = 0 > > > > function collect (d) > > divs:insert(d) > > end > > > > function filter (div) > > div_index = div_index + 1 > > if div.classes[1] == 'show' then > > return divs[div_index] > > else > > return {} > > end > > end > > > > return { > > {Div = collect}, > > {Div = filter} > > } > > > > Butch writes: > > > >> Thanks for the response. Yeah, I’ll either do what you suggested or use > >> some external filters before passing the files through Pandoc. > >> > >> > >> On Tuesday, 29 September 2020 at 19:42:05 UTC-3 John MacFarlane wrote: > >> > >>> > >>> This is the sort of thing that is currently a bit tricky > >>> with our filter architecture. > >>> > >>> One idea might be to do several passes (i.e., several filters, which > >>> you can include in the same lua file; see the docs). > >>> > >>> In the first pass, you'd set a special attribute keep="false" on > >>> all Divs. > >>> > >>> In the second pass, you'd match Divs with the 'show' class, > >>> and use walk_block to set keep="true" on all Divs inside it. > >>> You'd also set keep="true" on it. > >>> > >>> IN the third pass, you'd match Divs and remove them if > >>> keep="false". > >>> > >>> I think something like this could work. > >>> > >>> Butch writes: > >>> > >>> > Hello, > >>> > > >>> > I am trying to convert specific parts of an HTML file to Markdown. I > >>> want > >>> > to convert everything that’s inside a specific div (including other > div) > >>> > while excluding everything else. Is that possible? > >>> > > >>> > Here is an example. I want to take this: > >>> > > >>> >
> >>> >

This is the outer text.

> >>> >
> >>> >

This is the inner text.

> >>> >
> >>> >
> >>> >
> >>> >

This is the hidden text.

> >>> >
> >>> > > >>> > And convert it so I have this: > >>> > > >>> > ::: {.show} > >>> > This is the outer text. > >>> > > >>> > ::: {.inner} > >>> > This is the inner text. > >>> > ::: > >>> > ::: > >>> > > >>> > I.e., I want to convert everything that’s inside
> >>> > (including other div) and to exclude everything else in the document. > >>> > > >>> > If I use a filter like this: > >>> > > >>> > function Div(el) > >>> > if el.classes[1] == "show" then > >>> > return el > >>> > else > >>> > return {} > >>> > end > >>> > end > >>> > > >>> > The resulting Markdown will be: > >>> > > >>> > ::: {.show} > >>> > This is the outer text. > >>> > ::: > >>> > > >>> > Which is kind of expected. So what can I do to include in the > conversion > >>> > not only
, but also all the other div inside it? > >>> > > >>> > The actual HTML files I want to convert are very large, so I can’t > list > >>> all > >>> > the classes I want to include (or exclude from) in the conversion. > >>> > > >>> > Thanks in advance. > >>> > > >>> > > >>> > -- > >>> > You received this message because you are subscribed to the Google > >>> Groups "pandoc-discuss" group. > >>> > To unsubscribe from this group and stop receiving emails from it, > send > >>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > >>> > To view this discussion on the web visit > >>> > https://groups.google.com/d/msgid/pandoc-discuss/ee79a1ca-efb1-463c-ace9-5398c8e623e3n%40googlegroups.com > >>> . > >>> > > > > > > -- > > Albert Krewinkel > > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/87lfgrk5sb.fsf%40zeitkraut.de > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d8bdce6a-7632-4107-a700-0b228c9c3f74n%40googlegroups.com.