public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Converting everything that’s inside a specific div (including other div) while excluding everything else
@ 2020-09-29 20:45 Butch
       [not found] ` <ee79a1ca-efb1-463c-ace9-5398c8e623e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Butch @ 2020-09-29 20:45 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1703 bytes --]

Hello,

I am trying to convert specific parts of an HTML file to Markdown. I want 
to convert everything that’s inside a specific div (including other div) 
while excluding everything else. Is that possible?

Here is an example. I want to take this:

<div class="show">
    <p>This is the outer text.</p>
    <div class="inner">
        <p>This is the inner text.</p>
    </div>
</div>
<div class="hide">
    <p>This is the hidden text.</p>
</div>

And convert it so I have this:

::: {.show}
This is the outer text.

::: {.inner}
This is the inner text.
:::
:::

I.e., I want to convert everything that’s inside <div class="show"> 
(including other div) and to exclude everything else in the document.

If I use a filter like this:

function Div(el)
    if el.classes[1] == "show" then
        return el
    else
        return {}
    end
end

The resulting Markdown will be:

::: {.show}
This is the outer text.
:::

Which is kind of expected. So what can I do to include in the conversion 
not only <div class="show">, but also all the other div inside it?

The actual HTML files I want to convert are very large, so I can’t list all 
the classes I want to include (or exclude from) in the conversion.

Thanks in advance.


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ee79a1ca-efb1-463c-ace9-5398c8e623e3n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3682 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-03  5:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 20:45 Converting everything that’s inside a specific div (including other div) while excluding everything else Butch
     [not found] ` <ee79a1ca-efb1-463c-ace9-5398c8e623e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-09-29 22:41   ` John MacFarlane
     [not found]     ` <m2zh58xl1w.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-09-30  5:11       ` Butch
     [not found]         ` <d6b951e8-141e-4497-85cb-f5ecc8b992a4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-09-30  8:49           ` Albert Krewinkel
     [not found]             ` <87lfgrk5sb.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-09-30 19:26               ` John MacFarlane
     [not found]                 ` <m2eemjxe0f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-10-01  5:24                   ` Butch
     [not found]                     ` <d8bdce6a-7632-4107-a700-0b228c9c3f74n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-10-02 17:43                       ` Albert Krewinkel
     [not found]                         ` <87d020jzgd.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-10-03  5:29                           ` Butch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).