* Is there a way to change the way Pandoc parses HTML inside of markdown documents?
@ 2021-08-16 21:43 pompez
[not found] ` <aae29ca7-60ca-4349-af03-939f0ac503efn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: pompez @ 2021-08-16 21:43 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2262 bytes --]
I'm starting out with Lua filters and apologize for this possibly already
answered question. You can also read this question on StackOverflow
<https://stackoverflow.com/questions/68809527/is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-documents>
.
I'm using Pandoc to convert markdown to HTML. My markdown files also
contain some raw HTML. In the examples, I'll be using `<mark>` and `<u>`.
Let's say I want to change every `<mark>` to a `<u>` tag. We parse the
input as HTML and look at the AST.
```
$ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=html --to native
[Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("", ["mark"],[])
[Str "bar"]]]
```
On this structure, we can use a simple filter which replaces `Span`
elements representing the `<mark>` tag and replaces with `Underline`
elements.
```
function Span(elem)
if elem.classes[1]:gmatch('mark') then
return pandoc.Underline(elem.content)
end
end
```
```
[Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str "bar"]]]
```
This is good. But if we parse the same input as markdown, we get a much
less convenient structure.
```
$ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=markdown+raw_html
--to native
[Para [RawInline (Format "html") "<u>",Str "foo",RawInline (Format "html")
"</u>",Space,Str "&",Space,RawInline (Format "html") "<mark>",Str
"bar",RawInline (Format "html") "</mark>"]]
```
And if we had some additional criteria by which to replace `<mark>` with
`<u>` (the content for example), we would have to identify the opening and
closing `RawInline` elements.
I'm wondering if there is any good solutions to this problem? Is there a
way to parse HTML in markdown just as HTML would be parsed otherwise? Or is
there way to solve this in a Lua filter without writing some parsing code?
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 2902 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is there a way to change the way Pandoc parses HTML inside of markdown documents?
[not found] ` <aae29ca7-60ca-4349-af03-939f0ac503efn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-08-16 22:08 ` John MacFarlane
[not found] ` <yh480k1r6tt53d.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: John MacFarlane @ 2021-08-16 22:08 UTC (permalink / raw)
To: pompez, pandoc-discuss
I'm afraid you'll have to write some parsing code...
pompez <martinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> I'm starting out with Lua filters and apologize for this possibly already
> answered question. You can also read this question on StackOverflow
> <https://stackoverflow.com/questions/68809527/is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-documents>
> .
>
> I'm using Pandoc to convert markdown to HTML. My markdown files also
> contain some raw HTML. In the examples, I'll be using `<mark>` and `<u>`.
>
> Let's say I want to change every `<mark>` to a `<u>` tag. We parse the
> input as HTML and look at the AST.
>
> ```
> $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=html --to native
> [Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("", ["mark"],[])
> [Str "bar"]]]
> ```
>
> On this structure, we can use a simple filter which replaces `Span`
> elements representing the `<mark>` tag and replaces with `Underline`
> elements.
>
> ```
> function Span(elem)
> if elem.classes[1]:gmatch('mark') then
> return pandoc.Underline(elem.content)
> end
> end
> ```
>
> ```
> [Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str "bar"]]]
> ```
>
> This is good. But if we parse the same input as markdown, we get a much
> less convenient structure.
>
> ```
> $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=markdown+raw_html
> --to native
> [Para [RawInline (Format "html") "<u>",Str "foo",RawInline (Format "html")
> "</u>",Space,Str "&",Space,RawInline (Format "html") "<mark>",Str
> "bar",RawInline (Format "html") "</mark>"]]
> ```
>
> And if we had some additional criteria by which to replace `<mark>` with
> `<u>` (the content for example), we would have to identify the opening and
> closing `RawInline` elements.
>
> I'm wondering if there is any good solutions to this problem? Is there a
> way to parse HTML in markdown just as HTML would be parsed otherwise? Or is
> there way to solve this in a Lua filter without writing some parsing code?
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is there a way to change the way Pandoc parses HTML inside of markdown documents?
[not found] ` <yh480k1r6tt53d.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2021-08-16 22:55 ` pompez
2021-08-17 10:37 ` William Lupton
1 sibling, 0 replies; 6+ messages in thread
From: pompez @ 2021-08-16 22:55 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 3152 bytes --]
That's okay. Just wanted to know beforehand. Thanks.
On Tuesday, August 17, 2021 at 12:09:15 AM UTC+2 John MacFarlane wrote:
>
> I'm afraid you'll have to write some parsing code...
>
> pompez <martins...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I'm starting out with Lua filters and apologize for this possibly
> already
> > answered question. You can also read this question on StackOverflow
> > <
> https://stackoverflow.com/questions/68809527/is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-documents
> >
> > .
> >
> > I'm using Pandoc to convert markdown to HTML. My markdown files also
> > contain some raw HTML. In the examples, I'll be using `<mark>` and `<u>`.
> >
> > Let's say I want to change every `<mark>` to a `<u>` tag. We parse the
> > input as HTML and look at the AST.
> >
> > ```
> > $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=html --to native
> > [Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("", ["mark"],[])
> > [Str "bar"]]]
> > ```
> >
> > On this structure, we can use a simple filter which replaces `Span`
> > elements representing the `<mark>` tag and replaces with `Underline`
> > elements.
> >
> > ```
> > function Span(elem)
> > if elem.classes[1]:gmatch('mark') then
> > return pandoc.Underline(elem.content)
> > end
> > end
> > ```
> >
> > ```
> > [Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str "bar"]]]
> > ```
> >
> > This is good. But if we parse the same input as markdown, we get a much
> > less convenient structure.
> >
> > ```
> > $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=markdown+raw_html
> > --to native
> > [Para [RawInline (Format "html") "<u>",Str "foo",RawInline (Format
> "html")
> > "</u>",Space,Str "&",Space,RawInline (Format "html") "<mark>",Str
> > "bar",RawInline (Format "html") "</mark>"]]
> > ```
> >
> > And if we had some additional criteria by which to replace `<mark>` with
> > `<u>` (the content for example), we would have to identify the opening
> and
> > closing `RawInline` elements.
> >
> > I'm wondering if there is any good solutions to this problem? Is there a
> > way to parse HTML in markdown just as HTML would be parsed otherwise? Or
> is
> > there way to solve this in a Lua filter without writing some parsing
> code?
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com
> .
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/24f0fa08-cbd8-478c-9db0-d99ed2901148n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 5266 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is there a way to change the way Pandoc parses HTML inside of markdown documents?
[not found] ` <yh480k1r6tt53d.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2021-08-16 22:55 ` pompez
@ 2021-08-17 10:37 ` William Lupton
[not found] ` <CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy+TuqDZf5LP3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 1 reply; 6+ messages in thread
From: William Lupton @ 2021-08-17 10:37 UTC (permalink / raw)
To: pandoc-discuss; +Cc: pompez
[-- Attachment #1: Type: text/plain, Size: 3647 bytes --]
Could pandoc.read(markup, "html")
<https://pandoc.org/lua-filters.html#pandoc.read> help?
On Mon, 16 Aug 2021 at 23:09, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:
>
> I'm afraid you'll have to write some parsing code...
>
> pompez <martinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I'm starting out with Lua filters and apologize for this possibly
> already
> > answered question. You can also read this question on StackOverflow
> > <
> https://stackoverflow.com/questions/68809527/is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-documents
> >
> > .
> >
> > I'm using Pandoc to convert markdown to HTML. My markdown files also
> > contain some raw HTML. In the examples, I'll be using `<mark>` and `<u>`.
> >
> > Let's say I want to change every `<mark>` to a `<u>` tag. We parse the
> > input as HTML and look at the AST.
> >
> > ```
> > $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=html --to native
> > [Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("", ["mark"],[])
> > [Str "bar"]]]
> > ```
> >
> > On this structure, we can use a simple filter which replaces `Span`
> > elements representing the `<mark>` tag and replaces with `Underline`
> > elements.
> >
> > ```
> > function Span(elem)
> > if elem.classes[1]:gmatch('mark') then
> > return pandoc.Underline(elem.content)
> > end
> > end
> > ```
> >
> > ```
> > [Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str "bar"]]]
> > ```
> >
> > This is good. But if we parse the same input as markdown, we get a much
> > less convenient structure.
> >
> > ```
> > $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=markdown+raw_html
> > --to native
> > [Para [RawInline (Format "html") "<u>",Str "foo",RawInline (Format
> "html")
> > "</u>",Space,Str "&",Space,RawInline (Format "html") "<mark>",Str
> > "bar",RawInline (Format "html") "</mark>"]]
> > ```
> >
> > And if we had some additional criteria by which to replace `<mark>` with
> > `<u>` (the content for example), we would have to identify the opening
> and
> > closing `RawInline` elements.
> >
> > I'm wondering if there is any good solutions to this problem? Is there a
> > way to parse HTML in markdown just as HTML would be parsed otherwise? Or
> is
> > there way to solve this in a Lua filter without writing some parsing
> code?
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/yh480k1r6tt53d.fsf%40johnmacfarlane.net
> .
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy%2BTuqDZf5LP3g%40mail.gmail.com.
[-- Attachment #2: Type: text/html, Size: 5812 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is there a way to change the way Pandoc parses HTML inside of markdown documents?
[not found] ` <CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy+TuqDZf5LP3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2021-08-17 11:24 ` Bastien DUMONT
2021-08-24 8:44 ` pompez
1 sibling, 0 replies; 6+ messages in thread
From: Bastien DUMONT @ 2021-08-17 11:24 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
> On this structure, we can use a simple filter which replaces `Span`
> elements representing the `<mark>` tag and replaces with `Underline`
> elements.
>
> ```
> function Span(elem)
> if elem.classes[1]:gmatch('mark') then
> return pandoc.Underline(elem.content)
> end
> end
To apply the same code on a Markdown input file, you can use inline spans like this :
`[foo]{.underline} & [bar]{.mark}`.
Le Tuesday 17 August 2021 à 11:37:21AM, William Lupton a écrit :
> Could [1]pandoc.read(markup, "html") help?
>
> On Mon, 16 Aug 2021 at 23:09, John MacFarlane <[2]jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:
>
>
> I'm afraid you'll have to write some parsing code...
>
> pompez <[3]martinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I'm starting out with Lua filters and apologize for this possibly already
> > answered question. You can also read this question on StackOverflow
> > <[4]https://stackoverflow.com/questions/68809527/
> is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-documents
> >
> > .
> >
> > I'm using Pandoc to convert markdown to HTML. My markdown files also
> > contain some raw HTML. In the examples, I'll be using `<mark>` and `<u>`.
> >
> > Let's say I want to change every `<mark>` to a `<u>` tag. We parse the
> > input as HTML and look at the AST.
> >
> > ```
> > $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=html --to native
> > [Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("", ["mark"],[])
> > [Str "bar"]]]
> > ```
> >
> > On this structure, we can use a simple filter which replaces `Span`
> > elements representing the `<mark>` tag and replaces with `Underline`
> > elements.
> >
> > ```
> > function Span(elem)
> > if elem.classes[1]:gmatch('mark') then
> > return pandoc.Underline(elem.content)
> > end
> > end
> > ```
> >
> > ```
> > [Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str "bar"]]]
> > ```
> >
> > This is good. But if we parse the same input as markdown, we get a much
> > less convenient structure.
> >
> > ```
> > $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=markdown+raw_html
> > --to native
> > [Para [RawInline (Format "html") "<u>",Str "foo",RawInline (Format
> "html")
> > "</u>",Space,Str "&",Space,RawInline (Format "html") "<mark>",Str
> > "bar",RawInline (Format "html") "</mark>"]]
> > ```
> >
> > And if we had some additional criteria by which to replace `<mark>` with
> > `<u>` (the content for example), we would have to identify the opening
> and
> > closing `RawInline` elements.
> >
> > I'm wondering if there is any good solutions to this problem? Is there a
> > way to parse HTML in markdown just as HTML would be parsed otherwise? Or
> is
> > there way to solve this in a Lua filter without writing some parsing
> code?
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit [6]https://groups.google.com/d/
> msgid/pandoc-discuss/
> aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [7]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [8]https://groups.google.com/d/
> msgid/pandoc-discuss/yh480k1r6tt53d.fsf%40johnmacfarlane.net.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [9]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [10]https://groups.google.com/d/msgid/
> pandoc-discuss/
> CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy%2BTuqDZf5LP3g%40mail.gmail.com.
>
> References:
>
> [1] https://pandoc.org/lua-filters.html#pandoc.read
> [2] mailto:jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org
> [3] mailto:martinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
> [4] https://stackoverflow.com/questions/68809527/is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-documents
> [5] mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [6] https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com
> [7] mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [8] https://groups.google.com/d/msgid/pandoc-discuss/yh480k1r6tt53d.fsf%40johnmacfarlane.net
> [9] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [10] https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy%2BTuqDZf5LP3g%40mail.gmail.com?utm_medium=email&utm_source=footer
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YRuccFhI3anHPRPc%40localhost.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is there a way to change the way Pandoc parses HTML inside of markdown documents?
[not found] ` <CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy+TuqDZf5LP3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-08-17 11:24 ` Bastien DUMONT
@ 2021-08-24 8:44 ` pompez
1 sibling, 0 replies; 6+ messages in thread
From: pompez @ 2021-08-24 8:44 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 3928 bytes --]
Sorry for the late reply. In my case, I'd still like to recognize the
contents inside the block.
On Tuesday, August 17, 2021 at 12:37:37 PM UTC+2 William Lupton wrote:
> Could pandoc.read(markup, "html")
> <https://pandoc.org/lua-filters.html#pandoc.read> help?
>
> On Mon, 16 Aug 2021 at 23:09, John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:
>
>>
>> I'm afraid you'll have to write some parsing code...
>>
>> pompez <martins...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > I'm starting out with Lua filters and apologize for this possibly
>> already
>> > answered question. You can also read this question on StackOverflow
>> > <
>> https://stackoverflow.com/questions/68809527/is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-documents
>> >
>> > .
>> >
>> > I'm using Pandoc to convert markdown to HTML. My markdown files also
>> > contain some raw HTML. In the examples, I'll be using `<mark>` and
>> `<u>`.
>> >
>> > Let's say I want to change every `<mark>` to a `<u>` tag. We parse the
>> > input as HTML and look at the AST.
>> >
>> > ```
>> > $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=html --to native
>> > [Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("",
>> ["mark"],[])
>> > [Str "bar"]]]
>> > ```
>> >
>> > On this structure, we can use a simple filter which replaces `Span`
>> > elements representing the `<mark>` tag and replaces with `Underline`
>> > elements.
>> >
>> > ```
>> > function Span(elem)
>> > if elem.classes[1]:gmatch('mark') then
>> > return pandoc.Underline(elem.content)
>> > end
>> > end
>> > ```
>> >
>> > ```
>> > [Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str
>> "bar"]]]
>> > ```
>> >
>> > This is good. But if we parse the same input as markdown, we get a much
>> > less convenient structure.
>> >
>> > ```
>> > $ echo '<u>foo</u> & <mark>bar</mark>' | pandoc
>> --from=markdown+raw_html
>> > --to native
>> > [Para [RawInline (Format "html") "<u>",Str "foo",RawInline (Format
>> "html")
>> > "</u>",Space,Str "&",Space,RawInline (Format "html") "<mark>",Str
>> > "bar",RawInline (Format "html") "</mark>"]]
>> > ```
>> >
>> > And if we had some additional criteria by which to replace `<mark>`
>> with
>> > `<u>` (the content for example), we would have to identify the opening
>> and
>> > closing `RawInline` elements.
>> >
>> > I'm wondering if there is any good solutions to this problem? Is there
>> a
>> > way to parse HTML in markdown just as HTML would be parsed otherwise?
>> Or is
>> > there way to solve this in a Lua filter without writing some parsing
>> code?
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com
>> .
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>
> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/yh480k1r6tt53d.fsf%40johnmacfarlane.net
>> .
>>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/411e9a84-5981-4bd8-b905-914a66d1dc3fn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 7225 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-24 8:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16 21:43 Is there a way to change the way Pandoc parses HTML inside of markdown documents? pompez
[not found] ` <aae29ca7-60ca-4349-af03-939f0ac503efn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-08-16 22:08 ` John MacFarlane
[not found] ` <yh480k1r6tt53d.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2021-08-16 22:55 ` pompez
2021-08-17 10:37 ` William Lupton
[not found] ` <CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy+TuqDZf5LP3g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-08-17 11:24 ` Bastien DUMONT
2021-08-24 8:44 ` pompez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).