public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Roundtripping md->html->md: footnotes
@ 2020-02-27 20:23 Denis Maier
       [not found] ` <3f582e2a-87f2-4981-8598-44e515308471-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Denis Maier @ 2020-02-27 20:23 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1890 bytes --]

Hey,
I have just tried converting a very simple markdown file to html and back.

```
Text^[footnote]
More Text^[Another footnote]
```

`pandoc test.md -o test.html` gives me:

```
<p>Text<a href="#fn1" class="footnote-ref" id="fnref1" 
role="doc-noteref"><sup>1</sup></a> More Text<a href="#fn2" 
class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>footnote<a href="#fnref1" 
class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>Another footnote<a href="#fnref2" 
class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
```

`pandoc test.html -o test2.md` produces:

```
Text[^1^](#fn1){#fnref1 .footnote-ref} More Text[^2^](#fn2){#fnref2
.footnote-ref}

::: {.section .footnotes role="doc-endnotes"}

------------------------------------------------------------------------

1.  ::: {#fn1}
    footnote[↩︎](#fnref1){.footnote-back}
    :::

2.  ::: {#fn2}
    Another footnote[↩︎](#fnref2){.footnote-back}
    :::
:::
```

Is there a way to produce correct markdown from html with footnotes?

Background: Roundtripping is not the main issue here; I was wondering if I 
could use pandoc in a latex->docx chain. Like, converting latex to html 
with tex4ht or lwarp, and then use pandoc to convert the html to docx or 
something else.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3f582e2a-87f2-4981-8598-44e515308471%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3028 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Roundtripping md->html->md: footnotes
       [not found] ` <3f582e2a-87f2-4981-8598-44e515308471-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-02-27 22:33   ` BPJ
  2020-02-28 17:33   ` John MacFarlane
  2020-02-28 23:34   ` Benct Philip Jonsson
  2 siblings, 0 replies; 6+ messages in thread
From: BPJ @ 2020-02-27 22:33 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 2950 bytes --]

It could surely be done with a filter since all is nicely tagged up with
classes and ids. One pass to pull the note texts out of the divs and into a
Lua table indexed by note ids, and then another pass to replace the links
with note elements containing those note texts.


Den tors 27 feb. 2020 21:24Denis Maier <maier.de-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> Hey,
> I have just tried converting a very simple markdown file to html and back.
>
> ```
> Text^[footnote]
> More Text^[Another footnote]
> ```
>
> `pandoc test.md -o test.html` gives me:
>
> ```
> <p>Text<a href="#fn1" class="footnote-ref" id="fnref1"
> role="doc-noteref"><sup>1</sup></a> More Text<a href="#fn2"
> class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
> <section class="footnotes" role="doc-endnotes">
> <hr />
> <ol>
> <li id="fn1" role="doc-endnote"><p>footnote<a href="#fnref1"
> class="footnote-back" role="doc-backlink">↩︎</a></p></li>
> <li id="fn2" role="doc-endnote"><p>Another footnote<a href="#fnref2"
> class="footnote-back" role="doc-backlink">↩︎</a></p></li>
> </ol>
> </section>
> ```
>
> `pandoc test.html -o test2.md` produces:
>
> ```
> Text[^1^](#fn1){#fnref1 .footnote-ref} More Text[^2^](#fn2){#fnref2
> .footnote-ref}
>
> ::: {.section .footnotes role="doc-endnotes"}
>
> ------------------------------------------------------------------------
>
> 1.  ::: {#fn1}
>     footnote[↩︎](#fnref1){.footnote-back}
>     :::
>
> 2.  ::: {#fn2}
>     Another footnote[↩︎](#fnref2){.footnote-back}
>     :::
> :::
> ```
>
> Is there a way to produce correct markdown from html with footnotes?
>
> Background: Roundtripping is not the main issue here; I was wondering if I
> could use pandoc in a latex->docx chain. Like, converting latex to html
> with tex4ht or lwarp, and then use pandoc to convert the html to docx or
> something else.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/3f582e2a-87f2-4981-8598-44e515308471%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/3f582e2a-87f2-4981-8598-44e515308471%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhB1W8xYTz8BUAbuMJ0PUdQKh0ZvXHQk3UgjXtj7PhdWPA%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4496 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Roundtripping md->html->md: footnotes
       [not found] ` <3f582e2a-87f2-4981-8598-44e515308471-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2020-02-27 22:33   ` BPJ
@ 2020-02-28 17:33   ` John MacFarlane
  2020-02-28 23:34   ` Benct Philip Jonsson
  2 siblings, 0 replies; 6+ messages in thread
From: John MacFarlane @ 2020-02-28 17:33 UTC (permalink / raw)
  To: Denis Maier, pandoc-discuss


I take it the main issue here is that the HTML footnotes
are not recognized as footnotes.  (Other issues like the
fenced divs and attributes can be affected by disabling
pandoc extensions.)

There's an open issue relevant to this:
https://github.com/jgm/pandoc/issues/5294


Denis Maier <maier.de-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hey,
> I have just tried converting a very simple markdown file to html and back.
>
> ```
> Text^[footnote]
> More Text^[Another footnote]
> ```
>
> `pandoc test.md -o test.html` gives me:
>
> ```
> <p>Text<a href="#fn1" class="footnote-ref" id="fnref1" 
> role="doc-noteref"><sup>1</sup></a> More Text<a href="#fn2" 
> class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
> <section class="footnotes" role="doc-endnotes">
> <hr />
> <ol>
> <li id="fn1" role="doc-endnote"><p>footnote<a href="#fnref1" 
> class="footnote-back" role="doc-backlink">↩︎</a></p></li>
> <li id="fn2" role="doc-endnote"><p>Another footnote<a href="#fnref2" 
> class="footnote-back" role="doc-backlink">↩︎</a></p></li>
> </ol>
> </section>
> ```
>
> `pandoc test.html -o test2.md` produces:
>
> ```
> Text[^1^](#fn1){#fnref1 .footnote-ref} More Text[^2^](#fn2){#fnref2
> .footnote-ref}
>
> ::: {.section .footnotes role="doc-endnotes"}
>
> ------------------------------------------------------------------------
>
> 1.  ::: {#fn1}
>     footnote[↩︎](#fnref1){.footnote-back}
>     :::
>
> 2.  ::: {#fn2}
>     Another footnote[↩︎](#fnref2){.footnote-back}
>     :::
> :::
> ```
>
> Is there a way to produce correct markdown from html with footnotes?
>
> Background: Roundtripping is not the main issue here; I was wondering if I 
> could use pandoc in a latex->docx chain. Like, converting latex to html 
> with tex4ht or lwarp, and then use pandoc to convert the html to docx or 
> something else.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3f582e2a-87f2-4981-8598-44e515308471%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/yh480k4kvak467.fsf%40johnmacfarlane.net.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Roundtripping md->html->md: footnotes
       [not found] ` <3f582e2a-87f2-4981-8598-44e515308471-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2020-02-27 22:33   ` BPJ
  2020-02-28 17:33   ` John MacFarlane
@ 2020-02-28 23:34   ` Benct Philip Jonsson
       [not found]     ` <526f8e29-f2d9-b09f-0fd4-87fd801bb780-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 6+ messages in thread
From: Benct Philip Jonsson @ 2020-02-28 23:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Denis Maier

[-- Attachment #1: Type: text/plain, Size: 739 bytes --]

On 2020-02-27 21:23, Denis Maier wrote:
> Hey,
> I have just tried converting a very simple markdown file to html and back.
> 
> ```
> Text^[footnote]
> More Text^[Another footnote]
> ```

The attached filter does the trick if run during the html to markdown 
conversion.  I haven't tested it extensively though, just on your example.


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/526f8e29-f2d9-b09f-0fd4-87fd801bb780%40gmail.com.

[-- Attachment #2: pdc-notes-roundtrip.lua --]
[-- Type: text/x-lua, Size: 1165 bytes --]

local notes = {}

local remove_backlinks = {
  Link = function (link)
    if link.classes:includes('footnote-back') then
      return {} -- delete backlink
    else
      return nil
    end
  end,
}

local collect_contents = {
  Div = function (div)
    if div.classes:includes('footnotes') then return nil end
    div = pandoc.walk_block(div, remove_backlinks)
    local id = div.identifier
    notes[id] = div.content
    return nil -- we delete everything later
  end,
}

local collect_notes = {
  Div = function (div)
    if not div.classes:includes('footnotes') then return nil end
    pandoc.walk_block(div, collect_contents)
    return {} -- delete the footnotes div
  end,
}

local replace_refs = {
  Link = function (link)
    if not link.classes:includes('footnote-ref') then return nil end
    local fn_id = link.target:match("^%#(.+)")
    if not fn_id or not notes[fn_id] then return nil end
    return pandoc.Note(notes[fn_id])
  end,
}

function Pandoc (doc)
  local blocks = pandoc.Div(doc.blocks)
  blocks = pandoc.walk_block(blocks, collect_notes)
  blocks = pandoc.walk_block(blocks, replace_refs)
  doc.blocks = blocks.content
  return doc
end


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Roundtripping md->html->md: footnotes
       [not found]     ` <526f8e29-f2d9-b09f-0fd4-87fd801bb780-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2020-02-29 16:29       ` Denis Maier
       [not found]         ` <0b7b7d86-bcb8-4910-bc30-1cbf72699ed0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Denis Maier @ 2020-02-29 16:29 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1101 bytes --]

Wow, neat. Thank you so much! That's also a very nice example to understand 
how filters work. I see if this can be used. (Perhaps this can also be 
adapted to HTML produced by latex.)

Would be great nevertheless if this could be added to the pandoc's core.

Am Samstag, 29. Februar 2020 00:34:19 UTC+1 schrieb BP:
>
> On 2020-02-27 21:23, Denis Maier wrote: 
> > Hey, 
> > I have just tried converting a very simple markdown file to html and 
> back. 
> > 
> > ``` 
> > Text^[footnote] 
> > More Text^[Another footnote] 
> > ``` 
>
> The attached filter does the trick if run during the html to markdown 
> conversion.  I haven't tested it extensively though, just on your example. 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/0b7b7d86-bcb8-4910-bc30-1cbf72699ed0%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1635 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Roundtripping md->html->md: footnotes
       [not found]         ` <0b7b7d86-bcb8-4910-bc30-1cbf72699ed0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-03-02 11:16           ` BPJ
  0 siblings, 0 replies; 6+ messages in thread
From: BPJ @ 2020-03-02 11:16 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 4601 bytes --]

It's actually kind of weird in that it is a filter which runs a filter
which runs a filter which runs a filter (`Pandoc` → `collect_notes` →
`collect_contents` → `remove_backlinks`) each on the content of some
element selected by the previous one, but I actually think that this is the
most efficient way of performing this task (in the case of `collect_notes`
running `collect_contents` possibly the only way, since the divs containing
the content of each individual note don't have any identifying classes).
Also the `collect_notes` and `replace_refs` filters are run "manually" from
within a global `Pandoc` function, rather than returning the two filters to
be run in succession by the filter engine.  That approach would probably
have worked too, but I feel that the way I did it shows more clearly what
is going on, at the price of not using the standard way to do it. The
`remove_backlinks` filter could probably have been run on the whole
document contents too, but it seems wasteful as I actually got a reference
to one of the divs actually containing the backlinks right there, so it
seems more efficient to run that filter locally than to visit every link in
the document only to remove those few. I guess I should time the various
alternatives on a larger document to see which is actually more efficient,
but since the differences are probably counted in milliseconds anyway
opting for clarity to a human reading the code is hopefully the best way.

BTW this is yet another of my Perl filters ported to Lua, and I just
reproduced the way I had done it in Perl. Both engines use the approach
where each filter is an associative array with closures indexed on element
type tags, so the translation was very straightforward. The main difference
is that in Perl each subfilter is compiled into a function reference once
before being used, which at least is a bit more efficient, in Perl.
FWIW I will sooner or later have ported all my Perl filters to Lua, except
those using Perl regular expressions which can't be reproduced using
regular Lua patterns.  I guess those too could be ported using
[Lpeg][]/[re][] or rather [LuLpeg][], since Lpeg which is a C library
doesn't work with the prebuilt Pandoc binaries, but at least on my laptop
making LuLpeg available to Lua 5.3 was a hassle which I hardly can expect
people to go through — the tl;dr is that you first have to install luajit,
then install LuLpeg using luajit, and then copy/symlink that installation
to where Lua 5.3 expects to find it, whence it actually works quite happily.


[Lpeg]: http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html

[re]: http://www.inf.puc-rio.br/~roberto/lpeg/re.html

[LuLpeg]: https://github.com/pygy/LuLPeg


Den lör 29 feb. 2020 17:30Denis Maier <maier.de-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> Wow, neat. Thank you so much! That's also a very nice example to
> understand how filters work. I see if this can be used. (Perhaps this can
> also be adapted to HTML produced by latex.)
>
> Would be great nevertheless if this could be added to the pandoc's core.
>
> Am Samstag, 29. Februar 2020 00:34:19 UTC+1 schrieb BP:
>>
>> On 2020-02-27 21:23, Denis Maier wrote:
>> > Hey,
>> > I have just tried converting a very simple markdown file to html and
>> back.
>> >
>> > ```
>> > Text^[footnote]
>> > More Text^[Another footnote]
>> > ```
>>
>> The attached filter does the trick if run during the html to markdown
>> conversion.  I haven't tested it extensively though, just on your
>> example.
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/0b7b7d86-bcb8-4910-bc30-1cbf72699ed0%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/0b7b7d86-bcb8-4910-bc30-1cbf72699ed0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDzbe908GzLxYQv2Ej48wPLdjtKxYqibkreJs80%3DoJz8g%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 5995 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-02 11:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-27 20:23 Roundtripping md->html->md: footnotes Denis Maier
     [not found] ` <3f582e2a-87f2-4981-8598-44e515308471-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-02-27 22:33   ` BPJ
2020-02-28 17:33   ` John MacFarlane
2020-02-28 23:34   ` Benct Philip Jonsson
     [not found]     ` <526f8e29-f2d9-b09f-0fd4-87fd801bb780-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2020-02-29 16:29       ` Denis Maier
     [not found]         ` <0b7b7d86-bcb8-4910-bc30-1cbf72699ed0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-03-02 11:16           ` BPJ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).