* Semantic losses in HTML to Markdown back conversion
@ 2017-06-07 6:15 Paul Netsaver
0 siblings, 0 replies; only message in thread
From: Paul Netsaver @ 2017-06-07 6:15 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 3602 bytes --]
Hi, this is a post started at Github/Pandoc
<https://github.com/jgm/pandoc/issues/3724>.
I would like to discuss the opportunity to get a *full reversibility of the
HTML code translated from lightweight markup*.
Let's take a certain environment using markdown for input and pandoc for
HTML preview and export: there will be a collection of source md files and
generated HTML files.
Now suppose that later in that environment, or in a different one, only the
HTML files are available.
Or, as alternative, let's take an environment working basically with HTML
files and translating to markdown only for user modifications on-the-fly.
In such cases, it would be useful having a full reversible notation, so
that the transformation:
MDcode --> HTML --> MDcode could give back exactly the same source.
As example, see what happens with footnotes:
That's some text with a footnote.[^1]
[^1]: And that's the footnote.
The above markdown code will be translated into:
<p>That’s some text with a footnote.<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a></p>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>And that’s the footnote.<a href="#fnref1">↩</a></p></li>
</ol>
</div>
Now, executing again the transformation to markdown:
That’s some text with a footnote.[^1^]
<div class="footnotes">
------------------------------------------------------------------------
1. <div id="fn1">
</div>
And that’s the footnote.[↩]
</div>
[^1^]: #fn1{#fnref1 .footnoteRef}
[↩]: #fnref1
Repeating forward and backward these passages, both md and HTML codes will
explode!
Specifically, pandoc *html to markdown* converter ignores the footnotes and
footnoteRef classes, but anyway reconstructing the original code is not
easy. Maybe it would still be possible reversing the original code by
making unique conventions about *footnote section*, *block structure*,
*classnames*, etc.
I suspect this is a general behavior for all Common idioms without
dedicated elements
<https://www.w3.org/TR/html5/common-idioms.html#common-idioms>...
Of course it wouldn't be possible when performing these passages on
different environments.
Please, consider also an application translating webpages into markdown
articles.
In this case it would no more be possible to perform a semantic conversion,
but at least, once performed the 1st passage to markdown (and having
accepted the related *semantic loss*), from that moment onwards it should
no longer cumulate any garbage...
In other terms, I'ld prefer *lightweight constructs which can be made
corresponding uniquely to html constructs*. Probably for most application
it would not be necessary, but finally we're speaking of a new standard and
we do not really know in how many (and which many) different environments
these processes will work, so... the more general, the more robust...
What do you think about?
Netsaver Paul (Rome, IT)
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9e8c265c-494e-4994-bb9d-b783c66242f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 10187 bytes --]
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2017-06-07 6:15 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07 6:15 Semantic losses in HTML to Markdown back conversion Paul Netsaver
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).