Semantic losses in HTML to Markdown back conversion

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Semantic losses in HTML to Markdown back conversion
@ 2017-06-07  6:15 Paul Netsaver
  0 siblings, 0 replies; only message in thread
From: Paul Netsaver @ 2017-06-07  6:15 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 3602 bytes --]

Hi, this is a post started at Github/Pandoc 
<https://github.com/jgm/pandoc/issues/3724>.

I would like to discuss the opportunity to get a *full reversibility of the 
HTML code translated from lightweight markup*.
Let's take a certain environment using markdown for input and pandoc for 
HTML preview and export: there will be a collection of source md files and 
generated HTML files.
Now suppose that later in that environment, or in a different one, only the 
HTML files are available.
Or, as alternative, let's take an environment working basically with HTML 
files and translating to markdown only for user modifications on-the-fly.
In such cases, it would be useful having a full reversible notation, so 
that the transformation:
MDcode --> HTML --> MDcode could give back exactly the same source.
As example, see what happens with footnotes:

That's some text with a footnote.[^1]

[^1]: And that's the footnote.

The above markdown code will be translated into:

<p>That’s some text with a footnote.<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a></p>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>And that’s the footnote.<a href="#fnref1">↩</a></p></li>
</ol>
</div>

Now, executing again the transformation to markdown:

That’s some text with a footnote.[^1^]

<div class="footnotes">

------------------------------------------------------------------------

1.  <div id="fn1">

    </div>

    And that’s the footnote.[↩]

</div>

  [^1^]: #fn1{#fnref1 .footnoteRef}
  [↩]: #fnref1

Repeating forward and backward these passages, both md and HTML codes will 
explode!
Specifically, pandoc *html to markdown* converter ignores the footnotes and 
footnoteRef classes, but anyway reconstructing the original code is not 
easy. Maybe it would still be possible reversing the original code by 
making unique conventions about *footnote section*, *block structure*, 
*classnames*, etc.
I suspect this is a general behavior for all Common idioms without 
dedicated elements 
<https://www.w3.org/TR/html5/common-idioms.html#common-idioms>...
Of course it wouldn't be possible when performing these passages on 
different environments.

Please, consider also an application translating webpages into markdown 
articles.
In this case it would no more be possible to perform a semantic conversion, 
but at least, once performed the 1st passage to markdown (and having 
accepted the related *semantic loss*), from that moment onwards it should 
no longer cumulate any garbage...

In other terms, I'ld prefer *lightweight constructs which can be made 
corresponding uniquely to html constructs*. Probably for most application 
it would not be necessary, but finally we're speaking of a new standard and 
we do not really know in how many (and which many) different environments 
these processes will work, so... the more general, the more robust...

What do you think about?

Netsaver Paul (Rome, IT)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9e8c265c-494e-4994-bb9d-b783c66242f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 10187 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2017-06-07  6:15 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-07  6:15 Semantic losses in HTML to Markdown back conversion Paul Netsaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).