public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: md-docx-md rountripping not working
Date: Mon, 24 Dec 2018 12:32:58 +0100	[thread overview]
Message-ID: <CAFC_yuT3isv+2MG+Fk0swv5PB+hSHc39zFjum1qL3zyPui7N=Q@mail.gmail.com> (raw)
In-Reply-To: <de0093bf-6f3a-45cc-be5b-02764dd126bd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3767 bytes --]

I think this is what is to be expected, since Pandoc inserts the title,
author and date as *text* elements at the top of the docx document, which
usually is what you want. It's the same with HTML. In an HTML template you
can arrange it for the title heading, author etc. to be marked with a class
so that you can use a filter to rearrange them as metadata when converting
back to markdown. I don't know if the docx writer applies any special
styles to these elements but if it does/did you could use the `+styles`
extension and catch them with a filter.

/bpj


Den sön 23 dec. 2018 22:55Denis Maier <maier.de-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> I have tested if I can convert a docx produced with pandoc back to
> markdown after making a few changes in the docx. Normal paragraphs,
> blockquotes, headings and footnotes work fine. However, author, title, and
> date end up as normal paragraphs in the resulting markdown file, whereas
> the original source had a yaml metadata block.
>
> I have this file (test.md) and use `pandoc test.md -o test.docx`
>
> ```
> ---
> author: Author
> title: Test
> date: Dezember 2018
> ---
>
> Heading
> =======
>
> Test Test Test
> ```
>
> I can then convert the resulting docx to pandoc's native format:
>
> ```
> Pandoc (Meta {unMeta = fromList [("author",MetaInlines [Str
> "Author"]),("date",MetaInlines [Str "Dezember",Space,Str
> "2018"]),("title",MetaInlines [Str "Test"])]})
> [Header 1 ("heading",[],[]) [Str "Heading"]
> ,Div ("",[],[("custom-style","FirstParagraph")])
>  [Para [Str "Test",Space,Str "Test",Space,Str "Test"]]]
> ```
>
> Now, after making one small edit converting this to pandoc's native format
> (`pandoc test.docx -f docx+styles -t native) gives me:
>
> ```
> Pandoc (Meta {unMeta = fromList []})
> [Div ("",[],[("custom-style","Titel")])
>  [Para [Str "Test"]]
> ,Div ("",[],[("custom-style","Author")])
>  [Para [Str "Author"]]
> ,Div ("",[],[("custom-style","Datum")])
>  [Para [Str "Dezember",Space,Str "2018"]]
> ,Header 1 ("heading",[],[]) [Str "Heading"]
> ,Div ("",[],[("custom-style","FirstParagraph")])
>  [Para [Str "Test",Space,Str "Test",Space,Str "Test.",Space,Str
> "Another",Space,Str "Test."]]]
> ```
>
> What is going wrong here? As you can see my change was trivial and occured
> not in the metadata. Nevertheless, we end up with different styles.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuT3isv%2B2MG%2BFk0swv5PB%2BhSHc39zFjum1qL3zyPui7N%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 5431 bytes --]

  parent reply	other threads:[~2018-12-24 11:32 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-23 21:55 Denis Maier
     [not found] ` <de0093bf-6f3a-45cc-be5b-02764dd126bd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-24 11:32   ` BP Jonsson [this message]
2018-12-25 15:38     ` Denis Maier
     [not found]       ` <5157c829-b27c-4e8c-83b3-44e227c0a637-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-25 23:59         ` BP Jonsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFC_yuT3isv+2MG+Fk0swv5PB+hSHc39zFjum1qL3zyPui7N=Q@mail.gmail.com' \
    --to=bpjonsson-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).