* md-docx-md rountripping not working
@ 2018-12-23 21:55 Denis Maier
[not found] ` <de0093bf-6f3a-45cc-be5b-02764dd126bd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Denis Maier @ 2018-12-23 21:55 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2178 bytes --]
I have tested if I can convert a docx produced with pandoc back to markdown
after making a few changes in the docx. Normal paragraphs, blockquotes,
headings and footnotes work fine. However, author, title, and date end up
as normal paragraphs in the resulting markdown file, whereas the original
source had a yaml metadata block.
I have this file (test.md) and use `pandoc test.md -o test.docx`
```
---
author: Author
title: Test
date: Dezember 2018
---
Heading
=======
Test Test Test
```
I can then convert the resulting docx to pandoc's native format:
```
Pandoc (Meta {unMeta = fromList [("author",MetaInlines [Str
"Author"]),("date",MetaInlines [Str "Dezember",Space,Str
"2018"]),("title",MetaInlines [Str "Test"])]})
[Header 1 ("heading",[],[]) [Str "Heading"]
,Div ("",[],[("custom-style","FirstParagraph")])
[Para [Str "Test",Space,Str "Test",Space,Str "Test"]]]
```
Now, after making one small edit converting this to pandoc's native format
(`pandoc test.docx -f docx+styles -t native) gives me:
```
Pandoc (Meta {unMeta = fromList []})
[Div ("",[],[("custom-style","Titel")])
[Para [Str "Test"]]
,Div ("",[],[("custom-style","Author")])
[Para [Str "Author"]]
,Div ("",[],[("custom-style","Datum")])
[Para [Str "Dezember",Space,Str "2018"]]
,Header 1 ("heading",[],[]) [Str "Heading"]
,Div ("",[],[("custom-style","FirstParagraph")])
[Para [Str "Test",Space,Str "Test",Space,Str "Test.",Space,Str
"Another",Space,Str "Test."]]]
```
What is going wrong here? As you can see my change was trivial and occured
not in the metadata. Nevertheless, we end up with different styles.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 3315 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: md-docx-md rountripping not working
[not found] ` <de0093bf-6f3a-45cc-be5b-02764dd126bd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-24 11:32 ` BP Jonsson
2018-12-25 15:38 ` Denis Maier
0 siblings, 1 reply; 4+ messages in thread
From: BP Jonsson @ 2018-12-24 11:32 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1: Type: text/plain, Size: 3767 bytes --]
I think this is what is to be expected, since Pandoc inserts the title,
author and date as *text* elements at the top of the docx document, which
usually is what you want. It's the same with HTML. In an HTML template you
can arrange it for the title heading, author etc. to be marked with a class
so that you can use a filter to rearrange them as metadata when converting
back to markdown. I don't know if the docx writer applies any special
styles to these elements but if it does/did you could use the `+styles`
extension and catch them with a filter.
/bpj
Den sön 23 dec. 2018 22:55Denis Maier <maier.de-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
> I have tested if I can convert a docx produced with pandoc back to
> markdown after making a few changes in the docx. Normal paragraphs,
> blockquotes, headings and footnotes work fine. However, author, title, and
> date end up as normal paragraphs in the resulting markdown file, whereas
> the original source had a yaml metadata block.
>
> I have this file (test.md) and use `pandoc test.md -o test.docx`
>
> ```
> ---
> author: Author
> title: Test
> date: Dezember 2018
> ---
>
> Heading
> =======
>
> Test Test Test
> ```
>
> I can then convert the resulting docx to pandoc's native format:
>
> ```
> Pandoc (Meta {unMeta = fromList [("author",MetaInlines [Str
> "Author"]),("date",MetaInlines [Str "Dezember",Space,Str
> "2018"]),("title",MetaInlines [Str "Test"])]})
> [Header 1 ("heading",[],[]) [Str "Heading"]
> ,Div ("",[],[("custom-style","FirstParagraph")])
> [Para [Str "Test",Space,Str "Test",Space,Str "Test"]]]
> ```
>
> Now, after making one small edit converting this to pandoc's native format
> (`pandoc test.docx -f docx+styles -t native) gives me:
>
> ```
> Pandoc (Meta {unMeta = fromList []})
> [Div ("",[],[("custom-style","Titel")])
> [Para [Str "Test"]]
> ,Div ("",[],[("custom-style","Author")])
> [Para [Str "Author"]]
> ,Div ("",[],[("custom-style","Datum")])
> [Para [Str "Dezember",Space,Str "2018"]]
> ,Header 1 ("heading",[],[]) [Str "Heading"]
> ,Div ("",[],[("custom-style","FirstParagraph")])
> [Para [Str "Test",Space,Str "Test",Space,Str "Test.",Space,Str
> "Another",Space,Str "Test."]]]
> ```
>
> What is going wrong here? As you can see my change was trivial and occured
> not in the metadata. Nevertheless, we end up with different styles.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuT3isv%2B2MG%2BFk0swv5PB%2BhSHc39zFjum1qL3zyPui7N%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: Type: text/html, Size: 5431 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: md-docx-md rountripping not working
2018-12-24 11:32 ` BP Jonsson
@ 2018-12-25 15:38 ` Denis Maier
[not found] ` <5157c829-b27c-4e8c-83b3-44e227c0a637-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Denis Maier @ 2018-12-25 15:38 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 4317 bytes --]
Well, the point is that I can convert an unmodified pandoc produced docx
back to markdown, and the metadata will end up in an yaml metadata block.
The problem comes up only after I modify the docx in Word. As you can see
Word changes the names of the styles, perhaps because my default system
language is German, I guess the problem is related to this.
Denis
Am Montag, 24. Dezember 2018 12:33:14 UTC+1 schrieb BP Jonsson:
>
> I think this is what is to be expected, since Pandoc inserts the title,
> author and date as *text* elements at the top of the docx document, which
> usually is what you want. It's the same with HTML. In an HTML template you
> can arrange it for the title heading, author etc. to be marked with a class
> so that you can use a filter to rearrange them as metadata when converting
> back to markdown. I don't know if the docx writer applies any special
> styles to these elements but if it does/did you could use the `+styles`
> extension and catch them with a filter.
>
> /bpj
>
>
> Den sön 23 dec. 2018 22:55Denis Maier <maie...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>>
> skrev:
>
>> I have tested if I can convert a docx produced with pandoc back to
>> markdown after making a few changes in the docx. Normal paragraphs,
>> blockquotes, headings and footnotes work fine. However, author, title, and
>> date end up as normal paragraphs in the resulting markdown file, whereas
>> the original source had a yaml metadata block.
>>
>> I have this file (test.md) and use `pandoc test.md -o test.docx`
>>
>> ```
>> ---
>> author: Author
>> title: Test
>> date: Dezember 2018
>> ---
>>
>> Heading
>> =======
>>
>> Test Test Test
>> ```
>>
>> I can then convert the resulting docx to pandoc's native format:
>>
>> ```
>> Pandoc (Meta {unMeta = fromList [("author",MetaInlines [Str
>> "Author"]),("date",MetaInlines [Str "Dezember",Space,Str
>> "2018"]),("title",MetaInlines [Str "Test"])]})
>> [Header 1 ("heading",[],[]) [Str "Heading"]
>> ,Div ("",[],[("custom-style","FirstParagraph")])
>> [Para [Str "Test",Space,Str "Test",Space,Str "Test"]]]
>> ```
>>
>> Now, after making one small edit converting this to pandoc's native
>> format (`pandoc test.docx -f docx+styles -t native) gives me:
>>
>> ```
>> Pandoc (Meta {unMeta = fromList []})
>> [Div ("",[],[("custom-style","Titel")])
>> [Para [Str "Test"]]
>> ,Div ("",[],[("custom-style","Author")])
>> [Para [Str "Author"]]
>> ,Div ("",[],[("custom-style","Datum")])
>> [Para [Str "Dezember",Space,Str "2018"]]
>> ,Header 1 ("heading",[],[]) [Str "Heading"]
>> ,Div ("",[],[("custom-style","FirstParagraph")])
>> [Para [Str "Test",Space,Str "Test",Space,Str "Test.",Space,Str
>> "Another",Space,Str "Test."]]]
>> ```
>>
>> What is going wrong here? As you can see my change was trivial and
>> occured not in the metadata. Nevertheless, we end up with different styles.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com
>> <https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5157c829-b27c-4e8c-83b3-44e227c0a637%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 6913 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: md-docx-md rountripping not working
[not found] ` <5157c829-b27c-4e8c-83b3-44e227c0a637-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-12-25 23:59 ` BP Jonsson
0 siblings, 0 replies; 4+ messages in thread
From: BP Jonsson @ 2018-12-25 23:59 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Denis Maier
Den 2018-12-25 kl. 16:38, skrev Denis Maier:
> Well, the point is that I can convert an unmodified pandoc produced docx
> back to markdown, and the metadata will end up in an yaml metadata block.
> The problem comes up only after I modify the docx in Word. As you can see
> Word changes the names of the styles, perhaps because my default system
> language is German, I guess the problem is related to this.
I see. I did some exploration and experimentation along these lines.
For better or worse I don't have any OS where Word runs available
at the moment, but I created a docx file with author/title/date
metadata fields and changed it in LibreOffice, including changing
the document language to Swedish, and nothing similar happened,
but I noted that the title/author/date paragraphs inserted by
Pandoc have the named paragraph styles Title, Author and Date
respectively, of which the last two are listed as custom styles
and so probably are defined by Pandoc. When I change the paragraph
style of any of those paragraphs the metadata fields disappear
when I convert with `pandoc -so output.md input.docx`. (Note the
-s (aka --standalone) option --- without it no metadata is
included at all!)
When I changed the paragraph styles back the metadata fields
reappeared. In fact Pandoc seems to honor any paragraphs using one
of these paragraph styles but ignore the document properties, so
quite possibly all you need to do is checking that those named
paragraph styles are properly applied in your modified docx file.
/bpj
>
> Denis
>
> Am Montag, 24. Dezember 2018 12:33:14 UTC+1 schrieb BP Jonsson:
>>
>> I think this is what is to be expected, since Pandoc inserts the title,
>> author and date as *text* elements at the top of the docx document, which
>> usually is what you want. It's the same with HTML. In an HTML template you
>> can arrange it for the title heading, author etc. to be marked with a class
>> so that you can use a filter to rearrange them as metadata when converting
>> back to markdown. I don't know if the docx writer applies any special
>> styles to these elements but if it does/did you could use the `+styles`
>> extension and catch them with a filter.
>>
>> /bpj
>>
>>
>> Den sön 23 dec. 2018 22:55Denis Maier <maie...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>>
>> skrev:
>>
>>> I have tested if I can convert a docx produced with pandoc back to
>>> markdown after making a few changes in the docx. Normal paragraphs,
>>> blockquotes, headings and footnotes work fine. However, author, title, and
>>> date end up as normal paragraphs in the resulting markdown file, whereas
>>> the original source had a yaml metadata block.
>>>
>>> I have this file (test.md) and use `pandoc test.md -o test.docx`
>>>
>>> ```
>>> ---
>>> author: Author
>>> title: Test
>>> date: Dezember 2018
>>> ---
>>>
>>> Heading
>>> =======
>>>
>>> Test Test Test
>>> ```
>>>
>>> I can then convert the resulting docx to pandoc's native format:
>>>
>>> ```
>>> Pandoc (Meta {unMeta = fromList [("author",MetaInlines [Str
>>> "Author"]),("date",MetaInlines [Str "Dezember",Space,Str
>>> "2018"]),("title",MetaInlines [Str "Test"])]})
>>> [Header 1 ("heading",[],[]) [Str "Heading"]
>>> ,Div ("",[],[("custom-style","FirstParagraph")])
>>> [Para [Str "Test",Space,Str "Test",Space,Str "Test"]]]
>>> ```
>>>
>>> Now, after making one small edit converting this to pandoc's native
>>> format (`pandoc test.docx -f docx+styles -t native) gives me:
>>>
>>> ```
>>> Pandoc (Meta {unMeta = fromList []})
>>> [Div ("",[],[("custom-style","Titel")])
>>> [Para [Str "Test"]]
>>> ,Div ("",[],[("custom-style","Author")])
>>> [Para [Str "Author"]]
>>> ,Div ("",[],[("custom-style","Datum")])
>>> [Para [Str "Dezember",Space,Str "2018"]]
>>> ,Header 1 ("heading",[],[]) [Str "Heading"]
>>> ,Div ("",[],[("custom-style","FirstParagraph")])
>>> [Para [Str "Test",Space,Str "Test",Space,Str "Test.",Space,Str
>>> "Another",Space,Str "Test."]]]
>>> ```
>>>
>>> What is going wrong here? As you can see my change was trivial and
>>> occured not in the metadata. Nevertheless, we end up with different styles.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>> <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/bed73f36-2582-6440-550b-243246feebd9%40gmail.com.
For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-12-25 23:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-23 21:55 md-docx-md rountripping not working Denis Maier
[not found] ` <de0093bf-6f3a-45cc-be5b-02764dd126bd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-24 11:32 ` BP Jonsson
2018-12-25 15:38 ` Denis Maier
[not found] ` <5157c829-b27c-4e8c-83b3-44e227c0a637-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-12-25 23:59 ` BP Jonsson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).