* md-docx-md rountripping not working @ 2018-12-23 21:55 Denis Maier [not found] ` <de0093bf-6f3a-45cc-be5b-02764dd126bd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Denis Maier @ 2018-12-23 21:55 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 2178 bytes --] I have tested if I can convert a docx produced with pandoc back to markdown after making a few changes in the docx. Normal paragraphs, blockquotes, headings and footnotes work fine. However, author, title, and date end up as normal paragraphs in the resulting markdown file, whereas the original source had a yaml metadata block. I have this file (test.md) and use `pandoc test.md -o test.docx` ``` --- author: Author title: Test date: Dezember 2018 --- Heading ======= Test Test Test ``` I can then convert the resulting docx to pandoc's native format: ``` Pandoc (Meta {unMeta = fromList [("author",MetaInlines [Str "Author"]),("date",MetaInlines [Str "Dezember",Space,Str "2018"]),("title",MetaInlines [Str "Test"])]}) [Header 1 ("heading",[],[]) [Str "Heading"] ,Div ("",[],[("custom-style","FirstParagraph")]) [Para [Str "Test",Space,Str "Test",Space,Str "Test"]]] ``` Now, after making one small edit converting this to pandoc's native format (`pandoc test.docx -f docx+styles -t native) gives me: ``` Pandoc (Meta {unMeta = fromList []}) [Div ("",[],[("custom-style","Titel")]) [Para [Str "Test"]] ,Div ("",[],[("custom-style","Author")]) [Para [Str "Author"]] ,Div ("",[],[("custom-style","Datum")]) [Para [Str "Dezember",Space,Str "2018"]] ,Header 1 ("heading",[],[]) [Str "Heading"] ,Div ("",[],[("custom-style","FirstParagraph")]) [Para [Str "Test",Space,Str "Test",Space,Str "Test.",Space,Str "Another",Space,Str "Test."]]] ``` What is going wrong here? As you can see my change was trivial and occured not in the metadata. Nevertheless, we end up with different styles. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 3315 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <de0093bf-6f3a-45cc-be5b-02764dd126bd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: md-docx-md rountripping not working [not found] ` <de0093bf-6f3a-45cc-be5b-02764dd126bd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2018-12-24 11:32 ` BP Jonsson 2018-12-25 15:38 ` Denis Maier 0 siblings, 1 reply; 4+ messages in thread From: BP Jonsson @ 2018-12-24 11:32 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1: Type: text/plain, Size: 3767 bytes --] I think this is what is to be expected, since Pandoc inserts the title, author and date as *text* elements at the top of the docx document, which usually is what you want. It's the same with HTML. In an HTML template you can arrange it for the title heading, author etc. to be marked with a class so that you can use a filter to rearrange them as metadata when converting back to markdown. I don't know if the docx writer applies any special styles to these elements but if it does/did you could use the `+styles` extension and catch them with a filter. /bpj Den sön 23 dec. 2018 22:55Denis Maier <maier.de-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev: > I have tested if I can convert a docx produced with pandoc back to > markdown after making a few changes in the docx. Normal paragraphs, > blockquotes, headings and footnotes work fine. However, author, title, and > date end up as normal paragraphs in the resulting markdown file, whereas > the original source had a yaml metadata block. > > I have this file (test.md) and use `pandoc test.md -o test.docx` > > ``` > --- > author: Author > title: Test > date: Dezember 2018 > --- > > Heading > ======= > > Test Test Test > ``` > > I can then convert the resulting docx to pandoc's native format: > > ``` > Pandoc (Meta {unMeta = fromList [("author",MetaInlines [Str > "Author"]),("date",MetaInlines [Str "Dezember",Space,Str > "2018"]),("title",MetaInlines [Str "Test"])]}) > [Header 1 ("heading",[],[]) [Str "Heading"] > ,Div ("",[],[("custom-style","FirstParagraph")]) > [Para [Str "Test",Space,Str "Test",Space,Str "Test"]]] > ``` > > Now, after making one small edit converting this to pandoc's native format > (`pandoc test.docx -f docx+styles -t native) gives me: > > ``` > Pandoc (Meta {unMeta = fromList []}) > [Div ("",[],[("custom-style","Titel")]) > [Para [Str "Test"]] > ,Div ("",[],[("custom-style","Author")]) > [Para [Str "Author"]] > ,Div ("",[],[("custom-style","Datum")]) > [Para [Str "Dezember",Space,Str "2018"]] > ,Header 1 ("heading",[],[]) [Str "Heading"] > ,Div ("",[],[("custom-style","FirstParagraph")]) > [Para [Str "Test",Space,Str "Test",Space,Str "Test.",Space,Str > "Another",Space,Str "Test."]]] > ``` > > What is going wrong here? As you can see my change was trivial and occured > not in the metadata. Nevertheless, we end up with different styles. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com > <https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuT3isv%2B2MG%2BFk0swv5PB%2BhSHc39zFjum1qL3zyPui7N%3DQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 5431 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: md-docx-md rountripping not working 2018-12-24 11:32 ` BP Jonsson @ 2018-12-25 15:38 ` Denis Maier [not found] ` <5157c829-b27c-4e8c-83b3-44e227c0a637-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Denis Maier @ 2018-12-25 15:38 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 4317 bytes --] Well, the point is that I can convert an unmodified pandoc produced docx back to markdown, and the metadata will end up in an yaml metadata block. The problem comes up only after I modify the docx in Word. As you can see Word changes the names of the styles, perhaps because my default system language is German, I guess the problem is related to this. Denis Am Montag, 24. Dezember 2018 12:33:14 UTC+1 schrieb BP Jonsson: > > I think this is what is to be expected, since Pandoc inserts the title, > author and date as *text* elements at the top of the docx document, which > usually is what you want. It's the same with HTML. In an HTML template you > can arrange it for the title heading, author etc. to be marked with a class > so that you can use a filter to rearrange them as metadata when converting > back to markdown. I don't know if the docx writer applies any special > styles to these elements but if it does/did you could use the `+styles` > extension and catch them with a filter. > > /bpj > > > Den sön 23 dec. 2018 22:55Denis Maier <maie...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> > skrev: > >> I have tested if I can convert a docx produced with pandoc back to >> markdown after making a few changes in the docx. Normal paragraphs, >> blockquotes, headings and footnotes work fine. However, author, title, and >> date end up as normal paragraphs in the resulting markdown file, whereas >> the original source had a yaml metadata block. >> >> I have this file (test.md) and use `pandoc test.md -o test.docx` >> >> ``` >> --- >> author: Author >> title: Test >> date: Dezember 2018 >> --- >> >> Heading >> ======= >> >> Test Test Test >> ``` >> >> I can then convert the resulting docx to pandoc's native format: >> >> ``` >> Pandoc (Meta {unMeta = fromList [("author",MetaInlines [Str >> "Author"]),("date",MetaInlines [Str "Dezember",Space,Str >> "2018"]),("title",MetaInlines [Str "Test"])]}) >> [Header 1 ("heading",[],[]) [Str "Heading"] >> ,Div ("",[],[("custom-style","FirstParagraph")]) >> [Para [Str "Test",Space,Str "Test",Space,Str "Test"]]] >> ``` >> >> Now, after making one small edit converting this to pandoc's native >> format (`pandoc test.docx -f docx+styles -t native) gives me: >> >> ``` >> Pandoc (Meta {unMeta = fromList []}) >> [Div ("",[],[("custom-style","Titel")]) >> [Para [Str "Test"]] >> ,Div ("",[],[("custom-style","Author")]) >> [Para [Str "Author"]] >> ,Div ("",[],[("custom-style","Datum")]) >> [Para [Str "Dezember",Space,Str "2018"]] >> ,Header 1 ("heading",[],[]) [Str "Heading"] >> ,Div ("",[],[("custom-style","FirstParagraph")]) >> [Para [Str "Test",Space,Str "Test",Space,Str "Test.",Space,Str >> "Another",Space,Str "Test."]]] >> ``` >> >> What is going wrong here? As you can see my change was trivial and >> occured not in the metadata. Nevertheless, we end up with different styles. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. >> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org >> <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com >> <https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5157c829-b27c-4e8c-83b3-44e227c0a637%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 6913 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <5157c829-b27c-4e8c-83b3-44e227c0a637-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: md-docx-md rountripping not working [not found] ` <5157c829-b27c-4e8c-83b3-44e227c0a637-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2018-12-25 23:59 ` BP Jonsson 0 siblings, 0 replies; 4+ messages in thread From: BP Jonsson @ 2018-12-25 23:59 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Denis Maier Den 2018-12-25 kl. 16:38, skrev Denis Maier: > Well, the point is that I can convert an unmodified pandoc produced docx > back to markdown, and the metadata will end up in an yaml metadata block. > The problem comes up only after I modify the docx in Word. As you can see > Word changes the names of the styles, perhaps because my default system > language is German, I guess the problem is related to this. I see. I did some exploration and experimentation along these lines. For better or worse I don't have any OS where Word runs available at the moment, but I created a docx file with author/title/date metadata fields and changed it in LibreOffice, including changing the document language to Swedish, and nothing similar happened, but I noted that the title/author/date paragraphs inserted by Pandoc have the named paragraph styles Title, Author and Date respectively, of which the last two are listed as custom styles and so probably are defined by Pandoc. When I change the paragraph style of any of those paragraphs the metadata fields disappear when I convert with `pandoc -so output.md input.docx`. (Note the -s (aka --standalone) option --- without it no metadata is included at all!) When I changed the paragraph styles back the metadata fields reappeared. In fact Pandoc seems to honor any paragraphs using one of these paragraph styles but ignore the document properties, so quite possibly all you need to do is checking that those named paragraph styles are properly applied in your modified docx file. /bpj > > Denis > > Am Montag, 24. Dezember 2018 12:33:14 UTC+1 schrieb BP Jonsson: >> >> I think this is what is to be expected, since Pandoc inserts the title, >> author and date as *text* elements at the top of the docx document, which >> usually is what you want. It's the same with HTML. In an HTML template you >> can arrange it for the title heading, author etc. to be marked with a class >> so that you can use a filter to rearrange them as metadata when converting >> back to markdown. I don't know if the docx writer applies any special >> styles to these elements but if it does/did you could use the `+styles` >> extension and catch them with a filter. >> >> /bpj >> >> >> Den sön 23 dec. 2018 22:55Denis Maier <maie...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> >> skrev: >> >>> I have tested if I can convert a docx produced with pandoc back to >>> markdown after making a few changes in the docx. Normal paragraphs, >>> blockquotes, headings and footnotes work fine. However, author, title, and >>> date end up as normal paragraphs in the resulting markdown file, whereas >>> the original source had a yaml metadata block. >>> >>> I have this file (test.md) and use `pandoc test.md -o test.docx` >>> >>> ``` >>> --- >>> author: Author >>> title: Test >>> date: Dezember 2018 >>> --- >>> >>> Heading >>> ======= >>> >>> Test Test Test >>> ``` >>> >>> I can then convert the resulting docx to pandoc's native format: >>> >>> ``` >>> Pandoc (Meta {unMeta = fromList [("author",MetaInlines [Str >>> "Author"]),("date",MetaInlines [Str "Dezember",Space,Str >>> "2018"]),("title",MetaInlines [Str "Test"])]}) >>> [Header 1 ("heading",[],[]) [Str "Heading"] >>> ,Div ("",[],[("custom-style","FirstParagraph")]) >>> [Para [Str "Test",Space,Str "Test",Space,Str "Test"]]] >>> ``` >>> >>> Now, after making one small edit converting this to pandoc's native >>> format (`pandoc test.docx -f docx+styles -t native) gives me: >>> >>> ``` >>> Pandoc (Meta {unMeta = fromList []}) >>> [Div ("",[],[("custom-style","Titel")]) >>> [Para [Str "Test"]] >>> ,Div ("",[],[("custom-style","Author")]) >>> [Para [Str "Author"]] >>> ,Div ("",[],[("custom-style","Datum")]) >>> [Para [Str "Dezember",Space,Str "2018"]] >>> ,Header 1 ("heading",[],[]) [Str "Heading"] >>> ,Div ("",[],[("custom-style","FirstParagraph")]) >>> [Para [Str "Test",Space,Str "Test",Space,Str "Test.",Space,Str >>> "Another",Space,Str "Test."]]] >>> ``` >>> >>> What is going wrong here? As you can see my change was trivial and >>> occured not in the metadata. Nevertheless, we end up with different styles. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. >>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org >>> <javascript:>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com >>> <https://groups.google.com/d/msgid/pandoc-discuss/de0093bf-6f3a-45cc-be5b-02764dd126bd%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/bed73f36-2582-6440-550b-243246feebd9%40gmail.com. For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-12-25 23:59 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-23 21:55 md-docx-md rountripping not working Denis Maier [not found] ` <de0093bf-6f3a-45cc-be5b-02764dd126bd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2018-12-24 11:32 ` BP Jonsson 2018-12-25 15:38 ` Denis Maier [not found] ` <5157c829-b27c-4e8c-83b3-44e227c0a637-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2018-12-25 23:59 ` BP Jonsson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).