public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* problems with docx to markdown conversion
@ 2020-04-16 20:59 Don Raikes
  2020-04-16 23:16 ` John MacFarlane
  0 siblings, 1 reply; 5+ messages in thread
From: Don Raikes @ 2020-04-16 20:59 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2372 bytes --]

Hello,

 

Maybe there is a problem in my understanding or I my process, but I am struggling to effectively use pandoc in a writing project.

 

My process is:

 

I write my documents in markdown.

The documents include a lot of fenced code blocks both for html and javascript as well as example of interactive sessions in the terminal.

 

I convert my markdown files into docx files and send them to my editor.

She makes changes and sends the docx files back to me.

I convert the docx files back to markdown, but then I have to go in and fix all the fenced code blocks because they are not marked as fenced blocks any more.  Plus I have to fix any embedded links and graphics because the code for those have gotten split over two lines and they will not convert back to docx unless I have gone through and joined the lines back together.

 

Based on the articles I have read about using pandoc for writing projects it doesn't seem like the process should be this difficult, so what am I doing wrong?

 

The commands I am using for conversion are:

 

Markdown to docx:

Pandoc -s -f markdown -t docx -o project.docx project.md

 

Docx to markdown:

 

Pandoc -s -f docx -t markdown -atx-headers -o project.md project.docx

 

Any pointers to make the process work better would be greatly appreciated.

 

-- 
Thanks, Donald 

"As a leader, to be successful, is to help the people around you to be successful." - Kent Boucher

 

Accessibility, like security, is better when built-in from the beginning rather than bolted on at the end.


http://www.oracle.com/
Donald Raikes | Accessibility Specialist
Mobile: HYPERLINK "tel:+15202717608"+15202717608 | VOIP: HYPERLINK "tel:+15205744033"+15205744033 
Oracle Accessibility Program Office
| Tucson, Arizona 

http://www.oracle.com/commitment

Oracle is committed to developing practices and products that help protect the environment

 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fa15044a-b72b-49be-85f4-d3d60bd58c6c%40default.

[-- Attachment #2: Type: text/html, Size: 7828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with docx to markdown conversion
  2020-04-16 20:59 problems with docx to markdown conversion Don Raikes
@ 2020-04-16 23:16 ` John MacFarlane
  0 siblings, 0 replies; 5+ messages in thread
From: John MacFarlane @ 2020-04-16 23:16 UTC (permalink / raw)
  To: Don Raikes, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Don Raikes <DON.RAIKES-MouhYhfBpPxXrIkS9f7CXA@public.gmane.org> writes:

> I convert the docx files back to markdown, but then I have to go in and fix all the fenced code blocks because they are not marked as fenced blocks any more.

You mean they come out as indented code blocks?  Sorry, the
writer isn't configurable to adjust that. Indented code blocks
are used unless you explicitly specify a language, and I don't
think that information can round-trip through docx writer and
reader currently. It's not clear how it would be represented in
docx anyway.

> Plus I have to fix any embedded links and graphics because the
> code for those have gotten split over two lines and they will
> not convert back to docx unless I have gone through and joined
> the lines back together.

You'd have to give an example.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: problems with docx to markdown conversion
       [not found]   ` <m2tv0v3p1s.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2020-05-05 17:12     ` Don Raikes
  0 siblings, 0 replies; 5+ messages in thread
From: Don Raikes @ 2020-05-05 17:12 UTC (permalink / raw)
  To: John MacFarlane, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Thank you. Those extra parameters made a whole world of difference in the conversion process.

-----Original Message-----
From: John MacFarlane [mailto:jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org] 
Sent: Monday, May 4, 2020 10:54 PM
To: Don Raikes <don.raikes-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>; pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: problems with docx to markdown conversion


There are two issues:

1. line breaks -- if you don't want these, you can use --wrap=none By default pandoc tries to wrap markdown output to fit within about 80 columns.

2. Image attributes -- if you don't want these, you can use -t markdown-link_attributes-raw_html.  Or you can write a small lua filter to remove them.


Don Raikes <DON.RAIKES-MouhYhfBpPxXrIkS9f7CXA@public.gmane.org> writes:

> Hello,
>
>  
>
> I am having some problems with docx to markdown conversion using pandoc 2.9.2 on windows.
>
>  
>
> My workflow is:
>
>  
>
> 1.       I write/edit the doc in markdown format.
>
> 2.       I convert the markdown to docx format for my editor using the command:
>
> 3.       After making edits, the my editor returns the docx file to me and I convert it back to markdown for source code control and to review the edits.
>
> 4.       When I get the docx file back and convert it to markdown many of the links and image links are really a mess see img-before.txt and img-after.txt for an example.
>
> The img-before.txt is the img link before conversion into docx format.
>
> The img-after.txt is the same img link after converting back from docx to markdown.
>
>  
>
> Regular links have similar line break issue where the links are split across multiple lines in the generated markdown format.
>
>  
>
> Commands used:
>
>  
>
> Markdown to docx:
>
> C:\> pandoc -s -f markdown -t docx -o test.docx test.md
>
>  
>
> Docx to markdown:
>
> C:\> pandoc -s -atx-headers -f docx -t markdown -o test.md test.docx
>
>  
>
> Am I missing a parameter that would keep the links intact?
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://urldefense.com/v3/__https://groups.google.com/d/msgid/pandoc-discuss/071a68d3-ba0c-4ae2-8c90-61b4e001f9cc*40default__;JQ!!GqivPVa7Brio!Ndp5MOecGqVHiShXSWw18tpf3W2Mdg58cD4ddRgy3ilblqqu35vRO9Pta8MlRtDI$ .
> ![Organization Tab with Employee Details Form 
> Shown](media/image2.png){width="5.833333333333333in"
> height="3.521266404199475in"}
>
> Organization Tab with Employee Details Form Shown ![Organization Tab 
> with Employee Details Form Shown](img/empdet-form.png)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: problems with docx to markdown conversion
  2020-05-04 18:48 Don Raikes
@ 2020-05-05  5:53 ` John MacFarlane
       [not found]   ` <m2tv0v3p1s.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: John MacFarlane @ 2020-05-05  5:53 UTC (permalink / raw)
  To: Don Raikes, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


There are two issues:

1. line breaks -- if you don't want these, you can use --wrap=none
By default pandoc tries to wrap markdown output to fit within
about 80 columns.

2. Image attributes -- if you don't want these, you can use
-t markdown-link_attributes-raw_html.  Or you can write a small
lua filter to remove them.


Don Raikes <DON.RAIKES-MouhYhfBpPxXrIkS9f7CXA@public.gmane.org> writes:

> Hello,
>
>  
>
> I am having some problems with docx to markdown conversion using pandoc 2.9.2 on windows.
>
>  
>
> My workflow is:
>
>  
>
> 1.       I write/edit the doc in markdown format.
>
> 2.       I convert the markdown to docx format for my editor using the command:
>
> 3.       After making edits, the my editor returns the docx file to me and I convert it back to markdown for source code control and to review the edits.
>
> 4.       When I get the docx file back and convert it to markdown many of the links and image links are really a mess see img-before.txt and img-after.txt for an example.
>
> The img-before.txt is the img link before conversion into docx format.
>
> The img-after.txt is the same img link after converting back from docx to markdown.
>
>  
>
> Regular links have similar line break issue where the links are split across multiple lines in the generated markdown format.
>
>  
>
> Commands used:
>
>  
>
> Markdown to docx:
>
> C:\> pandoc -s -f markdown -t docx -o test.docx test.md
>
>  
>
> Docx to markdown:
>
> C:\> pandoc -s -atx-headers -f docx -t markdown -o test.md test.docx
>
>  
>
> Am I missing a parameter that would keep the links intact?
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/071a68d3-ba0c-4ae2-8c90-61b4e001f9cc%40default.
> ![Organization Tab with Employee Details Form
> Shown](media/image2.png){width="5.833333333333333in"
> height="3.521266404199475in"}
>
> Organization Tab with Employee Details Form Shown
> ![Organization Tab with Employee Details Form Shown](img/empdet-form.png)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* problems with docx to markdown conversion
@ 2020-05-04 18:48 Don Raikes
  2020-05-05  5:53 ` John MacFarlane
  0 siblings, 1 reply; 5+ messages in thread
From: Don Raikes @ 2020-05-04 18:48 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 1524 bytes --]

Hello,

 

I am having some problems with docx to markdown conversion using pandoc 2.9.2 on windows.

 

My workflow is:

 

1.       I write/edit the doc in markdown format.

2.       I convert the markdown to docx format for my editor using the command:

3.       After making edits, the my editor returns the docx file to me and I convert it back to markdown for source code control and to review the edits.

4.       When I get the docx file back and convert it to markdown many of the links and image links are really a mess see img-before.txt and img-after.txt for an example.

The img-before.txt is the img link before conversion into docx format.

The img-after.txt is the same img link after converting back from docx to markdown.

 

Regular links have similar line break issue where the links are split across multiple lines in the generated markdown format.

 

Commands used:

 

Markdown to docx:

C:\> pandoc -s -f markdown -t docx -o test.docx test.md

 

Docx to markdown:

C:\> pandoc -s -atx-headers -f docx -t markdown -o test.md test.docx

 

Am I missing a parameter that would keep the links intact?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/071a68d3-ba0c-4ae2-8c90-61b4e001f9cc%40default.

[-- Attachment #1.2: Type: text/html, Size: 6839 bytes --]

[-- Attachment #2: img-after.txt --]
[-- Type: text/plain, Size: 185 bytes --]

![Organization Tab with Employee Details Form
Shown](media/image2.png){width="5.833333333333333in"
height="3.521266404199475in"}

Organization Tab with Employee Details Form Shown

[-- Attachment #3: img-before.txt --]
[-- Type: text/plain, Size: 75 bytes --]

![Organization Tab with Employee Details Form Shown](img/empdet-form.png)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-05-05 17:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-16 20:59 problems with docx to markdown conversion Don Raikes
2020-04-16 23:16 ` John MacFarlane
2020-05-04 18:48 Don Raikes
2020-05-05  5:53 ` John MacFarlane
     [not found]   ` <m2tv0v3p1s.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-05-05 17:12     ` Don Raikes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).