public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* convert Word docx file containing Chinese and English  to Markdown
@ 2018-01-10  6:17 Philip Lee
       [not found] ` <f3990958-7004-4c1c-809e-7076c65aaaee-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Philip Lee @ 2018-01-10  6:17 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2018 bytes --]

I want to convert a docx file to Markdown 
The *a.docx* file contains the following content 

这条性质是由德国数学家戴德金(Richard Dedekind
> )提出的,他认为这条性质是一个明显的事实,无需也无法被证明,它能够刻画直线的连续性,它是直线之所以连续的本质表现,应将其看作一条公理,
> 可称其为直线连续性公理(line continuity axiom)。



After converted , got the following in source 

这条性质是由德国数学家戴德金(Richard
>
> Dedekind)提出的,他认为这条性质是一个明显的事实,无需也无法被证明,它能够刻画直线的连续性,它是直线之所以连续的本质表现,应将其看作一条公理,可称其为直线连续性公理(line
> continuity axiom)。


 However , I don't want line breaks between English words , that is , I 
expect the following result ,

这条性质是由德国数学家戴德金(Richard 
> Dedekind)提出的,他认为这条性质是一个明显的事实,无需也无法被证明,它能够刻画直线的连续性,它是直线之所以连续的本质表现,应将其看作一条公理,可称其为直线连续性公理(line continuity 
> axiom)。


*Anyone can help resolve the issue ? *
I know there are extensions ignore_line_breaks and east_asian_line_breaks 
might help , but I cannot figure out how to use them , I have tried 
pandoc -s a.docx -t markdown+east_asian_line_breaks -o a.md
but didn't work.


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to pandoc-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f3990958-7004-4c1c-809e-7076c65aaaee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4360 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: convert Word docx file containing Chinese and English  to Markdown
       [not found] ` <f3990958-7004-4c1c-809e-7076c65aaaee-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-01-10 12:23   ` Jesse Rosenthal
       [not found]     ` <87inc9gawu.fsf-4GNroTWusrE@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jesse Rosenthal @ 2018-01-10 12:23 UTC (permalink / raw)
  To: Philip Lee, pandoc-discuss

Can you post the a.docx file you were using as input? I could try to
take a look at this.

Philip Lee <redstone-cold@163.com> writes:

> I want to convert a docx file to Markdown 
> The *a.docx* file contains the following content 
>
> 这条性质是由德国数学家戴德金(Richard Dedekind
>> )提出的,他认为这条性质是一个明显的事实,无需也无法被证明,它能够刻画直线的连续性,它是直线之所以连续的本质表现,应将其看作一条公理,
>> 可称其为直线连续性公理(line continuity axiom)。
>
>
>
> After converted , got the following in source 
>
> 这条性质是由德国数学家戴德金(Richard
>>
>> Dedekind)提出的,他认为这条性质是一个明显的事实,无需也无法被证明,它能够刻画直线的连续性,它是直线之所以连续的本质表现,应将其看作一条公理,可称其为直线连续性公理(line
>> continuity axiom)。
>
>
>  However , I don't want line breaks between English words , that is , I 
> expect the following result ,
>
> 这条性质是由德国数学家戴德金(Richard 
>> Dedekind)提出的,他认为这条性质是一个明显的事实,无需也无法被证明,它能够刻画直线的连续性,它是直线之所以连续的本质表现,应将其看作一条公理,可称其为直线连续性公理(line continuity 
>> axiom)。
>
>
> *Anyone can help resolve the issue ? *
> I know there are extensions ignore_line_breaks and east_asian_line_breaks 
> might help , but I cannot figure out how to use them , I have tried 
> pandoc -s a.docx -t markdown+east_asian_line_breaks -o a.md
> but didn't work.
>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
> To post to this group, send email to pandoc-discuss@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f3990958-7004-4c1c-809e-7076c65aaaee%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to pandoc-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87inc9gawu.fsf%40jhu.edu.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: convert Word docx file containing Chinese and English  to Markdown
       [not found]     ` <87inc9gawu.fsf-4GNroTWusrE@public.gmane.org>
@ 2018-01-10 13:09       ` Philip Lee
       [not found]         ` <1ac9ebf8-5240-47c6-9b5e-591f73872089-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Philip Lee @ 2018-01-10 13:09 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3142 bytes --]

Thanks, please download it 
from https://drive.google.com/file/d/1L0zd1xlw9Vk8JL46feYqZ4x28sHhW8kF/view?usp=sharing

On Wednesday, January 10, 2018 at 8:23:08 PM UTC+8, Jesse Rosenthal wrote:
>
> Can you post the a.docx file you were using as input? I could try to 
> take a look at this. 
>
> Philip Lee <redsto...@163.com <javascript:>> writes: 
>
> > I want to convert a docx file to Markdown 
> > The *a.docx* file contains the following content 
> > 
> > 这条性质是由德国数学家戴德金(Richard Dedekind 
> >> )提出的,他认为这条性质是一个明显的事实,无需也无法被证明,它能够刻画直线的连续性,它是直线之所以连续的本质表现,应将其看作一条公理, 
> >> 可称其为直线连续性公理(line continuity axiom)。 
> > 
> > 
> > 
> > After converted , got the following in source 
> > 
> > 这条性质是由德国数学家戴德金(Richard 
> >> 
> >> 
> Dedekind)提出的,他认为这条性质是一个明显的事实,无需也无法被证明,它能够刻画直线的连续性,它是直线之所以连续的本质表现,应将其看作一条公理,可称其为直线连续性公理(line 
>
> >> continuity axiom)。 
> > 
> > 
> >  However , I don't want line breaks between English words , that is , I 
> > expect the following result , 
> > 
> > 这条性质是由德国数学家戴德金(Richard 
> >> 
> Dedekind)提出的,他认为这条性质是一个明显的事实,无需也无法被证明,它能够刻画直线的连续性,它是直线之所以连续的本质表现,应将其看作一条公理,可称其为直线连续性公理(line 
> continuity 
> >> axiom)。 
> > 
> > 
> > *Anyone can help resolve the issue ? * 
> > I know there are extensions ignore_line_breaks and 
> east_asian_line_breaks 
> > might help , but I cannot figure out how to use them , I have tried 
> > pandoc -s a.docx -t markdown+east_asian_line_breaks -o a.md 
> > but didn't work. 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...@googlegroups.com <javascript:>. 
> > To post to this group, send email to pandoc-...@googlegroups.com 
> <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/f3990958-7004-4c1c-809e-7076c65aaaee%40googlegroups.com. 
>
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to pandoc-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1ac9ebf8-5240-47c6-9b5e-591f73872089%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5671 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: convert Word docx file containing Chinese and English  to Markdown
       [not found]         ` <1ac9ebf8-5240-47c6-9b5e-591f73872089-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-01-10 13:29           ` Jesse Rosenthal
       [not found]             ` <87fu7dg7ve.fsf-4GNroTWusrE@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jesse Rosenthal @ 2018-01-10 13:29 UTC (permalink / raw)
  To: Philip Lee, pandoc-discuss

Try it without line-wrapping:

pandoc a.docx -t markdown --wrap=none


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: convert Word docx file containing Chinese and English  to Markdown
       [not found]             ` <87fu7dg7ve.fsf-4GNroTWusrE@public.gmane.org>
@ 2018-01-10 17:08               ` Philip Lee
  0 siblings, 0 replies; 5+ messages in thread
From: Philip Lee @ 2018-01-10 17:08 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 764 bytes --]

Awesome ! Thanks very very much !

On Wednesday, January 10, 2018 at 9:28:47 PM UTC+8, Jesse Rosenthal wrote:
>
> Try it without line-wrapping: 
>
> pandoc a.docx -t markdown --wrap=none 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1939254c-777a-4ed7-9305-34ac33bfd5d5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1319 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-01-10 17:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-10  6:17 convert Word docx file containing Chinese and English to Markdown Philip Lee
     [not found] ` <f3990958-7004-4c1c-809e-7076c65aaaee-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-01-10 12:23   ` Jesse Rosenthal
     [not found]     ` <87inc9gawu.fsf-4GNroTWusrE@public.gmane.org>
2018-01-10 13:09       ` Philip Lee
     [not found]         ` <1ac9ebf8-5240-47c6-9b5e-591f73872089-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-01-10 13:29           ` Jesse Rosenthal
     [not found]             ` <87fu7dg7ve.fsf-4GNroTWusrE@public.gmane.org>
2018-01-10 17:08               ` Philip Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).