public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Conversion from docx with numbered sections
@ 2022-04-20 10:26 jgran
       [not found] ` <73852e23-81fd-4c2f-9846-7670ebdde004n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: jgran @ 2022-04-20 10:26 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 919 bytes --]

Hi, 

how to correctly convert Word documents (.docx) which have numbered 
sections while keeping numbering? 
The numbers become paragraphs when translated into other formats (markdown 
or pdf for instance) and remain on a line above the header text.
a.docx: 
1.    Header 1 
 1.1.         Header 2
2.    Header 1

a.md
1.  # Header 1

    1.  ## Header 2

2.  # Header 1

What's the best way to deal with this situation? should I somehow remove 
numbering and use --number-section in a second step?

Thanks

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/73852e23-81fd-4c2f-9846-7670ebdde004n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1930 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Conversion from docx with numbered sections
       [not found] ` <73852e23-81fd-4c2f-9846-7670ebdde004n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-08-18  3:52   ` Johan Bergquist
       [not found]     ` <9761f078-45b4-46e2-8197-511e2ae188bfn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Johan Bergquist @ 2022-08-18  3:52 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2072 bytes --]

Hi jgran,
I'm using Pandoc 2.18 for Windows for docx-to-html5 conversions while I do 
the docx-to-PDF conversions directly by other means (Acrobat, Word Save as 
PDF). I have no problems with the html5 conversions and the 
--number-section option except that unnumbered section headers in Word get 
numbered too. I solved this by creating a different Word paragraph style 
"Headline" without numbering and applying the +styles extension, i.e. -f 
docx+styles, and added a "div[data-custom-style="Headline"] p" entry to the 
css file for selecting and formatting that paragraph style upon html5 
rendering.

However, I have not been able to get figure, table, and equation numbering 
consistently rendered in html5. I want to include chapter and section 
numbers in front of those numbers so I'm using fields like "{ STYLEREF 2 \s 
}.{ SEQ Equation \* ARABIC \s 2 }". I also tried Word's "Insert caption" 
command but that didn't work either. Typically, the html shows"1.." instead 
of "1.1.1", i.e. only the chapter number is included.

Best regards,
On Wednesday, 20 April 2022 at 19:26:51 UTC+9 jgran wrote:

> Hi, 
>
> how to correctly convert Word documents (.docx) which have numbered 
> sections while keeping numbering? 
> The numbers become paragraphs when translated into other formats (markdown 
> or pdf for instance) and remain on a line above the header text.
> a.docx: 
> 1.    Header 1 
>  1.1.         Header 2
> 2.    Header 1
>
> a.md
> 1.  # Header 1
>
>     1.  ## Header 2
>
> 2.  # Header 1
>
> What's the best way to deal with this situation? should I somehow remove 
> numbering and use --number-section in a second step?
>
> Thanks
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9761f078-45b4-46e2-8197-511e2ae188bfn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3214 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Conversion from docx with numbered sections
       [not found]     ` <9761f078-45b4-46e2-8197-511e2ae188bfn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-08-18  7:59       ` Johan Bergquist
  0 siblings, 0 replies; 3+ messages in thread
From: Johan Bergquist @ 2022-08-18  7:59 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2490 bytes --]

In some places, numbering gets correctly rendered in html5 if I change the 
second part of the field from "{ SEQ Equation \* Arabic \s 1 \* MERGEFORMAT 
}" to "{  SEQ Equation \* Arabic \s 1 }".
In some other places, it works without this change so it's still not 
consistent, though.
On Thursday, 18 August 2022 at 12:52:04 UTC+9 Johan Bergquist wrote:

> Hi jgran,
> I'm using Pandoc 2.18 for Windows for docx-to-html5 conversions while I do 
> the docx-to-PDF conversions directly by other means (Acrobat, Word Save as 
> PDF). I have no problems with the html5 conversions and the 
> --number-section option except that unnumbered section headers in Word get 
> numbered too. I solved this by creating a different Word paragraph style 
> "Headline" without numbering and applying the +styles extension, i.e. -f 
> docx+styles, and added a "div[data-custom-style="Headline"] p" entry to the 
> css file for selecting and formatting that paragraph style upon html5 
> rendering.
>
> However, I have not been able to get figure, table, and equation numbering 
> consistently rendered in html5. I want to include chapter and section 
> numbers in front of those numbers so I'm using fields like "{ STYLEREF 2 \s 
> }.{ SEQ Equation \* ARABIC \s 2 }". I also tried Word's "Insert caption" 
> command but that didn't work either. Typically, the html shows"1.." instead 
> of "1.1.1", i.e. only the chapter number is included.
>
> Best regards,
> On Wednesday, 20 April 2022 at 19:26:51 UTC+9 jgran wrote:
>
>> Hi, 
>>
>> how to correctly convert Word documents (.docx) which have numbered 
>> sections while keeping numbering? 
>> The numbers become paragraphs when translated into other formats 
>> (markdown or pdf for instance) and remain on a line above the header text.
>> a.docx: 
>> 1.    Header 1 
>>  1.1.         Header 2
>> 2.    Header 1
>>
>> a.md
>> 1.  # Header 1
>>
>>     1.  ## Header 2
>>
>> 2.  # Header 1
>>
>> What's the best way to deal with this situation? should I somehow remove 
>> numbering and use --number-section in a second step?
>>
>> Thanks
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4c0d85e5-65a1-4c19-8859-d50d1be7bb8an%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3906 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-18  7:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20 10:26 Conversion from docx with numbered sections jgran
     [not found] ` <73852e23-81fd-4c2f-9846-7670ebdde004n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-08-18  3:52   ` Johan Bergquist
     [not found]     ` <9761f078-45b4-46e2-8197-511e2ae188bfn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-08-18  7:59       ` Johan Bergquist

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).