public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* How to improve .docx output?
@ 2019-08-22 12:24 'Nick Bart' via pandoc-discuss
  2019-08-22 13:04 ` K4zuki
  0 siblings, 1 reply; 3+ messages in thread
From: 'Nick Bart' via pandoc-discuss @ 2019-08-22 12:24 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

For a certain project, I am forced to generate.docx documents.

While trying this out, I noticed that pandoc's default output still
feels somewhat rough around the edges in a number of aspects.

There are some issues that I have been able to fix using a
custom-reference.docx file (labelled as "can fix" below).

I have had trouble with others; some problems might of course be due to
the fact that I do not own Word (and have no inclination whatsoever to
buy it), so I have been editing custom-reference.docx with LibreOffice
(currently 6.2.6.2 on MacOS 10.13).

All in all, I wonder whether it would be feasible to generate.docx
documents that are out of the box more correct, feature-rich, and
aesthetically pleasing (ok, I'll admit I'm biased a bit here: more
latex-like) that currently.

What I've noticed so far regarding the default output, in no particular
order:

-   no page numbers (can fix)

-   no numbered section headings (was able to fix the first level, but
    not levels 2 and up; note that my results did not look right to
    begin with in LibreOffice, Menu: "Styles" → "Manage Styles")

-   With a custom-reference.docx modified to number section headings,
    `# References {-}` came out numbered, too; i.e., `{-}` was ignored.
    (A proper fix would probably require pandoc to attach a docx style
    like "Heading (unnumbered) 1" to such headings.)

-   uses default Microsoft fonts (could probably be fixed easily using a
    custom-reference.docx, but why doesn't pandoc use something more
    neutral, such as a Times/Helvetica combo - or at least something
    else that doesn't scream "I'm a Word document" straight away)?

-   LibreOffice displays some pandoc docx tables too wide (reported
    before: https://github.com/jgm/pandoc/issues/2576). For me, this
    happens when using simple tables (multiline tables come out ok).

-   With `--toc`, an empty table of contents (ok, can be fixed in LO via
    "Tools" → "Update" → "Update All"); and with a custom-reference.docx
    modified to number section headings, the toc title, again, is
    numbered.

So, my questions:

-   Has anyone been able to come up with a custom-reference.docx that
    solves some or all of these issues, and would be willing to share
    it, or would be interested in collaborating in order to develop one?

-   Any tips or tricks on editing a custom-reference.docx with LO (or
    with a plain text editor, for that matter)?

-   Any ideas on how to generate/update the table of contents without
    user interaction?

    -   I've seen this:https://ask.libreoffice.org/en/question/46586/how-to-automatically-update-indices-in-headless-mode/,
        and spent about an hour on it, but couldn't get it to work.

-   Any ideas on how to generate lists of tables and figures?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/RC5rhmsP_qtblfskRIcPZHhdVlJt76IbH1X0a1IN_iFZ_-Q1fwRtQUS16-5M0zfTI0J54Wi9vk5Z8xzTRfAyYrOwpmlUzosE-rH9C1UcK7c%3D%40protonmail.com.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How to improve .docx output?
  2019-08-22 12:24 How to improve .docx output? 'Nick Bart' via pandoc-discuss
@ 2019-08-22 13:04 ` K4zuki
       [not found]   ` <32bb05e7-9d7b-47d8-8729-dd7b42281839-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: K4zuki @ 2019-08-22 13:04 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1067 bytes --]

Hello,

I have also been working for docx output.

[...]
-   With a custom-reference.docx modified to number section headings, 
    `# References {-}` came out numbered, too; i.e., `{-}` was ignored. 
    (A proper fix would probably require pandoc to attach a docx style 
    like "Heading (unnumbered) 1" to such headings.) 
[...]

This Lua filter might help you:
https://github.com/pandocker/pandocker-lua-filters/blob/master/lua/docx-unnumberedheadings.lua

Your guess is correct; you have to prepare unnumbered custom heading 
styles(up to 4th level) in your reference.docx, 
also need to point them in YAML frontmatter (or dedicated extra YAML file).

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/32bb05e7-9d7b-47d8-8729-dd7b42281839%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1692 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How to improve .docx output?
       [not found]   ` <32bb05e7-9d7b-47d8-8729-dd7b42281839-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-08-22 13:44     ` BP Jonsson
  0 siblings, 0 replies; 3+ messages in thread
From: BP Jonsson @ 2019-08-22 13:44 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 4948 bytes --]

I'm AFK (on my phone) ATM so that I can't check the things I talk about
below, but hopefully they should be helpful.

Like you I use LibreOffice rather than Word. I'm exclusively on Linux these
days so I couldn't use Word even if I wanted to, since I couldn't afford
neither Mac, Windows or Office even if I wanted to. I use LaTeX — usually
Pandoc+LaTeX — produced PDF whenever I can, but for some reason some people
want me to send them DOCX files! ;-)

I think that the section numbering issue can be fixed, and IIRC I have done
so in the past:

<https://help.libreoffice.org/Writer/Outline_Numbering_1>

As for the toc and references headings getting numbered that should be
possible to fix, although perhaps in the output document rather than in the
reference document. Hopefully it will become possible to use a
`custom-style` attribute with headings.

See the discussion at <https://github.com/jgm/pandoc/issues/4697> for
several issues regarding custom-style for different kinds of content. It
doesn't seem like headings have been on the table yet. Perhaps it should be
worthwhile to create a meta-issue (is that a thing?) listing the various
things which can have named styles attached to them where the corresponding
Pandoc element either takes attributes itself or can be meaningfully
wrapped in a Div or Span. (Page styles would probably be out!)

As for default fonts it's hard to find any which everyone can be expected
to have across various Linux distros, Mac and Windows, so sticking to the
DOCX defaults may be the best thing to do ATM. For example fonts called
"Times" and "Helvetica" may or may not exist on a given system, or they may
be 256-character Type1 fonts with different charsets depending on the OS
rather than Unicode OpenType fonts. It's pretty easy to change in a
reference DOCX, but perhaps it could be made easier e.g. by having all
headings inherit their font from the same style, if they don't already.

It should be worthwhile to go through the custom styles defined in a DOCX
produced by Pandoc and inspect where the various styles inherit from, or
make them inherit from where you want them to and use that as your master
reference DOCX so that you easily can make global changes in the future.

As for a more LaTeX like appearance some of the answers to this SX question
seem useful:

<
https://tex.stackexchange.com/questions/8308/make-ms-word-document-look-like-it-has-been-typeset-in-latex?rq=1
>

This answer has some links to actual templates:

<https://tex.stackexchange.com/a/8394/93534>

One problem with making Pandoc's default DOCX more like anything other than
the Word defaults is that presumably most people producing DOCX with Pandoc
do so because someone higher up the food chain expects to get DOCX files
rather than PDF or whatever, and if they don't provide their own templates
chances are they want the Word defaults, sadly.

This page on the GitHub wiki (originally written by me) may also be helpful:

<
https://github.com/jgm/pandoc/wiki/Defining-custom-DOCX-styles-in-LibreOffice-(and-Word)
>

Den tors 22 aug. 2019 15:04K4zuki <k.yamamoto.08136891-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> Hello,
>
> I have also been working for docx output.
>
> [...]
> -   With a custom-reference.docx modified to number section headings,
>     `# References {-}` came out numbered, too; i.e., `{-}` was ignored.
>     (A proper fix would probably require pandoc to attach a docx style
>     like "Heading (unnumbered) 1" to such headings.)
> [...]
>
> This Lua filter might help you:
>
> https://github.com/pandocker/pandocker-lua-filters/blob/master/lua/docx-unnumberedheadings.lua
>
> Your guess is correct; you have to prepare unnumbered custom heading
> styles(up to 4th level) in your reference.docx,
> also need to point them in YAML frontmatter (or dedicated extra YAML file).
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/32bb05e7-9d7b-47d8-8729-dd7b42281839%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/32bb05e7-9d7b-47d8-8729-dd7b42281839%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuTh-0ATckyTq1TsnkgD%2BFrPRGr%2BGxDjd1P6MezD8FQuXw%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 7148 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-08-22 13:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 12:24 How to improve .docx output? 'Nick Bart' via pandoc-discuss
2019-08-22 13:04 ` K4zuki
     [not found]   ` <32bb05e7-9d7b-47d8-8729-dd7b42281839-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-08-22 13:44     ` BP Jonsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).