docx to pdf conversion error

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* docx to pdf conversion error
@ 2017-11-09 23:32 sclarke-DpHT0TjK6O80n/F98K4Iww
       [not found] ` <9256e740-7530-4dc2-9acd-5d14eca6fc07-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: sclarke-DpHT0TjK6O80n/F98K4Iww @ 2017-11-09 23:32 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 2040 bytes --]

OK not sure if this is by design or a bug ;)

Relatively new to pandoc and coversion, and am coming from the Windowz 
world.

We have a variety of OSs and editors in use and want to ensure a 
commonality of  output format with standardised frontice pages etc.

Version: pandoc 2.0.1.1 Compiled with pandoc-types 1.17.2, texmath 0.10, 
skylighting 0.4.3.2

We have created our own style file for formatting in latex, and our own 
pandoc.latex to generate the coverpage etc.
We have been extending into using a yaml metadata file to include variables 
and elements (email address, mulitple authors etc) for our reports.

We have a problem with the yaml metadata file causing failures with the 
conversion from docx to pdf.
The file works correctly when used against a markdown file to produce a pdf.

Behaviour:

This fails -

>pandoc --standalone --toc --number-sections --pdf-engine=xelatex 
--template pandoc.latex test.docx metadata.yaml -o test10.pdf
couldn't parse docx file

This works without issue -
>pandoc --standalone --toc --number-sections --pdf-engine=xelatex 
--template pandoc.latex test.docx -o test10.pdf

Now either I'm doing something I'm not supposed to (and it will be very 
annoying to have to go through the intermediary step of converting to 
markdown file first then to pdf), or something is wrong with my syntax.

The same command when used for a markdown file works without issue and 
draws the variables we want from the markdown.yaml.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9256e740-7530-4dc2-9acd-5d14eca6fc07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2869 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found] ` <9256e740-7530-4dc2-9acd-5d14eca6fc07-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-10  3:58   ` John Muccigrosso
       [not found]     ` <ac6b1f81-80bf-4e92-a65c-db089def2c19-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-11-10  4:44   ` John MacFarlane
  1 sibling, 1 reply; 16+ messages in thread
From: John Muccigrosso @ 2017-11-10  3:58 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 852 bytes --]

That looks good, as long as your yaml meets the requirements. But a quick 
test I just whipped up with existing files failed in the same way. Hmmm.

PS You donʻt need —standalone for PDFs. They have to be. 
See http://pandoc.org/MANUAL.html#general-writer-options.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ac6b1f81-80bf-4e92-a65c-db089def2c19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1275 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]     ` <ac6b1f81-80bf-4e92-a65c-db089def2c19-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-10  4:02       ` sclarke-DpHT0TjK6O80n/F98K4Iww
       [not found]         ` <d1935dae-097e-447c-bd10-a8265ba169db-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: sclarke-DpHT0TjK6O80n/F98K4Iww @ 2017-11-10  4:02 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1177 bytes --]

the yaml works for a .md with the same content.

On Friday, 10 November 2017 14:28:38 UTC+10:30, John Muccigrosso wrote:
>
> That looks good, as long as your yaml meets the requirements. But a quick 
> test I just whipped up with existing files failed in the same way. Hmmm.
>
> PS You donʻt need —standalone for PDFs. They have to be. See 
> http://pandoc.org/MANUAL.html#general-writer-options.
>

Ayup - we are using a makefile called with the option for the output type 
either pdf or html (or both) so that's just the command line code that 
would be run by the make process 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d1935dae-097e-447c-bd10-a8265ba169db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2280 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]         ` <d1935dae-097e-447c-bd10-a8265ba169db-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-10  4:02           ` sclarke-DpHT0TjK6O80n/F98K4Iww
  0 siblings, 0 replies; 16+ messages in thread
From: sclarke-DpHT0TjK6O80n/F98K4Iww @ 2017-11-10  4:02 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1367 bytes --]

edit - forgot to say that I had the same issue with 1.19

On Friday, 10 November 2017 14:32:14 UTC+10:30, scl...-DpHT0TjK6O80n/F98K4Iww@public.gmane.org 
wrote:
>
> the yaml works for a .md with the same content.
>
> On Friday, 10 November 2017 14:28:38 UTC+10:30, John Muccigrosso wrote:
>>
>> That looks good, as long as your yaml meets the requirements. But a quick 
>> test I just whipped up with existing files failed in the same way. Hmmm.
>>
>> PS You donʻt need —standalone for PDFs. They have to be. See 
>> http://pandoc.org/MANUAL.html#general-writer-options.
>>
>
> Ayup - we are using a makefile called with the option for the output type 
> either pdf or html (or both) so that's just the command line code that 
> would be run by the make process 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a5fc112f-63a5-4c86-b14b-d1f9f6ccd4e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2592 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found] ` <9256e740-7530-4dc2-9acd-5d14eca6fc07-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-11-10  3:58   ` John Muccigrosso
@ 2017-11-10  4:44   ` John MacFarlane
       [not found]     ` <20171110044412.GI70590-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: John MacFarlane @ 2017-11-10  4:44 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

YAML metadata only works with Markdown.

When you specify multiple files on the command line,
pandoc concatenates them together and treats them as
if they were one big file.

That works when one is yaml and the other is Markdown
(as long as the yaml has proper begin and end delimiters).

It doesn't work when one is yaml and the other is docx.

Related issue;
https://github.com/jgm/pandoc/issues/1960


+++ sclarke-DpHT0TjK6O80n/F98K4Iww@public.gmane.org [Nov 09 17 15:32 ]:
>   OK not sure if this is by design or a bug ;)
>   Relatively new to pandoc and coversion, and am coming from the Windowz
>   world.
>   We have a variety of OSs and editors in use and want to ensure a
>   commonality of  output format with standardised frontice pages etc.
>   Version: pandoc 2.0.1.1 Compiled with pandoc-types 1.17.2, texmath
>   0.10, skylighting 0.4.3.2
>   We have created our own style file for formatting in latex, and our own
>   pandoc.latex to generate the coverpage etc.
>   We have been extending into using a yaml metadata file to include
>   variables and elements (email address, mulitple authors etc) for our
>   reports.
>   We have a problem with the yaml metadata file causing failures with the
>   conversion from docx to pdf.
>   The file works correctly when used against a markdown file to produce a
>   pdf.
>   Behaviour:
>   This fails -
>   >pandoc --standalone --toc --number-sections --pdf-engine=xelatex
>   --template pandoc.latex test.docx metadata.yaml -o test10.pdf
>   couldn't parse docx file
>   This works without issue -
>   >pandoc --standalone --toc --number-sections --pdf-engine=xelatex
>   --template pandoc.latex test.docx -o test10.pdf
>   Now either I'm doing something I'm not supposed to (and it will be very
>   annoying to have to go through the intermediary step of converting to
>   markdown file first then to pdf), or something is wrong with my syntax.
>   The same command when used for a markdown file works without issue and
>   draws the variables we want from the markdown.yaml.
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/9256e740-7530-4dc2-
>   9acd-5d14eca6fc07%40googlegroups.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/9256e740-7530-4dc2-9acd-5d14eca6fc07-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]     ` <20171110044412.GI70590-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
@ 2017-11-10  5:10       ` sclarke-DpHT0TjK6O80n/F98K4Iww
       [not found]         ` <e741d764-809b-422b-aee1-75b4e4351569-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: sclarke-DpHT0TjK6O80n/F98K4Iww @ 2017-11-10  5:10 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4143 bytes --]

bugger so by design then :(

Looks like I'll have to edit the makefile to put in an intermediate step to 
cope with word users.
Really wanted to be editor agnostic and allow for most options in the 
conversion.

Cheers

On Friday, 10 November 2017 15:14:28 UTC+10:30, John MacFarlane wrote:
>
> YAML metadata only works with Markdown. 
>
> When you specify multiple files on the command line, 
> pandoc concatenates them together and treats them as 
> if they were one big file. 
>
> That works when one is yaml and the other is Markdown 
> (as long as the yaml has proper begin and end delimiters). 
>
> It doesn't work when one is yaml and the other is docx. 
>
> Related issue; 
> https://github.com/jgm/pandoc/issues/1960 
>
>
> +++ scl...-DpHT0TjK6O80n/F98K4Iww@public.gmane.org <javascript:> [Nov 09 17 15:32 ]: 
> >   OK not sure if this is by design or a bug ;) 
> >   Relatively new to pandoc and coversion, and am coming from the Windowz 
> >   world. 
> >   We have a variety of OSs and editors in use and want to ensure a 
> >   commonality of  output format with standardised frontice pages etc. 
> >   Version: pandoc 2.0.1.1 Compiled with pandoc-types 1.17.2, texmath 
> >   0.10, skylighting 0.4.3.2 
> >   We have created our own style file for formatting in latex, and our 
> own 
> >   pandoc.latex to generate the coverpage etc. 
> >   We have been extending into using a yaml metadata file to include 
> >   variables and elements (email address, mulitple authors etc) for our 
> >   reports. 
> >   We have a problem with the yaml metadata file causing failures with 
> the 
> >   conversion from docx to pdf. 
> >   The file works correctly when used against a markdown file to produce 
> a 
> >   pdf. 
> >   Behaviour: 
> >   This fails - 
> >   >pandoc --standalone --toc --number-sections --pdf-engine=xelatex 
> >   --template pandoc.latex test.docx metadata.yaml -o test10.pdf 
> >   couldn't parse docx file 
> >   This works without issue - 
> >   >pandoc --standalone --toc --number-sections --pdf-engine=xelatex 
> >   --template pandoc.latex test.docx -o test10.pdf 
> >   Now either I'm doing something I'm not supposed to (and it will be 
> very 
> >   annoying to have to go through the intermediary step of converting to 
> >   markdown file first then to pdf), or something is wrong with my 
> syntax. 
> >   The same command when used for a markdown file works without issue and 
> >   draws the variables we want from the markdown.yaml. 
> > 
> >   -- 
> >   You received this message because you are subscribed to the Google 
> >   Groups "pandoc-discuss" group. 
> >   To unsubscribe from this group and stop receiving emails from it, send 
> >   an email to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To post to this group, send email to 
> >   [2]pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To view this discussion on the web visit 
> >   [3]
> https://groups.google.com/d/msgid/pandoc-discuss/9256e740-7530-4dc2- 
> >   9acd-5d14eca6fc07%40googlegroups.com. 
> >   For more options, visit [4]https://groups.google.com/d/optout. 
> > 
> >References 
> > 
> >   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   2. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   3. 
> https://groups.google.com/d/msgid/pandoc-discuss/9256e740-7530-4dc2-9acd-5d14eca6fc07-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer 
> >   4. https://groups.google.com/d/optout 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e741d764-809b-422b-aee1-75b4e4351569%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 8398 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]         ` <e741d764-809b-422b-aee1-75b4e4351569-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-10 22:02           ` Kolen Cheung
  2017-11-10 22:06           ` Kolen Cheung
  1 sibling, 0 replies; 16+ messages in thread
From: Kolen Cheung @ 2017-11-10 22:02 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]

I think there’s an issue on allowing YAML front matter on any of the format that pandoc support. You might want to search it on GitHub. (But even in that case, what do YAMl with docx means? As said, they are cat together.) I recall one complication is that What format should be allowed within the YAML, because currently markdown is allowed in the YAML block.

Adding `.INTERMEDIATE` is simple in make. And it will automatically delete it after finished with it. You can also uses a few different tricks to avoid having intermediates. e.g. `pandoc -f docx -t md ... | pandoc -f md meta.yml - ...`. The `-` will make the stdin behaves like a file (well written cli tools support this). Another option is to use bash only syntax: `pandoc -f md ... meta.yml <(pandoc -f docx -t md ...) ...`. The `<()` is similar to `$()` but acts like a file instead. If you write your script like this, the intermediate file (that this commands used behind the scene) is in the RAM and hence faster than an intermediate written to disk if you use `.INTERMEDIATE`.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/db10fdc1-1d7e-4009-8ddb-a07b2bc59691%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]         ` <e741d764-809b-422b-aee1-75b4e4351569-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-11-10 22:02           ` Kolen Cheung
@ 2017-11-10 22:06           ` Kolen Cheung
       [not found]             ` <4710734d-f71c-45c9-b65f-27ebf81e29e1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Kolen Cheung @ 2017-11-10 22:06 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]

By the way, I would suggest not use markdown as the intermediate format. You can convert both (the metadata and docx) to native first and cat them together. A caveat is that the YAML in native would have a `[]` line in the end (because it has an empty body), so you’d want to remove it by `head -n -1`.

The reason is that you want to avoid the round trip that pandoc is doing internally (first native to md, then md to native). And since pandoc isn’t exactly idempotent, this not only make it faster, but also safe.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4710734d-f71c-45c9-b65f-27ebf81e29e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]             ` <4710734d-f71c-45c9-b65f-27ebf81e29e1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-12 21:34               ` sclarke-DpHT0TjK6O80n/F98K4Iww
       [not found]                 ` <85643608-a189-4407-8a21-37e5d75074fd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: sclarke-DpHT0TjK6O80n/F98K4Iww @ 2017-11-12 21:34 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1114 bytes --]

Hmm ok - I'll look into that :)

The workflow is probably more edge case from the standard use of pandoc for 
academic writing.

What we want to achieve:

User -> editor of choice -> save in native format
Apply YAML metadata with user details and extracted customer/client details 
from CRM
Apply customised style to ensure consistency
Apply template to ensure consistent cover page, TOC format, 
copyright/confidentiality notice, commercial in confidence footer

Export to PDF for all the good reasons you use PDF for official documents ;)


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/85643608-a189-4407-8a21-37e5d75074fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1688 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]                 ` <85643608-a189-4407-8a21-37e5d75074fd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-13  9:19                   ` albert.krewinkel-stqabkCVF6SGlKaCpJGLJw
       [not found]                     ` <323e407c-d9ae-429a-b951-1dc5b87a624b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: albert.krewinkel-stqabkCVF6SGlKaCpJGLJw @ 2017-11-13  9:19 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1792 bytes --]

Lua filters are my hammer which makes everything look like a nail, so here 
is a suggestion on using those to get what you want:

function Meta(meta)
  local f = io.open(meta.metadata_file, 'r')
  local content = f:read('*a')
  f:close()
  return pandoc.read(content).meta
end


Safe the above to a file and call pandoc with --lua-filter=<that-file>.lua 
and --metadata=metadata_file:<your-yaml-file>. It will overwrite all 
metadata using the contents of the yaml file.

See also this issue: https://github.com/jgm/pandoc/issues/3115

On Sunday, November 12, 2017 at 10:34:02 PM UTC+1, scl...-DpHT0TjK6O80n/F98K4Iww@public.gmane.org 
wrote:
>
> Hmm ok - I'll look into that :)
>
> The workflow is probably more edge case from the standard use of pandoc 
> for academic writing.
>
> What we want to achieve:
>
> User -> editor of choice -> save in native format
> Apply YAML metadata with user details and extracted customer/client 
> details from CRM
> Apply customised style to ensure consistency
> Apply template to ensure consistent cover page, TOC format, 
> copyright/confidentiality notice, commercial in confidence footer
>
> Export to PDF for all the good reasons you use PDF for official documents 
> ;)
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/323e407c-d9ae-429a-b951-1dc5b87a624b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 435534 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]                     ` <323e407c-d9ae-429a-b951-1dc5b87a624b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-13 23:37                       ` Kolen Cheung
       [not found]                         ` <2b1dba89-0544-49a0-954d-cad4c269fc78-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Kolen Cheung @ 2017-11-13 23:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2087 bytes --]

I think you nailed it ;)

This is safer and more reliable. May be put it somewhere as an example? 
(e.g. the lua filter doc.?)

On Monday, November 13, 2017 at 1:19:02 AM UTC-8, albert.k...-stqabkCVF6SGlKaCpJGLJw@public.gmane.org 
wrote:
>
> Lua filters are my hammer which makes everything look like a nail, so here 
> is a suggestion on using those to get what you want:
>
> function Meta(meta)
>   local f = io.open(meta.metadata_file, 'r')
>   local content = f:read('*a')
>   f:close()
>   return pandoc.read(content).meta
> end
>
>
> Safe the above to a file and call pandoc with --lua-filter=<that-file>.lua 
> and --metadata=metadata_file:<your-yaml-file>. It will overwrite all 
> metadata using the contents of the yaml file.
>
> See also this issue: https://github.com/jgm/pandoc/issues/3115
>
> On Sunday, November 12, 2017 at 10:34:02 PM UTC+1, scl...-DpHT0TjK6O80n/F98K4Iww@public.gmane.org 
> wrote:
>>
>> Hmm ok - I'll look into that :)
>>
>> The workflow is probably more edge case from the standard use of pandoc 
>> for academic writing.
>>
>> What we want to achieve:
>>
>> User -> editor of choice -> save in native format
>> Apply YAML metadata with user details and extracted customer/client 
>> details from CRM
>> Apply customised style to ensure consistency
>> Apply template to ensure consistent cover page, TOC format, 
>> copyright/confidentiality notice, commercial in confidence footer
>>
>> Export to PDF for all the good reasons you use PDF for official documents 
>> ;)
>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2b1dba89-0544-49a0-954d-cad4c269fc78%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5086 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]                         ` <2b1dba89-0544-49a0-954d-cad4c269fc78-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-13 23:53                           ` sclarke-DpHT0TjK6O80n/F98K4Iww
  2017-11-14  7:50                           ` albert.krewinkel-stqabkCVF6SGlKaCpJGLJw
  1 sibling, 0 replies; 16+ messages in thread
From: sclarke-DpHT0TjK6O80n/F98K4Iww @ 2017-11-13 23:53 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2573 bytes --]

That's done it :)

pandoc --standalone --toc --number-sections --pdf-engine=xelatex --template 
pandoc.latex test.docx -o test99.pdf --lua-filter=meta.lua 
--metadata=metadata_file:metadata.yaml

output PDF looks exactly the same as 

pandoc --standalone --toc --number-sections --pdf-engine=xelatex --template 
pandoc.latex metadata.yaml test4.md -o test4.pdf

On Tuesday, 14 November 2017 10:07:53 UTC+10:30, Kolen Cheung wrote:
>
> I think you nailed it ;)
>
> This is safer and more reliable. May be put it somewhere as an example? 
> (e.g. the lua filter doc.?)
>
> On Monday, November 13, 2017 at 1:19:02 AM UTC-8, 
> albert.k...-stqabkCVF6SGlKaCpJGLJw@public.gmane.org wrote:
>>
>> Lua filters are my hammer which makes everything look like a nail, so 
>> here is a suggestion on using those to get what you want:
>>
>> function Meta(meta)
>>   local f = io.open(meta.metadata_file, 'r')
>>   local content = f:read('*a')
>>   f:close()
>>   return pandoc.read(content).meta
>> end
>>
>>
>> Safe the above to a file and call pandoc with 
>> --lua-filter=<that-file>.lua and --metadata=metadata_file:<your-yaml-file>. 
>> It will overwrite all metadata using the contents of the yaml file.
>>
>> See also this issue: https://github.com/jgm/pandoc/issues/3115
>>
>> On Sunday, November 12, 2017 at 10:34:02 PM UTC+1, scl...-DpHT0TjK6O80n/F98K4Iww@public.gmane.org 
>> wrote:
>>>
>>> Hmm ok - I'll look into that :)
>>>
>>> The workflow is probably more edge case from the standard use of pandoc 
>>> for academic writing.
>>>
>>> What we want to achieve:
>>>
>>> User -> editor of choice -> save in native format
>>> Apply YAML metadata with user details and extracted customer/client 
>>> details from CRM
>>> Apply customised style to ensure consistency
>>> Apply template to ensure consistent cover page, TOC format, 
>>> copyright/confidentiality notice, commercial in confidence footer
>>>
>>> Export to PDF for all the good reasons you use PDF for official 
>>> documents ;)
>>>
>>>
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1ac11a31-70c6-4b09-9009-3e39f6893f3d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5710 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]                         ` <2b1dba89-0544-49a0-954d-cad4c269fc78-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-11-13 23:53                           ` sclarke-DpHT0TjK6O80n/F98K4Iww
@ 2017-11-14  7:50                           ` albert.krewinkel-stqabkCVF6SGlKaCpJGLJw
       [not found]                             ` <600cb80b-0753-4823-acd3-a05cae7577e8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: albert.krewinkel-stqabkCVF6SGlKaCpJGLJw @ 2017-11-14  7:50 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1011 bytes --]

On Tuesday, November 14, 2017 at 12:37:53 AM UTC+1, Kolen Cheung wrote:
>
> I think you nailed it ;)
>

Made my morning :D

May be put it somewhere as an example? (e.g. the lua filter doc.?)
>

There is something related, a filter to add default values if a meta value 
is unset: http://pandoc.org/lua-filters.html#default-metadata-file

I seem to remember that you maintained a collection of pandoc filters. Is 
there a repo somewhere?
 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/600cb80b-0753-4823-acd3-a05cae7577e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]                             ` <600cb80b-0753-4823-acd3-a05cae7577e8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-14 12:12                               ` Kolen Cheung
       [not found]                                 ` <17d771db-6489-402c-afa6-d948ca8e9cac-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Kolen Cheung @ 2017-11-14 12:12 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 2671 bytes --]

essentially pandocpm is supposed to do that, 
in https://github.com/pandoc-extras

The development of pandocpm has stalled for various reasons (one primary 
reason is the change of funding starting from January so I no longer worked 
on a project that uses pandoc extensively, which has given me lots of time 
to do pandoc related stuffs in Fall semester last year).

While pandocpm sort of work, there's still some major changes needed. The 
new lua filtering system in pandoc 2.0 certainly sparks my interest in 
completing this tool again. e.g. we had excuses to not have our own package 
manager because we can rely on those existed in the language we wrote it. 
But since pandoc embedded the lua interpreter, the only thing needed for a 
lua filter is really only the filter itself (so in this case relying on 
some sort of lua package manager doesn't make sense). Now there's still one 
imperfection though, that pandocpm is written in Python. Do you think it is 
possible to rewrite pandocpm in lua, using only the embedded lua 
interpreter alone? I haven't used lua at all, but I heard that in lua 
there's minimal "standard library" so I'm not sure how much functionality 
has been embedded. From some examples you and @jgm did, there's already IO. 
And then pandoc certainly handles YAML (but is there YAML library available 
in the embedded lua interpreter? That might make it easier.) And then 
pandocpm only need to access the DATADIR, and I imagine it could be done to 
let pandoc passes it to the script?

On Monday, November 13, 2017 at 11:50:05 PM UTC-8, 
albert.k...-stqabkCVF6SGlKaCpJGLJw@public.gmane.org wrote:
>
> On Tuesday, November 14, 2017 at 12:37:53 AM UTC+1, Kolen Cheung wrote:
>>
>> I think you nailed it ;)
>>
>
> Made my morning :D
>
> May be put it somewhere as an example? (e.g. the lua filter doc.?)
>>
>
> There is something related, a filter to add default values if a meta value 
> is unset: http://pandoc.org/lua-filters.html#default-metadata-file
>
> I seem to remember that you maintained a collection of pandoc filters. Is 
> there a repo somewhere?
>  
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/17d771db-6489-402c-afa6-d948ca8e9cac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4215 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]                                 ` <17d771db-6489-402c-afa6-d948ca8e9cac-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-15 13:38                                   ` albert.krewinkel-stqabkCVF6SGlKaCpJGLJw
       [not found]                                     ` <b8cdcc3a-105a-405f-b2a2-188cdc053abb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: albert.krewinkel-stqabkCVF6SGlKaCpJGLJw @ 2017-11-15 13:38 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2325 bytes --]

One possibility would be to pack all collected lua filters up into a single 
luarock, and to use the luarocks module management system to deal with 
downloads/updates.  This would allow the definition of dependencies on 
other lua packages while keeping things reasonably simple. The disadvantage 
is that one of the main benefits of lua filters, namely independence from 
other software, would be weakened.

I believe collecting filters in a central place is a very good start, 
either way.


On Tuesday, November 14, 2017 at 1:12:27 PM UTC+1, Kolen Cheung wrote:
>
> While pandocpm sort of work, there's still some major changes needed. The 
> new lua filtering system in pandoc 2.0 certainly sparks my interest in 
> completing this tool again. e.g. we had excuses to not have our own package 
> manager because we can rely on those existed in the language we wrote it. 
> But since pandoc embedded the lua interpreter, the only thing needed for a 
> lua filter is really only the filter itself (so in this case relying on 
> some sort of lua package manager doesn't make sense). Now there's still one 
> imperfection though, that pandocpm is written in Python. Do you think it is 
> possible to rewrite pandocpm in lua, using only the embedded lua 
> interpreter alone? I haven't used lua at all, but I heard that in lua 
> there's minimal "standard library" so I'm not sure how much functionality 
> has been embedded. From some examples you and @jgm did, there's already IO. 
> And then pandoc certainly handles YAML (but is there YAML library available 
> in the embedded lua interpreter? That might make it easier.) And then 
> pandocpm only need to access the DATADIR, and I imagine it could be done to 
> let pandoc passes it to the script?
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b8cdcc3a-105a-405f-b2a2-188cdc053abb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2894 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: docx to pdf conversion error
       [not found]                                     ` <b8cdcc3a-105a-405f-b2a2-188cdc053abb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-11-15 17:43                                       ` Kolen Cheung
  0 siblings, 0 replies; 16+ messages in thread
From: Kolen Cheung @ 2017-11-15 17:43 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3482 bytes --]

Another idea I have is to just rely on `git clone...` for bare-bone 
features. For purists that don't want/need to install yet another 
dependencies and stacks, they can just copy and paste the one-liner to `git 
clone` it to the pandoc data-dir. For people don't mind installing one more 
thing, then they can use pandocpm to get more control (since pandocpm is 
designed to be agnostic to what language the filter is written, but just 
copy a single self-contained file to the data-dir).

One thing I worried about use `git clone` though is that git is also an 
external dependency. On Linux it isn't a problem. On Mac, git is not 
shipped with macOS by default, but requires at least the Xcode command line 
tool. A quick search seems to suggest git is not available by default on 
Windows either.

This situation is quite similar to how one should install the pandoc 
templates (pandocpm can help this too), except that it is optional to 
install these if the default embedded in pandoc is fine.

On Wednesday, November 15, 2017 at 5:38:39 AM UTC-8, 
albert.k...-stqabkCVF6SGlKaCpJGLJw@public.gmane.org wrote:
>
> One possibility would be to pack all collected lua filters up into a 
> single luarock, and to use the luarocks module management system to deal 
> with downloads/updates.  This would allow the definition of dependencies on 
> other lua packages while keeping things reasonably simple. The disadvantage 
> is that one of the main benefits of lua filters, namely independence from 
> other software, would be weakened.
>
> I believe collecting filters in a central place is a very good start, 
> either way.
>
>
> On Tuesday, November 14, 2017 at 1:12:27 PM UTC+1, Kolen Cheung wrote:
>>
>> While pandocpm sort of work, there's still some major changes needed. The 
>> new lua filtering system in pandoc 2.0 certainly sparks my interest in 
>> completing this tool again. e.g. we had excuses to not have our own package 
>> manager because we can rely on those existed in the language we wrote it. 
>> But since pandoc embedded the lua interpreter, the only thing needed for a 
>> lua filter is really only the filter itself (so in this case relying on 
>> some sort of lua package manager doesn't make sense). Now there's still one 
>> imperfection though, that pandocpm is written in Python. Do you think it is 
>> possible to rewrite pandocpm in lua, using only the embedded lua 
>> interpreter alone? I haven't used lua at all, but I heard that in lua 
>> there's minimal "standard library" so I'm not sure how much functionality 
>> has been embedded. From some examples you and @jgm did, there's already IO. 
>> And then pandoc certainly handles YAML (but is there YAML library available 
>> in the embedded lua interpreter? That might make it easier.) And then 
>> pandocpm only need to access the DATADIR, and I imagine it could be done to 
>> let pandoc passes it to the script?
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7cf55037-9500-42c9-b446-706124f870c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4212 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-11-15 17:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09 23:32 docx to pdf conversion error sclarke-DpHT0TjK6O80n/F98K4Iww
     [not found] ` <9256e740-7530-4dc2-9acd-5d14eca6fc07-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-10  3:58   ` John Muccigrosso
     [not found]     ` <ac6b1f81-80bf-4e92-a65c-db089def2c19-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-10  4:02       ` sclarke-DpHT0TjK6O80n/F98K4Iww
     [not found]         ` <d1935dae-097e-447c-bd10-a8265ba169db-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-10  4:02           ` sclarke-DpHT0TjK6O80n/F98K4Iww
2017-11-10  4:44   ` John MacFarlane
     [not found]     ` <20171110044412.GI70590-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
2017-11-10  5:10       ` sclarke-DpHT0TjK6O80n/F98K4Iww
     [not found]         ` <e741d764-809b-422b-aee1-75b4e4351569-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-10 22:02           ` Kolen Cheung
2017-11-10 22:06           ` Kolen Cheung
     [not found]             ` <4710734d-f71c-45c9-b65f-27ebf81e29e1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-12 21:34               ` sclarke-DpHT0TjK6O80n/F98K4Iww
     [not found]                 ` <85643608-a189-4407-8a21-37e5d75074fd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-13  9:19                   ` albert.krewinkel-stqabkCVF6SGlKaCpJGLJw
     [not found]                     ` <323e407c-d9ae-429a-b951-1dc5b87a624b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-13 23:37                       ` Kolen Cheung
     [not found]                         ` <2b1dba89-0544-49a0-954d-cad4c269fc78-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-13 23:53                           ` sclarke-DpHT0TjK6O80n/F98K4Iww
2017-11-14  7:50                           ` albert.krewinkel-stqabkCVF6SGlKaCpJGLJw
     [not found]                             ` <600cb80b-0753-4823-acd3-a05cae7577e8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-14 12:12                               ` Kolen Cheung
     [not found]                                 ` <17d771db-6489-402c-afa6-d948ca8e9cac-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-15 13:38                                   ` albert.krewinkel-stqabkCVF6SGlKaCpJGLJw
     [not found]                                     ` <b8cdcc3a-105a-405f-b2a2-188cdc053abb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-11-15 17:43                                       ` Kolen Cheung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).