public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* MS Word document always differs
@ 2022-01-26  9:40 Nandakumar Chandrasekhar
       [not found] ` <a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Nandakumar Chandrasekhar @ 2022-01-26  9:40 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1454 bytes --]

Dear Folks,

I am in the process of writing a lua-filter that modifies the font of 
font-awesome icons added to a Word/DOCX document.

I am finding that if I were to generate a file called expected.docx from 
the code:

pandoc -f markdown -t docx --reference-doc sample-reference.docx -o 
expected.docx sample.md

and then generate another docx file with a different name using the same 
source file and reference docx as below:

pandoc -f markdown -t docx --reference-doc sample-reference.docx -o 
sample.docx sample.md

The two file always differ when I try:

diff expected.docx sample.docx

I do not see why they should differ when they were created with the same 
parameters and source file with only the name changing.

What alternatives do I have to stop making the files differ.

I need to write tests for my lua filter to be accepted into 
pandoc/lua-filters.

Therefore, I need to make sure that the expected output and the generated 
output are exactly the same to pass the test.

I hope someone can lend some insight.

Many thanks.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1921 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: MS Word document always differs
       [not found] ` <a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-01-26  9:49   ` William Lupton
       [not found]     ` <CAEe_xxjoHA0sta+4=eRu4xuYS44e2tpu4_74mmKMjc49e5=fow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: William Lupton @ 2022-01-26  9:49 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 2928 bytes --]

Just commenting on this:

> The two file always differ when I try: diff expected.docx sample.docx. I
do not see why they should differ when they were created with the same
parameters and source file with only the name changing.

A docx file is a ZIP that contains many resources including a file
called docProps/core.xml that includes "created" and "modified" properties
that indicate when the document was created and modified.

Therefore I don't think that you can assume that two docx files with
identical content are in fact identical.

Perhaps the word/document.xml files (also in the ZIP) will be identical, or
perhaps it would be better to convert to a different format for comparison
purposes?

William

On Wed, 26 Jan 2022 at 09:40, Nandakumar Chandrasekhar <
navanitachora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Dear Folks,
>
> I am in the process of writing a lua-filter that modifies the font of
> font-awesome icons added to a Word/DOCX document.
>
> I am finding that if I were to generate a file called expected.docx from
> the code:
>
> pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
> expected.docx sample.md
>
> and then generate another docx file with a different name using the same
> source file and reference docx as below:
>
> pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
> sample.docx sample.md
>
> The two file always differ when I try:
>
> diff expected.docx sample.docx
>
> I do not see why they should differ when they were created with the same
> parameters and source file with only the name changing.
>
> What alternatives do I have to stop making the files differ.
>
> I need to write tests for my lua filter to be accepted into
> pandoc/lua-filters.
>
> Therefore, I need to make sure that the expected output and the generated
> output are exactly the same to pass the test.
>
> I hope someone can lend some insight.
>
> Many thanks.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxjoHA0sta%2B4%3DeRu4xuYS44e2tpu4_74mmKMjc49e5%3Dfow%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4038 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: MS Word document always differs
       [not found]     ` <CAEe_xxjoHA0sta+4=eRu4xuYS44e2tpu4_74mmKMjc49e5=fow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2022-01-26 10:49       ` Nandakumar Chandrasekhar
       [not found]         ` <b8ce8324-2f09-4448-a38a-702e2a0ea3e7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2022-01-26 18:55       ` John MacFarlane
  1 sibling, 1 reply; 7+ messages in thread
From: Nandakumar Chandrasekhar @ 2022-01-26 10:49 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3325 bytes --]

Thanks for the explanation William, I now understand.

After further research, I found the way you suggested by converting the 
generated files back to markdown and then doing a diff on those.

Thank you for confirming this methodology.

Cheers.

On Wednesday, January 26, 2022 at 3:20:10 PM UTC+5:30 William Lupton wrote:

> Just commenting on this:
>
> > The two file always differ when I try: diff expected.docx sample.docx. I 
> do not see why they should differ when they were created with the same 
> parameters and source file with only the name changing.
>
> A docx file is a ZIP that contains many resources including a file 
> called docProps/core.xml that includes "created" and "modified" properties 
> that indicate when the document was created and modified.
>
> Therefore I don't think that you can assume that two docx files with 
> identical content are in fact identical.
>
> Perhaps the word/document.xml files (also in the ZIP) will be identical, 
> or perhaps it would be better to convert to a different format for 
> comparison purposes?
>
> William
>
> On Wed, 26 Jan 2022 at 09:40, Nandakumar Chandrasekhar <
> navani...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> Dear Folks,
>>
>> I am in the process of writing a lua-filter that modifies the font of 
>> font-awesome icons added to a Word/DOCX document.
>>
>> I am finding that if I were to generate a file called expected.docx from 
>> the code:
>>
>> pandoc -f markdown -t docx --reference-doc sample-reference.docx -o 
>> expected.docx sample.md
>>
>> and then generate another docx file with a different name using the same 
>> source file and reference docx as below:
>>
>> pandoc -f markdown -t docx --reference-doc sample-reference.docx -o 
>> sample.docx sample.md
>>
>> The two file always differ when I try:
>>
>> diff expected.docx sample.docx
>>
>> I do not see why they should differ when they were created with the same 
>> parameters and source file with only the name changing.
>>
>> What alternatives do I have to stop making the files differ.
>>
>> I need to write tests for my lua filter to be accepted into 
>> pandoc/lua-filters.
>>
>> Therefore, I need to make sure that the expected output and the generated 
>> output are exactly the same to pass the test.
>>
>> I hope someone can lend some insight.
>>
>> Many thanks.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com 
>> <https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b8ce8324-2f09-4448-a38a-702e2a0ea3e7n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5061 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: MS Word document always differs
       [not found]         ` <b8ce8324-2f09-4448-a38a-702e2a0ea3e7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-01-26 12:14           ` Bastien DUMONT
  2022-01-26 17:02             ` Nandakumar Chandrasekhar
  0 siblings, 1 reply; 7+ messages in thread
From: Bastien DUMONT @ 2022-01-26 12:14 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Alternatively, you can export to native for testing purposes. It preserves the raw OOXML code added by your filter, if any, and will make your test files independant of potential changes in Pandoc writers.

Le Wednesday 26 January 2022 à 02:49:34AM, Nandakumar Chandrasekhar a écrit :
> Thanks for the explanation William, I now understand.
> 
> After further research, I found the way you suggested by converting the
> generated files back to markdown and then doing a diff on those.
> 
> Thank you for confirming this methodology.
> 
> Cheers.
> 
> On Wednesday, January 26, 2022 at 3:20:10 PM UTC+5:30 William Lupton wrote:
> 
>     Just commenting on this:
> 
>     > The two file always differ when I try: diff expected.docx sample.docx. I
>     do not see why they should differ when they were created with the same
>     parameters and source file with only the name changing.
> 
>     A docx file is a ZIP that contains many resources including a file
>     called docProps/core.xml that includes "created" and "modified" properties
>     that indicate when the document was created and modified.
> 
>     Therefore I don't think that you can assume that two docx files with
>     identical content are in fact identical.
> 
>     Perhaps the word/document.xml files (also in the ZIP) will be identical, or
>     perhaps it would be better to convert to a different format for comparison
>     purposes?
> 
>     William
> 
>     On Wed, 26 Jan 2022 at 09:40, Nandakumar Chandrasekhar <
>     navani...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
>         Dear Folks,
> 
>         I am in the process of writing a lua-filter that modifies the font of
>         font-awesome icons added to a Word/DOCX document.
> 
>         I am finding that if I were to generate a file called expected.docx
>         from the code:
> 
>         pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
>         expected.docx sample.md
> 
>         and then generate another docx file with a different name using the
>         same source file and reference docx as below:
> 
>         pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
>         sample.docx sample.md
> 
>         The two file always differ when I try:
> 
>         diff expected.docx sample.docx
> 
>         I do not see why they should differ when they were created with the
>         same parameters and source file with only the name changing.
> 
>         What alternatives do I have to stop making the files differ.
> 
>         I need to write tests for my lua filter to be accepted into pandoc/
>         lua-filters.
> 
>         Therefore, I need to make sure that the expected output and the
>         generated output are exactly the same to pass the test.
> 
>         I hope someone can lend some insight.
> 
>         Many thanks.
> 
>         --
>         You received this message because you are subscribed to the Google
>         Groups "pandoc-discuss" group.
>         To unsubscribe from this group and stop receiving emails from it, send
>         an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>         To view this discussion on the web visit [1]https://groups.google.com/d
>         /msgid/pandoc-discuss/
>         a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com.
> 
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to [2]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit [3]https://groups.google.com/d/msgid/
> pandoc-discuss/b8ce8324-2f09-4448-a38a-702e2a0ea3e7n%40googlegroups.com.
> 
> References:
> 
> [1] https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com?utm_medium=email&utm_source=footer
> [2] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> [3] https://groups.google.com/d/msgid/pandoc-discuss/b8ce8324-2f09-4448-a38a-702e2a0ea3e7n%40googlegroups.com?utm_medium=email&utm_source=footer

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/YfE7DhwO3/NohUfT%40localhost.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: MS Word document always differs
  2022-01-26 12:14           ` Bastien DUMONT
@ 2022-01-26 17:02             ` Nandakumar Chandrasekhar
  0 siblings, 0 replies; 7+ messages in thread
From: Nandakumar Chandrasekhar @ 2022-01-26 17:02 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4862 bytes --]

@Bastien Dumont I did not know about the native format. It does make a lot 
more sense to compare the AST than a particular format which as you have 
said is subject to change.

Cheers

On Wednesday, January 26, 2022 at 5:45:43 PM UTC+5:30 Bastien Dumont wrote:

> Alternatively, you can export to native for testing purposes. It preserves 
> the raw OOXML code added by your filter, if any, and will make your test 
> files independant of potential changes in Pandoc writers.
>
> Le Wednesday 26 January 2022 à 02:49:34AM, Nandakumar Chandrasekhar a 
> écrit :
> > Thanks for the explanation William, I now understand.
> > 
> > After further research, I found the way you suggested by converting the
> > generated files back to markdown and then doing a diff on those.
> > 
> > Thank you for confirming this methodology.
> > 
> > Cheers.
> > 
> > On Wednesday, January 26, 2022 at 3:20:10 PM UTC+5:30 William Lupton 
> wrote:
> > 
> > Just commenting on this:
> > 
> > > The two file always differ when I try: diff expected.docx sample.docx. 
> I
> > do not see why they should differ when they were created with the same
> > parameters and source file with only the name changing.
> > 
> > A docx file is a ZIP that contains many resources including a file
> > called docProps/core.xml that includes "created" and "modified" 
> properties
> > that indicate when the document was created and modified.
> > 
> > Therefore I don't think that you can assume that two docx files with
> > identical content are in fact identical.
> > 
> > Perhaps the word/document.xml files (also in the ZIP) will be identical, 
> or
> > perhaps it would be better to convert to a different format for 
> comparison
> > purposes?
> > 
> > William
> > 
> > On Wed, 26 Jan 2022 at 09:40, Nandakumar Chandrasekhar <
> > navani...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > 
> > Dear Folks,
> > 
> > I am in the process of writing a lua-filter that modifies the font of
> > font-awesome icons added to a Word/DOCX document.
> > 
> > I am finding that if I were to generate a file called expected.docx
> > from the code:
> > 
> > pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
> > expected.docx sample.md
> > 
> > and then generate another docx file with a different name using the
> > same source file and reference docx as below:
> > 
> > pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
> > sample.docx sample.md
> > 
> > The two file always differ when I try:
> > 
> > diff expected.docx sample.docx
> > 
> > I do not see why they should differ when they were created with the
> > same parameters and source file with only the name changing.
> > 
> > What alternatives do I have to stop making the files differ.
> > 
> > I need to write tests for my lua filter to be accepted into pandoc/
> > lua-filters.
> > 
> > Therefore, I need to make sure that the expected output and the
> > generated output are exactly the same to pass the test.
> > 
> > I hope someone can lend some insight.
> > 
> > Many thanks.
> > 
> > --
> > You received this message because you are subscribed to the Google
> > Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit [1]https://groups.google.com/d
> > /msgid/pandoc-discuss/
> > a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com.
> > 
> > --
> > You received this message because you are subscribed to the Google Groups
> > "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email
> > to [2]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit [3]
> https://groups.google.com/d/msgid/
> > pandoc-discuss/b8ce8324-2f09-4448-a38a-702e2a0ea3e7n%40googlegroups.com.
> > 
> > References:
> > 
> > [1] 
> https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com?utm_medium=email&utm_source=footer
> > [2] mailto:pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> > [3] 
> https://groups.google.com/d/msgid/pandoc-discuss/b8ce8324-2f09-4448-a38a-702e2a0ea3e7n%40googlegroups.com?utm_medium=email&utm_source=footer
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/176295a0-ed8d-4953-a852-792f9481e648n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 8259 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: MS Word document always differs
       [not found]     ` <CAEe_xxjoHA0sta+4=eRu4xuYS44e2tpu4_74mmKMjc49e5=fow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2022-01-26 10:49       ` Nandakumar Chandrasekhar
@ 2022-01-26 18:55       ` John MacFarlane
       [not found]         ` <yh480kzgnie38m.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: John MacFarlane @ 2022-01-26 18:55 UTC (permalink / raw)
  To: William Lupton, pandoc-discuss


See the manual:
https://pandoc.org/MANUAL.html#reproducible-builds

William Lupton <wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org> writes:

> Just commenting on this:
>
>> The two file always differ when I try: diff expected.docx sample.docx. I
> do not see why they should differ when they were created with the same
> parameters and source file with only the name changing.
>
> A docx file is a ZIP that contains many resources including a file
> called docProps/core.xml that includes "created" and "modified" properties
> that indicate when the document was created and modified.
>
> Therefore I don't think that you can assume that two docx files with
> identical content are in fact identical.
>
> Perhaps the word/document.xml files (also in the ZIP) will be identical, or
> perhaps it would be better to convert to a different format for comparison
> purposes?
>
> William
>
> On Wed, 26 Jan 2022 at 09:40, Nandakumar Chandrasekhar <
> navanitachora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> Dear Folks,
>>
>> I am in the process of writing a lua-filter that modifies the font of
>> font-awesome icons added to a Word/DOCX document.
>>
>> I am finding that if I were to generate a file called expected.docx from
>> the code:
>>
>> pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
>> expected.docx sample.md
>>
>> and then generate another docx file with a different name using the same
>> source file and reference docx as below:
>>
>> pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
>> sample.docx sample.md
>>
>> The two file always differ when I try:
>>
>> diff expected.docx sample.docx
>>
>> I do not see why they should differ when they were created with the same
>> parameters and source file with only the name changing.
>>
>> What alternatives do I have to stop making the files differ.
>>
>> I need to write tests for my lua filter to be accepted into
>> pandoc/lua-filters.
>>
>> Therefore, I need to make sure that the expected output and the generated
>> output are exactly the same to pass the test.
>>
>> I hope someone can lend some insight.
>>
>> Many thanks.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com
>> <https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxjoHA0sta%2B4%3DeRu4xuYS44e2tpu4_74mmKMjc49e5%3Dfow%40mail.gmail.com.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: MS Word document always differs
       [not found]         ` <yh480kzgnie38m.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2022-01-27  3:28           ` Nandakumar Chandrasekhar
  0 siblings, 0 replies; 7+ messages in thread
From: Nandakumar Chandrasekhar @ 2022-01-27  3:28 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4068 bytes --]

Thank you so much @jgm for pointing out how I could do this just by diffing 
the expected and generated docx document.

I will set the SOURCE_DATE_EPOCH environment variable in the Makefile 
before creating the document.

Many thanks.

On Thursday, January 27, 2022 at 12:25:44 AM UTC+5:30 John MacFarlane wrote:

>
> See the manual:
> https://pandoc.org/MANUAL.html#reproducible-builds
>
> William Lupton <wlu...-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org> writes:
>
> > Just commenting on this:
> >
> >> The two file always differ when I try: diff expected.docx sample.docx. I
> > do not see why they should differ when they were created with the same
> > parameters and source file with only the name changing.
> >
> > A docx file is a ZIP that contains many resources including a file
> > called docProps/core.xml that includes "created" and "modified" 
> properties
> > that indicate when the document was created and modified.
> >
> > Therefore I don't think that you can assume that two docx files with
> > identical content are in fact identical.
> >
> > Perhaps the word/document.xml files (also in the ZIP) will be identical, 
> or
> > perhaps it would be better to convert to a different format for 
> comparison
> > purposes?
> >
> > William
> >
> > On Wed, 26 Jan 2022 at 09:40, Nandakumar Chandrasekhar <
> > navani...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> >> Dear Folks,
> >>
> >> I am in the process of writing a lua-filter that modifies the font of
> >> font-awesome icons added to a Word/DOCX document.
> >>
> >> I am finding that if I were to generate a file called expected.docx from
> >> the code:
> >>
> >> pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
> >> expected.docx sample.md
> >>
> >> and then generate another docx file with a different name using the same
> >> source file and reference docx as below:
> >>
> >> pandoc -f markdown -t docx --reference-doc sample-reference.docx -o
> >> sample.docx sample.md
> >>
> >> The two file always differ when I try:
> >>
> >> diff expected.docx sample.docx
> >>
> >> I do not see why they should differ when they were created with the same
> >> parameters and source file with only the name changing.
> >>
> >> What alternatives do I have to stop making the files differ.
> >>
> >> I need to write tests for my lua filter to be accepted into
> >> pandoc/lua-filters.
> >>
> >> Therefore, I need to make sure that the expected output and the 
> generated
> >> output are exactly the same to pass the test.
> >>
> >> I hope someone can lend some insight.
> >>
> >> Many thanks.
> >>
> >> --
> >> You received this message because you are subscribed to the Google 
> Groups
> >> "pandoc-discuss" group.
> >> To unsubscribe from this group and stop receiving emails from it, send 
> an
> >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> >> To view this discussion on the web visit
> >> 
> https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com
> >> <
> https://groups.google.com/d/msgid/pandoc-discuss/a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en%40googlegroups.com?utm_medium=email&utm_source=footer
> >
> >> .
> >>
> >
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxjoHA0sta%2B4%3DeRu4xuYS44e2tpu4_74mmKMjc49e5%3Dfow%40mail.gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/868dd83d-6800-4888-9dd4-4729b15eb4fen%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 6974 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-27  3:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-26  9:40 MS Word document always differs Nandakumar Chandrasekhar
     [not found] ` <a3d1fad4-91f6-4495-aa4c-874f6ca5bb6en-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-01-26  9:49   ` William Lupton
     [not found]     ` <CAEe_xxjoHA0sta+4=eRu4xuYS44e2tpu4_74mmKMjc49e5=fow-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-01-26 10:49       ` Nandakumar Chandrasekhar
     [not found]         ` <b8ce8324-2f09-4448-a38a-702e2a0ea3e7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-01-26 12:14           ` Bastien DUMONT
2022-01-26 17:02             ` Nandakumar Chandrasekhar
2022-01-26 18:55       ` John MacFarlane
     [not found]         ` <yh480kzgnie38m.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2022-01-27  3:28           ` Nandakumar Chandrasekhar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).