When Pandoc creates ODT file from HTML containing SVG images, it losslessly
embeds the images as-is. That's a problem for me, because my pipeline does
some automatic transformations to the document using LibreOffice UNO API
and ultimately saves as DOCX. When LibreOffice saves a ODT file containing
the SVG images into DOCX, it rasterizes the images in a very poor
resolution, that is according to folks in the LibreOffice forum,
uncontrollable. But there is a trick: I can use Inkscape to do convert the
SVG images into EMF. EMF files are not rasterized by LibreOffice when it
saves the document as DOCX.
The problem is that the EMF files have obviously different binary content
than SVG originals. When I replace them in the "Pictures/" folder inside
the ODT, LibreOffice notices that the file name of the EMF pictures does
not match their hash and claims the "image is corrupted" and gives an
option to repair. Unfortunately, that repair dialog cannot get automated in
the headless environment, which means I need to know how to make the
"non-broken" ODT document in the first place. For that I need to know the
hashing scheme.
I tried to read the Pandoc sources to get the answer myself, but my zero
knowledge of Haskell is a major obstacle.
My gut feeling says the answer is somewhere in the
`pandoc/src/Text/Pandoc/Writers/OpenDocument.hs`.
On Saturday, December 2, 2023 at 9:05:42 PM UTC John MacFarlane wrote:
> Why is it necessary to do this? Docx can handle svgs, can't it?
>
> > On Dec 2, 2023, at 6:16 AM, Adam Ryczkowski
> wrote:
> >
> > Hi!
> >
> > I write a script that replaces "svg" images with "emf" in the odt in
> order to allow lossless convertion to "docx" format using LibreOffice.
> >
> > The problem is that mere replacing the files and fixing the
> `content.xml` does not suffice. The image file name is some form of hash of
> its contents. If the contents does not match, Libreoffice reports the
> document to be "broken" (but allows to repair). Alas, this repair cannot be
> automated.
> >
> > I tried to get that from the Pandoc source code, but Haskell's syntax
> seem too alien to me.
> >
> > What is the naming convention for the files in the Pictures/ folder?
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/88f8c8dc-b9b6-4e6e-91ce-75e08412e466n%40googlegroups.com
> .
>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/734880a0-db9e-4855-b228-22902fbb387an%40googlegroups.com.