public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org,
	Sean Winslow <mrspot-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
Date: Mon, 24 Jul 2017 21:27:28 +0200	[thread overview]
Message-ID: <07b8ac66-75e5-04b8-b39c-d60157171baf@gmail.com> (raw)
In-Reply-To: <261e84b1-9891-465a-a21e-80a61b9e98c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>

Den 2017-07-24 kl. 17:01, skrev Sean Winslow:

 > BPJ,
 >
 > I know nothing about filters in pandoc--what would you suggest as
 > a starting place to learn more? Would these potentially help me
 > with any of the issues above?
 >

Actually the main problem with your LaTeX is that you are using 
the legacy LaTeX accent commands instead of actual Unicode characters.
For one thing you shouldn't do that, because the main reason for 
using XeTeX or LuaTeX is that they handle Unicode natively.
Secondly it is exactly the legacy accent commands which throw 
Pandoc in your MWE. Once I had converted the legacy commands to 
their Unicode equivalents your Pandoc converted your MWE to DOCX 
just fine. (As I don't have Word I've checked it in LibreOffice, 
where it looks OK.) Luckily you don't need to convert all those 
legacy commands by hand. There is a Perl module LaTeX::Decode 
which does that for you. Unfortunately there is a bug in the 
command line script coming with the module, but I have written my 
own CLI script which doesn't have that bug. :-)

Since you are on a Mac you should have a new enough version of 
perl installed already.  All you should need to do is to download 
my script from <https://git.io/v7to6> unpack the contents into the 
same directory (aka folder) as your original LaTeX file and run 
the following commands:

     cpan App::cpanminus

     cpanm LaTeX::Decode Encode Unicode::Normalize Getopt::Long 
Pod::Usage

     perl ltx2utf8.pl nameofyourlatexfile.tex | pandoc -r latex -o 
nameofyourdocxfile.docx

That will at least take care of the diacritics. Other fancy things 
you have used like tikz will need to be addressed separately. I 
have a somewhat working script to extract tikzpictures from a 
LaTeX file, compile each to a PDF and print out the LaTeX file 
with each `\begin{tikzpicture}...\end{tikzpicture}` replaced with 
a `\includegraphics{...}` pointing to the right PDF file. I just 
tried converting a LaTeX file thus processed to DOCX. It worked 
but for some reason the fonts were lost in the DOCX. Your 
publisher will anyway want to have any image files by themselves 
if I'm not mistaken. This latter script lacks some necessary 
documentation, which I have no time to write today. Let me know if 
you are interested.

/bpj


      parent reply	other threads:[~2017-07-24 19:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-22 16:37 Sean Winslow
     [not found] ` <b4abf81b-74e7-490a-8cb9-f6a313c651e0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-23  7:47   ` John MacFarlane
2017-07-23 23:20   ` Sean Winslow
     [not found]     ` <94be1e1e-c49f-4fe6-92fe-4aaf13c083f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-24  8:58       ` John MacFarlane
     [not found]         ` <20170724085825.GA4877-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
2017-07-24  9:17           ` John MacFarlane
2017-07-24 11:34   ` Melroch
2017-07-24 15:01   ` Sean Winslow
     [not found]     ` <261e84b1-9891-465a-a21e-80a61b9e98c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-24 17:25       ` John MacFarlane
     [not found]         ` <20170724172502.GA26245-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
2017-07-25 16:30           ` Sean Winslow
     [not found]             ` <6ac7783a-acbb-4d7f-8ed4-0fcf150d3422-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-25 16:46               ` John MacFarlane
2017-07-24 19:27       ` BP Jonsson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07b8ac66-75e5-04b8-b39c-d60157171baf@gmail.com \
    --to=bpjonsson-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=mrspot-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).