From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/17952 Path: news.gmane.org!.POSTED!not-for-mail From: BP Jonsson Newsgroups: gmane.text.pandoc Subject: Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX Date: Mon, 24 Jul 2017 21:27:28 +0200 Message-ID: <07b8ac66-75e5-04b8-b39c-d60157171baf@gmail.com> References: <261e84b1-9891-465a-a21e-80a61b9e98c0@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed X-Trace: blaine.gmane.org 1500924455 20314 195.159.176.226 (24 Jul 2017 19:27:35 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 24 Jul 2017 19:27:35 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org, Sean Winslow Original-X-From: pandoc-discuss+bncBDIY76M674FRBJUU3HFQKGQECAERCTA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Jul 24 21:27:30 2017 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-qk0-f187.google.com ([209.85.220.187]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dZj0r-0004zB-PX for gtp-pandoc-discuss@m.gmane.org; Mon, 24 Jul 2017 21:27:29 +0200 Original-Received: by mail-qk0-f187.google.com with SMTP id d136sf6150577qkg.1 for ; Mon, 24 Jul 2017 12:27:35 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1500924455; cv=pass; d=google.com; s=arc-20160816; b=mhw+eO4awopgxfg8loK++NYYe03iH5bkVeAkb028qgErGoogIOQvqW585m+uvCxL0j 37bN9Q9e99DiOLCKTQxtzDHu082x69Mr9KzYxfFq6pB3eOANrdSxngjefTF65mpK3I+W Y7cikDosOc+YhYtg9oyizvJoSjFyDw/KA11wLVYc8yiP/xY2b5wddibdUfHIeGlgAeMi KTBR/81DA1DWoBlf7mPRWu1YM0Kiqq7Hja32XlbEerFtKDSNddt7sA/9tdmoA9cQQPWX NCPE2X6js6cbwuX/ncM1vtxctdcwjjtksUQ1hGUosx+ujjoB9DVnUyVHR2ZL6celA1w1 1+5Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :to:subject:arc-authentication-results:arc-message-signature:sender :dkim-signature:arc-authentication-results; bh=9VbRj5Kj8/XoEa/7FH6WNhBMFMgDJcJ2t90aPtIuhhY=; b=a7w4GgH6l0LNuY5qIu7PtmcA78id01FGp9Q1JL6cxwyyPKix4Coa9Wf6d5A4JYT4xK qWJKu4sDlCEo2WjnMCE23pSnRbYjidwclBTv8Y3h7JQbD4ZwQg2reQpx3KIcRuc7r0oV RBFK2tR62sKHULTIpEVEeAbZ47BGjFwON65+SGZFJLjJqDAU1PcVlFYtJ0QDXgyqJD4O hmGVL4eglshl0cazUnfJ/hf+zu0MekWPD7yGAxa5u1EpHe7/ehnGNCHmfhAY8bGOQpnS kg6Kdlo2zPIkcMg/fcksgJ4j2Q8/S5byLqSPAi/IsD5CL6KgZ2ayjJP/G4z33368Sxlr bWs ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=softfail (google.com: domain of transitioning bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org does not designate 138.128.164.243 as permitted sender) smtp.mailfrom=bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=9VbRj5Kj8/XoEa/7FH6WNhBMFMgDJcJ2t90aPtIuhhY=; b=k4dGdxDqu67mlxkvbdLa0Ccfk4X24uIfPbiWsk8g68oT9D6mFy/J11rSShlfYHIXSe 2K2DVPuxZ/Vkt/qiV/POuxvT+E2ROEpKWZ6CNeBjz/h3zQ9V6GKR3G+8UCJueL13Oul8 htTrPCn4XpNv6AzKFuo+2a73D7BVFEBFTud+R9tyJTQQV+9nio75VhuGg08febvRGGYq lemUGrtp3XTHiQfZs9Kios8jugvbDxkhXWG1CznQj0wYQm7e3CnAPipxZLDAkfzS+5Nf 6jp+H3FlnKYjXCwdhoM2RWww78gyKHOhWMhVb97cUodrP8qtC/D2LuIklxjDjk92d/+7 NsHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=9VbRj5Kj8/XoEa/7FH6WNhBMFMgDJcJ2t90aPtIuhhY=; b=Hc6gaE92jiJcufQOvF552qQDkueQZAad0NPfMiEUCsFkVrE20cf6MdMymjCazTrnE+ ohCxbRPmxmE8rqjtVEAfoGwTYsefIGObX0Hs+GEMlmjUJHKC7rMdhZoJahjI3Lftv3Qh 3Qyf5yEBBqbRep5Ppt2/mGjXhdXv42xd8jlIa4LkwfrlJSigSMN3w1IcjoPTU3RmaAzH n0rlq/7wWNz8hUecmn3GnNF5ry4N2QkbHPAguF0N96pCTvJ062/T67JOPak5NEoDdzeI nwGR2dwxLWfNRNsaflx8vX22Qp4fSFz9AoBEickudzEzKe2FsrY7Jl/tUqQJEqzm3LLS Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AIVw111CHhLEDzE1un9qWBOxpnOicQNILqQALhfssNLoAsU/P+nNIa0D i/I7pAVT/EnTRg== X-Received: by 10.36.26.69 with SMTP id 66mr310275iti.12.1500924455471; Mon, 24 Jul 2017 12:27:35 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.107.46.158 with SMTP id u30ls12409482iou.20.gmail; Mon, 24 Jul 2017 12:27:34 -0700 (PDT) X-Received: by 10.99.106.5 with SMTP id f5mr12504743pgc.90.1500924454688; Mon, 24 Jul 2017 12:27:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1500924454; cv=none; d=google.com; s=arc-20160816; b=O0tzek7Pd28Vbd9Dcw46muCudHHLV4I1foT/6ioFyivzTSLC5XhtzSBOlk1oDS0uan PoxXsJRAuKrk8hODzxexNhsZ5t1WrVftpucRudbbWFD3eRYGSqJa0ZoagV6hlLvooeCU haxR8wV65F3oJXZRHAFjkPFZWuH4qCqJYI25aZ3oBblRB4s9k8Xz69VhUgeTibR3r/9i cNrowMXt8frgHDAnilmPXW8QSjDTI7jLoO+i4C+4mFXa7UuH2QeZvuQ5sp69uQwEXArL tyLaG1iRKhXEXvGTWe25x2x7FeWA0kUnGImH2NPbsmj7/NF4zP0HIhsBQGZ8O6oAwVqh vPSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:to:subject :arc-authentication-results; bh=V0oc9UxwMCZBGPJw/trxCtrUJvmT+YUWeIEW2naGBtc=; b=aJK073g42PDzJ0HuRq0nzpRFi80YvY4Kk3OEOfS9HckihjsVSVmL+8dfrg2ckHfnLm b16OGP0I3/y058Lzk9XapFTlMy/7KGtflXh5gIhGd30/BhyIDyRY5ZMnzDEBnBJEo4Db sfz/PpV/Hn579oIFMWno+9GBHtI9eQsK+/zKeHxeaAi8g5OKZ5AD1Scw/QmHruu6dq4g ewI4i/IHJwDzmaRGmdyBIxkkcr/418I/91Rk01fMk7Bbp9ZHpri+e65ozg4jyvYlo0JA 1+nCKIsn9TWdHJqXaI1R19vU8KJYrDH18cFcLNUhKIH1YeRiC7bX3Hm97eQh7Y0VW9QW pZLw== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=softfail (google.com: domain of transitioning bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org does not designate 138.128.164.243 as permitted sender) smtp.mailfrom=bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Original-Received: from manu6.manufrog.com (ns11.manufrog.com. [138.128.164.243]) by gmr-mx.google.com with ESMTPS id j134si1412102vke.5.2017.07.24.12.27.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jul 2017 12:27:34 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org does not designate 138.128.164.243 as permitted sender) client-ip=138.128.164.243; Original-Received: from [178.249.150.162] (port=42045 helo=[192.168.1.4]) by manu6.manufrog.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89) (envelope-from ) id 1dZj0v-003Rlo-RQ; Mon, 24 Jul 2017 21:27:33 +0200 In-Reply-To: <261e84b1-9891-465a-a21e-80a61b9e98c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Content-Language: sv-SE X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - manu6.manufrog.com X-AntiAbuse: Original Domain - googlegroups.com X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - gmail.com X-Get-Message-Sender-Via: manu6.manufrog.com: authenticated_id: bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org X-Authenticated-Sender: manu6.manufrog.com: bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org X-Source: X-Source-Args: X-Source-Dir: X-Original-Sender: bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=softfail (google.com: domain of transitioning bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org does not designate 138.128.164.243 as permitted sender) smtp.mailfrom=bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:17952 Archived-At: Den 2017-07-24 kl. 17:01, skrev Sean Winslow: > BPJ, > > I know nothing about filters in pandoc--what would you suggest as > a starting place to learn more? Would these potentially help me > with any of the issues above? > Actually the main problem with your LaTeX is that you are using the legacy LaTeX accent commands instead of actual Unicode characters. For one thing you shouldn't do that, because the main reason for using XeTeX or LuaTeX is that they handle Unicode natively. Secondly it is exactly the legacy accent commands which throw Pandoc in your MWE. Once I had converted the legacy commands to their Unicode equivalents your Pandoc converted your MWE to DOCX just fine. (As I don't have Word I've checked it in LibreOffice, where it looks OK.) Luckily you don't need to convert all those legacy commands by hand. There is a Perl module LaTeX::Decode which does that for you. Unfortunately there is a bug in the command line script coming with the module, but I have written my own CLI script which doesn't have that bug. :-) Since you are on a Mac you should have a new enough version of perl installed already. All you should need to do is to download my script from unpack the contents into the same directory (aka folder) as your original LaTeX file and run the following commands: cpan App::cpanminus cpanm LaTeX::Decode Encode Unicode::Normalize Getopt::Long Pod::Usage perl ltx2utf8.pl nameofyourlatexfile.tex | pandoc -r latex -o nameofyourdocxfile.docx That will at least take care of the diacritics. Other fancy things you have used like tikz will need to be addressed separately. I have a somewhat working script to extract tikzpictures from a LaTeX file, compile each to a PDF and print out the LaTeX file with each `\begin{tikzpicture}...\end{tikzpicture}` replaced with a `\includegraphics{...}` pointing to the right PDF file. I just tried converting a LaTeX file thus processed to DOCX. It worked but for some reason the fonts were lost in the DOCX. Your publisher will anyway want to have any image files by themselves if I'm not mistaken. This latter script lacks some necessary documentation, which I have no time to write today. Let me know if you are interested. /bpj