public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
@ 2017-07-22 16:37 Sean Winslow
       [not found] ` <b4abf81b-74e7-490a-8cb9-f6a313c651e0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Winslow @ 2017-07-22 16:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2430 bytes --]



I am trying to convert a dissertation from LaTex to Word, in order to 
comply with publisher requirements. Part of why I used LaTeX is my need for 
complicated diacritics in transcriptions, which XeLaTeX/LuaLaTeX and the 
dblaccent package made easy. Now, when I use pandoc to output to docx, 
certain glyphs are missing. See, for example, \b{q} in Maqala and \v{\d{C}} 
in Chelaqwot:


LuaLaTeX (or XeLaTeX) produces this:

<https://lh3.googleusercontent.com/-Nto3OG7FE8c/WXN9zv4_J3I/AAAAAAAAB3k/D5NTrxzrD2cjsdAQ9cWQUUkFTZzQYoYuwCLcBGAs/s1600/screenshot_latex.png>

But this is what I see in Word:

<https://lh3.googleusercontent.com/-Rjn4Lnkaxx8/WXN9_oO77FI/AAAAAAAAB3o/aa-KxIii0jwM3zt0OO1PqI9RkIDu1TfQgCLcBGAs/s1600/screenshot_word.png>


Here is my MWE: 

%!TEX TS-program = lualatex
%!TEX encoding = UTF-8 Unicode

\documentclass[a4]{memoir}

%packages
\usepackage{fontspec}
\usepackage{dblaccnt}

\usepackage{savesym}
\savesymbol{U}
\savesymbol{T}
\usepackage{semtrans}

%newcommands
\newcommand{\schwa}{ǝ}
\newcommand{\mekele}{M\"{a}\b{q}\"{a}l\"{a}}
\newcommand{\chelekot}{\d{\v{C}}el\={a}qwot S\schwa{}lasse}

\defaultfontfeatures{Mapping=tex-text}
\setromanfont[Mapping=tex-text]{Brill}

\begin{document}

The two research locations visited were \mekele{} and \chelekot{}.\par

\end{document}

and the pandoc command I am using to convert it:

pandoc test.tex \

    --from=latex \

    --to=docx \

    --output=test.docx \

    --latex-engine=lualatex \

    --reference-docx=test_ref.docx \

    -S \

    -R

The reference-docx is just the output, but changed to use Brill as the font.

Is there any way to have pandoc pass along the special diacritics I need? 
Re-doing all of them by hand will be a nightmare, and is a lot of the 
reason I am learning pandoc.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b4abf81b-74e7-490a-8cb9-f6a313c651e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 5853 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found] ` <b4abf81b-74e7-490a-8cb9-f6a313c651e0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-07-23  7:47   ` John MacFarlane
  2017-07-23 23:20   ` Sean Winslow
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: John MacFarlane @ 2017-07-23  7:47 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Pandoc's latex reader doesn't know anything about dblaccent.
But it can understand newcommand definitions, so you may
have some success simply adding the necessary macros to
the beginning of your document.  (Note that pandoc only
understands latex 'newcommand' macros, not plain tex
macros, so you can't just use dblaccnt.sty itself.  So
I'm not sure this helps much.)

+++ Sean Winslow [Jul 22 17 09:37 ]:
>   I am trying to convert a dissertation from LaTex to Word, in order to
>   comply with publisher requirements. Part of why I used LaTeX is my need
>   for complicated diacritics in transcriptions, which XeLaTeX/LuaLaTeX
>   and the dblaccent package made easy. Now, when I use pandoc to output
>   to docx, certain glyphs are missing. See, for example, \b{q} in Maqala
>   and \v{\d{C}} in Chelaqwot:
>
>   LuaLaTeX (or XeLaTeX) produces this:
>
>   [1][screenshot_latex.png]
>
>   But this is what I see in Word:
>
>   [2][screenshot_word.png]
>
>   Here is my MWE:
>   %!TEX TS-program = lualatex
>   %!TEX encoding = UTF-8 Unicode
>   \documentclass[a4]{memoir}
>   %packages
>   \usepackage{fontspec}
>   \usepackage{dblaccnt}
>   \usepackage{savesym}
>   \savesymbol{U}
>   \savesymbol{T}
>   \usepackage{semtrans}
>   %newcommands
>   \newcommand{\schwa}{ǝ}
>   \newcommand{\mekele}{M\"{a}\b{q}\"{a}l\"{a}}
>   \newcommand{\chelekot}{\d{\v{C}}el\={a}qwot S\schwa{}lasse}
>   \defaultfontfeatures{Mapping=tex-text}
>   \setromanfont[Mapping=tex-text]{Brill}
>   \begin{document}
>   The two research locations visited were \mekele{} and \chelekot{}.\par
>   \end{document}
>   and the pandoc command I am using to convert it:
>
>   pandoc test.tex \
>
>       --from=latex \
>
>       --to=docx \
>
>       --output=test.docx \
>
>       --latex-engine=lualatex \
>
>       --reference-docx=test_ref.docx \
>
>       -S \
>
>       -R
>   The reference-docx is just the output, but changed to use Brill as the
>   font.
>   Is there any way to have pandoc pass along the special diacritics I
>   need? Re-doing all of them by hand will be a nightmare, and is a lot of
>   the reason I am learning pandoc.
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [3]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [4]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [5]https://groups.google.com/d/msgid/pandoc-discuss/b4abf81b-74e7-490a-
>   8cb9-f6a313c651e0%40googlegroups.com.
>   For more options, visit [6]https://groups.google.com/d/optout.
>
>References
>
>   1. https://lh3.googleusercontent.com/-Nto3OG7FE8c/WXN9zv4_J3I/AAAAAAAAB3k/D5NTrxzrD2cjsdAQ9cWQUUkFTZzQYoYuwCLcBGAs/s1600/screenshot_latex.png
>   2. https://lh3.googleusercontent.com/-Rjn4Lnkaxx8/WXN9_oO77FI/AAAAAAAAB3o/aa-KxIii0jwM3zt0OO1PqI9RkIDu1TfQgCLcBGAs/s1600/screenshot_word.png
>   3. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   4. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   5. https://groups.google.com/d/msgid/pandoc-discuss/b4abf81b-74e7-490a-8cb9-f6a313c651e0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   6. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20170723074737.GB67976%40macbook-air-2.home.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found] ` <b4abf81b-74e7-490a-8cb9-f6a313c651e0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-07-23  7:47   ` John MacFarlane
@ 2017-07-23 23:20   ` Sean Winslow
       [not found]     ` <94be1e1e-c49f-4fe6-92fe-4aaf13c083f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-07-24 11:34   ` Melroch
  2017-07-24 15:01   ` Sean Winslow
  3 siblings, 1 reply; 11+ messages in thread
From: Sean Winslow @ 2017-07-23 23:20 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1436 bytes --]

What is the preferred way to pass an if conditional when using pandoc to 
write the file? I can make the following code work with etoolbox in 
LuaLaTeX, but the first line is missing the special unicode characters 
after going through pandoc, which I assume means I cannot use etoolbox 
conditionals?

%!TEX TS-program = LuaLaTeX
%!TEX encoding = UTF-8 Unicode

\documentclass[a4paper, 12pt] {article}%

\usepackage{fontspec}%
\usepackage{etoolbox}%
\setmainfont{Brill}

\renewcommand{\d}[1]{%
  \ifstrequal{#1}{h}{ḥ}{%
    \ifstrequal{#1}{\v{C}}{Č̣}{}}}
  
\renewcommand{\=}[1]{%
  \ifstrequal{#1}{a}{ä}{}%
  }

\begin{document}

The research locations visited were Ba\d{h}ǝr Dar and \d{\v{C}}el\={a}qwot 
Sǝlasse.\par
The research locations visited were Baḥǝr Dar and Č̣eläqwot Sǝlasse.\par

\end{document}



-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/94be1e1e-c49f-4fe6-92fe-4aaf13c083f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3571 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found]     ` <94be1e1e-c49f-4fe6-92fe-4aaf13c083f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-07-24  8:58       ` John MacFarlane
       [not found]         ` <20170724085825.GA4877-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: John MacFarlane @ 2017-07-24  8:58 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Can you say more about what you're doing?  Are you
converting this latex to some other format?  If so,
which?

If you're converting this to another format, then
the answer is that pandoc's latex reader doesn't
know about \ifstrequal. This could be added, feel free
to submit an issue on the github tracker.

+++ Sean Winslow [Jul 23 17 16:20 ]:
>   What is the preferred way to pass an if conditional when using pandoc
>   to write the file? I can make the following code work with etoolbox in
>   LuaLaTeX, but the first line is missing the special unicode characters
>   after going through pandoc, which I assume means I cannot use etoolbox
>   conditionals?
>   %!TEX TS-program = LuaLaTeX
>   %!TEX encoding = UTF-8 Unicode
>   \documentclass[a4paper, 12pt] {article}%
>   \usepackage{fontspec}%
>   \usepackage{etoolbox}%
>   \setmainfont{Brill}
>   \renewcommand{\d}[1]{%
>     \ifstrequal{#1}{h}{ḥ}{%
>       \ifstrequal{#1}{\v{C}}{Č̣}{}}}
>
>   \renewcommand{\=}[1]{%
>     \ifstrequal{#1}{a}{ä}{}%
>     }
>   \begin{document}
>   The research locations visited were Ba\d{h}ǝr Dar and
>   \d{\v{C}}el\={a}qwot Sǝlasse.\par
>   The research locations visited were Baḥǝr Dar and Č̣eläqwot
>   Sǝlasse.\par
>   \end{document}
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/94be1e1e-c49f-4fe6-
>   92fe-4aaf13c083f3%40googlegroups.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/94be1e1e-c49f-4fe6-92fe-4aaf13c083f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20170724085825.GA4877%40Johns-MBP.home.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found]         ` <20170724085825.GA4877-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
@ 2017-07-24  9:17           ` John MacFarlane
  0 siblings, 0 replies; 11+ messages in thread
From: John MacFarlane @ 2017-07-24  9:17 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I just added support for \ifstrequal in the master branch,
if you want to test.

+++ John MacFarlane [Jul 24 17 10:58 ]:
>Can you say more about what you're doing?  Are you
>converting this latex to some other format?  If so,
>which?
>
>If you're converting this to another format, then
>the answer is that pandoc's latex reader doesn't
>know about \ifstrequal. This could be added, feel free
>to submit an issue on the github tracker.
>
>+++ Sean Winslow [Jul 23 17 16:20 ]:
>>  What is the preferred way to pass an if conditional when using pandoc
>>  to write the file? I can make the following code work with etoolbox in
>>  LuaLaTeX, but the first line is missing the special unicode characters
>>  after going through pandoc, which I assume means I cannot use etoolbox
>>  conditionals?
>>  %!TEX TS-program = LuaLaTeX
>>  %!TEX encoding = UTF-8 Unicode
>>  \documentclass[a4paper, 12pt] {article}%
>>  \usepackage{fontspec}%
>>  \usepackage{etoolbox}%
>>  \setmainfont{Brill}
>>  \renewcommand{\d}[1]{%
>>    \ifstrequal{#1}{h}{ḥ}{%
>>      \ifstrequal{#1}{\v{C}}{Č̣}{}}}
>>
>>  \renewcommand{\=}[1]{%
>>    \ifstrequal{#1}{a}{ä}{}%
>>    }
>>  \begin{document}
>>  The research locations visited were Ba\d{h}ǝr Dar and
>>  \d{\v{C}}el\={a}qwot Sǝlasse.\par
>>  The research locations visited were Baḥǝr Dar and Č̣eläqwot
>>  Sǝlasse.\par
>>  \end{document}
>>
>>  --
>>  You received this message because you are subscribed to the Google
>>  Groups "pandoc-discuss" group.
>>  To unsubscribe from this group and stop receiving emails from it, send
>>  an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>  To post to this group, send email to
>>  [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>  To view this discussion on the web visit
>>  [3]https://groups.google.com/d/msgid/pandoc-discuss/94be1e1e-c49f-4fe6-
>>  92fe-4aaf13c083f3%40googlegroups.com.
>>  For more options, visit [4]https://groups.google.com/d/optout.
>>
>>References
>>
>>  1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>  2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>  3. https://groups.google.com/d/msgid/pandoc-discuss/94be1e1e-c49f-4fe6-92fe-4aaf13c083f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>>  4. https://groups.google.com/d/optout
>
>-- 
>You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20170724085825.GA4877%40Johns-MBP.home.
>For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20170724091721.GA6767%40Johns-MBP.home.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found] ` <b4abf81b-74e7-490a-8cb9-f6a313c651e0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-07-23  7:47   ` John MacFarlane
  2017-07-23 23:20   ` Sean Winslow
@ 2017-07-24 11:34   ` Melroch
  2017-07-24 15:01   ` Sean Winslow
  3 siblings, 0 replies; 11+ messages in thread
From: Melroch @ 2017-07-24 11:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3826 bytes --]

Word was quite good at stacking diacritics already about a decade ago when
I last looked, so if you just succeed in writing a newcommand which unpacks
the LaTeX diacritics into the proper Unicode diacritics you should be good.
It should be possible at least to combine some newcommands with a filter
which translates the accents. Feel free to contact me offlist and I'll try
to work something out.

/bpj

Den 22 jul 2017 18:38 skrev "Sean Winslow" <mrspot-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:

> I am trying to convert a dissertation from LaTex to Word, in order to
> comply with publisher requirements. Part of why I used LaTeX is my need for
> complicated diacritics in transcriptions, which XeLaTeX/LuaLaTeX and the
> dblaccent package made easy. Now, when I use pandoc to output to docx,
> certain glyphs are missing. See, for example, \b{q} in Maqala and \v{\d{C}}
> in Chelaqwot:
>
>
> LuaLaTeX (or XeLaTeX) produces this:
>
>
> <https://lh3.googleusercontent.com/-Nto3OG7FE8c/WXN9zv4_J3I/AAAAAAAAB3k/D5NTrxzrD2cjsdAQ9cWQUUkFTZzQYoYuwCLcBGAs/s1600/screenshot_latex.png>
>
> But this is what I see in Word:
>
>
> <https://lh3.googleusercontent.com/-Rjn4Lnkaxx8/WXN9_oO77FI/AAAAAAAAB3o/aa-KxIii0jwM3zt0OO1PqI9RkIDu1TfQgCLcBGAs/s1600/screenshot_word.png>
>
>
> Here is my MWE:
>
> %!TEX TS-program = lualatex
> %!TEX encoding = UTF-8 Unicode
>
> \documentclass[a4]{memoir}
>
> %packages
> \usepackage{fontspec}
> \usepackage{dblaccnt}
>
> \usepackage{savesym}
> \savesymbol{U}
> \savesymbol{T}
> \usepackage{semtrans}
>
> %newcommands
> \newcommand{\schwa}{ǝ}
> \newcommand{\mekele}{M\"{a}\b{q}\"{a}l\"{a}}
> \newcommand{\chelekot}{\d{\v{C}}el\={a}qwot S\schwa{}lasse}
>
> \defaultfontfeatures{Mapping=tex-text}
> \setromanfont[Mapping=tex-text]{Brill}
>
> \begin{document}
>
> The two research locations visited were \mekele{} and \chelekot{}.\par
>
> \end{document}
>
> and the pandoc command I am using to convert it:
>
> pandoc test.tex \
>
>     --from=latex \
>
>     --to=docx \
>
>     --output=test.docx \
>
>     --latex-engine=lualatex \
>
>     --reference-docx=test_ref.docx \
>
>     -S \
>
>     -R
>
> The reference-docx is just the output, but changed to use Brill as the
> font.
>
> Is there any way to have pandoc pass along the special diacritics I need?
> Re-doing all of them by hand will be a nightmare, and is a lot of the
> reason I am learning pandoc.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/pandoc-discuss/b4abf81b-74e7-490a-8cb9-f6a313c651e0%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/b4abf81b-74e7-490a-8cb9-f6a313c651e0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDmsP1O67vC%3DxV5bUmpcJ%2BXh-AaV5rUN7JJ%2BMFE_d3Osg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 8723 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found] ` <b4abf81b-74e7-490a-8cb9-f6a313c651e0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-07-24 11:34   ` Melroch
@ 2017-07-24 15:01   ` Sean Winslow
       [not found]     ` <261e84b1-9891-465a-a21e-80a61b9e98c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  3 siblings, 1 reply; 11+ messages in thread
From: Sean Winslow @ 2017-07-24 15:01 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5035 bytes --]

John,

Thank you for the quick response, and for adding that! I currently have the 
release branch of pandoc installed from homebrew, but I will remove it and 
compile from the master branch late this evening to test out the solution.

Can you say more about what you're doing?  Are you 
converting this latex to some other format?  If so, 
which? 


I wrote a dissertation on Ethiopian scribal practices which uses a lot of 
LaTeX features (fig, subfig, pdfparcols, tikz, datatool, special 
diacritics, font-switching for Ethiopic, Arabic, greek). It has been 
accepted (with revisions) for publication, but I need to get the file into 
docx for the publisher, so that it fits the workflow they have for 
inDesign. Luckily, they do not want the images in, so I am going to write a 
macro that changes figures to the figure name and caption, and I realize 
all the parcolumns/tikz/datatool stuff is probably a complete loss and 
needs to be redone by hand, but there is so much Ethiopic text and 
transcribed Ethiopic that it would be a nightmare to replace it all by 
hand, so I am very keen to transfer that over automatically. After I 
recompile from the master branch, I will still be trying to figure out 
these issues:

1. As written, it is also highly-referenced, but labels do not seem to be 
transferring over--is there a procedure for making \label and \ref work, or 
do I need to fix every one by hand?

2. In LaTeX, I have a 
\renewcommand{\includegraphics}[2][]{%
    {(((\url{#2})))}% print file name in a small box with triple parens
}
which lists the name of the file and the caption. In the pandoc-created 
docx, the caption and the optional table of figures caption print twice, 
without the filename. Is there something wrong with the syntax of my 
renewcommand?

3. The Ethiopic text transfers over correctly, but since my main font 
(Brill) does not contain Ethiopic glyphs, I have a
\newfontfamily\ethiopicfont[Script=Ethiopic]{Abyssinica SIL}
set up. In the docx, I see blocks, which when I change the font by hand to 
Abyssinica, render correctly. What command do I need to pass to pandoc to 
get it to set the ethiopicfont in a different font?

BPJ,

I know nothing about filters in pandoc--what would you suggest as a 
starting place to learn more? Would these potentially help me with any of 
the issues above?

Thanks,

-Sean

On Saturday, July 22, 2017 at 12:37:26 PM UTC-4, Sean Winslow wrote:
>
> I am trying to convert a dissertation from LaTex to Word, in order to 
> comply with publisher requirements. Part of why I used LaTeX is my need for 
> complicated diacritics in transcriptions, which XeLaTeX/LuaLaTeX and the 
> dblaccent package made easy. Now, when I use pandoc to output to docx, 
> certain glyphs are missing. See, for example, \b{q} in Maqala and \v{\d{C}} 
> in Chelaqwot:
>
>
> LuaLaTeX (or XeLaTeX) produces this:
>
>
> <https://lh3.googleusercontent.com/-Nto3OG7FE8c/WXN9zv4_J3I/AAAAAAAAB3k/D5NTrxzrD2cjsdAQ9cWQUUkFTZzQYoYuwCLcBGAs/s1600/screenshot_latex.png>
>
> But this is what I see in Word:
>
>
> <https://lh3.googleusercontent.com/-Rjn4Lnkaxx8/WXN9_oO77FI/AAAAAAAAB3o/aa-KxIii0jwM3zt0OO1PqI9RkIDu1TfQgCLcBGAs/s1600/screenshot_word.png>
>
>
> Here is my MWE: 
>
> %!TEX TS-program = lualatex
> %!TEX encoding = UTF-8 Unicode
>
> \documentclass[a4]{memoir}
>
> %packages
> \usepackage{fontspec}
> \usepackage{dblaccnt}
>
> \usepackage{savesym}
> \savesymbol{U}
> \savesymbol{T}
> \usepackage{semtrans}
>
> %newcommands
> \newcommand{\schwa}{ǝ}
> \newcommand{\mekele}{M\"{a}\b{q}\"{a}l\"{a}}
> \newcommand{\chelekot}{\d{\v{C}}el\={a}qwot S\schwa{}lasse}
>
> \defaultfontfeatures{Mapping=tex-text}
> \setromanfont[Mapping=tex-text]{Brill}
>
> \begin{document}
>
> The two research locations visited were \mekele{} and \chelekot{}.\par
>
> \end{document}
>
> and the pandoc command I am using to convert it:
>
> pandoc test.tex \
>
>     --from=latex \
>
>     --to=docx \
>
>     --output=test.docx \
>
>     --latex-engine=lualatex \
>
>     --reference-docx=test_ref.docx \
>
>     -S \
>
>     -R
>
> The reference-docx is just the output, but changed to use Brill as the 
> font.
>
> Is there any way to have pandoc pass along the special diacritics I need? 
> Re-doing all of them by hand will be a nightmare, and is a lot of the 
> reason I am learning pandoc.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/261e84b1-9891-465a-a21e-80a61b9e98c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 8906 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found]     ` <261e84b1-9891-465a-a21e-80a61b9e98c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-07-24 17:25       ` John MacFarlane
       [not found]         ` <20170724172502.GA26245-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
  2017-07-24 19:27       ` BP Jonsson
  1 sibling, 1 reply; 11+ messages in thread
From: John MacFarlane @ 2017-07-24 17:25 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Sean Winslow [Jul 24 17 08:01 ]:
>   John,
>   Thank you for the quick response, and for adding that! I currently have
>   the release branch of pandoc installed from homebrew, but I will remove
>   it and compile from the master branch late this evening to test out the
>   solution.

If you wait a bit, there should be a nightly on
the pandoc-nightly repository.

>   Can you say more about what you're doing?  Are you
>
>   converting this latex to some other format?  If so,
>
>   which?
>
>   I wrote a dissertation on Ethiopian scribal practices which uses a lot
>   of LaTeX features (fig, subfig, pdfparcols, tikz, datatool, special
>   diacritics, font-switching for Ethiopic, Arabic, greek). It has been
>   accepted (with revisions) for publication, but I need to get the file
>   into docx for the publisher, so that it fits the workflow they have for
>   inDesign. Luckily, they do not want the images in, so I am going to
>   write a macro that changes figures to the figure name and caption, and
>   I realize all the parcolumns/tikz/datatool stuff is probably a complete
>   loss and needs to be redone by hand, but there is so much Ethiopic text

Not necessarily.  You can use a filter to do tikz images.
See https://github.com/sergiocorreia/panflute/blob/master/examples/panflute/tikz.py

>   and transcribed Ethiopic that it would be a nightmare to replace it all
>   by hand, so I am very keen to transfer that over automatically. After I
>   recompile from the master branch, I will still be trying to figure out
>   these issues:
>   1. As written, it is also highly-referenced, but labels do not seem to
>   be transferring over--is there a procedure for making \label and \ref
>   work, or do I need to fix every one by hand?

These are not supported at this time.  We'd have to
reimplement the whole latex numbering/label system. That's
on the TODO list.

>   2. In LaTeX, I have a
>   \renewcommand{\includegraphics}[2][]{%
>       {(((\url{#2})))}% print file name in a small box with triple parens
>   }
>   which lists the name of the file and the caption. In the pandoc-created
>   docx, the caption and the optional table of figures caption print
>   twice, without the filename. Is there something wrong with the syntax
>   of my renewcommand?

See if the new version does better.  It parses macros
better.

>   3. The Ethiopic text transfers over correctly, but since my main font
>   (Brill) does not contain Ethiopic glyphs, I have a
>   \newfontfamily\ethiopicfont[Script=Ethiopic]{Abyssinica SIL}
>   set up. In the docx, I see blocks, which when I change the font by hand
>   to Abyssinica, render correctly. What command do I need to pass to
>   pandoc to get it to set the ethiopicfont in a different font?

Pandoc just does structural elements, it won't change the
font in the docx.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found]     ` <261e84b1-9891-465a-a21e-80a61b9e98c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2017-07-24 17:25       ` John MacFarlane
@ 2017-07-24 19:27       ` BP Jonsson
  1 sibling, 0 replies; 11+ messages in thread
From: BP Jonsson @ 2017-07-24 19:27 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Sean Winslow

Den 2017-07-24 kl. 17:01, skrev Sean Winslow:

 > BPJ,
 >
 > I know nothing about filters in pandoc--what would you suggest as
 > a starting place to learn more? Would these potentially help me
 > with any of the issues above?
 >

Actually the main problem with your LaTeX is that you are using 
the legacy LaTeX accent commands instead of actual Unicode characters.
For one thing you shouldn't do that, because the main reason for 
using XeTeX or LuaTeX is that they handle Unicode natively.
Secondly it is exactly the legacy accent commands which throw 
Pandoc in your MWE. Once I had converted the legacy commands to 
their Unicode equivalents your Pandoc converted your MWE to DOCX 
just fine. (As I don't have Word I've checked it in LibreOffice, 
where it looks OK.) Luckily you don't need to convert all those 
legacy commands by hand. There is a Perl module LaTeX::Decode 
which does that for you. Unfortunately there is a bug in the 
command line script coming with the module, but I have written my 
own CLI script which doesn't have that bug. :-)

Since you are on a Mac you should have a new enough version of 
perl installed already.  All you should need to do is to download 
my script from <https://git.io/v7to6> unpack the contents into the 
same directory (aka folder) as your original LaTeX file and run 
the following commands:

     cpan App::cpanminus

     cpanm LaTeX::Decode Encode Unicode::Normalize Getopt::Long 
Pod::Usage

     perl ltx2utf8.pl nameofyourlatexfile.tex | pandoc -r latex -o 
nameofyourdocxfile.docx

That will at least take care of the diacritics. Other fancy things 
you have used like tikz will need to be addressed separately. I 
have a somewhat working script to extract tikzpictures from a 
LaTeX file, compile each to a PDF and print out the LaTeX file 
with each `\begin{tikzpicture}...\end{tikzpicture}` replaced with 
a `\includegraphics{...}` pointing to the right PDF file. I just 
tried converting a LaTeX file thus processed to DOCX. It worked 
but for some reason the fonts were lost in the DOCX. Your 
publisher will anyway want to have any image files by themselves 
if I'm not mistaken. This latter script lacks some necessary 
documentation, which I have no time to write today. Let me know if 
you are interested.

/bpj


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found]         ` <20170724172502.GA26245-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
@ 2017-07-25 16:30           ` Sean Winslow
       [not found]             ` <6ac7783a-acbb-4d7f-8ed4-0fcf150d3422-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Winslow @ 2017-07-25 16:30 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1291 bytes --]

On Monday, July 24, 2017 at 1:25:19 PM UTC-4, John MacFarlane wrote:
>
> If you wait a bit, there should be a nightly on 
> the pandoc-nightly repository. 
>

John, when I try to use the nightly, with the command:

$ /Users/MyUserNameHere/Downloads/pandoc-osx-862d92f/pandoc  test2.tex \

   --from=latex \

   --to=docx \

   --output=test2.docx \

   --latex-engine=lualatex

 I get the error

[warning] Could not load include file 'fontspec.sty' at line 6 column 22

[warning] Could not load include file 'etoolbox.sty' at line 7 column 22

These warnings do not come if I use the non-nightly version, and I believe 
that both of these packages are still needed? Am I misunderstanding how to 
use the nightly build?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6ac7783a-acbb-4d7f-8ed4-0fcf150d3422%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 7274 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Pandoc selectively transfers glyphs from LuaLaTeX to DOCX
       [not found]             ` <6ac7783a-acbb-4d7f-8ed4-0fcf150d3422-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-07-25 16:46               ` John MacFarlane
  0 siblings, 0 replies; 11+ messages in thread
From: John MacFarlane @ 2017-07-25 16:46 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Sean Winslow [Jul 25 17 09:30 ]:
>   [warning] Could not load include file 'fontspec.sty' at line 6 column
>   22
>   [warning] Could not load include file 'etoolbox.sty' at line 7 column
>   22

>   These warnings do not come if I use the non-nightly version, and I
>   believe that both of these packages are still needed? Am I
>   misunderstanding how to use the nightly build?

The same thing is happening; it's just that the nightly
actually tells you that it's not loading these sty files.
If you put them in the working directory, pandoc would
try to load them, but it wouldn't be able to make much
of either, since they use plain tex macros.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-07-25 16:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-22 16:37 Pandoc selectively transfers glyphs from LuaLaTeX to DOCX Sean Winslow
     [not found] ` <b4abf81b-74e7-490a-8cb9-f6a313c651e0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-23  7:47   ` John MacFarlane
2017-07-23 23:20   ` Sean Winslow
     [not found]     ` <94be1e1e-c49f-4fe6-92fe-4aaf13c083f3-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-24  8:58       ` John MacFarlane
     [not found]         ` <20170724085825.GA4877-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
2017-07-24  9:17           ` John MacFarlane
2017-07-24 11:34   ` Melroch
2017-07-24 15:01   ` Sean Winslow
     [not found]     ` <261e84b1-9891-465a-a21e-80a61b9e98c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-24 17:25       ` John MacFarlane
     [not found]         ` <20170724172502.GA26245-l/d5Ua9yGnxXsXJlQylH7w@public.gmane.org>
2017-07-25 16:30           ` Sean Winslow
     [not found]             ` <6ac7783a-acbb-4d7f-8ed4-0fcf150d3422-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-25 16:46               ` John MacFarlane
2017-07-24 19:27       ` BP Jonsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).