public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Converting HTML (or Dokuwiki) to LaTeX while preserving $...$ and backslash environments
@ 2013-10-20 19:14 TSofM
  0 siblings, 0 replies; only message in thread
From: TSofM @ 2013-10-20 19:14 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 4784 bytes --]

I would like to convert from Dokuwiki format (in a .txt file) to latex. The 
guide<http://donaldmerand.com/code/2012/07/20/how-i-actually-convert-dokuwiki-to-latex.html>recommends using Dokuwiki's parser to first convert .txt to HTML. Then 
Pandoc can be used to convert the HTML to latex. This works well for most 
of Dokuwiki's syntax.

However, if MathJax is used (MathJax is basically written as the typical 
latex syntax), then this does not work. In particular, I need a few 
commands to be unchanged in the final .tex file. In particular, this 
includes dollar-sign environments, and also backslash commands. Here is an 
example Dokuwiki .txt file we want to convert:

====== Mathjax test ======

Here is a test for an in-line equation $x^2 + y^2 = z^2$, and also a 
displayed equation 
\begin{equation} \label{myeq}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}

I can then reference the equation via \eqref{myeq}. I can also dostandalone mathematics like 
\[
a^2 + b^2 = c^2.
\]

There are other equation environments. Here is one that involves cases:
\begin{multline}
f(x) = \begin{cases}a \\ b\end{cases} \\ 
g(x).
\end{multline}


Once you run this through the Dokuwiki parser using doku2html test.txt > 
test.html, the following is returned:
<h1 class="sectionedit1" id="mathjax_test">Mathjax test</h1>
<div class="level1">

<p>
Here is a test for an in-line equation $x^2 + y^2 = z^2$, and also a 
displayed equation 
\begin{equation} \label{myeq}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}
</p>

<p>
I can then reference the equation via \eqref{myeq}. I can also do 
standalone mathematics like 
\[
a^2 + b^2 = c^2.
\]
</p>

<p>
There are other equation environments. Here is one that involves cases:
\begin{multline}
f(x) = \begin{cases}a \\ b\end{cases} \\ 
g(x).
\end{multline}
</p>
</div>


Note that latex is preserved perfectly. Once we run this through Pandoc, 
though, the relevant section is changed to:
\section{Mathjax test}\label{mathjax_test}

Here is a test for an in-line equation \$x\^{}2 + y\^{}2 = z\^{}2\$, and
also a displayed equation \textbackslash{}begin\{equation\}
\textbackslash{}label\{myeq\} f(x) =
\textbackslash{}sum\_\{n=0\}\^{}\textbackslash{}infty
\textbackslash{}frac\{f\^{}\{(n)\}(0)\}\{n!\} x\^{}n.
\textbackslash{}end\{equation\}

I can then reference the equation via \textbackslash{}eqref\{myeq\}. I
can also do standalone mathematics like \textbackslash{}{[} a\^{}2 +
b\^{}2 = c\^{}2. \textbackslash{}{]}

There are other equation environments. Here is one that involves cases:
\textbackslash{}begin\{multline\} f(x) = \textbackslash{}begin\{cases\}a
\textbackslash{}\textbackslash{} b\textbackslash{}end\{cases\}
\textbackslash{}\textbackslash{} g(x). \textbackslash{}end\{multline\}


We want to stop Pandoc from processing the dollar-sign environments and 
also the backslash environments. How do we do that? 

*Haskell fix for dollar signs*

Jon MacFarlane recommended a Haskell fix for the dollar signs<http://stackoverflow.com/questions/11338049/how-to-convert-html-with-mathjax-into-latex-using-pandoc>. 
Note that you can compile his code with some minor changes to syntax<http://stackoverflow.com/questions/19472828/unable-to-compile-haskell-program-couldnt-match-errors>
.
import Text.Pandoc.JSON
import Text.Pandoc

main = toJSONFilter fixmath

fixmath :: Block -> Block
fixmath = bottomUp fixmathBlock . bottomUp fixmathInline

fixmathInline :: Inline -> Inline
fixmathInline (RawInline (Format "html") ('<':'!':'-':'-':'M':'A':'T':'H':xs
)) =
  RawInline (Format "tex") $ take (length xs - 3) xs
fixmathInline x = x

fixmathBlock :: Block -> Block
fixmathBlock (RawBlock (Format "html") ('<':'!':'-':'-':'M':'A':'T':'H':xs)) 
=
  RawBlock (Format "tex") $ take (length xs - 3) xs
fixmathBlock x = x


This does a good job of preserving the dollar signs. Two questions remain: 

   1. Can someone help to add to the script so that the backslash 
   environments are preserved as well? I'm afraid I don't know much about 
   Haskell, so I try to follow along as best I can. 
   2. Is there an easier solution that you can spot to convert from 
   Dokuwiki to latex while preserving MathJax?
   

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/897da56d-fc07-4f5c-92e6-12fdbc4faecb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 38565 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2013-10-20 19:14 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-20 19:14 Converting HTML (or Dokuwiki) to LaTeX while preserving $...$ and backslash environments TSofM

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).