* Converting HTML (or Dokuwiki) to LaTeX while preserving $...$ and backslash environments
@ 2013-10-20 19:14 TSofM
0 siblings, 0 replies; only message in thread
From: TSofM @ 2013-10-20 19:14 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 4784 bytes --]
I would like to convert from Dokuwiki format (in a .txt file) to latex. The
guide<http://donaldmerand.com/code/2012/07/20/how-i-actually-convert-dokuwiki-to-latex.html>recommends using Dokuwiki's parser to first convert .txt to HTML. Then
Pandoc can be used to convert the HTML to latex. This works well for most
of Dokuwiki's syntax.
However, if MathJax is used (MathJax is basically written as the typical
latex syntax), then this does not work. In particular, I need a few
commands to be unchanged in the final .tex file. In particular, this
includes dollar-sign environments, and also backslash commands. Here is an
example Dokuwiki .txt file we want to convert:
====== Mathjax test ======
Here is a test for an in-line equation $x^2 + y^2 = z^2$, and also a
displayed equation
\begin{equation} \label{myeq}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}
I can then reference the equation via \eqref{myeq}. I can also dostandalone mathematics like
\[
a^2 + b^2 = c^2.
\]
There are other equation environments. Here is one that involves cases:
\begin{multline}
f(x) = \begin{cases}a \\ b\end{cases} \\
g(x).
\end{multline}
Once you run this through the Dokuwiki parser using doku2html test.txt >
test.html, the following is returned:
<h1 class="sectionedit1" id="mathjax_test">Mathjax test</h1>
<div class="level1">
<p>
Here is a test for an in-line equation $x^2 + y^2 = z^2$, and also a
displayed equation
\begin{equation} \label{myeq}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}
</p>
<p>
I can then reference the equation via \eqref{myeq}. I can also do
standalone mathematics like
\[
a^2 + b^2 = c^2.
\]
</p>
<p>
There are other equation environments. Here is one that involves cases:
\begin{multline}
f(x) = \begin{cases}a \\ b\end{cases} \\
g(x).
\end{multline}
</p>
</div>
Note that latex is preserved perfectly. Once we run this through Pandoc,
though, the relevant section is changed to:
\section{Mathjax test}\label{mathjax_test}
Here is a test for an in-line equation \$x\^{}2 + y\^{}2 = z\^{}2\$, and
also a displayed equation \textbackslash{}begin\{equation\}
\textbackslash{}label\{myeq\} f(x) =
\textbackslash{}sum\_\{n=0\}\^{}\textbackslash{}infty
\textbackslash{}frac\{f\^{}\{(n)\}(0)\}\{n!\} x\^{}n.
\textbackslash{}end\{equation\}
I can then reference the equation via \textbackslash{}eqref\{myeq\}. I
can also do standalone mathematics like \textbackslash{}{[} a\^{}2 +
b\^{}2 = c\^{}2. \textbackslash{}{]}
There are other equation environments. Here is one that involves cases:
\textbackslash{}begin\{multline\} f(x) = \textbackslash{}begin\{cases\}a
\textbackslash{}\textbackslash{} b\textbackslash{}end\{cases\}
\textbackslash{}\textbackslash{} g(x). \textbackslash{}end\{multline\}
We want to stop Pandoc from processing the dollar-sign environments and
also the backslash environments. How do we do that?
*Haskell fix for dollar signs*
Jon MacFarlane recommended a Haskell fix for the dollar signs<http://stackoverflow.com/questions/11338049/how-to-convert-html-with-mathjax-into-latex-using-pandoc>.
Note that you can compile his code with some minor changes to syntax<http://stackoverflow.com/questions/19472828/unable-to-compile-haskell-program-couldnt-match-errors>
.
import Text.Pandoc.JSON
import Text.Pandoc
main = toJSONFilter fixmath
fixmath :: Block -> Block
fixmath = bottomUp fixmathBlock . bottomUp fixmathInline
fixmathInline :: Inline -> Inline
fixmathInline (RawInline (Format "html") ('<':'!':'-':'-':'M':'A':'T':'H':xs
)) =
RawInline (Format "tex") $ take (length xs - 3) xs
fixmathInline x = x
fixmathBlock :: Block -> Block
fixmathBlock (RawBlock (Format "html") ('<':'!':'-':'-':'M':'A':'T':'H':xs))
=
RawBlock (Format "tex") $ take (length xs - 3) xs
fixmathBlock x = x
This does a good job of preserving the dollar signs. Two questions remain:
1. Can someone help to add to the script so that the backslash
environments are preserved as well? I'm afraid I don't know much about
Haskell, so I try to follow along as best I can.
2. Is there an easier solution that you can spot to convert from
Dokuwiki to latex while preserving MathJax?
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/897da56d-fc07-4f5c-92e6-12fdbc4faecb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
[-- Attachment #2: Type: text/html, Size: 38565 bytes --]
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2013-10-20 19:14 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-20 19:14 Converting HTML (or Dokuwiki) to LaTeX while preserving $...$ and backslash environments TSofM
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).