public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* HTML to LaTeX: how do you define regions that aren't touched by Pandoc?
@ 2013-10-21 18:51 TSofM
       [not found] ` <b7d858ad-786c-4e68-9bd7-a481b5b9865d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: TSofM @ 2013-10-21 18:51 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1868 bytes --]

Consider the following Markdown/Dokuwiki code:

<notouch>
\begin{equation}
a = b = c
\end{equation}
</notouch>

<notex>
Don't tex me!
</notex>

I have defined two special environments, <notouch> [Pandoc should include 
the contents but should not parse], and <notex> [Pandoc should not include 
the contents]

I'd like to parse this first using Dokuwiki command line interpreter<http://donaldmerand.com/code/2012/07/20/how-i-actually-convert-dokuwiki-to-latex.html>, 
which converts it to HTML. The result when you pass it through the .txt to 
HTML conversion is:
<p>
&lt;notouch&gt;
\begin{equation}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}
&lt;/notouch&gt;
</p>

<p>
&lt;notex&gt;
Don&#039;t tex me!
&lt;/notex&gt;
</p>


Basically, I want to pass this through Pandoc to convert to latex. 

   1. Whenever <notouch> ... </notouch> is encountered in the original 
   file, I want Pandoc to preserve everything inside the environment *
   without* interpreting the code (so in the final latex file, this section 
   is *not touched*). 
   2. Whenever <notex> ... </notex> is encountered in the original file, I 
   want Pandoc to remove it entirely. So anything contained in this 
   environment is either commented-out or removed entirely in the latex file. 
   

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b7d858ad-786c-4e68-9bd7-a481b5b9865d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 3815 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: HTML to LaTeX: how do you define regions that aren't touched by Pandoc?
       [not found] ` <b7d858ad-786c-4e68-9bd7-a481b5b9865d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2013-10-21 18:51   ` TSofM
  2013-10-21 19:17   ` John MacFarlane
  1 sibling, 0 replies; 5+ messages in thread
From: TSofM @ 2013-10-21 18:51 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 677 bytes --]

Sorry! I forgot to ask the actual question: can someone guide me through 
how this might be done? 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b28cdc22-c3b2-4730-958a-ff8647808551%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 944 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: HTML to LaTeX: how do you define regions that aren't touched by Pandoc?
       [not found] ` <b7d858ad-786c-4e68-9bd7-a481b5b9865d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2013-10-21 18:51   ` TSofM
@ 2013-10-21 19:17   ` John MacFarlane
       [not found]     ` <20131021191718.GB32668-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  1 sibling, 1 reply; 5+ messages in thread
From: John MacFarlane @ 2013-10-21 19:17 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I think the easiest approach would be to first preprocess
the source, replacing `<notouch>` with `<pre class="rawtex">`
and `</notouch>` with `</pre>`, and replacing `<notex>` with
`<!--` and `</notex>` with `-->`.

You can use sed or perl for this.

Then, run the result through pandoc, and use a filter like
this:

#!/usr/bin/env runhaskell
import Text.Pandoc.JSON

main = toJSONFilter rawtex

rawtex :: Block -> Block
rawtex (CodeBlock (_,["rawtex"],_) code) =
  RawBlock (Format "latex") code
rawtex x = x


Save this as rawtex.hs, then

    chmod +x rawtex.hs

You can run it like this:

    pandoc --filter ./rawtex.hs -f html -t latex -s

+++ TSofM [Oct 21 13 11:51 ]:
>    Consider the following Markdown/Dokuwiki code:
>    <notouch>
>    \begin{equation}
>    a = b = c
>    \end{equation}
>    </notouch>
>    <notex>
>    Don't tex me!
>    </notex>
>    I have defined two special environments, <notouch> [Pandoc should
>    include the contents but should not parse], and <notex> [Pandoc should
>    not include the contents]
>    I'd like to parse this first using [1]Dokuwiki command line
>    interpreter, which converts it to HTML. The result when you pass it
>    through the .txt to HTML conversion is:
>    <p>
>    &lt;notouch&gt;
>    \begin{equation}
>    f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
>    \end{equation}
>    &lt;/notouch&gt;
>    </p>
>    <p>
>    &lt;notex&gt;
>    Don&#039;t tex me!
>    &lt;/notex&gt;
>    </p>
>    Basically, I want to pass this through Pandoc to convert to latex.
>     1. Whenever <notouch> ... </notouch> is encountered in the original
>        file, I want Pandoc to preserve everything inside the environment
>        without interpreting the code (so in the final latex file, this
>        section is not touched).
>     2. Whenever <notex> ... </notex> is encountered in the original file,
>        I want Pandoc to remove it entirely. So anything contained in this
>        environment is either commented-out or removed entirely in the
>        latex file.
> 
>    --
>    You received this message because you are subscribed to the Google
>    Groups "pandoc-discuss" group.
>    To unsubscribe from this group and stop receiving emails from it, send
>    an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>    To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>    To view this discussion on the web visit
>    [2]https://groups.google.com/d/msgid/pandoc-discuss/b7d858ad-786c-4e68-
>    9bd7-a481b5b9865d%40googlegroups.com.
>    For more options, visit [3]https://groups.google.com/groups/opt_out.
> 
> References
> 
>    1. http://donaldmerand.com/code/2012/07/20/how-i-actually-convert-dokuwiki-to-latex.html
>    2. https://groups.google.com/d/msgid/pandoc-discuss/b7d858ad-786c-4e68-9bd7-a481b5b9865d%40googlegroups.com
>    3. https://groups.google.com/groups/opt_out


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: HTML to LaTeX: how do you define regions that aren't touched by Pandoc?
       [not found]     ` <20131021191718.GB32668-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2013-10-21 22:13       ` TSofM
  2013-10-27 16:35       ` TSofM
  1 sibling, 0 replies; 5+ messages in thread
From: TSofM @ 2013-10-21 22:13 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 5420 bytes --]

Hi!

Thank you. I managed to come up with the following sed script (sed is also 
new to me):
#!/usr/bin/sed -f
s/<notouch>/<html><pre class="rawtex">/g
s/<\/notouch>/<\/pre><\/html>/g
s/<notex>/<html><!--/g
s/<\/notex>/--><\/html>/g

which seems to work and the <html> blocks are necessary so that Dokuwiki's 
parser returns the following in an HTML: 
<p>
This is a notouch environment
<pre class="rawtex">
\begin{equation}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}
</pre>
And this is a notex environment
<!--
Don't tex me!
-->
</p>

I used your suggested .hs script and it returned the following:
This is a notouch environment

\begin{equation}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}

And this is a notex environment

which is almost perfect. Can I ask you two things: (1) how do I modify the 
script so as to insert a comment symbol % whenever the <pre> ... </pre> 
blocks are encountered (so that LaTeX does not process anything above and 
below the displayed equation as a newline and there should be no empty 
lines in the above .tex output). (2) would you be able to write a few 
comments about how rawtex function is defined and how CodeBlock, RawBlock, 
etc. works here:


#!/usr/bin/env runhaskell 
import Text.Pandoc.JSON 
 
main = toJSONFilter rawtex 
 
rawtex :: Block -> Block 
rawtex (CodeBlock (_,["rawtex"],_) code) = 
  RawBlock (Format "latex") code 
rawtex x = x 

I tried to wade through some introductory tutorials on Haskell just now, 
and do various searches on this syntax, but it is still foreign...


On Monday, October 21, 2013 8:17:18 PM UTC+1, fiddlosopher wrote:
>
> I think the easiest approach would be to first preprocess 
> the source, replacing `<notouch>` with `<pre class="rawtex">` 
> and `</notouch>` with `</pre>`, and replacing `<notex>` with 
> `<!--` and `</notex>` with `-->`. 
>
> You can use sed or perl for this. 
>
> Then, run the result through pandoc, and use a filter like 
> this: 
>
> #!/usr/bin/env runhaskell 
> import Text.Pandoc.JSON 
>
> main = toJSONFilter rawtex 
>
> rawtex :: Block -> Block 
> rawtex (CodeBlock (_,["rawtex"],_) code) = 
>   RawBlock (Format "latex") code 
> rawtex x = x 
>
>
> Save this as rawtex.hs, then 
>
>     chmod +x rawtex.hs 
>
> You can run it like this: 
>
>     pandoc --filter ./rawtex.hs -f html -t latex -s 
>
> +++ TSofM [Oct 21 13 11:51 ]: 
> >    Consider the following Markdown/Dokuwiki code: 
> >    <notouch> 
> >    \begin{equation} 
> >    a = b = c 
> >    \end{equation} 
> >    </notouch> 
> >    <notex> 
> >    Don't tex me! 
> >    </notex> 
> >    I have defined two special environments, <notouch> [Pandoc should 
> >    include the contents but should not parse], and <notex> [Pandoc 
> should 
> >    not include the contents] 
> >    I'd like to parse this first using [1]Dokuwiki command line 
> >    interpreter, which converts it to HTML. The result when you pass it 
> >    through the .txt to HTML conversion is: 
> >    <p> 
> >    &lt;notouch&gt; 
> >    \begin{equation} 
> >    f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n. 
> >    \end{equation} 
> >    &lt;/notouch&gt; 
> >    </p> 
> >    <p> 
> >    &lt;notex&gt; 
> >    Don&#039;t tex me! 
> >    &lt;/notex&gt; 
> >    </p> 
> >    Basically, I want to pass this through Pandoc to convert to latex. 
> >     1. Whenever <notouch> ... </notouch> is encountered in the original 
> >        file, I want Pandoc to preserve everything inside the environment 
> >        without interpreting the code (so in the final latex file, this 
> >        section is not touched). 
> >     2. Whenever <notex> ... </notex> is encountered in the original 
> file, 
> >        I want Pandoc to remove it entirely. So anything contained in 
> this 
> >        environment is either commented-out or removed entirely in the 
> >        latex file. 
> > 
> >    -- 
> >    You received this message because you are subscribed to the Google 
> >    Groups "pandoc-discuss" group. 
> >    To unsubscribe from this group and stop receiving emails from it, 
> send 
> >    an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >    To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:>. 
>
> >    To view this discussion on the web visit 
> >    [2]
> https://groups.google.com/d/msgid/pandoc-discuss/b7d858ad-786c-4e68- 
> >    9bd7-a481b5b9865d%40googlegroups.com. 
> >    For more options, visit [3]https://groups.google.com/groups/opt_out. 
> > 
> > References 
> > 
> >    1. 
> http://donaldmerand.com/code/2012/07/20/how-i-actually-convert-dokuwiki-to-latex.html 
> >    2. 
> https://groups.google.com/d/msgid/pandoc-discuss/b7d858ad-786c-4e68-9bd7-a481b5b9865d%40googlegroups.com 
> >    3. https://groups.google.com/groups/opt_out 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8744d46d-f3b8-4ce9-a696-bf23b82ecaad%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 16905 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: HTML to LaTeX: how do you define regions that aren't touched by Pandoc?
       [not found]     ` <20131021191718.GB32668-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2013-10-21 22:13       ` TSofM
@ 2013-10-27 16:35       ` TSofM
  1 sibling, 0 replies; 5+ messages in thread
From: TSofM @ 2013-10-27 16:35 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 5877 bytes --]

I've written the following script, which goes and replaces statements like

Here is a test for an in-line equation $x^2 + y^2 = z^2$, and also a 
displayed equation 
\begin{equation} \label{myeq}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}


with the <pre class="rawtex">..</pre> statement:

Here is a test for an in-line equation <html><pre class="rawtex">$x^2 + y^2 
= z^2$</pre></html>, and also a displayed equation 
<html><pre class="rawtex">\begin{equation} \label{myeq}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}</pre></html>

The script is here:

#!/bin/sh

# Replace equation 
sed 's/\\begin{equation}/<html><pre class="rawtex">\\begin{equation}/g' $1.txt 
> tmp.txt
sed -i.bak 's/\\end{equation}/\\end{equation}<\/pre><\/html>/g' tmp.txt

# Replace \[ ... \] 
sed -i.bak 's/\\\[/<html><pre class="rawtex">\\\[/g' tmp.txt
sed -i.bak 's/\\\]/\\\]<\/pre><\/html>/g' tmp.txt

# Replace <notouch> environments
sed -i.bak 's/<notouch>/<html><pre class="rawtex">/g' tmp.txt
sed -i.bak 's/<\/notouch>/<\/pre><\/html>/g' tmp.txt

# Comment out <notex> environments
sed -i.bak 's/<notex>/<html><!--/g' tmp.txt
sed -i.bak 's/<\/notex>/--><\/html>/g' tmp.txt

# Replace $ ... $ environments
perl -i -0pe 's/(\$\$?[^\$]+\$\$?)/\<html><pre 
class="rawtex">$1<\/pre><\/html>/gm' tmp.txt

The problem is using John's script:
#!/usr/bin/env runhaskell
import Text.Pandoc.JSON

main = toJSONFilter rawtex

rawtex :: Block -> Block
rawtex (CodeBlock (_,["rawtex"],_) code) =
  RawBlock (Format "latex") code
rawtex x = x


the replacement inserts empty lines. It gives a .tex file like this:
Here is a test for an in-line equation

$x^2 + y^2 = z^2$

, and also a displayed equation

\begin{equation} \label{myeq}
f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n.
\end{equation}




Instead, we need something inline. I've tried changing RawBlock to 
RawInline, Block -> Block to Inline -> Inline, but the code does not 
compile. I'm not sure what the equivalent is for the CodeBlock statememt.


On Monday, October 21, 2013 8:17:18 PM UTC+1, fiddlosopher wrote:
>
> I think the easiest approach would be to first preprocess 
> the source, replacing `<notouch>` with `<pre class="rawtex">` 
> and `</notouch>` with `</pre>`, and replacing `<notex>` with 
> `<!--` and `</notex>` with `-->`. 
>
> You can use sed or perl for this. 
>
> Then, run the result through pandoc, and use a filter like 
> this: 
>
> #!/usr/bin/env runhaskell 
> import Text.Pandoc.JSON 
>
> main = toJSONFilter rawtex 
>
> rawtex :: Block -> Block 
> rawtex (CodeBlock (_,["rawtex"],_) code) = 
>   RawBlock (Format "latex") code 
> rawtex x = x 
>
>
> Save this as rawtex.hs, then 
>
>     chmod +x rawtex.hs 
>
> You can run it like this: 
>
>     pandoc --filter ./rawtex.hs -f html -t latex -s 
>
> +++ TSofM [Oct 21 13 11:51 ]: 
> >    Consider the following Markdown/Dokuwiki code: 
> >    <notouch> 
> >    \begin{equation} 
> >    a = b = c 
> >    \end{equation} 
> >    </notouch> 
> >    <notex> 
> >    Don't tex me! 
> >    </notex> 
> >    I have defined two special environments, <notouch> [Pandoc should 
> >    include the contents but should not parse], and <notex> [Pandoc 
> should 
> >    not include the contents] 
> >    I'd like to parse this first using [1]Dokuwiki command line 
> >    interpreter, which converts it to HTML. The result when you pass it 
> >    through the .txt to HTML conversion is: 
> >    <p> 
> >    &lt;notouch&gt; 
> >    \begin{equation} 
> >    f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n. 
> >    \end{equation} 
> >    &lt;/notouch&gt; 
> >    </p> 
> >    <p> 
> >    &lt;notex&gt; 
> >    Don&#039;t tex me! 
> >    &lt;/notex&gt; 
> >    </p> 
> >    Basically, I want to pass this through Pandoc to convert to latex. 
> >     1. Whenever <notouch> ... </notouch> is encountered in the original 
> >        file, I want Pandoc to preserve everything inside the environment 
> >        without interpreting the code (so in the final latex file, this 
> >        section is not touched). 
> >     2. Whenever <notex> ... </notex> is encountered in the original 
> file, 
> >        I want Pandoc to remove it entirely. So anything contained in 
> this 
> >        environment is either commented-out or removed entirely in the 
> >        latex file. 
> > 
> >    -- 
> >    You received this message because you are subscribed to the Google 
> >    Groups "pandoc-discuss" group. 
> >    To unsubscribe from this group and stop receiving emails from it, 
> send 
> >    an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >    To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<javascript:>. 
>
> >    To view this discussion on the web visit 
> >    [2]
> https://groups.google.com/d/msgid/pandoc-discuss/b7d858ad-786c-4e68- 
> >    9bd7-a481b5b9865d%40googlegroups.com. 
> >    For more options, visit [3]https://groups.google.com/groups/opt_out. 
> > 
> > References 
> > 
> >    1. 
> http://donaldmerand.com/code/2012/07/20/how-i-actually-convert-dokuwiki-to-latex.html 
> >    2. 
> https://groups.google.com/d/msgid/pandoc-discuss/b7d858ad-786c-4e68-9bd7-a481b5b9865d%40googlegroups.com 
> >    3. https://groups.google.com/groups/opt_out 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f6800f80-0d3a-4653-a968-f7e04c472ea8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 32402 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-10-27 16:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-21 18:51 HTML to LaTeX: how do you define regions that aren't touched by Pandoc? TSofM
     [not found] ` <b7d858ad-786c-4e68-9bd7-a481b5b9865d-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2013-10-21 18:51   ` TSofM
2013-10-21 19:17   ` John MacFarlane
     [not found]     ` <20131021191718.GB32668-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2013-10-21 22:13       ` TSofM
2013-10-27 16:35       ` TSofM

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).