a better platform for DeTeX

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* a better platform for DeTeX - a question.
@ 2016-03-08 19:40 Paulo Ney de Souza
       [not found] ` <CAFVhNZOiGeUoAEzJ=aJGecJhw+Yo8qSjSHdyP6iidYBV7DQgug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Paulo Ney de Souza @ 2016-03-08 19:40 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]

DeTeX is an old program (written in Lex in the beginning of the 80's) and
extremely useful to TeX users for spelling, word-count and even grammar
analysis. It removes all TeX commands from the file and leaves behind all
the text. It is distributed to this day by TeXLive.

The Lex code produces C, that is then compiled. As you may imagine it is
extremely difficult to maintain -- when you try to fix something you
inadvertently break 25 others ...  and needs badly to be updated for i18n
and other issues after so many years.

I have been thinking that Haskell may provide a better language for doing
the work, and that "pandoc" may even be the right tool to do it,
implementing a sort of "pure-text" format that gets rid off all math,
\textbf, \emph, etc ... and leaves only the text behind. Conversion to
"markdown" and "reStructuredText" is sort of half-way there ....

Wondering if anyone has tried any of that and what would be the closest
format to start with ....

Paulo Ney

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOiGeUoAEzJ%3DaJGecJhw%2BYo8qSjSHdyP6iidYBV7DQgug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 2107 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a better platform for DeTeX - a question.
       [not found] ` <CAFVhNZOiGeUoAEzJ=aJGecJhw+Yo8qSjSHdyP6iidYBV7DQgug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-09  5:05   ` John MacFarlane
       [not found]     ` <20160309050508.GD68594-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: John MacFarlane @ 2016-03-09  5:05 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

You could try

    pandoc -f latex -t plain

and see how close that gets you.

+++ Paulo Ney de Souza [Mar 08 16 11:40 ]:
>   DeTeX is an old program (written in Lex in the beginning of the 80's)
>   and extremely useful to TeX users for spelling, word-count and even
>   grammar analysis. It removes all TeX commands from the file and leaves
>   behind all the text. It is distributed to this day by TeXLive.
>   The Lex code produces C, that is then compiled. As you may imagine it
>   is extremely difficult to maintain -- when you try to fix something you
>   inadvertently break 25 others ...  and needs badly to be updated for
>   i18n and other issues after so many years.
>   I have been thinking that Haskell may provide a better language for
>   doing the work, and that "pandoc" may even be the right tool to do it,
>   implementing a sort of "pure-text" format that gets rid off all math,
>   \textbf, \emph, etc ... and leaves only the text behind. Conversion to
>   "markdown" and "reStructuredText" is sort of half-way there ....
>   Wondering if anyone has tried any of that and what would be the closest
>   format to start with ....
>   Paulo Ney
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOiGeUoAEzJ%3
>   DaJGecJhw%2BYo8qSjSHdyP6iidYBV7DQgug%40mail.gmail.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOiGeUoAEzJ=aJGecJhw+Yo8qSjSHdyP6iidYBV7DQgug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a better platform for DeTeX - a question.
       [not found]     ` <20160309050508.GD68594-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
@ 2016-03-09  5:13       ` Paulo Ney de Souza
       [not found]         ` <CAFVhNZMLJKTmSuuj2GMDCi0xiay8HadM+uogKrq03K_sQQnGDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Paulo Ney de Souza @ 2016-03-09  5:13 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3645 bytes --]

It gets awfully close, but certainly a few adjustments are needed... What
are the main uses of "plain"?

Paulo Ney

On Tue, Mar 8, 2016 at 9:05 PM, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

> You could try
>
>    pandoc -f latex -t plain
>
> and see how close that gets you.
>
> +++ Paulo Ney de Souza [Mar 08 16 11:40 ]:
>
>>   DeTeX is an old program (written in Lex in the beginning of the 80's)
>>   and extremely useful to TeX users for spelling, word-count and even
>>   grammar analysis. It removes all TeX commands from the file and leaves
>>   behind all the text. It is distributed to this day by TeXLive.
>>   The Lex code produces C, that is then compiled. As you may imagine it
>>   is extremely difficult to maintain -- when you try to fix something you
>>   inadvertently break 25 others ...  and needs badly to be updated for
>>   i18n and other issues after so many years.
>>   I have been thinking that Haskell may provide a better language for
>>   doing the work, and that "pandoc" may even be the right tool to do it,
>>   implementing a sort of "pure-text" format that gets rid off all math,
>>   \textbf, \emph, etc ... and leaves only the text behind. Conversion to
>>   "markdown" and "reStructuredText" is sort of half-way there ....
>>   Wondering if anyone has tried any of that and what would be the closest
>>   format to start with ....
>>   Paulo Ney
>>
>>   --
>>   You received this message because you are subscribed to the Google
>>   Groups "pandoc-discuss" group.
>>   To unsubscribe from this group and stop receiving emails from it, send
>>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>   To post to this group, send email to
>>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>   To view this discussion on the web visit
>>   [3]https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOiGeUoAEzJ%3
>>   DaJGecJhw%2BYo8qSjSHdyP6iidYBV7DQgug%40mail.gmail.com.
>>   For more options, visit [4]https://groups.google.com/d/optout.
>>
>> References
>>
>>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>   3.
>> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOiGeUoAEzJ=aJGecJhw+Yo8qSjSHdyP6iidYBV7DQgug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org?utm_medium=email&utm_source=footer
>>   4. https://groups.google.com/d/optout
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/20160309050508.GD68594%40MacBook-Air-2.local
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMLJKTmSuuj2GMDCi0xiay8HadM%2BuogKrq03K_sQQnGDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 6158 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a better platform for DeTeX - a question.
       [not found]         ` <CAFVhNZMLJKTmSuuj2GMDCi0xiay8HadM+uogKrq03K_sQQnGDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-09  8:23           ` BPJ
       [not found]             ` <CADAJKhAHFrRUKSUnyibgTSCLVBeyHcgQOKxO=qs=z4fVjwyo4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: BPJ @ 2016-03-09  8:23 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 5548 bytes --]

What are the remaining wrinkles? It is quite likely that they can be fixed
with a filter. Remaining inline markup certainly can.

onsdag 9 mars 2016 skrev Paulo Ney de Souza <pauloney-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:

> It gets awfully close, but certainly a few adjustments are needed... What
> are the main uses of "plain"?
>
> Paulo Ney
>
> On Tue, Mar 8, 2016 at 9:05 PM, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org
> <javascript:_e(%7B%7D,'cvml','jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org');>> wrote:
>
>> You could try
>>
>>    pandoc -f latex -t plain
>>
>> and see how close that gets you.
>>
>> +++ Paulo Ney de Souza [Mar 08 16 11:40 ]:
>>
>>>   DeTeX is an old program (written in Lex in the beginning of the 80's)
>>>   and extremely useful to TeX users for spelling, word-count and even
>>>   grammar analysis. It removes all TeX commands from the file and leaves
>>>   behind all the text. It is distributed to this day by TeXLive.
>>>   The Lex code produces C, that is then compiled. As you may imagine it
>>>   is extremely difficult to maintain -- when you try to fix something you
>>>   inadvertently break 25 others ...  and needs badly to be updated for
>>>   i18n and other issues after so many years.
>>>   I have been thinking that Haskell may provide a better language for
>>>   doing the work, and that "pandoc" may even be the right tool to do it,
>>>   implementing a sort of "pure-text" format that gets rid off all math,
>>>   \textbf, \emph, etc ... and leaves only the text behind. Conversion to
>>>   "markdown" and "reStructuredText" is sort of half-way there ....
>>>   Wondering if anyone has tried any of that and what would be the closest
>>>   format to start with ....
>>>   Paulo Ney
>>>
>>>   --
>>>   You received this message because you are subscribed to the Google
>>>   Groups "pandoc-discuss" group.
>>>   To unsubscribe from this group and stop receiving emails from it, send
>>>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>> <javascript:_e(%7B%7D,'cvml','pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>
>>> .
>>>   To post to this group, send email to
>>>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>> <javascript:_e(%7B%7D,'cvml','pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>.
>>>   To view this discussion on the web visit
>>>   [3]
>>> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOiGeUoAEzJ%3
>>>   DaJGecJhw%2BYo8qSjSHdyP6iidYBV7DQgug%40mail.gmail.com.
>>>   For more options, visit [4]https://groups.google.com/d/optout.
>>>
>>> References
>>>
>>>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>> <javascript:_e(%7B%7D,'cvml','pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>
>>>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>> <javascript:_e(%7B%7D,'cvml','pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>
>>>   3.
>>> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOiGeUoAEzJ=aJGecJhw+Yo8qSjSHdyP6iidYBV7DQgug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org?utm_medium=email&utm_source=footer
>>>   4. https://groups.google.com/d/optout
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> <javascript:_e(%7B%7D,'cvml','pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>
>> .
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> <javascript:_e(%7B%7D,'cvml','pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/20160309050508.GD68594%40MacBook-Air-2.local
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> <javascript:_e(%7B%7D,'cvml','pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>
> .
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> <javascript:_e(%7B%7D,'cvml','pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMLJKTmSuuj2GMDCi0xiay8HadM%2BuogKrq03K_sQQnGDw%40mail.gmail.com
> <https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMLJKTmSuuj2GMDCi0xiay8HadM%2BuogKrq03K_sQQnGDw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAHFrRUKSUnyibgTSCLVBeyHcgQOKxO%3Dqs%3Dz4fVjwyo4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 7998 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a better platform for DeTeX - a question.
       [not found]             ` <CADAJKhAHFrRUKSUnyibgTSCLVBeyHcgQOKxO=qs=z4fVjwyo4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-09 17:21               ` Paulo Ney de Souza
       [not found]                 ` <CAFVhNZPqAO2gVh8k3_ccG+4POUGD+JQ4ZtjAai4BeSMU6BLTmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Paulo Ney de Souza @ 2016-03-09 17:21 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 7540 bytes --]

Pandoc does so many things wonderfully well, like the ascii-accents to
utf-8 ... that it is a pity not to use it to implement the replacement
program.

Here I list the 10 biggest issues in using it as a base to spellers:

1- Keep underlines out, like the ones in:

    Introduction
    ===========+

    What is Combinatorics?
    ----------------------

2- Remove labels from \ref{} commands, that will produce things like:

   Some reference here to Figure [fig1.1].

3- Remove the images themselves, which leaves things like:

    [image]
     [fig1.1]

4- but preserving the contents of certain commands like \caption{}.

\begin{figure}[!htb]
\includegraphics[scale=0.3]{Figura11.pdf}
\caption{This text I want to spell, but not this label~\ref{lable-in-ref}}
\label{fig1.1}
\end{figure}

5- To be able to configure so you can get rid of everything that is inside
certain environments like:

{align}
{align*}
{eqnarray*}
{equation*}

6- To be able to configure so you can preserve the argument that are inside
certain environments like: \todo{}

7- To get rid of things like:

    \itemsep=-1pt

that leaves behind things like:
    =-1pt

8- To remove the specs of a column table like:

    \begin{tabular}{|c|c|c|c|}
    A & B
    \end{tabular}

that leave behind:

    |c|c|c|c| A & B

9- To properly deal with some specific commands like \lettrine{}

    \lettrine{T}{his is the dropcap} in a text...

    \lettrine{\textcolor{red}{T}}{his is the dropcap} in a text...

in preserving the text inside.

10- To be able to redirect text in particular language to a particular file
acording to directives in Babel and Polyglossia.

\begin{otherlanguage}{french}
Text en Francais ...
\end{otherlanguage}

\foreignlanguage{french}{Le text en Francais}

So that part of the text would be directed to "main_fr.txt" and other
languages appropriately.

Paulo Ney


On Wed, Mar 9, 2016 at 12:23 AM, BPJ <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org> wrote:

> What are the remaining wrinkles? It is quite likely that they can be fixed
> with a filter. Remaining inline markup certainly can.
>
>
> onsdag 9 mars 2016 skrev Paulo Ney de Souza <pauloney-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>
>> It gets awfully close, but certainly a few adjustments are needed... What
>> are the main uses of "plain"?
>>
>> Paulo Ney
>>
>> On Tue, Mar 8, 2016 at 9:05 PM, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:
>>
>>> You could try
>>>
>>>    pandoc -f latex -t plain
>>>
>>> and see how close that gets you.
>>>
>>> +++ Paulo Ney de Souza [Mar 08 16 11:40 ]:
>>>
>>>>   DeTeX is an old program (written in Lex in the beginning of the 80's)
>>>>   and extremely useful to TeX users for spelling, word-count and even
>>>>   grammar analysis. It removes all TeX commands from the file and leaves
>>>>   behind all the text. It is distributed to this day by TeXLive.
>>>>   The Lex code produces C, that is then compiled. As you may imagine it
>>>>   is extremely difficult to maintain -- when you try to fix something
>>>> you
>>>>   inadvertently break 25 others ...  and needs badly to be updated for
>>>>   i18n and other issues after so many years.
>>>>   I have been thinking that Haskell may provide a better language for
>>>>   doing the work, and that "pandoc" may even be the right tool to do it,
>>>>   implementing a sort of "pure-text" format that gets rid off all math,
>>>>   \textbf, \emph, etc ... and leaves only the text behind. Conversion to
>>>>   "markdown" and "reStructuredText" is sort of half-way there ....
>>>>   Wondering if anyone has tried any of that and what would be the
>>>> closest
>>>>   format to start with ....
>>>>   Paulo Ney
>>>>
>>>>   --
>>>>   You received this message because you are subscribed to the Google
>>>>   Groups "pandoc-discuss" group.
>>>>   To unsubscribe from this group and stop receiving emails from it, send
>>>>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>   To post to this group, send email to
>>>>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>   To view this discussion on the web visit
>>>>   [3]
>>>> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOiGeUoAEzJ%3
>>>>   DaJGecJhw%2BYo8qSjSHdyP6iidYBV7DQgug%40mail.gmail.com.
>>>>   For more options, visit [4]https://groups.google.com/d/optout.
>>>>
>>>> References
>>>>
>>>>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>>>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>>>>   3.
>>>> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOiGeUoAEzJ=aJGecJhw+Yo8qSjSHdyP6iidYBV7DQgug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org?utm_medium=email&utm_source=footer
>>>>   4. https://groups.google.com/d/optout
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/20160309050508.GD68594%40MacBook-Air-2.local
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMLJKTmSuuj2GMDCi0xiay8HadM%2BuogKrq03K_sQQnGDw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMLJKTmSuuj2GMDCi0xiay8HadM%2BuogKrq03K_sQQnGDw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAHFrRUKSUnyibgTSCLVBeyHcgQOKxO%3Dqs%3Dz4fVjwyo4g%40mail.gmail.com
> <https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhAHFrRUKSUnyibgTSCLVBeyHcgQOKxO%3Dqs%3Dz4fVjwyo4g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZPqAO2gVh8k3_ccG%2B4POUGD%2BJQ4ZtjAai4BeSMU6BLTmw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 10674 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a better platform for DeTeX - a question.
       [not found]                 ` <CAFVhNZPqAO2gVh8k3_ccG+4POUGD+JQ4ZtjAai4BeSMU6BLTmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-09 20:22                   ` BP Jonsson
       [not found]                     ` <56E0861F.6050608-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: BP Jonsson @ 2016-03-09 20:22 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Den 2016-03-09 kl. 18:21, skrev Paulo Ney de Souza:
> Pandoc does so many things wonderfully well, like the ascii-accents to
> utf-8 ... that it is a pity not to use it to implement the replacement
> program.
>
> Here I list the 10 biggest issues in using it as a base to spellers:
>
> 1- Keep underlines out, like the ones in:
>
>      Introduction
>      ===========+
>
>      What is Combinatorics?
>      ----------------------

The plain writer does that by default. It turns level one headings 
into UPPERCASE. If that is a problem they can be pseudo-demoted to 
level 2 so that they will look the same as other headings.

>
> 2- Remove labels from \ref{} commands, that will produce things like:
>
>     Some reference here to Figure [fig1.1].

See the answer to 6. Unfortunately that technique doesn't work 
unmodified because the latex reader seems to treat `\ref{}` 
specially, but with a bit of prefiltering one could easily replace 
every instance of the string `\ref` with something like 
`\removeMe` and then define `\newcommand{\removeMe}[1]{}` and 
pandoc will take care of the rest! It would be easy to write a 
script which reads in the latex file, does the prefiltering, 
inserts the (re)definitions in an appropriate place and runs 
pandoc, redirecting pandoc's output to its own output stream,
so that the user doesn't need to run two programs and pipe between 
them themself.

>
> 3- Remove the images themselves, which leaves things like:
>
>      [image]
>       [fig1.1]

That's easy. Technically you would replace the image element with 
its text content = caption

>
> 4- but preserving the contents of certain commands like \caption{}.
>
> \begin{figure}[!htb]
> \includegraphics[scale=0.3]{Figura11.pdf}
> \caption{This text I want to spell, but not this label~\ref{lable-in-ref}}
> \label{fig1.1}
> \end{figure}

That's relatively easy too. It's really just a combination of 2 and 3

>
> 5- To be able to configure so you can get rid of everything that is inside
> certain environments like:
>
> {align}
> {align*}
> {eqnarray*}
> {equation*}

If pandoc treats these specially the prefiltering technique from 2 
works.  The plain writer deletes all unknown environments already.

>
> 6- To be able to configure so you can preserve the argument that are inside
> certain environments like: \todo{}

$ pandoc -r latex -w plain
\renewcommand{\todo}[1]{#1}
\todo{everything}
^D
everything

That technique works for everything which pandoc doesn't treat 
specially. I'm not sure it works with actual environments as 
opposed to commands but a prefilter could just remove the 
`\begin...` and `\end...` lines.

>
> 7- To get rid of things like:
>
>      \itemsep=-1pt
>
> that leaves behind things like:
>      =-1pt

Also a good job for a prefilter

>
> 8- To remove the specs of a column table like:
>
>      \begin{tabular}{|c|c|c|c|}
>      A & B
>      \end{tabular}
>
> that leave behind:
>
>      |c|c|c|c| A & B

Can be solved.

>
> 9- To properly deal with some specific commands like \lettrine{}
>
>      \lettrine{T}{his is the dropcap} in a text...
>
>      \lettrine{\textcolor{red}{T}}{his is the dropcap} in a text...

\renewcommand{\lettrine}[2]{#1#2}
\renewcommand{\textcolor}[2]{#2}

>
> in preserving the text inside.
>
> 10- To be able to redirect text in particular language to a particular file
> acording to directives in Babel and Polyglossia.
>
> \begin{otherlanguage}{french}
> Text en Francais ...
> \end{otherlanguage}
>
> \foreignlanguage{french}{Le text en Francais}
>
> So that part of the text would be directed to "main_fr.txt" and other
> languages appropriately.

That would be trickier, but not impossible. Actually reading the 
mind of babel and polyglossia would be out of scope, but 
recognising the commands and doing something appropriate should be 
possible.

I assume that you would also like to turn things like emphasis, 
strikeout and sub/superscripts into naked text, since the plain 
writer currently doesn't:

$ pandoc -r markdown -w plain
*foo* **bar** ~~quux~~ ~1~ ^2^
^D
_foo_ BAR ~~quux~~ ₁ ²

That's easy with a filter.

/bpj

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/56E0861F.6050608%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a better platform for DeTeX - a question.
       [not found]                     ` <56E0861F.6050608-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2016-03-09 22:11                       ` Paulo Ney de Souza
       [not found]                         ` <CAFVhNZMV6Se68AWFthHgkh2xx2QWHoAAxMMzZXh_fjqU_+bWHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Paulo Ney de Souza @ 2016-03-09 22:11 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 5887 bytes --]

BPJ, are you proposing that all should be taken care with a pre-filter and
using the plain writer or is #10 a deal-breaker?

Paulo Ney

On Wed, Mar 9, 2016 at 12:22 PM, BP Jonsson <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org> wrote:

> Den 2016-03-09 kl. 18:21, skrev Paulo Ney de Souza:
>
>> Pandoc does so many things wonderfully well, like the ascii-accents to
>> utf-8 ... that it is a pity not to use it to implement the replacement
>> program.
>>
>> Here I list the 10 biggest issues in using it as a base to spellers:
>>
>> 1- Keep underlines out, like the ones in:
>>
>>      Introduction
>>      ===========+
>>
>>      What is Combinatorics?
>>      ----------------------
>>
>
> The plain writer does that by default. It turns level one headings into
> UPPERCASE. If that is a problem they can be pseudo-demoted to level 2 so
> that they will look the same as other headings.
>
>
>> 2- Remove labels from \ref{} commands, that will produce things like:
>>
>>     Some reference here to Figure [fig1.1].
>>
>
> See the answer to 6. Unfortunately that technique doesn't work unmodified
> because the latex reader seems to treat `\ref{}` specially, but with a bit
> of prefiltering one could easily replace every instance of the string
> `\ref` with something like `\removeMe` and then define
> `\newcommand{\removeMe}[1]{}` and pandoc will take care of the rest! It
> would be easy to write a script which reads in the latex file, does the
> prefiltering, inserts the (re)definitions in an appropriate place and runs
> pandoc, redirecting pandoc's output to its own output stream,
> so that the user doesn't need to run two programs and pipe between them
> themself.
>
>
>> 3- Remove the images themselves, which leaves things like:
>>
>>      [image]
>>       [fig1.1]
>>
>
> That's easy. Technically you would replace the image element with its text
> content = caption
>
>
>> 4- but preserving the contents of certain commands like \caption{}.
>>
>> \begin{figure}[!htb]
>> \includegraphics[scale=0.3]{Figura11.pdf}
>> \caption{This text I want to spell, but not this label~\ref{lable-in-ref}}
>> \label{fig1.1}
>> \end{figure}
>>
>
> That's relatively easy too. It's really just a combination of 2 and 3
>
>
>> 5- To be able to configure so you can get rid of everything that is inside
>> certain environments like:
>>
>> {align}
>> {align*}
>> {eqnarray*}
>> {equation*}
>>
>
> If pandoc treats these specially the prefiltering technique from 2 works.
> The plain writer deletes all unknown environments already.
>
>
>> 6- To be able to configure so you can preserve the argument that are
>> inside
>> certain environments like: \todo{}
>>
>
> $ pandoc -r latex -w plain
> \renewcommand{\todo}[1]{#1}
> \todo{everything}
> ^D
> everything
>
> That technique works for everything which pandoc doesn't treat specially.
> I'm not sure it works with actual environments as opposed to commands but a
> prefilter could just remove the `\begin...` and `\end...` lines.
>
>
>
>> 7- To get rid of things like:
>>
>>      \itemsep=-1pt
>>
>> that leaves behind things like:
>>      =-1pt
>>
>
> Also a good job for a prefilter
>
>
>> 8- To remove the specs of a column table like:
>>
>>      \begin{tabular}{|c|c|c|c|}
>>      A & B
>>      \end{tabular}
>>
>> that leave behind:
>>
>>      |c|c|c|c| A & B
>>
>
> Can be solved.
>
>
>> 9- To properly deal with some specific commands like \lettrine{}
>>
>>      \lettrine{T}{his is the dropcap} in a text...
>>
>>      \lettrine{\textcolor{red}{T}}{his is the dropcap} in a text...
>>
>
> \renewcommand{\lettrine}[2]{#1#2}
> \renewcommand{\textcolor}[2]{#2}
>
>
>
>> in preserving the text inside.
>>
>> 10- To be able to redirect text in particular language to a particular
>> file
>> acording to directives in Babel and Polyglossia.
>>
>> \begin{otherlanguage}{french}
>> Text en Francais ...
>> \end{otherlanguage}
>>
>> \foreignlanguage{french}{Le text en Francais}
>>
>> So that part of the text would be directed to "main_fr.txt" and other
>> languages appropriately.
>>
>
> That would be trickier, but not impossible. Actually reading the mind of
> babel and polyglossia would be out of scope, but recognising the commands
> and doing something appropriate should be possible.
>
> I assume that you would also like to turn things like emphasis, strikeout
> and sub/superscripts into naked text, since the plain writer currently
> doesn't:
>
> $ pandoc -r markdown -w plain
> *foo* **bar** ~~quux~~ ~1~ ^2^
> ^D
> _foo_ BAR ~~quux~~ ₁ ²
>
> That's easy with a filter.
>
>
> /bpj
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/56E0861F.6050608%40gmail.com
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMV6Se68AWFthHgkh2xx2QWHoAAxMMzZXh_fjqU_%2BbWHA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 8713 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a better platform for DeTeX - a question.
       [not found]                         ` <CAFVhNZMV6Se68AWFthHgkh2xx2QWHoAAxMMzZXh_fjqU_+bWHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-10  0:33                           ` BP Jonsson
       [not found]                             ` <CAFC_yuSBX6fPUr9fKjhXVRdBYt=GX5_ooNyBqJCuYZ-cezRUBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: BP Jonsson @ 2016-03-10  0:33 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 10001 bytes --]

Yes a pre-filter modifying the LaTeX input, another filter modifying the
pandoc AST and finally pandoc's plain writer. In practice I would write a
single script which reads in the latex, filters it, shells out to pandoc
to convert it to JSON , converts the JSON into a data structure, modifies
the structure, converts it back to JSON and shells out to pandoc once more
to feed the modified AST to the plain writer.

I have written a pandoc filter which alters the pandoc AST so that I get
POD (Perl's documentation format) when running it through the plain writer,
and I have taken advantage of the fact that pandoc's latex reader can
interpret simple macro definitions and thus interpret  commands it
otherwise wouldn't know. I have also prefiltered latex source to make it
more palatable to pandoc -- replacing square brackets with curlies notably.
The regular expression engine in recent versions of perl can easily match
nested bracketed consteucts. I already have a monster regex which matches
and captures up to nine curly or square bracket pairs and their content,
allowing nested bracketed material to an arbitrary depth in the content
which I have used to remove arguments, change brackets and wrap argument
content in further commands. These techniques could be combined to do a
suitable LaTeX to text conversion.

#10 isn't a deal breaker unless you expect actual interaction with a LaTeX
run. carving out the arguments of certain commands and textifying them
separately is entirely feasible, but it would involve running pandoc twice
for each language.

Anyway It is 1:30 AM here so we will have to continue this conversation
tomorrow!

/bpj

onsdag 9 mars 2016 skrev Paulo Ney de Souza <pauloney-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:

> BPJ, are you proposing that all should be taken care with a pre-filter and
> using the plain writer or is #10 a deal-breaker?
>
> Paulo Ney
>
> On Wed, Mar 9, 2016 at 12:22 PM, BP Jonsson <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org
> <javascript:_e(%7B%7D,'cvml','bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org');>> wrote:
>
>> Den 2016-03-09 kl. 18:21, skrev Paulo Ney de Souza:
>>
>>> Pandoc does so many things wonderfully well, like the ascii-accents to
>>> utf-8 ... that it is a pity not to use it to implement the replacement
>>> program.
>>>
>>> Here I list the 10 biggest issues in using it as a base to spellers:
>>>
>>> 1- Keep underlines out, like the ones in:
>>>
>>>      Introduction
>>>      ===========+
>>>
>>>      What is Combinatorics?
>>>      ----------------------
>>>
>>
>> The plain writer does that by default. It turns level one headings into
>> UPPERCASE. If that is a problem they can be pseudo-demoted to level 2 so
>> that they will look the same as other headings.
>>
>>
>>> 2- Remove labels from \ref{} commands, that will produce things like:
>>>
>>>     Some reference here to Figure [fig1.1].
>>>
>>
>> See the answer to 6. Unfortunately that technique doesn't work unmodified
>> because the latex reader seems to treat `\ref{}` specially, but with a bit
>> of prefiltering one could easily replace every instance of the string
>> `\ref` with something like `\removeMe` and then define
>> `\newcommand{\removeMe}[1]{}` and pandoc will take care of the rest! It
>> would be easy to write a script which reads in the latex file, does the
>> prefiltering, inserts the (re)definitions in an appropriate place and runs
>> pandoc, redirecting pandoc's output to its own output stream,
>> so that the user doesn't need to run two programs and pipe between them
>> themself.
>>
>>
>>> 3- Remove the images themselves, which leaves things like:
>>>
>>>      [image]
>>>       [fig1.1]
>>>
>>
>> That's easy. Technically you would replace the image element with its
>> text content = caption
>>
>>
>>> 4- but preserving the contents of certain commands like \caption{}.
>>>
>>> \begin{figure}[!htb]
>>> \includegraphics[scale=0.3]{Figura11.pdf}
>>> \caption{This text I want to spell, but not this
>>> label~\ref{lable-in-ref}}
>>> \label{fig1.1}
>>> \end{figure}
>>>
>>
>> That's relatively easy too. It's really just a combination of 2 and 3
>>
>>
>>> 5- To be able to configure so you can get rid of everything that is
>>> inside
>>> certain environments like:
>>>
>>> {align}
>>> {align*}
>>> {eqnarray*}
>>> {equation*}
>>>
>>
>> If pandoc treats these specially the prefiltering technique from 2
>> works.  The plain writer deletes all unknown environments already.
>>
>>
>>> 6- To be able to configure so you can preserve the argument that are
>>> inside
>>> certain environments like: \todo{}
>>>
>>
>> $ pandoc -r latex -w plain
>> \renewcommand{\todo}[1]{#1}
>> \todo{everything}
>> ^D
>> everything
>>
>> That technique works for everything which pandoc doesn't treat specially.
>> I'm not sure it works with actual environments as opposed to commands but a
>> prefilter could just remove the `\begin...` and `\end...` lines.
>>
>>
>>
>>> 7- To get rid of things like:
>>>
>>>      \itemsep=-1pt
>>>
>>> that leaves behind things like:
>>>      =-1pt
>>>
>>
>> Also a good job for a prefilter
>>
>>
>>> 8- To remove the specs of a column table like:
>>>
>>>      \begin{tabular}{|c|c|c|c|}
>>>      A & B
>>>      \end{tabular}
>>>
>>> that leave behind:
>>>
>>>      |c|c|c|c| A & B
>>>
>>
>> Can be solved.
>>
>>
>>> 9- To properly deal with some specific commands like \lettrine{}
>>>
>>>      \lettrine{T}{his is the dropcap} in a text...
>>>
>>>      \lettrine{\textcolor{red}{T}}{his is the dropcap} in a text...
>>>
>>
>> \renewcommand{\lettrine}[2]{#1#2}
>> \renewcommand{\textcolor}[2]{#2}
>>
>>
>>
>>> in preserving the text inside.
>>>
>>> 10- To be able to redirect text in particular language to a particular
>>> file
>>> acording to directives in Babel and Polyglossia.
>>>
>>> \begin{otherlanguage}{french}
>>> Text en Francais ...
>>> \end{otherlanguage}
>>>
>>> \foreignlanguage{french}{Le text en Francais}
>>>
>>> So that part of the text would be directed to "main_fr.txt" and other
>>> languages appropriately.
>>>
>>
>> That would be trickier, but not impossible. Actually reading the mind of
>> babel and polyglossia would be out of scope, but recognising the commands
>> and doing something appropriate should be possible.
>>
>> I assume that you would also like to turn things like emphasis, strikeout
>> and sub/superscripts into naked text, since the plain writer currently
>> doesn't:
>>
>> $ pandoc -r markdown -w plain
>> *foo* **bar** ~~quux~~ ~1~ ^2^
>> ^D
>> _foo_ BAR ~~quux~~ ₁ ²
>>
>> That's easy with a filter.
>>
>>
>> /bpj
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> <javascript:_e(%7B%7D,'cvml','pandoc-discuss%2Bunsubscribe@googlegroups.com');>
>> .
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> <javascript:_e(%7B%7D,'cvml','pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/56E0861F.6050608%40gmail.com
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> <javascript:_e(%7B%7D,'cvml','pandoc-discuss%2Bunsubscribe@googlegroups.com');>
> .
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> <javascript:_e(%7B%7D,'cvml','pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org');>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMV6Se68AWFthHgkh2xx2QWHoAAxMMzZXh_fjqU_%2BbWHA%40mail.gmail.com
> <https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMV6Se68AWFthHgkh2xx2QWHoAAxMMzZXh_fjqU_%2BbWHA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

------------------------------
SavedURI :Show URLShow URLSavedURI :
SavedURI :Hide URLHide URLSavedURI :
https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvwhttps://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw
<https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw>
<https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw>
------------------------------

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSBX6fPUr9fKjhXVRdBYt%3DGX5_ooNyBqJCuYZ-cezRUBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 13202 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a better platform for DeTeX - a question.
       [not found]                             ` <CAFC_yuSBX6fPUr9fKjhXVRdBYt=GX5_ooNyBqJCuYZ-cezRUBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-03-10  6:58                               ` Paulo Ney de Souza
  2016-03-10  8:30                                 ` BP Jonsson
  0 siblings, 1 reply; 10+ messages in thread
From: Paulo Ney de Souza @ 2016-03-10  6:58 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 11206 bytes --]

Hope you pardon my ignorance, I am familiar with a few of the steps below,
but not with most!

My question then would be:

Why would the double level filtering + AST meddling + multiple pandoc runs
be simpler than writing another writer for pandoc?

I was hoping that preparing another writer would be easier because I could
follow "plain" as the example and even borough a lot of the code in there
already!

Paulo Ney

On Wed, Mar 9, 2016 at 4:33 PM, BP Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Yes a pre-filter modifying the LaTeX input, another filter modifying the
> pandoc AST and finally pandoc's plain writer. In practice I would write a
> single script which reads in the latex, filters it, shells out to pandoc
> to convert it to JSON , converts the JSON into a data structure, modifies
> the structure, converts it back to JSON and shells out to pandoc once more
> to feed the modified AST to the plain writer.
>
> I have written a pandoc filter which alters the pandoc AST so that I get
> POD (Perl's documentation format) when running it through the plain writer,
> and I have taken advantage of the fact that pandoc's latex reader can
> interpret simple macro definitions and thus interpret  commands it
> otherwise wouldn't know. I have also prefiltered latex source to make it
> more palatable to pandoc -- replacing square brackets with curlies notably.
> The regular expression engine in recent versions of perl can easily match
> nested bracketed consteucts. I already have a monster regex which matches
> and captures up to nine curly or square bracket pairs and their content,
> allowing nested bracketed material to an arbitrary depth in the content
> which I have used to remove arguments, change brackets and wrap argument
> content in further commands. These techniques could be combined to do a
> suitable LaTeX to text conversion.
>
> #10 isn't a deal breaker unless you expect actual interaction with a LaTeX
> run. carving out the arguments of certain commands and textifying them
> separately is entirely feasible, but it would involve running pandoc twice
> for each language.
>
> Anyway It is 1:30 AM here so we will have to continue this conversation
> tomorrow!
>
> /bpj
>
>
> onsdag 9 mars 2016 skrev Paulo Ney de Souza <pauloney-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>
>> BPJ, are you proposing that all should be taken care with a pre-filter
>> and using the plain writer or is #10 a deal-breaker?
>>
>> Paulo Ney
>>
>> On Wed, Mar 9, 2016 at 12:22 PM, BP Jonsson <bpj-J3H7GcXPSITLoDKTGw+V6w@public.gmane.org> wrote:
>>
>>> Den 2016-03-09 kl. 18:21, skrev Paulo Ney de Souza:
>>>
>>>> Pandoc does so many things wonderfully well, like the ascii-accents to
>>>> utf-8 ... that it is a pity not to use it to implement the replacement
>>>> program.
>>>>
>>>> Here I list the 10 biggest issues in using it as a base to spellers:
>>>>
>>>> 1- Keep underlines out, like the ones in:
>>>>
>>>>      Introduction
>>>>      ===========+
>>>>
>>>>      What is Combinatorics?
>>>>      ----------------------
>>>>
>>>
>>> The plain writer does that by default. It turns level one headings into
>>> UPPERCASE. If that is a problem they can be pseudo-demoted to level 2 so
>>> that they will look the same as other headings.
>>>
>>>
>>>> 2- Remove labels from \ref{} commands, that will produce things like:
>>>>
>>>>     Some reference here to Figure [fig1.1].
>>>>
>>>
>>> See the answer to 6. Unfortunately that technique doesn't work
>>> unmodified because the latex reader seems to treat `\ref{}` specially, but
>>> with a bit of prefiltering one could easily replace every instance of the
>>> string `\ref` with something like `\removeMe` and then define
>>> `\newcommand{\removeMe}[1]{}` and pandoc will take care of the rest! It
>>> would be easy to write a script which reads in the latex file, does the
>>> prefiltering, inserts the (re)definitions in an appropriate place and runs
>>> pandoc, redirecting pandoc's output to its own output stream,
>>> so that the user doesn't need to run two programs and pipe between them
>>> themself.
>>>
>>>
>>>> 3- Remove the images themselves, which leaves things like:
>>>>
>>>>      [image]
>>>>       [fig1.1]
>>>>
>>>
>>> That's easy. Technically you would replace the image element with its
>>> text content = caption
>>>
>>>
>>>> 4- but preserving the contents of certain commands like \caption{}.
>>>>
>>>> \begin{figure}[!htb]
>>>> \includegraphics[scale=0.3]{Figura11.pdf}
>>>> \caption{This text I want to spell, but not this
>>>> label~\ref{lable-in-ref}}
>>>> \label{fig1.1}
>>>> \end{figure}
>>>>
>>>
>>> That's relatively easy too. It's really just a combination of 2 and 3
>>>
>>>
>>>> 5- To be able to configure so you can get rid of everything that is
>>>> inside
>>>> certain environments like:
>>>>
>>>> {align}
>>>> {align*}
>>>> {eqnarray*}
>>>> {equation*}
>>>>
>>>
>>> If pandoc treats these specially the prefiltering technique from 2
>>> works.  The plain writer deletes all unknown environments already.
>>>
>>>
>>>> 6- To be able to configure so you can preserve the argument that are
>>>> inside
>>>> certain environments like: \todo{}
>>>>
>>>
>>> $ pandoc -r latex -w plain
>>> \renewcommand{\todo}[1]{#1}
>>> \todo{everything}
>>> ^D
>>> everything
>>>
>>> That technique works for everything which pandoc doesn't treat
>>> specially. I'm not sure it works with actual environments as opposed to
>>> commands but a prefilter could just remove the `\begin...` and `\end...`
>>> lines.
>>>
>>>
>>>
>>>> 7- To get rid of things like:
>>>>
>>>>      \itemsep=-1pt
>>>>
>>>> that leaves behind things like:
>>>>      =-1pt
>>>>
>>>
>>> Also a good job for a prefilter
>>>
>>>
>>>> 8- To remove the specs of a column table like:
>>>>
>>>>      \begin{tabular}{|c|c|c|c|}
>>>>      A & B
>>>>      \end{tabular}
>>>>
>>>> that leave behind:
>>>>
>>>>      |c|c|c|c| A & B
>>>>
>>>
>>> Can be solved.
>>>
>>>
>>>> 9- To properly deal with some specific commands like \lettrine{}
>>>>
>>>>      \lettrine{T}{his is the dropcap} in a text...
>>>>
>>>>      \lettrine{\textcolor{red}{T}}{his is the dropcap} in a text...
>>>>
>>>
>>> \renewcommand{\lettrine}[2]{#1#2}
>>> \renewcommand{\textcolor}[2]{#2}
>>>
>>>
>>>
>>>> in preserving the text inside.
>>>>
>>>> 10- To be able to redirect text in particular language to a particular
>>>> file
>>>> acording to directives in Babel and Polyglossia.
>>>>
>>>> \begin{otherlanguage}{french}
>>>> Text en Francais ...
>>>> \end{otherlanguage}
>>>>
>>>> \foreignlanguage{french}{Le text en Francais}
>>>>
>>>> So that part of the text would be directed to "main_fr.txt" and other
>>>> languages appropriately.
>>>>
>>>
>>> That would be trickier, but not impossible. Actually reading the mind of
>>> babel and polyglossia would be out of scope, but recognising the commands
>>> and doing something appropriate should be possible.
>>>
>>> I assume that you would also like to turn things like emphasis,
>>> strikeout and sub/superscripts into naked text, since the plain writer
>>> currently doesn't:
>>>
>>> $ pandoc -r markdown -w plain
>>> *foo* **bar** ~~quux~~ ~1~ ^2^
>>> ^D
>>> _foo_ BAR ~~quux~~ ₁ ²
>>>
>>> That's easy with a filter.
>>>
>>>
>>> /bpj
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/56E0861F.6050608%40gmail.com
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMV6Se68AWFthHgkh2xx2QWHoAAxMMzZXh_fjqU_%2BbWHA%40mail.gmail.com
>> <https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZMV6Se68AWFthHgkh2xx2QWHoAAxMMzZXh_fjqU_%2BbWHA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> ------------------------------
> SavedURI :Show URLShow URLSavedURI :
> SavedURI :Hide URLHide URLSavedURI :
>
> https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvwhttps://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw
> <https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw>
> <https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.sv.G3GZFwvcniQ.O/m=m_i,t,it/am=fUAcTAoZawdGHAZ2YD-g9N_f7LL4CX7WlSgHQKgABHaCv9kToPiBD8qOMw/rt=h/d=1/rs=AItRSTO5CF1YB_frDRXLXTeUsQ1zItcBvw>
> ------------------------------
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSBX6fPUr9fKjhXVRdBYt%3DGX5_ooNyBqJCuYZ-cezRUBw%40mail.gmail.com
> <https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSBX6fPUr9fKjhXVRdBYt%3DGX5_ooNyBqJCuYZ-cezRUBw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFVhNZOt5N6imU3GKpsfAghn%2BcY7D9Cm7SkQFcv71in8AuDo%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 14652 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: a better platform for DeTeX - a question.
  2016-03-10  6:58                               ` Paulo Ney de Souza
@ 2016-03-10  8:30                                 ` BP Jonsson
  0 siblings, 0 replies; 10+ messages in thread
From: BP Jonsson @ 2016-03-10  8:30 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Den 2016-03-10 kl. 07:58, skrev Paulo Ney de Souza:
> Why would the double level filtering + AST meddling + multiple pandoc runs
> be simpler than writing another writer for pandoc?

Because you wouldn't need to learn Haskell! ;-)

Actually I'm thinking that you would need to also make a new 
reader which does the right thing with stuff which the existing 
latex reader squashes, turns into structures inappropriate for 
this purpose or at best can preserve as raw TeX. In the first 
filtering step by my wrapping-script method you wouldn't be 
constrained by pandoc's document model. Also note that in 
filter~1~ you wouldn't need to do a full parse, just 
replacing/removing some commands and/or whole or part of their 
arguments, and I already have the basic tools and only would need 
to combine and modify them. At the very least it would give an 
idea of what would be needed to do in a more sophisticated program.

/bpj

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-03-10  8:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-08 19:40 a better platform for DeTeX - a question Paulo Ney de Souza
     [not found] ` <CAFVhNZOiGeUoAEzJ=aJGecJhw+Yo8qSjSHdyP6iidYBV7DQgug-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-09  5:05   ` John MacFarlane
     [not found]     ` <20160309050508.GD68594-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
2016-03-09  5:13       ` Paulo Ney de Souza
     [not found]         ` <CAFVhNZMLJKTmSuuj2GMDCi0xiay8HadM+uogKrq03K_sQQnGDw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-09  8:23           ` BPJ
     [not found]             ` <CADAJKhAHFrRUKSUnyibgTSCLVBeyHcgQOKxO=qs=z4fVjwyo4g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-09 17:21               ` Paulo Ney de Souza
     [not found]                 ` <CAFVhNZPqAO2gVh8k3_ccG+4POUGD+JQ4ZtjAai4BeSMU6BLTmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-09 20:22                   ` BP Jonsson
     [not found]                     ` <56E0861F.6050608-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-09 22:11                       ` Paulo Ney de Souza
     [not found]                         ` <CAFVhNZMV6Se68AWFthHgkh2xx2QWHoAAxMMzZXh_fjqU_+bWHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-10  0:33                           ` BP Jonsson
     [not found]                             ` <CAFC_yuSBX6fPUr9fKjhXVRdBYt=GX5_ooNyBqJCuYZ-cezRUBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-03-10  6:58                               ` Paulo Ney de Souza
2016-03-10  8:30                                 ` BP Jonsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).