Approach to converting large, custom, LaTeX document to restructured text

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Approach to converting large, custom, LaTeX document to restructured text
@ 2020-09-10 21:43 Jeremy Conlin
       [not found] ` <9c40cd2c-9874-446b-8772-c8a99e377acan-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Jeremy Conlin @ 2020-09-10 21:43 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1520 bytes --]

I have a large (900 page) LaTeX document (broken up into several LaTeX 
files) that I want to convert into restructured text. I've already tried to 
use pandoc to convert some of the files and it has failed for a few 
reasons. 

I'm a new pandoc user, but I figure I'm going to have to write my own 
converter. Before I do, I wanted to ask this forum what the right way to 
approach the conversion. I was planning on reading  everything into Python, 
do my own search/replace and then pass the result on to pandoc. I would 
then rinse/repeat until I have everything the way I want it. 

I know there are filters and such that I can write to customize things, but 
(as a beginner) I'm not sure if it would be easier to learn pandoc syntax 
and write my own filter, or just go at it in Python as I described above.

I don't mind doing it either way; I think it might be a fun side project to 
do when I'm procrastinating doing what I really should be doing. 

Please advise on what is the right approach. I'm sure there are other 
approaches too that I'm not aware of. I'm open for suggestions.

Thanks,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9c40cd2c-9874-446b-8772-c8a99e377acan%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1959 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Approach to converting large, custom, LaTeX document to restructured text
       [not found] ` <9c40cd2c-9874-446b-8772-c8a99e377acan-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-09-11  0:50   ` John MacFarlane
       [not found]     ` <m27dt1qgqk.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2020-09-11  0:50 UTC (permalink / raw)
  To: Jeremy Conlin, pandoc-discuss


It really depends on the details of the document, but if
pandoc is struggling with certain commands and environments,
one approach is to define custom macros for those, which
convert them into something pandoc can handle.

(In a few cases you might get away with just putting the .sty
file in the working directory, so pandoc tries to parse it,
but pandoc usually can't handle the lower-level tex definitions
style files have, so this usually doesn't work.)

For example, if you have a foobar command, just
add this to your document

\renewcommand{foobar}[2]{limit yourself
here to stuff pandoc can handle}

You can often get pretty far with this method.

Jeremy Conlin <jlconlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I have a large (900 page) LaTeX document (broken up into several LaTeX 
> files) that I want to convert into restructured text. I've already tried to 
> use pandoc to convert some of the files and it has failed for a few 
> reasons. 
>
> I'm a new pandoc user, but I figure I'm going to have to write my own 
> converter. Before I do, I wanted to ask this forum what the right way to 
> approach the conversion. I was planning on reading  everything into Python, 
> do my own search/replace and then pass the result on to pandoc. I would 
> then rinse/repeat until I have everything the way I want it. 
>
> I know there are filters and such that I can write to customize things, but 
> (as a beginner) I'm not sure if it would be easier to learn pandoc syntax 
> and write my own filter, or just go at it in Python as I described above.
>
> I don't mind doing it either way; I think it might be a fun side project to 
> do when I'm procrastinating doing what I really should be doing. 
>
> Please advise on what is the right approach. I'm sure there are other 
> approaches too that I'm not aware of. I'm open for suggestions.
>
> Thanks,
> Jeremy
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9c40cd2c-9874-446b-8772-c8a99e377acan%40googlegroups.com.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Approach to converting large, custom, LaTeX document to restructured text
       [not found]     ` <m27dt1qgqk.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2020-09-11 13:31       ` Jeremy Conlin
       [not found]         ` <d8e598ff-e975-420d-baee-523f9ab38e35n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Jeremy Conlin @ 2020-09-11 13:31 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4341 bytes --]

Thank you for your response, John. 

Upon closer inspection, I think my initial assumptions were incorrect. I 
thought pandoc had found a command/environment that it didn't understand, 
but now it seems more obscure. 

I ran pandoc  with this command: "pandoc File.tex -t json --verbose" and 
got the following output

```
(lots of messages about Skipped and Parsing unescaped '&')
[INFO] Skipped '\bottomrule' at line 1849 column 16
[INFO] Skipped '\begin{tabular}' at line 1823 column 18
[INFO] Skipped '\end{tabular}' at line 1850 column 16
[INFO] Skipped '\subexperiment{SAP}' at line 1854 column 20

Error at "source" (line 1855, column 12):
unexpected [
Additional details are found in the following paragraphs.
           ^
```
The carrot should point to the d in details.

So I'm not sure why pandoc found what it thought was an "unexpected [". I 
couldn't find a bracket in the preceding few dozen lines, but I did find 
one in the few lines afterwards. Does the message mean something obscure?

Thanks for your help.
Jeremy

$ pandoc --version
pandoc 2.10
Compiled with pandoc-types 1.21, texmath 0.12.0.2, skylighting 0.8.5
Default user data directory: /Users/jlconlin/.local/share/pandoc or 
/Users/jlconlin/.pandoc
Copyright (C) 2006-2020 John MacFarlane
Web:  https://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.


On Thursday, September 10, 2020 at 6:50:28 PM UTC-6 John MacFarlane wrote:

>
> It really depends on the details of the document, but if
> pandoc is struggling with certain commands and environments,
> one approach is to define custom macros for those, which
> convert them into something pandoc can handle.
>
> (In a few cases you might get away with just putting the .sty
> file in the working directory, so pandoc tries to parse it,
> but pandoc usually can't handle the lower-level tex definitions
> style files have, so this usually doesn't work.)
>
> For example, if you have a foobar command, just
> add this to your document
>
> \renewcommand{foobar}[2]{limit yourself
> here to stuff pandoc can handle}
>
> You can often get pretty far with this method.
>
> Jeremy Conlin <jlco...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I have a large (900 page) LaTeX document (broken up into several LaTeX 
> > files) that I want to convert into restructured text. I've already tried 
> to 
> > use pandoc to convert some of the files and it has failed for a few 
> > reasons. 
> >
> > I'm a new pandoc user, but I figure I'm going to have to write my own 
> > converter. Before I do, I wanted to ask this forum what the right way to 
> > approach the conversion. I was planning on reading everything into 
> Python, 
> > do my own search/replace and then pass the result on to pandoc. I would 
> > then rinse/repeat until I have everything the way I want it. 
> >
> > I know there are filters and such that I can write to customize things, 
> but 
> > (as a beginner) I'm not sure if it would be easier to learn pandoc 
> syntax 
> > and write my own filter, or just go at it in Python as I described above.
> >
> > I don't mind doing it either way; I think it might be a fun side project 
> to 
> > do when I'm procrastinating doing what I really should be doing. 
> >
> > Please advise on what is the right approach. I'm sure there are other 
> > approaches too that I'm not aware of. I'm open for suggestions.
> >
> > Thanks,
> > Jeremy
> >
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/9c40cd2c-9874-446b-8772-c8a99e377acan%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d8e598ff-e975-420d-baee-523f9ab38e35n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5986 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Approach to converting large, custom, LaTeX document to restructured text
       [not found]         ` <d8e598ff-e975-420d-baee-523f9ab38e35n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-09-11 15:07           ` John MacFarlane
  0 siblings, 0 replies; 4+ messages in thread
From: John MacFarlane @ 2020-09-11 15:07 UTC (permalink / raw)
  To: Jeremy Conlin, pandoc-discuss


Sorry, the messages aren't always that helpful.

But you can try e.g. creating a document with the part
up to, say, line 1800, then adding stuff gradually;
this often tells you what the problem is.

Jeremy Conlin <jlconlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Thank you for your response, John. 
>
> Upon closer inspection, I think my initial assumptions were incorrect. I 
> thought pandoc had found a command/environment that it didn't understand, 
> but now it seems more obscure. 
>
> I ran pandoc  with this command: "pandoc File.tex -t json --verbose" and 
> got the following output
>
> ```
> (lots of messages about Skipped and Parsing unescaped '&')
> [INFO] Skipped '\bottomrule' at line 1849 column 16
> [INFO] Skipped '\begin{tabular}' at line 1823 column 18
> [INFO] Skipped '\end{tabular}' at line 1850 column 16
> [INFO] Skipped '\subexperiment{SAP}' at line 1854 column 20
>
> Error at "source" (line 1855, column 12):
> unexpected [
> Additional details are found in the following paragraphs.
>            ^
> ```
> The carrot should point to the d in details.
>
> So I'm not sure why pandoc found what it thought was an "unexpected [". I 
> couldn't find a bracket in the preceding few dozen lines, but I did find 
> one in the few lines afterwards. Does the message mean something obscure?
>
> Thanks for your help.
> Jeremy
>
> $ pandoc --version
> pandoc 2.10
> Compiled with pandoc-types 1.21, texmath 0.12.0.2, skylighting 0.8.5
> Default user data directory: /Users/jlconlin/.local/share/pandoc or 
> /Users/jlconlin/.pandoc
> Copyright (C) 2006-2020 John MacFarlane
> Web:  https://pandoc.org
> This is free software; see the source for copying conditions.
> There is no warranty, not even for merchantability or fitness
> for a particular purpose.
>
>
> On Thursday, September 10, 2020 at 6:50:28 PM UTC-6 John MacFarlane wrote:
>
>>
>> It really depends on the details of the document, but if
>> pandoc is struggling with certain commands and environments,
>> one approach is to define custom macros for those, which
>> convert them into something pandoc can handle.
>>
>> (In a few cases you might get away with just putting the .sty
>> file in the working directory, so pandoc tries to parse it,
>> but pandoc usually can't handle the lower-level tex definitions
>> style files have, so this usually doesn't work.)
>>
>> For example, if you have a foobar command, just
>> add this to your document
>>
>> \renewcommand{foobar}[2]{limit yourself
>> here to stuff pandoc can handle}
>>
>> You can often get pretty far with this method.
>>
>> Jeremy Conlin <jlco...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > I have a large (900 page) LaTeX document (broken up into several LaTeX 
>> > files) that I want to convert into restructured text. I've already tried 
>> to 
>> > use pandoc to convert some of the files and it has failed for a few 
>> > reasons. 
>> >
>> > I'm a new pandoc user, but I figure I'm going to have to write my own 
>> > converter. Before I do, I wanted to ask this forum what the right way to 
>> > approach the conversion. I was planning on reading everything into 
>> Python, 
>> > do my own search/replace and then pass the result on to pandoc. I would 
>> > then rinse/repeat until I have everything the way I want it. 
>> >
>> > I know there are filters and such that I can write to customize things, 
>> but 
>> > (as a beginner) I'm not sure if it would be easier to learn pandoc 
>> syntax 
>> > and write my own filter, or just go at it in Python as I described above.
>> >
>> > I don't mind doing it either way; I think it might be a fun side project 
>> to 
>> > do when I'm procrastinating doing what I really should be doing. 
>> >
>> > Please advise on what is the right approach. I'm sure there are other 
>> > approaches too that I'm not aware of. I'm open for suggestions.
>> >
>> > Thanks,
>> > Jeremy
>> >
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> > To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/9c40cd2c-9874-446b-8772-c8a99e377acan%40googlegroups.com
>> .
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d8e598ff-e975-420d-baee-523f9ab38e35n%40googlegroups.com.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-11 15:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10 21:43 Approach to converting large, custom, LaTeX document to restructured text Jeremy Conlin
     [not found] ` <9c40cd2c-9874-446b-8772-c8a99e377acan-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-09-11  0:50   ` John MacFarlane
     [not found]     ` <m27dt1qgqk.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-09-11 13:31       ` Jeremy Conlin
     [not found]         ` <d8e598ff-e975-420d-baee-523f9ab38e35n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-09-11 15:07           ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).