From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/26098 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jeremy Conlin Newsgroups: gmane.text.pandoc Subject: Re: Approach to converting large, custom, LaTeX document to restructured text Date: Fri, 11 Sep 2020 06:31:44 -0700 (PDT) Message-ID: References: <9c40cd2c-9874-446b-8772-c8a99e377acan@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_550_1453396115.1599831104591" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40127"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDEO7JNYQUCBBQXY5X5AKGQEY7TFU4Q-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Sep 11 15:31:49 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-ot1-f57.google.com ([209.85.210.57]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1kGj9Z-000AK9-Pv for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 11 Sep 2020 15:31:49 +0200 Original-Received: by mail-ot1-f57.google.com with SMTP id z22sf2438535otq.14 for ; Fri, 11 Sep 2020 06:31:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=zJPkZCy9/XqWuCLiGOHqT1R0VEK5TI3kECiPH5D3s3s=; b=hu5cgfDLj0GioaaOBxP7m5AvY1+GW/8B0uRfl42WNxs6RzhoxM7pjky0iwtlIcYlH9 EhAG7YAa11pdmoSeSB5E9S9P4dqqyxGbxRF8KaVygEkW+v8fSGsmxdjdxeZvvRRYnysg 2Q0W50j5CZR/gcKaXVTTJv7rBmW+Chaoi4EOx2gWS4jos1YNcGsP96SEKDvlhMo+chDE Q9Hsu40PE39gDCZCLZsKibEQBZByAYXG40rNlhmjMOyScLFJDjZbl11ENMUoQSyyIMr3 J0nj+9B2Vc6HxuX4GNiqCiCpv6YSMbo4Vgre5Z8Wn4JVEf+KsyLvPI8BjoW0uxW3dp/2 UZ6g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=zJPkZCy9/XqWuCLiGOHqT1R0VEK5TI3kECiPH5D3s3s=; b=pU3JKHUd5QDi5kuYYJFfltaVoggPt2JGTzwmEta/ETkvbcoGw/5DV43OJPYe0VD+Yl C9YP1+4aHyYsAcKAnoGXGbGvSI92M+wnpudGN1cJMl84OzvEcFNJLNV2I2zKXkhtyH+f CxEEwZ5H0QfPuYvO30mxtr2p5Px2CQRPC0X+WJt9tRoorT9WAHByiUBAC45q1qm53AgO 2pspEsWv5FBT1f3eTtijEgjJpAtC5ZJniAQTpbpiFaPOdR05rsUwNnrWMqfS3Ry9e0PK qTmPmeqzgEf0Y0oNWT+4cSJBKupf6mH7eu//uMy6wuhZB4SO9rRYL4rggIfvtnqNloQD UU/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=zJPkZCy9/XqWuCLiGOHqT1R0VEK5TI3kECiPH5D3s3s=; b=czCfXcfuy1+95uUHj6F8eP/oOwHaBNZoT+TOPWclBNSypd/hTLTt2VACmPYlUoyP6l 5NU68v9wE3UPvICp676ef2G/K0WB5cAJXnD2qSppOZK2tmUKb03M6kJrSpeEuyts+MOV 8Khro4hrk2anA4HQrUQU3dunSCMG5IyBUrupoyiazMhDaj7mwxeDLgeqekg0pRh6chUu pEA2KrHc/abgAQoUjARE3mESyhaxOydTNNwYdhhZgDJcoUdd7FoQZw8SVbsHBKCHJzex wUy4ZSOXXWjJ7cc50f8RZYp9wnufo8Gb2kwcmejeYAcFLA1UleLL7hEZ1a63WoWgen0Q t2kA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM532pLmXrH8BTatze9/iT2rPcvSFZGw+BujeMdRWPOpYKm9IoIxZR 7KGt1pxvowUvVKPQfVIOKdY= X-Google-Smtp-Source: ABdhPJwl+8hTJ/UWAbBUtzY9aa0g311FTsgGodnwIzA7SJCVAqdyqEXhwiCZ3VyVssTaZJf81rqjHA== X-Received: by 2002:aca:53c2:: with SMTP id h185mr1266978oib.128.1599831108815; Fri, 11 Sep 2020 06:31:48 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aca:4c51:: with SMTP id z78ls434808oia.4.gmail; Fri, 11 Sep 2020 06:31:45 -0700 (PDT) X-Received: by 2002:aca:d4c3:: with SMTP id l186mr1187928oig.25.1599831105793; Fri, 11 Sep 2020 06:31:45 -0700 (PDT) In-Reply-To: X-Original-Sender: jlconlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:26098 Archived-At: ------=_Part_550_1453396115.1599831104591 Content-Type: multipart/alternative; boundary="----=_Part_551_795806112.1599831104591" ------=_Part_551_795806112.1599831104591 Content-Type: text/plain; charset="UTF-8" Thank you for your response, John. Upon closer inspection, I think my initial assumptions were incorrect. I thought pandoc had found a command/environment that it didn't understand, but now it seems more obscure. I ran pandoc with this command: "pandoc File.tex -t json --verbose" and got the following output ``` (lots of messages about Skipped and Parsing unescaped '&') [INFO] Skipped '\bottomrule' at line 1849 column 16 [INFO] Skipped '\begin{tabular}' at line 1823 column 18 [INFO] Skipped '\end{tabular}' at line 1850 column 16 [INFO] Skipped '\subexperiment{SAP}' at line 1854 column 20 Error at "source" (line 1855, column 12): unexpected [ Additional details are found in the following paragraphs. ^ ``` The carrot should point to the d in details. So I'm not sure why pandoc found what it thought was an "unexpected [". I couldn't find a bracket in the preceding few dozen lines, but I did find one in the few lines afterwards. Does the message mean something obscure? Thanks for your help. Jeremy $ pandoc --version pandoc 2.10 Compiled with pandoc-types 1.21, texmath 0.12.0.2, skylighting 0.8.5 Default user data directory: /Users/jlconlin/.local/share/pandoc or /Users/jlconlin/.pandoc Copyright (C) 2006-2020 John MacFarlane Web: https://pandoc.org This is free software; see the source for copying conditions. There is no warranty, not even for merchantability or fitness for a particular purpose. On Thursday, September 10, 2020 at 6:50:28 PM UTC-6 John MacFarlane wrote: > > It really depends on the details of the document, but if > pandoc is struggling with certain commands and environments, > one approach is to define custom macros for those, which > convert them into something pandoc can handle. > > (In a few cases you might get away with just putting the .sty > file in the working directory, so pandoc tries to parse it, > but pandoc usually can't handle the lower-level tex definitions > style files have, so this usually doesn't work.) > > For example, if you have a foobar command, just > add this to your document > > \renewcommand{foobar}[2]{limit yourself > here to stuff pandoc can handle} > > You can often get pretty far with this method. > > Jeremy Conlin writes: > > > I have a large (900 page) LaTeX document (broken up into several LaTeX > > files) that I want to convert into restructured text. I've already tried > to > > use pandoc to convert some of the files and it has failed for a few > > reasons. > > > > I'm a new pandoc user, but I figure I'm going to have to write my own > > converter. Before I do, I wanted to ask this forum what the right way to > > approach the conversion. I was planning on reading everything into > Python, > > do my own search/replace and then pass the result on to pandoc. I would > > then rinse/repeat until I have everything the way I want it. > > > > I know there are filters and such that I can write to customize things, > but > > (as a beginner) I'm not sure if it would be easier to learn pandoc > syntax > > and write my own filter, or just go at it in Python as I described above. > > > > I don't mind doing it either way; I think it might be a fun side project > to > > do when I'm procrastinating doing what I really should be doing. > > > > Please advise on what is the right approach. I'm sure there are other > > approaches too that I'm not aware of. I'm open for suggestions. > > > > Thanks, > > Jeremy > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/9c40cd2c-9874-446b-8772-c8a99e377acan%40googlegroups.com > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/d8e598ff-e975-420d-baee-523f9ab38e35n%40googlegroups.com. ------=_Part_551_795806112.1599831104591 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you for your response, John. 

Upon closer ins= pection, I think my initial assumptions were incorrect. I thought pandoc ha= d found a command/environment that it didn't understand, but now it seems m= ore obscure. 

I ran pandoc  with this co= mmand: "pandoc File.tex -t json --verbose" and got the following output

```
(lots of messages about Skipped and Par= sing unescaped '&')
[INFO] Skipped '\bottomrule' at line= 1849 column 16
[INFO] Skipped '\begin{tabular}' at line 1823 col= umn 18
[INFO] Skipped '\end{tabular}' at line 1850 column 16
[INFO] Skipped '\subexperiment{SAP}' at line 1854 column 20

Error at "source" (line 1855, column 12):
unexpec= ted [
Additional details are found in the following paragraphs.
           ^
```
The carrot should point to the d in details.

So I'm not sure why pandoc found what it thought was an "unexpected [". I= couldn't find a bracket in the preceding few dozen lines, but I did find o= ne in the few lines afterwards. Does the message mean something obscure?

Thanks for your help.
Jeremy
$ pandoc --version
pandoc 2.10
Compile= d with pandoc-types 1.21, texmath 0.12.0.2, skylighting 0.8.5
Def= ault user data directory: /Users/jlconlin/.local/share/pandoc or /Users/jlc= onlin/.pandoc
Copyright (C) 2006-2020 John MacFarlane
W= eb:  https://pandoc.org
This is free software; see the sourc= e for copying conditions.
There is no warranty, not even for merc= hantability or fitness
for a particular purpose.


On Thursday, September 10, 2020 at 6:50:28 PM UTC-6 John MacFarlane wr= ote:

It really depends on the details of the document, but if
pandoc is struggling with certain commands and environments,
one approach is to define custom macros for those, which
convert them into something pandoc can handle.

(In a few cases you might get away with just putting the .sty
file in the working directory, so pandoc tries to parse it,
but pandoc usually can't handle the lower-level tex definitions
style files have, so this usually doesn't work.)

For example, if you have a foobar command, just
add this to your document

\renewcommand{foobar}[2]{limit yourself
here to stuff pandoc can handle}

You can often get pretty far with this method.

Jeremy Conlin <jlco...@gm= ail.com> writes:

> I have a large (900 page) LaTeX document (broken up into several L= aTeX=20
> files) that I want to convert into restructured text. I've alr= eady tried to=20
> use pandoc to convert some of the files and it has failed for a fe= w=20
> reasons.=20
>
> I'm a new pandoc user, but I figure I'm going to have to w= rite my own=20
> converter. Before I do, I wanted to ask this forum what the right = way to=20
> approach the conversion. I was planning on reading everything int= o Python,=20
> do my own search/replace and then pass the result on to pandoc. I = would=20
> then rinse/repeat until I have everything the way I want it.=20
>
> I know there are filters and such that I can write to customize th= ings, but=20
> (as a beginner) I'm not sure if it would be easier to learn pa= ndoc syntax=20
> and write my own filter, or just go at it in Python as I described= above.
>
> I don't mind doing it either way; I think it might be a fun si= de project to=20
> do when I'm procrastinating doing what I really should be doin= g.=20
>
> Please advise on what is the right approach. I'm sure there ar= e other=20
> approaches too that I'm not aware of. I'm open for suggest= ions.
>
> Thanks,
> Jeremy
>
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-discus..= .@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9c40= cd2c-9874-446b-8772-c8a99e377acan%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/d8e598ff-e975-420d-baee-523f9ab38e35n%40googlegroups.= com.
------=_Part_551_795806112.1599831104591-- ------=_Part_550_1453396115.1599831104591--