Re: Summer project ideas

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

From: Julien Dutant <julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Summer project ideas
Date: Fri, 29 Jan 2021 10:59:21 -0800 (PST)	[thread overview]
Message-ID: <6036a733-df54-4985-bbab-beb743c032adn@googlegroups.com> (raw)
In-Reply-To: <875z3fu9fc.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>

[-- Attachment #1.1: Type: text/plain, Size: 7541 bytes --]

Speaking of projects, I'm working on a pandoc workflow to produce an 
academic journal (dialectica, John will know it 
http://dialectica.philosophie.ch/) that's flipping open access. It's 
complementary to what John, Albert and others have been doing on the 
authoring side of things. The aims are make copy-editing simpler, more 
accessible, more affordable, independent of commercial tools, with the same 
or better quality that what we got from Wiley-Blackwell. We convert all 
manuscripts to markdown + bibtex, edit them in markdown, and produce JATS 
XML, HTML and PDF (via LaTeX) from that source, using pandoc, citeproc and 
custom lua filters. I should say that not only this would be of course be 
impossible without pandoc, but there are also many details in its design 
that turned out to be very useful for the process (e.g. the defaults file). 
I've been working with a small team of copyeditors on this. Most started 
out without special expertise beyond writing or having written a PhD and 
they're now fluent with markdown and editing rules, as we hoped. However 
there's still quite a bit of technical knowledge required here and there 
when e.g. LaTeX break downs or a script doesn't work on Windows.  

From this perspective here are the things I'm currently thinking would help 
a project like ours: 

1. Cross-referencing would be good indeed. We now use pandoc-crossref. It'd 
be good to handle cross-references to foonotes (see 
https://github.com/lierdakil/pandoc-crossref/issues/215 , at the moment we 
use links with the footnote number manually entered). 

2. Implementing the new table features as well (we had a few multi-row / 
multi-col tables, currently hard-coded in the relevant outputs).

3. Docx reader that picks up Zotero/Endnote/Mendeley *citations*. At the 
moment the most time-consuming copyediting task is to replace manuscript 
citations with markdown tags. If Pandoc was able to pick up 
Zotero/Endnote/Mendeley citations in docx sources, we could consider asking 
the authors to insert them.

Would John's "architectural improvement" for the docx reader/writer make 
developing that feature easier? If so that'd be a big plus I'd think. 

4. A good visual editor for pandoc's markdown. (Read on to see what that 
has to do with pandoc itself.) Copyeditors like it and it's needed if we 
want to involve authors. My dream would be to have an equivalent of Google 
Docs that would allow authors to do revisions online (something in the air, 
cf https://www.authorea.com/) but a desktop one would be great. IMHO the 
best at the moment is RStudio's 
(https://rstudio.github.io/visual-markdown-editing/#/ just released but 
we've been using the beta preview for a few months.) But one with an even 
better potential I think is LyX: it's faster (C++ rather than javascript), 
designed for LaTeX so covering a lot of academic writing features, with a 
module class+extension packages that could accommodate markdown files, 
custom markup and perhaps even lua filters. All it would take to enable 
this would be for pandoc to be able to read LyX's own format (requested 
here https://github.com/jgm/pandoc/issues/5555 but without noting that this 
would turn LyX into one of the best markdown visual editor available). 

The good news here is that the LyX team looks set to make their 
(undocumented, custom) format into an XML format 
(https://wiki.lyx.org/Devel/XML). 

So, getting to the point: could John's "architectural improvement" for the 
docx reader/writer be something that is flexible enough to enable us to 
create new readers for other XML formats? Would it worth having a 
customizable XML reader that we could feed with custom XML->Pandoc JSON 
maps, or is that better done by XSLT?

5. Generate custom XML for Indesign (and Scribus?). The biggest technical 
hurdles we've faced actually came from LaTeX: PDF generation crashes, the 
copyeditor gets an error report (gets confused by the line number, which is 
from .tex not from their .md), and we have to dig through to find the 
source (weird code in the .bib, people's LaTeX distribution needing an 
update, etc.). Final touches aren't easy (try to prevent a pagebreak 
between paragraphs). That makes the workflow inaccessible to teams without 
a house LaTeX expert and too onerous to those with design-heavy book / 
journal projects. So it would be good to have an alternative to use other 
typesetting engines. 

I haven't looked into this in detail, so perhaps Pandoc already has all 
that's needed to do this: I'm thinking of a combination of JATS XML + raw 
XML markup + custom JATS XML template + lua filters as needed. If not, 
perhaps that's another area where a customizable XML writer and/or John's 
"architectural improvement" could help. 

Best,
Julien
On Friday, January 29, 2021 at 2:04:02 PM UTC Albert Krewinkel wrote:

>
> John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>
> > It would be an architectural improvement to create a Docx type
> > that represents a docx document (perhaps this could be in a
> > separate module); this could be an intermediate for both the
> > reader and the writer. (Cf. what we currently do with ipynb.)
> > A similar approach could make sense for EPUB as well.
>
> I like this. The project would be self-contained but still very useful
> for pandoc.
>
> > Replace the XML library: I assume you mean by using
> > another library instead of xml-light? Or do you mean creating
> > a new library?
>
> The library that I keep dreaming about would be similar to the current
> `xml` library, but build on DocLayout, with strictness improvements, and
> an extensive benchmarking suite. I'd be equally okay with using a
> different library like xml-conduit, but that's probably not as
> interesting a project.
>
> I'm unsure how much domain knowledge is required for our other ideas,
> maybe projects with a high ratio of Haskell are better suited.
>
> I'll draft something up and post here once I'm done.
>
>
> > Albert Krewinkel <albert...-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
> >
> >> Google's Summer of Code project will run again this year, and
> >> Haskell.org started to collect ideas. I'd be interested to participate
> >> again, so I began to think about possible projects. My favorites so far:
> >>
> >> - Improve or replace the xml library; See also
> >> https://github.com/jgm/pandoc/issues/5854
> >>
> >> - Add full support for figures.
> >> https://github.com/jgm/pandoc/issues/3177
> >> https://github.com/jgm/pandoc-types/pull/83
> >>
> >> - Better table support for more formats.
> >>
> >> - OOXML writer, similar to the OpenDocument writer. This could make
> >> docx and pptx transformations simpler to debug and easier to test.
> >>
> >> Which of the above are most important and would be suitable for a GSoC
> >> project? Of maybe something else from the "List of projects" would be
> >> even better? https://github.com/jgm/pandoc/issues/5581
> >>
> >> --
> >> Albert Krewinkel
> >> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6036a733-df54-4985-bbab-beb743c032adn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 9901 bytes --]

next prev parent reply	other threads:[~2021-01-29 18:59 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 21:21 Albert Krewinkel
     [not found] ` <87bldavzy2.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-28  1:45   ` John MacFarlane
     [not found]     ` <m235ylal7f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-01-28 11:03       ` AW: " denis.maier-FfwAq0itz3ofv37vnLkPlQ
     [not found]         ` <1fedfa2a9a174255a781ef43c5b03912-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
2021-01-28 17:46           ` BPJ
     [not found]             ` <CADAJKhCT0mpp7Tvj4xMn9ToZUktvob-QtTcZK9Gf7DMRRetabA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-01-28 18:22               ` John MacFarlane
     [not found]                 ` <m2a6ss9b1w.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-01-28 19:55                   ` BPJ
2021-01-29 14:03       ` Albert Krewinkel
     [not found]         ` <875z3fu9fc.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-29 17:58           ` John MacFarlane
     [not found]             ` <m2ft2j62wi.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-01-30  8:35               ` Albert Krewinkel
     [not found]                 ` <8735yiu8jp.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-31  1:40                   ` John MacFarlane
2021-02-17  1:01                   ` John MacFarlane
     [not found]                     ` <m21rdf8q2z.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-02-17 18:35                       ` John MacFarlane
2021-01-29 18:59           ` Julien Dutant [this message]
     [not found]             ` <6036a733-df54-4985-bbab-beb743c032adn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-01-29 20:39               ` John MacFarlane
2021-01-30  9:00               ` Albert Krewinkel
     [not found]                 ` <871re2u7dw.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-31  0:37                   ` Julien Dutant
2021-02-14  8:07           ` Albert Krewinkel
     [not found]             ` <87r1ljax8g.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-02-14 13:09               ` jcr
     [not found]                 ` <4c65af87-daee-4c11-9f03-f95ac9f69146n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-02-14 14:44                   ` Albert Krewinkel
2021-01-31 18:08   ` John MacFarlane

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6036a733-df54-4985-bbab-beb743c032adn@googlegroups.com \
    --to=julien.dutant-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).