From: Julien Dutant <julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Summer project ideas
Date: Fri, 29 Jan 2021 10:59:21 -0800 (PST) [thread overview]
Message-ID: <6036a733-df54-4985-bbab-beb743c032adn@googlegroups.com> (raw)
In-Reply-To: <875z3fu9fc.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
[-- Attachment #1.1: Type: text/plain, Size: 7541 bytes --]
Speaking of projects, I'm working on a pandoc workflow to produce an
academic journal (dialectica, John will know it
http://dialectica.philosophie.ch/) that's flipping open access. It's
complementary to what John, Albert and others have been doing on the
authoring side of things. The aims are make copy-editing simpler, more
accessible, more affordable, independent of commercial tools, with the same
or better quality that what we got from Wiley-Blackwell. We convert all
manuscripts to markdown + bibtex, edit them in markdown, and produce JATS
XML, HTML and PDF (via LaTeX) from that source, using pandoc, citeproc and
custom lua filters. I should say that not only this would be of course be
impossible without pandoc, but there are also many details in its design
that turned out to be very useful for the process (e.g. the defaults file).
I've been working with a small team of copyeditors on this. Most started
out without special expertise beyond writing or having written a PhD and
they're now fluent with markdown and editing rules, as we hoped. However
there's still quite a bit of technical knowledge required here and there
when e.g. LaTeX break downs or a script doesn't work on Windows.
From this perspective here are the things I'm currently thinking would help
a project like ours:
1. Cross-referencing would be good indeed. We now use pandoc-crossref. It'd
be good to handle cross-references to foonotes (see
https://github.com/lierdakil/pandoc-crossref/issues/215 , at the moment we
use links with the footnote number manually entered).
2. Implementing the new table features as well (we had a few multi-row /
multi-col tables, currently hard-coded in the relevant outputs).
3. Docx reader that picks up Zotero/Endnote/Mendeley *citations*. At the
moment the most time-consuming copyediting task is to replace manuscript
citations with markdown tags. If Pandoc was able to pick up
Zotero/Endnote/Mendeley citations in docx sources, we could consider asking
the authors to insert them.
Would John's "architectural improvement" for the docx reader/writer make
developing that feature easier? If so that'd be a big plus I'd think.
4. A good visual editor for pandoc's markdown. (Read on to see what that
has to do with pandoc itself.) Copyeditors like it and it's needed if we
want to involve authors. My dream would be to have an equivalent of Google
Docs that would allow authors to do revisions online (something in the air,
cf https://www.authorea.com/) but a desktop one would be great. IMHO the
best at the moment is RStudio's
(https://rstudio.github.io/visual-markdown-editing/#/ just released but
we've been using the beta preview for a few months.) But one with an even
better potential I think is LyX: it's faster (C++ rather than javascript),
designed for LaTeX so covering a lot of academic writing features, with a
module class+extension packages that could accommodate markdown files,
custom markup and perhaps even lua filters. All it would take to enable
this would be for pandoc to be able to read LyX's own format (requested
here https://github.com/jgm/pandoc/issues/5555 but without noting that this
would turn LyX into one of the best markdown visual editor available).
The good news here is that the LyX team looks set to make their
(undocumented, custom) format into an XML format
(https://wiki.lyx.org/Devel/XML).
So, getting to the point: could John's "architectural improvement" for the
docx reader/writer be something that is flexible enough to enable us to
create new readers for other XML formats? Would it worth having a
customizable XML reader that we could feed with custom XML->Pandoc JSON
maps, or is that better done by XSLT?
5. Generate custom XML for Indesign (and Scribus?). The biggest technical
hurdles we've faced actually came from LaTeX: PDF generation crashes, the
copyeditor gets an error report (gets confused by the line number, which is
from .tex not from their .md), and we have to dig through to find the
source (weird code in the .bib, people's LaTeX distribution needing an
update, etc.). Final touches aren't easy (try to prevent a pagebreak
between paragraphs). That makes the workflow inaccessible to teams without
a house LaTeX expert and too onerous to those with design-heavy book /
journal projects. So it would be good to have an alternative to use other
typesetting engines.
I haven't looked into this in detail, so perhaps Pandoc already has all
that's needed to do this: I'm thinking of a combination of JATS XML + raw
XML markup + custom JATS XML template + lua filters as needed. If not,
perhaps that's another area where a customizable XML writer and/or John's
"architectural improvement" could help.
Best,
Julien
On Friday, January 29, 2021 at 2:04:02 PM UTC Albert Krewinkel wrote:
>
> John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>
> > It would be an architectural improvement to create a Docx type
> > that represents a docx document (perhaps this could be in a
> > separate module); this could be an intermediate for both the
> > reader and the writer. (Cf. what we currently do with ipynb.)
> > A similar approach could make sense for EPUB as well.
>
> I like this. The project would be self-contained but still very useful
> for pandoc.
>
> > Replace the XML library: I assume you mean by using
> > another library instead of xml-light? Or do you mean creating
> > a new library?
>
> The library that I keep dreaming about would be similar to the current
> `xml` library, but build on DocLayout, with strictness improvements, and
> an extensive benchmarking suite. I'd be equally okay with using a
> different library like xml-conduit, but that's probably not as
> interesting a project.
>
> I'm unsure how much domain knowledge is required for our other ideas,
> maybe projects with a high ratio of Haskell are better suited.
>
> I'll draft something up and post here once I'm done.
>
>
> > Albert Krewinkel <albert...-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
> >
> >> Google's Summer of Code project will run again this year, and
> >> Haskell.org started to collect ideas. I'd be interested to participate
> >> again, so I began to think about possible projects. My favorites so far:
> >>
> >> - Improve or replace the xml library; See also
> >> https://github.com/jgm/pandoc/issues/5854
> >>
> >> - Add full support for figures.
> >> https://github.com/jgm/pandoc/issues/3177
> >> https://github.com/jgm/pandoc-types/pull/83
> >>
> >> - Better table support for more formats.
> >>
> >> - OOXML writer, similar to the OpenDocument writer. This could make
> >> docx and pptx transformations simpler to debug and easier to test.
> >>
> >> Which of the above are most important and would be suitable for a GSoC
> >> project? Of maybe something else from the "List of projects" would be
> >> even better? https://github.com/jgm/pandoc/issues/5581
> >>
> >> --
> >> Albert Krewinkel
> >> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6036a733-df54-4985-bbab-beb743c032adn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 9901 bytes --]
next prev parent reply other threads:[~2021-01-29 18:59 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-27 21:21 Albert Krewinkel
[not found] ` <87bldavzy2.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-28 1:45 ` John MacFarlane
[not found] ` <m235ylal7f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-01-28 11:03 ` AW: " denis.maier-FfwAq0itz3ofv37vnLkPlQ
[not found] ` <1fedfa2a9a174255a781ef43c5b03912-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
2021-01-28 17:46 ` BPJ
[not found] ` <CADAJKhCT0mpp7Tvj4xMn9ToZUktvob-QtTcZK9Gf7DMRRetabA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-01-28 18:22 ` John MacFarlane
[not found] ` <m2a6ss9b1w.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-01-28 19:55 ` BPJ
2021-01-29 14:03 ` Albert Krewinkel
[not found] ` <875z3fu9fc.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-29 17:58 ` John MacFarlane
[not found] ` <m2ft2j62wi.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-01-30 8:35 ` Albert Krewinkel
[not found] ` <8735yiu8jp.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-31 1:40 ` John MacFarlane
2021-02-17 1:01 ` John MacFarlane
[not found] ` <m21rdf8q2z.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-02-17 18:35 ` John MacFarlane
2021-01-29 18:59 ` Julien Dutant [this message]
[not found] ` <6036a733-df54-4985-bbab-beb743c032adn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-01-29 20:39 ` John MacFarlane
2021-01-30 9:00 ` Albert Krewinkel
[not found] ` <871re2u7dw.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-31 0:37 ` Julien Dutant
2021-02-14 8:07 ` Albert Krewinkel
[not found] ` <87r1ljax8g.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-02-14 13:09 ` jcr
[not found] ` <4c65af87-daee-4c11-9f03-f95ac9f69146n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-02-14 14:44 ` Albert Krewinkel
2021-01-31 18:08 ` John MacFarlane
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6036a733-df54-4985-bbab-beb743c032adn@googlegroups.com \
--to=julien.dutant-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).