public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Summer project ideas
@ 2021-01-27 21:21 Albert Krewinkel
       [not found] ` <87bldavzy2.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Albert Krewinkel @ 2021-01-27 21:21 UTC (permalink / raw)
  To: pandoc-discuss

Google's Summer of Code project will run again this year, and
Haskell.org started to collect ideas. I'd be interested to participate
again, so I began to think about possible projects. My favorites so far:

- Improve or replace the xml library; See also
  https://github.com/jgm/pandoc/issues/5854

- Add full support for figures.
  https://github.com/jgm/pandoc/issues/3177
  https://github.com/jgm/pandoc-types/pull/83

- Better table support for more formats.

- OOXML writer, similar to the OpenDocument writer. This could make
  docx and pptx transformations simpler to debug and easier to test.

Which of the above are most important and would be suitable for a GSoC
project? Of maybe something else from the "List of projects" would be
even better? https://github.com/jgm/pandoc/issues/5581

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found] ` <87bldavzy2.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2021-01-28  1:45   ` John MacFarlane
       [not found]     ` <m235ylal7f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  2021-01-31 18:08   ` John MacFarlane
  1 sibling, 1 reply; 20+ messages in thread
From: John MacFarlane @ 2021-01-28  1:45 UTC (permalink / raw)
  To: Albert Krewinkel, pandoc-discuss


I'd say that for academic writing, figures is a good one.
Also a flexible system of references (the kind of thing
pandoc-crossref is currently needed for). Unfortunately,
everything in this domain requires high-level decisions
about architecture which are sometimes hard to make.

Improving table support is a definite need, though maybe not the
most interesting project.

An OpenXML writer is difficult because of the way Word splits
things into different files. currently we have an unexported
function writeOpenXML in the Docx writer; it produces three XML
documents, the main document, footnotes, and comments.

It would be an architectural improvement to create a Docx type
that represents a docx document (perhaps this could be in a
separate module); this could be an intermediate for both the
reader and the writer.  (Cf. what we currently do with ipynb.)
A similar approach could make sense for EPUB as well.

Replace the XML library:  I assume you mean by using
another library instead of xml-light?  Or do you mean creating
a new library?



Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:

> Google's Summer of Code project will run again this year, and
> Haskell.org started to collect ideas. I'd be interested to participate
> again, so I began to think about possible projects. My favorites so far:
>
> - Improve or replace the xml library; See also
>   https://github.com/jgm/pandoc/issues/5854
>
> - Add full support for figures.
>   https://github.com/jgm/pandoc/issues/3177
>   https://github.com/jgm/pandoc-types/pull/83
>
> - Better table support for more formats.
>
> - OOXML writer, similar to the OpenDocument writer. This could make
>   docx and pptx transformations simpler to debug and easier to test.
>
> Which of the above are most important and would be suitable for a GSoC
> project? Of maybe something else from the "List of projects" would be
> even better? https://github.com/jgm/pandoc/issues/5581
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 20+ messages in thread

* AW: Summer project ideas
       [not found]     ` <m235ylal7f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-01-28 11:03       ` denis.maier-FfwAq0itz3ofv37vnLkPlQ
       [not found]         ` <1fedfa2a9a174255a781ef43c5b03912-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
  2021-01-29 14:03       ` Albert Krewinkel
  1 sibling, 1 reply; 20+ messages in thread
From: denis.maier-FfwAq0itz3ofv37vnLkPlQ @ 2021-01-28 11:03 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw,
	albert+pandoc-9EawChwDxG8hFhg+JK9F0w

I don't know how feasible that would be, but a flexible system for references would be awesome!

> -----Ursprüngliche Nachricht-----
> Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-
> discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von John MacFarlane
> Gesendet: Donnerstag, 28. Januar 2021 02:45
> An: Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>; pandoc-discuss
> <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
> Betreff: Re: Summer project ideas
> 
> 
> I'd say that for academic writing, figures is a good one.
> Also a flexible system of references (the kind of thing pandoc-crossref is
> currently needed for). Unfortunately, everything in this domain requires
> high-level decisions about architecture which are sometimes hard to make.
> 
> Improving table support is a definite need, though maybe not the most
> interesting project.
> 
> An OpenXML writer is difficult because of the way Word splits things into
> different files. currently we have an unexported function writeOpenXML in
> the Docx writer; it produces three XML documents, the main document,
> footnotes, and comments.
> 
> It would be an architectural improvement to create a Docx type that
> represents a docx document (perhaps this could be in a separate module);
> this could be an intermediate for both the reader and the writer.  (Cf. what
> we currently do with ipynb.) A similar approach could make sense for EPUB
> as well.
> 
> Replace the XML library:  I assume you mean by using another library instead
> of xml-light?  Or do you mean creating a new library?
> 
> 
> 
> Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
> 
> > Google's Summer of Code project will run again this year, and
> > Haskell.org started to collect ideas. I'd be interested to participate
> > again, so I began to think about possible projects. My favorites so far:
> >
> > - Improve or replace the xml library; See also
> >   https://github.com/jgm/pandoc/issues/5854
> >
> > - Add full support for figures.
> >   https://github.com/jgm/pandoc/issues/3177
> >   https://github.com/jgm/pandoc-types/pull/83
> >
> > - Better table support for more formats.
> >
> > - OOXML writer, similar to the OpenDocument writer. This could make
> >   docx and pptx transformations simpler to debug and easier to test.
> >
> > Which of the above are most important and would be suitable for a GSoC
> > project? Of maybe something else from the "List of projects" would be
> > even better? https://github.com/jgm/pandoc/issues/5581
> >
> > --
> > Albert Krewinkel
> > GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124
> 
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-
> discuss/m235ylal7f.fsf%40MacBook-Pro.hsd1.ca.comcast.net.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1fedfa2a9a174255a781ef43c5b03912%40ub.unibe.ch.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]         ` <1fedfa2a9a174255a781ef43c5b03912-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
@ 2021-01-28 17:46           ` BPJ
       [not found]             ` <CADAJKhCT0mpp7Tvj4xMn9ToZUktvob-QtTcZK9Gf7DMRRetabA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: BPJ @ 2021-01-28 17:46 UTC (permalink / raw)
  To: pandoc-discuss; +Cc: Albert Krewinkel

[-- Attachment #1: Type: text/plain, Size: 4572 bytes --]

I guess that some of the pandoc-crossrefs code might be useful. At least I
hope that any built-in reference handling will stay as close to
pandoc-crossrefs's syntax and interface as possible so that old files
written with pandoc-crossrefs in mind will work with as little change as
possible (hopefully none.)

Den tor 28 jan. 2021 12:04 <denis.maier-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org> skrev:

> I don't know how feasible that would be, but a flexible system for
> references would be awesome!
>
> > -----Ursprüngliche Nachricht-----
> > Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-
> > discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von John MacFarlane
> > Gesendet: Donnerstag, 28. Januar 2021 02:45
> > An: Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>; pandoc-discuss
> > <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
> > Betreff: Re: Summer project ideas
> >
> >
> > I'd say that for academic writing, figures is a good one.
> > Also a flexible system of references (the kind of thing pandoc-crossref
> is
> > currently needed for). Unfortunately, everything in this domain requires
> > high-level decisions about architecture which are sometimes hard to make.
> >
> > Improving table support is a definite need, though maybe not the most
> > interesting project.
> >
> > An OpenXML writer is difficult because of the way Word splits things into
> > different files. currently we have an unexported function writeOpenXML in
> > the Docx writer; it produces three XML documents, the main document,
> > footnotes, and comments.
> >
> > It would be an architectural improvement to create a Docx type that
> > represents a docx document (perhaps this could be in a separate module);
> > this could be an intermediate for both the reader and the writer.  (Cf.
> what
> > we currently do with ipynb.) A similar approach could make sense for EPUB
> > as well.
> >
> > Replace the XML library:  I assume you mean by using another library
> instead
> > of xml-light?  Or do you mean creating a new library?
> >
> >
> >
> > Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
> >
> > > Google's Summer of Code project will run again this year, and
> > > Haskell.org started to collect ideas. I'd be interested to participate
> > > again, so I began to think about possible projects. My favorites so
> far:
> > >
> > > - Improve or replace the xml library; See also
> > >   https://github.com/jgm/pandoc/issues/5854
> > >
> > > - Add full support for figures.
> > >   https://github.com/jgm/pandoc/issues/3177
> > >   https://github.com/jgm/pandoc-types/pull/83
> > >
> > > - Better table support for more formats.
> > >
> > > - OOXML writer, similar to the OpenDocument writer. This could make
> > >   docx and pptx transformations simpler to debug and easier to test.
> > >
> > > Which of the above are most important and would be suitable for a GSoC
> > > project? Of maybe something else from the "List of projects" would be
> > > even better? https://github.com/jgm/pandoc/issues/5581
> > >
> > > --
> > > Albert Krewinkel
> > > GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/pandoc-
> > discuss/m235ylal7f.fsf%40MacBook-Pro.hsd1.ca.comcast.net.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/1fedfa2a9a174255a781ef43c5b03912%40ub.unibe.ch
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCT0mpp7Tvj4xMn9ToZUktvob-QtTcZK9Gf7DMRRetabA%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 7062 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]             ` <CADAJKhCT0mpp7Tvj4xMn9ToZUktvob-QtTcZK9Gf7DMRRetabA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2021-01-28 18:22               ` John MacFarlane
       [not found]                 ` <m2a6ss9b1w.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: John MacFarlane @ 2021-01-28 18:22 UTC (permalink / raw)
  To: BPJ, pandoc-discuss; +Cc: Albert Krewinkel


The thing I've always resisted about pandoc-crossref syntax is
the use of English-derived keywords for labels.  Perhaps there's
not a good way to avoid that, but it would be something I'd want
to look inot.

BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I guess that some of the pandoc-crossrefs code might be useful. At least I
> hope that any built-in reference handling will stay as close to
> pandoc-crossrefs's syntax and interface as possible so that old files
> written with pandoc-crossrefs in mind will work with as little change as
> possible (hopefully none.)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]                 ` <m2a6ss9b1w.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-01-28 19:55                   ` BPJ
  0 siblings, 0 replies; 20+ messages in thread
From: BPJ @ 2021-01-28 19:55 UTC (permalink / raw)
  To: John MacFarlane, pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

I for one agree about that; one of the reasons that I don't like LaTeX is
that it forces me to use English-based markup with non-English text.
Perhaps if textual labels can't be avoided they can be configurable, like
the actually inserted texts need to be configurable with templates,
something like

``````yaml
labels:  # Icelandic
  sec: kafli
  fig: mynd
  tab: tafla
``````

(These are the actual Icelandic words for 'section, 'figure, 'table'. They
are so short that it makes little sense to truncate them.)

I guess templates should better use the same library as Pandoc's document
template. pandoc-crossref's coopting of math syntax is an expedient which
wouldn't be needed anymore.

FWIW my list-table filter for example has the same problem, but for now I
have abstained from making the div attributes it looks for configurable. It
would not be too complicated though:

``````yaml
# Swedish
lol2table_class: lista-till-tabell
table2lol_class: tabell-till-lista
list_table_align: justering
list_table_width: bredd
``````

Den tors 28 jan. 2021 19:22John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> skrev:

>
> The thing I've always resisted about pandoc-crossref syntax is
> the use of English-derived keywords for labels.  Perhaps there's
> not a good way to avoid that, but it would be something I'd want
> to look inot.
>
> BPJ <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I guess that some of the pandoc-crossrefs code might be useful. At least
> I
> > hope that any built-in reference handling will stay as close to
> > pandoc-crossrefs's syntax and interface as possible so that old files
> > written with pandoc-crossrefs in mind will work with as little change as
> > possible (hopefully none.)
>
> --
Better --help|less than helpless

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCfuAxAUOV9R05J4two5RPF87qXtA96ChAvNQ4jRyTX0g%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 3583 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]     ` <m235ylal7f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  2021-01-28 11:03       ` AW: " denis.maier-FfwAq0itz3ofv37vnLkPlQ
@ 2021-01-29 14:03       ` Albert Krewinkel
       [not found]         ` <875z3fu9fc.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  1 sibling, 1 reply; 20+ messages in thread
From: Albert Krewinkel @ 2021-01-29 14:03 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: Albert Krewinkel


John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:

> It would be an architectural improvement to create a Docx type
> that represents a docx document (perhaps this could be in a
> separate module); this could be an intermediate for both the
> reader and the writer.  (Cf. what we currently do with ipynb.)
> A similar approach could make sense for EPUB as well.

I like this. The project would be self-contained but still very useful
for pandoc.

> Replace the XML library:  I assume you mean by using
> another library instead of xml-light?  Or do you mean creating
> a new library?

The library that I keep dreaming about would be similar to the current
`xml` library, but build on DocLayout, with strictness improvements, and
an extensive benchmarking suite. I'd be equally okay with using a
different library like xml-conduit, but that's probably not as
interesting a project.

I'm unsure how much domain knowledge is required for our other ideas,
maybe projects with a high ratio of Haskell are better suited.

I'll draft something up and post here once I'm done.


> Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
>
>> Google's Summer of Code project will run again this year, and
>> Haskell.org started to collect ideas. I'd be interested to participate
>> again, so I began to think about possible projects. My favorites so far:
>>
>> - Improve or replace the xml library; See also
>>   https://github.com/jgm/pandoc/issues/5854
>>
>> - Add full support for figures.
>>   https://github.com/jgm/pandoc/issues/3177
>>   https://github.com/jgm/pandoc-types/pull/83
>>
>> - Better table support for more formats.
>>
>> - OOXML writer, similar to the OpenDocument writer. This could make
>>   docx and pptx transformations simpler to debug and easier to test.
>>
>> Which of the above are most important and would be suitable for a GSoC
>> project? Of maybe something else from the "List of projects" would be
>> even better? https://github.com/jgm/pandoc/issues/5581
>>
>> --
>> Albert Krewinkel
>> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]         ` <875z3fu9fc.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2021-01-29 17:58           ` John MacFarlane
       [not found]             ` <m2ft2j62wi.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  2021-01-29 18:59           ` Julien Dutant
  2021-02-14  8:07           ` Albert Krewinkel
  2 siblings, 1 reply; 20+ messages in thread
From: John MacFarlane @ 2021-01-29 17:58 UTC (permalink / raw)
  To: Albert Krewinkel, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: Albert Krewinkel

Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:

>> Replace the XML library:  I assume you mean by using
>> another library instead of xml-light?  Or do you mean creating
>> a new library?
>
> The library that I keep dreaming about would be similar to the current
> `xml` library, but build on DocLayout, with strictness improvements, and
> an extensive benchmarking suite. I'd be equally okay with using a
> different library like xml-conduit, but that's probably not as
> interesting a project.

Build on doclayout?  Why?  XML is not layout-heavy; you might
want indentation but doclayout is probably overkill for that.
The reason I'd hesitate about building it on doclayout is that
I don't know how efficient doclayout is...



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]         ` <875z3fu9fc.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  2021-01-29 17:58           ` John MacFarlane
@ 2021-01-29 18:59           ` Julien Dutant
       [not found]             ` <6036a733-df54-4985-bbab-beb743c032adn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2021-02-14  8:07           ` Albert Krewinkel
  2 siblings, 1 reply; 20+ messages in thread
From: Julien Dutant @ 2021-01-29 18:59 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 7541 bytes --]

Speaking of projects, I'm working on a pandoc workflow to produce an 
academic journal (dialectica, John will know it 
http://dialectica.philosophie.ch/) that's flipping open access. It's 
complementary to what John, Albert and others have been doing on the 
authoring side of things. The aims are make copy-editing simpler, more 
accessible, more affordable, independent of commercial tools, with the same 
or better quality that what we got from Wiley-Blackwell. We convert all 
manuscripts to markdown + bibtex, edit them in markdown, and produce JATS 
XML, HTML and PDF (via LaTeX) from that source, using pandoc, citeproc and 
custom lua filters. I should say that not only this would be of course be 
impossible without pandoc, but there are also many details in its design 
that turned out to be very useful for the process (e.g. the defaults file). 
I've been working with a small team of copyeditors on this. Most started 
out without special expertise beyond writing or having written a PhD and 
they're now fluent with markdown and editing rules, as we hoped. However 
there's still quite a bit of technical knowledge required here and there 
when e.g. LaTeX break downs or a script doesn't work on Windows.  

From this perspective here are the things I'm currently thinking would help 
a project like ours: 

1. Cross-referencing would be good indeed. We now use pandoc-crossref. It'd 
be good to handle cross-references to foonotes (see 
https://github.com/lierdakil/pandoc-crossref/issues/215 , at the moment we 
use links with the footnote number manually entered). 

2. Implementing the new table features as well (we had a few multi-row / 
multi-col tables, currently hard-coded in the relevant outputs).

3. Docx reader that picks up Zotero/Endnote/Mendeley *citations*. At the 
moment the most time-consuming copyediting task is to replace manuscript 
citations with markdown tags. If Pandoc was able to pick up 
Zotero/Endnote/Mendeley citations in docx sources, we could consider asking 
the authors to insert them.

Would John's "architectural improvement" for the docx reader/writer make 
developing that feature easier? If so that'd be a big plus I'd think. 

4. A good visual editor for pandoc's markdown. (Read on to see what that 
has to do with pandoc itself.) Copyeditors like it and it's needed if we 
want to involve authors. My dream would be to have an equivalent of Google 
Docs that would allow authors to do revisions online (something in the air, 
cf https://www.authorea.com/) but a desktop one would be great. IMHO the 
best at the moment is RStudio's 
(https://rstudio.github.io/visual-markdown-editing/#/ just released but 
we've been using the beta preview for a few months.) But one with an even 
better potential I think is LyX: it's faster (C++ rather than javascript), 
designed for LaTeX so covering a lot of academic writing features, with a 
module class+extension packages that could accommodate markdown files, 
custom markup and perhaps even lua filters. All it would take to enable 
this would be for pandoc to be able to read LyX's own format (requested 
here https://github.com/jgm/pandoc/issues/5555 but without noting that this 
would turn LyX into one of the best markdown visual editor available). 

The good news here is that the LyX team looks set to make their 
(undocumented, custom) format into an XML format 
(https://wiki.lyx.org/Devel/XML). 

So, getting to the point: could John's "architectural improvement" for the 
docx reader/writer be something that is flexible enough to enable us to 
create new readers for other XML formats? Would it worth having a 
customizable XML reader that we could feed with custom XML->Pandoc JSON 
maps, or is that better done by XSLT?

5. Generate custom XML for Indesign (and Scribus?). The biggest technical 
hurdles we've faced actually came from LaTeX: PDF generation crashes, the 
copyeditor gets an error report (gets confused by the line number, which is 
from .tex not from their .md), and we have to dig through to find the 
source (weird code in the .bib, people's LaTeX distribution needing an 
update, etc.). Final touches aren't easy (try to prevent a pagebreak 
between paragraphs). That makes the workflow inaccessible to teams without 
a house LaTeX expert and too onerous to those with design-heavy book / 
journal projects. So it would be good to have an alternative to use other 
typesetting engines. 

I haven't looked into this in detail, so perhaps Pandoc already has all 
that's needed to do this: I'm thinking of a combination of JATS XML + raw 
XML markup + custom JATS XML template + lua filters as needed. If not, 
perhaps that's another area where a customizable XML writer and/or John's 
"architectural improvement" could help. 

Best,
Julien
On Friday, January 29, 2021 at 2:04:02 PM UTC Albert Krewinkel wrote:

>
> John MacFarlane <j...-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>
> > It would be an architectural improvement to create a Docx type
> > that represents a docx document (perhaps this could be in a
> > separate module); this could be an intermediate for both the
> > reader and the writer. (Cf. what we currently do with ipynb.)
> > A similar approach could make sense for EPUB as well.
>
> I like this. The project would be self-contained but still very useful
> for pandoc.
>
> > Replace the XML library: I assume you mean by using
> > another library instead of xml-light? Or do you mean creating
> > a new library?
>
> The library that I keep dreaming about would be similar to the current
> `xml` library, but build on DocLayout, with strictness improvements, and
> an extensive benchmarking suite. I'd be equally okay with using a
> different library like xml-conduit, but that's probably not as
> interesting a project.
>
> I'm unsure how much domain knowledge is required for our other ideas,
> maybe projects with a high ratio of Haskell are better suited.
>
> I'll draft something up and post here once I'm done.
>
>
> > Albert Krewinkel <albert...-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
> >
> >> Google's Summer of Code project will run again this year, and
> >> Haskell.org started to collect ideas. I'd be interested to participate
> >> again, so I began to think about possible projects. My favorites so far:
> >>
> >> - Improve or replace the xml library; See also
> >> https://github.com/jgm/pandoc/issues/5854
> >>
> >> - Add full support for figures.
> >> https://github.com/jgm/pandoc/issues/3177
> >> https://github.com/jgm/pandoc-types/pull/83
> >>
> >> - Better table support for more formats.
> >>
> >> - OOXML writer, similar to the OpenDocument writer. This could make
> >> docx and pptx transformations simpler to debug and easier to test.
> >>
> >> Which of the above are most important and would be suitable for a GSoC
> >> project? Of maybe something else from the "List of projects" would be
> >> even better? https://github.com/jgm/pandoc/issues/5581
> >>
> >> --
> >> Albert Krewinkel
> >> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6036a733-df54-4985-bbab-beb743c032adn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 9901 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]             ` <6036a733-df54-4985-bbab-beb743c032adn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-01-29 20:39               ` John MacFarlane
  2021-01-30  9:00               ` Albert Krewinkel
  1 sibling, 0 replies; 20+ messages in thread
From: John MacFarlane @ 2021-01-29 20:39 UTC (permalink / raw)
  To: Julien Dutant, pandoc-discuss

Julien Dutant <julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Speaking of projects, I'm working on a pandoc workflow to produce an 
> academic journal (dialectica, John will know it 
> http://dialectica.philosophie.ch/) that's flipping open access. It's 

That's great to hear!

> 4. A good visual editor for pandoc's markdown. (Read on to see what that 
> has to do with pandoc itself.) Copyeditors like it and it's needed if we 
> want to involve authors. My dream would be to have an equivalent of Google 
> Docs that would allow authors to do revisions online (something in the air, 
> cf https://www.authorea.com/) but a desktop one would be great. IMHO the 
> best at the moment is RStudio's 
> (https://rstudio.github.io/visual-markdown-editing/#/ just released but 
> we've been using the beta preview for a few months.) But one with an even 
> better potential I think is LyX: it's faster (C++ rather than javascript), 
> designed for LaTeX so covering a lot of academic writing features, with a 
> module class+extension packages that could accommodate markdown files, 
> custom markup and perhaps even lua filters. All it would take to enable 
> this would be for pandoc to be able to read LyX's own format (requested 
> here https://github.com/jgm/pandoc/issues/5555 but without noting that this 
> would turn LyX into one of the best markdown visual editor available). 

Why don't you add a comment to issue 5555 noting this?

> The good news here is that the LyX team looks set to make their 
> (undocumented, custom) format into an XML format 
> (https://wiki.lyx.org/Devel/XML). 

That would be easier to parse.  But it suggests that 5555 should
wait until this happens?

> So, getting to the point: could John's "architectural improvement" for the 
> docx reader/writer be something that is flexible enough to enable us to 
> create new readers for other XML formats?

No, this would be specific to Docx format.

> Would it worth having a 
> customizable XML reader that we could feed with custom XML->Pandoc JSON 
> maps, or is that better done by XSLT?

XSLT -> DocBook and then pandoc sounds like it could work.

> 5. Generate custom XML for Indesign (and Scribus?).

ICML is for InDesign, isn't it? We have an ICML writer.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]             ` <m2ft2j62wi.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-01-30  8:35               ` Albert Krewinkel
       [not found]                 ` <8735yiu8jp.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Albert Krewinkel @ 2021-01-30  8:35 UTC (permalink / raw)
  To: John MacFarlane; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:

> Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
>
>>> Replace the XML library:  I assume you mean by using
>>> another library instead of xml-light?  Or do you mean creating
>>> a new library?
>>
>> The library that I keep dreaming about would be similar to the current
>> `xml` library, but build on DocLayout, with strictness improvements, and
>> an extensive benchmarking suite. I'd be equally okay with using a
>> different library like xml-conduit, but that's probably not as
>> interesting a project.
>
> Build on doclayout?  Why?  XML is not layout-heavy; you might
> want indentation but doclayout is probably overkill for that.
> The reason I'd hesitate about building it on doclayout is that
> I don't know how efficient doclayout is...

My rationale here is that we already have module Text.Pandoc.XML.
It's based on doclayout, does about half of what I'd like, and is
very convenient. You are probably right about doclayout being
overkill, so it would be a bit of an experiment.


--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]             ` <6036a733-df54-4985-bbab-beb743c032adn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2021-01-29 20:39               ` John MacFarlane
@ 2021-01-30  9:00               ` Albert Krewinkel
       [not found]                 ` <871re2u7dw.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  1 sibling, 1 reply; 20+ messages in thread
From: Albert Krewinkel @ 2021-01-30  9:00 UTC (permalink / raw)
  To: Julien Dutant; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hello Julien,

Julien Dutant <julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Speaking of projects, I'm working on a pandoc workflow to produce an
> academic journal (dialectica, John will know it
> http://dialectica.philosophie.ch/) that's flipping open access. It's
> complementary to what John, Albert and others have been doing on the
> authoring side of things. The aims are make copy-editing simpler, more
> accessible, more affordable, independent of commercial tools, with the same
> or better quality that what we got from Wiley-Blackwell. We convert all
> manuscripts to markdown + bibtex, edit them in markdown, and produce JATS
> XML, HTML and PDF (via LaTeX) from that source, using pandoc, citeproc and
> custom lua filters. I should say that not only this would be of course be
> impossible without pandoc, but there are also many details in its design
> that turned out to be very useful for the process (e.g. the defaults file).
> I've been working with a small team of copyeditors on this. Most started
> out without special expertise beyond writing or having written a PhD and
> they're now fluent with markdown and editing rules, as we hoped. However
> there's still quite a bit of technical knowledge required here and there
> when e.g. LaTeX break downs or a script doesn't work on Windows.

This sounds very interesting! I'm currently working with "Open
Journals", publishers of the GitHub-based journals [JOSS] and [JOSE], to
improve the publishing pipeline and outputs. Last year, I joined
"Hamburg Open Science" to build a journal publishing workflow combining
OJS, pandoc, and Gitlab (see [k&g], [modpub]).

We are probably facing similar issues, and I'd be delighted if we could
find time to exchange ideas and discuss them in depth.

In the Hamburg project we've been basing our workflow on the
editor[Zettlr]. The latest version comes bundled with pandoc and uses it
as the primary exporter. Hendrik, the author, is open to some changes
that would allow to use it in an advanced publishing environment.

I can also recommend to base the workflow on Docker: it makes it easy to
generate PDFs in a reproducible fashion and allows to re-generate papers
every time the git archive is updated. See, e.g., [JOSS paperdraft].

Cheers,
Albert


[JOSS]: https://joss.theoj.org
[JOSE]: https://jose.theoj.org
[k&g]: https://kommunikation-gesellschaft.de
[modpub]: https://oa-pub.hos.tuhh.de/en/
[Zettlr]: https://zettlr.com
[JOSS paperdraft]: https://joss.readthedocs.io/en/latest/submitting.html#github-action


--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]                 ` <871re2u7dw.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2021-01-31  0:37                   ` Julien Dutant
  0 siblings, 0 replies; 20+ messages in thread
From: Julien Dutant @ 2021-01-31  0:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3556 bytes --]

Thanks all for the feedback and tips!

John: thanks - I hadn't realized Pandoc had a IDML writer, this is great. I 
had ruled out Indesign early on for various reasons, but perhaps this 
should be reassessed in the future. 

Albert: thanks for the tip, the journals look great. I'll get in touch, wd 
be great to have a chat!

Best,
J 

On Saturday, January 30, 2021 at 9:00:20 AM UTC Albert Krewinkel wrote:

> Hello Julien,
>
> Julien Dutant <julien...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > Speaking of projects, I'm working on a pandoc workflow to produce an
> > academic journal (dialectica, John will know it
> > http://dialectica.philosophie.ch/) that's flipping open access. It's
> > complementary to what John, Albert and others have been doing on the
> > authoring side of things. The aims are make copy-editing simpler, more
> > accessible, more affordable, independent of commercial tools, with the 
> same
> > or better quality that what we got from Wiley-Blackwell. We convert all
> > manuscripts to markdown + bibtex, edit them in markdown, and produce JATS
> > XML, HTML and PDF (via LaTeX) from that source, using pandoc, citeproc 
> and
> > custom lua filters. I should say that not only this would be of course be
> > impossible without pandoc, but there are also many details in its design
> > that turned out to be very useful for the process (e.g. the defaults 
> file).
> > I've been working with a small team of copyeditors on this. Most started
> > out without special expertise beyond writing or having written a PhD and
> > they're now fluent with markdown and editing rules, as we hoped. However
> > there's still quite a bit of technical knowledge required here and there
> > when e.g. LaTeX break downs or a script doesn't work on Windows.
>
> This sounds very interesting! I'm currently working with "Open
> Journals", publishers of the GitHub-based journals [JOSS] and [JOSE], to
> improve the publishing pipeline and outputs. Last year, I joined
> "Hamburg Open Science" to build a journal publishing workflow combining
> OJS, pandoc, and Gitlab (see [k&g], [modpub]).
>
> We are probably facing similar issues, and I'd be delighted if we could
> find time to exchange ideas and discuss them in depth.
>
> In the Hamburg project we've been basing our workflow on the
> editor[Zettlr]. The latest version comes bundled with pandoc and uses it
> as the primary exporter. Hendrik, the author, is open to some changes
> that would allow to use it in an advanced publishing environment.
>
> I can also recommend to base the workflow on Docker: it makes it easy to
> generate PDFs in a reproducible fashion and allows to re-generate papers
> every time the git archive is updated. See, e.g., [JOSS paperdraft].
>
> Cheers,
> Albert
>
>
> [JOSS]: https://joss.theoj.org
> [JOSE]: https://jose.theoj.org
> [k&g]: https://kommunikation-gesellschaft.de
> [modpub]: https://oa-pub.hos.tuhh.de/en/
> [Zettlr]: https://zettlr.com
> [JOSS paperdraft]: 
> https://joss.readthedocs.io/en/latest/submitting.html#github-action
>
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a06ce878-588d-47b1-83b8-da6a038bd8dbn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 6276 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]                 ` <8735yiu8jp.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2021-01-31  1:40                   ` John MacFarlane
  2021-02-17  1:01                   ` John MacFarlane
  1 sibling, 0 replies; 20+ messages in thread
From: John MacFarlane @ 2021-01-31  1:40 UTC (permalink / raw)
  To: Albert Krewinkel; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:

> John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>
>> Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
>>
>>>> Replace the XML library:  I assume you mean by using
>>>> another library instead of xml-light?  Or do you mean creating
>>>> a new library?
>>>
>>> The library that I keep dreaming about would be similar to the current
>>> `xml` library, but build on DocLayout, with strictness improvements, and
>>> an extensive benchmarking suite. I'd be equally okay with using a
>>> different library like xml-conduit, but that's probably not as
>>> interesting a project.
>>
>> Build on doclayout?  Why?  XML is not layout-heavy; you might
>> want indentation but doclayout is probably overkill for that.
>> The reason I'd hesitate about building it on doclayout is that
>> I don't know how efficient doclayout is...
>
> My rationale here is that we already have module Text.Pandoc.XML.
> It's based on doclayout, does about half of what I'd like, and is
> very convenient. You are probably right about doclayout being
> overkill, so it would be a bit of an experiment.

Yes, I suppose it's true that *for use in pandoc* we really need
it to be doclayout-based, or we won't get automatic wrapping.
(We don't currently have it with HTML because we're using
blaze and not our own XML library to generate that.)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found] ` <87bldavzy2.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  2021-01-28  1:45   ` John MacFarlane
@ 2021-01-31 18:08   ` John MacFarlane
  1 sibling, 0 replies; 20+ messages in thread
From: John MacFarlane @ 2021-01-31 18:08 UTC (permalink / raw)
  To: Albert Krewinkel, pandoc-discuss


I recently thought of this issue:

https://github.com/jgm/pandoc/issues/6611

making pandoc available as a shared library callable
from C and other languages.

The difficulty here is in creating the right interface.
But we might be able to make that pretty simple by using
the JSON representations of data structures.

The simplest possible library would just wrap convertWithOpts;
we already have JSON for Opts, so the calling library would
just need to pass in a JSON string.  But maybe it would be
worth creating something more sophisticated.

Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:

> Google's Summer of Code project will run again this year, and
> Haskell.org started to collect ideas. I'd be interested to participate
> again, so I began to think about possible projects. My favorites so far:
>
> - Improve or replace the xml library; See also
>   https://github.com/jgm/pandoc/issues/5854
>
> - Add full support for figures.
>   https://github.com/jgm/pandoc/issues/3177
>   https://github.com/jgm/pandoc-types/pull/83
>
> - Better table support for more formats.
>
> - OOXML writer, similar to the OpenDocument writer. This could make
>   docx and pptx transformations simpler to debug and easier to test.
>
> Which of the above are most important and would be suitable for a GSoC
> project? Of maybe something else from the "List of projects" would be
> even better? https://github.com/jgm/pandoc/issues/5581
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87bldavzy2.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]         ` <875z3fu9fc.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  2021-01-29 17:58           ` John MacFarlane
  2021-01-29 18:59           ` Julien Dutant
@ 2021-02-14  8:07           ` Albert Krewinkel
       [not found]             ` <87r1ljax8g.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  2 siblings, 1 reply; 20+ messages in thread
From: Albert Krewinkel @ 2021-02-14  8:07 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:

> I'll draft something up and post here once I'm done.

I'm happy to report that together, Alison Hill, Christophe Dervieux
(both of RStudio) and I decided to proceed with a project proposal
that will result in better support for figures. The PR to the Summer
of Haskell repo is linked below.

https://github.com/haskell-org/summer-of-haskell/pull/134/files

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]             ` <87r1ljax8g.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2021-02-14 13:09               ` jcr
       [not found]                 ` <4c65af87-daee-4c11-9f03-f95ac9f69146n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: jcr @ 2021-02-14 13:09 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1084 bytes --]

Looks good, apart from the typo on line 25: "continiously".

On Sunday, February 14, 2021 at 9:07:39 AM UTC+1 Albert Krewinkel wrote:

> Albert Krewinkel <albert...-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
>
> > I'll draft something up and post here once I'm done.
>
> I'm happy to report that together, Alison Hill, Christophe Dervieux
> (both of RStudio) and I decided to proceed with a project proposal
> that will result in better support for figures. The PR to the Summer
> of Haskell repo is linked below.
>
> https://github.com/haskell-org/summer-of-haskell/pull/134/files
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4c65af87-daee-4c11-9f03-f95ac9f69146n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2018 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]                 ` <4c65af87-daee-4c11-9f03-f95ac9f69146n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-02-14 14:44                   ` Albert Krewinkel
  0 siblings, 0 replies; 20+ messages in thread
From: Albert Krewinkel @ 2021-02-14 14:44 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

jcr <ffi.appdev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Looks good, apart from the typo on line 25: "continiously".

Thanks, fixed.


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]                 ` <8735yiu8jp.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  2021-01-31  1:40                   ` John MacFarlane
@ 2021-02-17  1:01                   ` John MacFarlane
       [not found]                     ` <m21rdf8q2z.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  1 sibling, 1 reply; 20+ messages in thread
From: John MacFarlane @ 2021-02-17  1:01 UTC (permalink / raw)
  To: Albert Krewinkel; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


I've done something similar to this in my last few
commits to master.  (This was motivated by some processing
instructions that made xml-light's parser break.)

The new T.P.XML.Light module exports an interface more or less
identical to xml-light's, but using xml-conduit's faster and
more robust parser, and using Text instead of String. All
the readers that used xml-light have been changed to use
this.

Some benchmarks on the XML-based readers:

    | Reader  |  old  | new   |
    | ------- | ----- | ----- |
    | docbook | 18 ms | 10 ms |
    | opml    | 65 ms | 35 ms |
    | jats    | 15 ms |  9 ms |
    | docx    | 72 ms | 44 ms |
    | odt     | 78 ms | 28 ms |
    | epub    | 64 ms | 56 ms |
    | fb2     | 14 ms | 4 ms  |

Not bad!

This is a bit of a kludge, and it might still be good to
think about a more permanent change of XML libraries.

Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:

> John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>
>> Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
>>
>>>> Replace the XML library:  I assume you mean by using
>>>> another library instead of xml-light?  Or do you mean creating
>>>> a new library?
>>>
>>> The library that I keep dreaming about would be similar to the current
>>> `xml` library, but build on DocLayout, with strictness improvements, and
>>> an extensive benchmarking suite. I'd be equally okay with using a
>>> different library like xml-conduit, but that's probably not as
>>> interesting a project.
>>
>> Build on doclayout?  Why?  XML is not layout-heavy; you might
>> want indentation but doclayout is probably overkill for that.
>> The reason I'd hesitate about building it on doclayout is that
>> I don't know how efficient doclayout is...
>
> My rationale here is that we already have module Text.Pandoc.XML.
> It's based on doclayout, does about half of what I'd like, and is
> very convenient. You are probably right about doclayout being
> overkill, so it would be a bit of an experiment.
>
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Summer project ideas
       [not found]                     ` <m21rdf8q2z.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-02-17 18:35                       ` John MacFarlane
  0 siblings, 0 replies; 20+ messages in thread
From: John MacFarlane @ 2021-02-17 18:35 UTC (permalink / raw)
  To: Albert Krewinkel; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


By the way, I also tried using xeno, and that failed completely.
Weird "pipe errors" running the test suite, and the executable
fell over on just about everything I tried. I haven't looked into
it, but for one thing it looks like it has problems with
doctypes.

If anyone wants to explore further, there's a xeno branch in
the repository.

John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:

> I've done something similar to this in my last few
> commits to master.  (This was motivated by some processing
> instructions that made xml-light's parser break.)
>
> The new T.P.XML.Light module exports an interface more or less
> identical to xml-light's, but using xml-conduit's faster and
> more robust parser, and using Text instead of String. All
> the readers that used xml-light have been changed to use
> this.
>
> Some benchmarks on the XML-based readers:
>
>     | Reader  |  old  | new   |
>     | ------- | ----- | ----- |
>     | docbook | 18 ms | 10 ms |
>     | opml    | 65 ms | 35 ms |
>     | jats    | 15 ms |  9 ms |
>     | docx    | 72 ms | 44 ms |
>     | odt     | 78 ms | 28 ms |
>     | epub    | 64 ms | 56 ms |
>     | fb2     | 14 ms | 4 ms  |
>
> Not bad!
>
> This is a bit of a kludge, and it might still be good to
> think about a more permanent change of XML libraries.
>
> Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
>
>> John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> writes:
>>
>>> Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:
>>>
>>>>> Replace the XML library:  I assume you mean by using
>>>>> another library instead of xml-light?  Or do you mean creating
>>>>> a new library?
>>>>
>>>> The library that I keep dreaming about would be similar to the current
>>>> `xml` library, but build on DocLayout, with strictness improvements, and
>>>> an extensive benchmarking suite. I'd be equally okay with using a
>>>> different library like xml-conduit, but that's probably not as
>>>> interesting a project.
>>>
>>> Build on doclayout?  Why?  XML is not layout-heavy; you might
>>> want indentation but doclayout is probably overkill for that.
>>> The reason I'd hesitate about building it on doclayout is that
>>> I don't know how efficient doclayout is...
>>
>> My rationale here is that we already have module Text.Pandoc.XML.
>> It's based on doclayout, does about half of what I'd like, and is
>> very convenient. You are probably right about doclayout being
>> overkill, so it would be a bit of an experiment.
>>
>>
>> --
>> Albert Krewinkel
>> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-02-17 18:35 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-27 21:21 Summer project ideas Albert Krewinkel
     [not found] ` <87bldavzy2.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-28  1:45   ` John MacFarlane
     [not found]     ` <m235ylal7f.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-01-28 11:03       ` AW: " denis.maier-FfwAq0itz3ofv37vnLkPlQ
     [not found]         ` <1fedfa2a9a174255a781ef43c5b03912-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
2021-01-28 17:46           ` BPJ
     [not found]             ` <CADAJKhCT0mpp7Tvj4xMn9ToZUktvob-QtTcZK9Gf7DMRRetabA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-01-28 18:22               ` John MacFarlane
     [not found]                 ` <m2a6ss9b1w.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-01-28 19:55                   ` BPJ
2021-01-29 14:03       ` Albert Krewinkel
     [not found]         ` <875z3fu9fc.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-29 17:58           ` John MacFarlane
     [not found]             ` <m2ft2j62wi.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-01-30  8:35               ` Albert Krewinkel
     [not found]                 ` <8735yiu8jp.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-31  1:40                   ` John MacFarlane
2021-02-17  1:01                   ` John MacFarlane
     [not found]                     ` <m21rdf8q2z.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-02-17 18:35                       ` John MacFarlane
2021-01-29 18:59           ` Julien Dutant
     [not found]             ` <6036a733-df54-4985-bbab-beb743c032adn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-01-29 20:39               ` John MacFarlane
2021-01-30  9:00               ` Albert Krewinkel
     [not found]                 ` <871re2u7dw.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-01-31  0:37                   ` Julien Dutant
2021-02-14  8:07           ` Albert Krewinkel
     [not found]             ` <87r1ljax8g.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2021-02-14 13:09               ` jcr
     [not found]                 ` <4c65af87-daee-4c11-9f03-f95ac9f69146n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-02-14 14:44                   ` Albert Krewinkel
2021-01-31 18:08   ` John MacFarlane

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).