public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Deprecate standalone for readers?
@ 2020-12-15 17:20 Albert Krewinkel
       [not found] ` <87czzbq940.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Albert Krewinkel @ 2020-12-15 17:20 UTC (permalink / raw)
  To: pandoc-discuss

There are currently two readers which are influenced by the
`-s`/`--standalone` flag: rst and txt2tags. I propose to change this
behavior: reading should be independent of output generation; parsing of
metadata should be controllable with an extension, e.g. `-f
rst+metadata`.

Rationale: the current situation makes it difficult to handle metadata
with a filter. Consider reStructuredText metadata like this:

```rst
:author: John Doe
:institute: ACME Inc.

:author: Jane Doe
:institute: Federation of Planets
```

When using `-s`, pandoc will fold the entries in a lossy way; processing
with a filter is not possible. Otherwise, i.e. without `-s`, filtering
is possible, but creating output with a template is not (as that whould
imply `-s`).

Could we savely alter this behavior, or would that create problems?

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Deprecate standalone for readers?
       [not found] ` <87czzbq940.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2020-12-15 17:32   ` John MacFarlane
       [not found]     ` <m2a6ufx9ew.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2020-12-15 17:32 UTC (permalink / raw)
  To: Albert Krewinkel, pandoc-discuss


This is something I've wondered about before; it is
indeed awkward that --standalone has these two separate
effects.

However, there's a reason why those two readers are influenced
by -s.  reST has a convention that a heading at the beginning
of the document becomes a title.  But obviously we don't want
that to happen when we're just generating fragments to
include in a larger document.  So we need to know whether
we're generating a fragment.

I guess what you're proposing is that people would have
to explicitly specify `-f rst-metadata` when they wanted
the fragment behavior?

Could the specific issue with reST metadata be solved by changing the
reader to accept multiple entries?


Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:

> There are currently two readers which are influenced by the
> `-s`/`--standalone` flag: rst and txt2tags. I propose to change this
> behavior: reading should be independent of output generation; parsing of
> metadata should be controllable with an extension, e.g. `-f
> rst+metadata`.
>
> Rationale: the current situation makes it difficult to handle metadata
> with a filter. Consider reStructuredText metadata like this:
>
> ```rst
> :author: John Doe
> :institute: ACME Inc.
>
> :author: Jane Doe
> :institute: Federation of Planets
> ```
>
> When using `-s`, pandoc will fold the entries in a lossy way; processing
> with a filter is not possible. Otherwise, i.e. without `-s`, filtering
> is possible, but creating output with a template is not (as that whould
> imply `-s`).
>
> Could we savely alter this behavior, or would that create problems?
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87czzbq940.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Deprecate standalone for readers?
       [not found]     ` <m2a6ufx9ew.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2020-12-15 18:50       ` Albert Krewinkel
       [not found]         ` <87a6uerji6.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Albert Krewinkel @ 2020-12-15 18:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


John MacFarlane writes:

> However, there's a reason why those two readers are influenced
> by -s.  reST has a convention that a heading at the beginning
> of the document becomes a title.  But obviously we don't want
> that to happen when we're just generating fragments to
> include in a larger document.  So we need to know whether
> we're generating a fragment.
>
> I guess what you're proposing is that people would have
> to explicitly specify `-f rst-metadata` when they wanted
> the fragment behavior?

Yes, that's what I have in mind.

> Could the specific issue with reST metadata be solved by changing the
> reader to accept multiple entries?

Maybe it's easiest if I describe my concrete use-case: the goal is to
determine if pandoc could replace docutils when processing files like
this one:
https://github.com/scipy-conference/scipy_proceedings/blob/2020/papers/pydra/paper.rst
The file is processed with a writer that is defined here:
https://github.com/scipy-conference/scipy_proceedings/blob/2020/publisher/writer/__init__.py
The writer processes the metadata, treating each metadata key as an
event that induces writer-internal state changes. I don't think there is
a one-size-fits-all method for pandoc to handle metadata in a compatible
fashion without producing very ugly metadata.

It is not too difficult to reproduce the writer behavior through a
filter, but the leaking of `--standalone` into the reader forces me to
write an intermediary json file before producing a PDF.

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Deprecate standalone for readers?
       [not found]         ` <87a6uerji6.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2022-03-14 14:34           ` Albert Krewinkel
  0 siblings, 0 replies; 4+ messages in thread
From: Albert Krewinkel @ 2022-03-14 14:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Just to bring closure to this thread: The new custom reader
feature makes the effect of `-s` on readers a non-issue. While
still awkward, it is easy to work around via a custom reader. See
below for an example.

Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> writes:

> John MacFarlane writes:
>
>> However, there's a reason why those two readers are influenced
>> by -s.  reST has a convention that a heading at the beginning
>> of the document becomes a title.  But obviously we don't want
>> that to happen when we're just generating fragments to
>> include in a larger document.  So we need to know whether
>> we're generating a fragment.
>>
>> I guess what you're proposing is that people would have
>> to explicitly specify `-f rst-metadata` when they wanted
>> the fragment behavior?
>
> Yes, that's what I have in mind.
>
>> Could the specific issue with reST metadata be solved by changing the
>> reader to accept multiple entries?
>
> Maybe it's easiest if I describe my concrete use-case: the goal is to
> determine if pandoc could replace docutils when processing files like
> this one:
> https://github.com/scipy-conference/scipy_proceedings/blob/2020/papers/pydra/paper.rst

Here's the custom reader code I wrote to handle the above use-case:

    -- Convert definition list to metadata.
    local function deflist_to_meta (items)
      local meta = pandoc.Meta{
        authors = pandoc.List{}
      }
      local author
      for i, item in ipairs(items) do
        local key = pandoc.utils.stringify(item[1])
        local value = pandoc.utils.blocks_to_inlines(item[2][1] or {})
        if key == 'author' then
          author = {name = value, institution = pandoc.List()}
          meta.authors:insert(author)
        elseif key == 'email' and author then
          author.email = value
        elseif key == 'orcid' and author then
          author.orcid = pandoc.utils.stringify(value)
        elseif key == 'institution' and author then
          author.institution:insert(value)
        else
          author = nil
          meta[key] = value
        end
      end
      return meta
    end
    
    function Reader (input, opts)
      opts.standalone = false
      local doc = pandoc.read(tostring(input), 'rst', opts)
      -- treat initial definition list as metadata
      if doc.blocks[1].t == 'DefinitionList' then
        doc.meta = deflist_to_meta(doc.blocks[1].content)
        doc.blocks:remove(1)
      end
      return doc
    end

-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-14 14:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-15 17:20 Deprecate standalone for readers? Albert Krewinkel
     [not found] ` <87czzbq940.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-12-15 17:32   ` John MacFarlane
     [not found]     ` <m2a6ufx9ew.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2020-12-15 18:50       ` Albert Krewinkel
     [not found]         ` <87a6uerji6.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-03-14 14:34           ` Albert Krewinkel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).