public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Lua-Filter, Span-text to metadata-text: How to get rid of linebreaks in metadata
@ 2019-10-09 16:56 Jonas Zohren
       [not found] ` <a2955f67-6668-c9a6-3f46-ff100088759d-ncST9ati83jjhi9iKp3Nug@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jonas Zohren @ 2019-10-09 16:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1839 bytes --]

Dear list!

Setup:
PandocMarkdown transcript of meeting with specially tagged spans. E.g.

```md
[Let's declare war on those other guys over there which we don't want to
live.]{.resolution}

[Let's buy a tank.]{.resolution}
```

I want to extract those resolutions out of the document and store them
as metadata for further processing. I managed to do so with a lua filter
using `pandoc.utils.stringify(span)`. As a result the strings get stored
in metadata:
```yaml
date: 0000-00-00
resolutions:
- text: |
      Let's declare war on those other guys over there which
      we don't want to live on.

```

And here is my problem: It gets split up into multiple lines, even
though the original text did not have line breaks. In markdown this
wouldn't be a huge problem, as it ignores this, but when I now output
the metadata as json with the template

```md
$meta-json$
```

and `pandoc -t markdown -s` the resulting json contains those line
break, which originally weren't there:

```json
{text: "Let's declare war on those other guys over there which\nwe don't
want to live on."}
```

AFAIK PandocMarkdown treats the yaml metadata strings as regular
markdown strings and might auto line break them up as a result, but why
does this leak into the JSON-output? The raw JSON-AST (`pandoc -t JSON`)
does not containt those line breaks.

How can I avoid that and export my metadata as JSON with _clean_ strings?


Kind Regards

Jonas


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a2955f67-6668-c9a6-3f46-ff100088759d%40tu-dortmund.de.

[-- Attachment #2: 0xD8879970EF182C4B.asc --]
[-- Type: application/pgp-keys, Size: 4841 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lua-Filter, Span-text to metadata-text: How to get rid of linebreaks in metadata
       [not found] ` <a2955f67-6668-c9a6-3f46-ff100088759d-ncST9ati83jjhi9iKp3Nug@public.gmane.org>
@ 2019-10-10  4:20   ` John MacFarlane
       [not found]     ` <m2a7a9nse8.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  2019-10-10  8:43   ` BPJ
  1 sibling, 1 reply; 5+ messages in thread
From: John MacFarlane @ 2019-10-10  4:20 UTC (permalink / raw)
  To: Jonas Zohren, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


If you don't want any line wrapping behavior, just use
--wrap=none on the command line.

Alternatively, --wrap=preserve will preserve newlines
in your source file.

Does that help or have I misunderstood the problem?

Jonas Zohren <jonas.zohren-ncST9ati83jjhi9iKp3Nug@public.gmane.org> writes:

> Dear list!
>
> Setup:
> PandocMarkdown transcript of meeting with specially tagged spans. E.g.
>
> ```md
> [Let's declare war on those other guys over there which we don't want to
> live.]{.resolution}
>
> [Let's buy a tank.]{.resolution}
> ```
>
> I want to extract those resolutions out of the document and store them
> as metadata for further processing. I managed to do so with a lua filter
> using `pandoc.utils.stringify(span)`. As a result the strings get stored
> in metadata:
> ```yaml
> date: 0000-00-00
> resolutions:
> - text: |
>       Let's declare war on those other guys over there which
>       we don't want to live on.
>
> ```
>
> And here is my problem: It gets split up into multiple lines, even
> though the original text did not have line breaks. In markdown this
> wouldn't be a huge problem, as it ignores this, but when I now output
> the metadata as json with the template
>
> ```md
> $meta-json$
> ```
>
> and `pandoc -t markdown -s` the resulting json contains those line
> break, which originally weren't there:
>
> ```json
> {text: "Let's declare war on those other guys over there which\nwe don't
> want to live on."}
> ```
>
> AFAIK PandocMarkdown treats the yaml metadata strings as regular
> markdown strings and might auto line break them up as a result, but why
> does this leak into the JSON-output? The raw JSON-AST (`pandoc -t JSON`)
> does not containt those line breaks.
>
> How can I avoid that and export my metadata as JSON with _clean_ strings?
>
>
> Kind Regards
>
> Jonas
>
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a2955f67-6668-c9a6-3f46-ff100088759d%40tu-dortmund.de.
> -----BEGIN PGP PUBLIC KEY BLOCK-----
>
> mQINBFykndUBEADXE5rhjAt3LOp2UWNvg3jYHlyxngriME3CjRYGgkpvB4OoKpwl
> 5EgPNNJA8DfCrPkIERcl+/fi7Zypscg7KojKHXbhEVfSx1zhTyzY1mEh5LJ+u2b1
> XI9VLoIWWvtqB3J0DUxwrQFPJfgA5q+1Jbr/+PAWX+J1WqEbcOYDF4L0wwg2WAqL
> 9RSa/0BIRlYWPaBBJKAL/xi3jgpKpVnhFmuItu332ImcTylCUDE1r/ROY4GzC/yN
> sk1mKmIum2nKccwQFXuJ5573A8rIJVRqYCBtRjxcSdSIq51eNFYTiTXPWXuG3G/w
> kXA7voyDgYrvQ+CwChOCeEVAip+VTo4IvSK/H+6JchnANEF5cVAd137hFAjvXrEE
> 5dhppio/9Xh54WQGOEEmhaxkuXMeAr6I7d4ydb+d0jsFL80byLY0pMPfsFJ4owKD
> 6va8LYluPXWnxHmh3qmDWUZz96aFTmX4sz2LX9F4g95MDyYMYUNwgA3CW/tmLAF1
> V/uzVSfKFgFYJmGujHawD14TWqQZxCt8H/6pClkchOkC/u9+y7YV1JzFE3Dl30Fj
> P8HXjrdDQ/wzJkO3zKDPK1I6DhWihMpOsrfms1joN3uxXHLYB/3VFq5vAl+bjtEZ
> jUJtCFdo8c7EaDmQxpQpL58fmstDzSCAoIDGFVrs31PCE6GQ5Zr8dUDTbQARAQAB
> tCpKb25hcyBab2hyZW4gPGpvbmFzLnpvaHJlbkB0dS1kb3J0bXVuZC5kZT6JAlcE
> EwEIAEECGyMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4ACGQEWIQToxvlH0ps5E5SH
> tYfYh5lw7xgsSwUCXNlnjwUJASIXugAKCRDYh5lw7xgsSyL9EADDhOSGvXAjFxWc
> xnXB8m4GFEesBozjDB8rHKxEvUl/fOVCEj3YicnhDxFr57kyj2u/Yh4HUgZMNNQn
> Gza/PLjam4kWgiWxIhKj28BKLHkpGMMZBXXwcKykD3N6g//zele4gzmPE6ZCPL94
> KoG2riBZJlGuLnhBNaMOE+09YoulPbVHU8nONKhxCzvsD+ESDt0OWScWLIk1bkEe
> MB9Y9JDR9s6jUYX6EiS8PkoUP3ienaWHsyXiksqDHKzSZFTYtmxN11DMOyerXDJT
> 3YldC53VhtV175DgP0Yssvjcz1XY/QvdT82vnRJLGWt72sybkdxn8SasGzG6B4rR
> UI5dCHYL+J9UZ7JcyBGlUZ3mNhuMLOTQjainzZuXBi6RZj0AfMDRsIXcS7A9V1Ot
> qfZDMXvL3WWwde2PUGhuWTU0senNnE9CKTfYNHxUdRUZyq19ZycMsrmh2Wt4Vjsg
> 3+7y2niEz3j++kuQ/XX0sFPXhuErdQbRZxwUl2kfLpLphfxVdfq7x3g/AfE923+l
> pLqdGU/UDX2xwqZakzxn/BnXRBPW04b96x/8DDrwN6MVXIMb9YNPX1wLDnFYqfBn
> 5MN/79CIbHR34GQ4cLkGaXg1sZ+YKSbHgbh6HbGuPBOVzo6l5noXyaZlcLdk6vMn
> +bEitwH6SR3uW1QU4iQ6G43z0OqALokCVwQTAQgAQQIbIwUJACeNAAULCQgHAgYV
> CgkICwIEFgIDAQIeAQIXgBYhBOjG+UfSmzkTlIe1h9iHmXDvGCxLBQJcpKCKAhkB
> AAoJENiHmXDvGCxL8ncQANEiE9Gf8fNnHW4hxisnaWd12QRxDXQJF703CzuAJFUK
> kkKhOFqtta3GBLpnwoSUGYGWLdYdfmXz9GuZjtZt0nVnEB9c6Rf9JIds19TnKXj+
> 8Axs2Zbcmj8ktX+SIJdc0XQDTf7bDivqbwPeopzheqqcAABBSFffb737Tn145o6Y
> 9twFF8/X+Tznrcq9va9NdoXBmoiCsV222SLTPQtc+mUQ6Xr4Eu4Cy8ZJTr34oGuR
> cdrw89L6NNsYY3mhfPiBCjTyi/Wl6f82gFDOXphc1cV/f/aeXXiStxfbFZAn7bP3
> Jrezhpm87a5Neq+fgBl1oMcnYG0m45o3zglBlfsSX6vOEHHSoYfMychqqksH4wNx
> T4Y/YlhwCwvKAC2ScaGuTDOwuC8GFClL1g6k+bDs65ByQqLXbIHH9BhWlqFS2jRf
> vLiTsnMqbFSLhhfrurL53GpzNRtu/ydjUYD1RI6Gfoxr/cTyZrjts7EKHBGok/46
> txAnV/iOfPAJgrX15yl35ZD+8XidE0lEpAmbJfjgoIb5HXA0Ci/aFyg9jjnRJAA2
> Nb3lIbVYdPkmDmhTFYwKdMi9Qa24L01iI2ForC/I9Rn4FHraCU7Mc/qSVNz6mCHO
> +Ayl58J1u+aoI1DBkMZpSkpj54bwpJj3hgGjf/Rk/ds5mMvRiDEXu+WplydpAmeN
> tCNKb25hcyBab2hyZW4gPGpvbmFzLnpvaHJlbkB1ZG8uZWR1PokCVAQTAQgAPgIb
> IwULCQgHAgYVCgkICwIEFgIDAQIeAQIXgBYhBOjG+UfSmzkTlIe1h9iHmXDvGCxL
> BQJc2WeQBQkBIhe6AAoJENiHmXDvGCxLNNUQAKjDScqEtBQAB92uJXGjDq/AjkWP
> /5XSmuuYFjP1dz5qXWgCUD7ccw9KQe6inLn59MdPPeKzxx2C1W4HfBmOtjQYuhzW
> /7eHiK7B7yhWDoBQFxP+OsIGQfxyEbKjPhWUgeucOuB47Oh93Ypiy+6LvgF9lIyg
> QH+VnphH4y+MAAlR8cx+vqEUrtQQ6zfbg/JMZyavyNf3ezQHKy+KWnnlffPcEpEk
> dDMbcS/LIw5299EDPtg4KtnFSnBsfw9tRd87yz5lNRCuDe/Uq/UmLLJ0h3vNCg6b
> DaVdr0DnVwJJoaJqXvX4c6/1VOTeTkpBcHNsAjO9ldNvcMBWDZUCcTskdVxL5MH1
> xXXk5rrskByN6YsD9wX+uFQI4cNQLsFO9VzXozo2T7PIOTFK+mKLFD2uoo7ySnRS
> 1Br9jwr+76UuRjCnJjiWXJccvGgyiycZXw4ve+P455LQBqqxEBsXNyd/bagoOGzP
> aG+6WaPDJ4lWMfOoqKTBkERYyzXinXUgJDZfj4vKT65vEdDYA6ENq0ZpXSlfniIk
> fJhAoXENqtpbcqU3sh5j6S8LjBa/CKhE8G9cu81c+64+bdyHJ70ZYTkNtB1zgXO/
> Ho9clGA9b4CM1X5J3Iot7VU99jtEO6G63YW9CJ5lChVyvsb1Zfbse7UDej9s6FId
> QGP9dBFILf+JFf1EuQINBFykndUBEAClhhrvuh4g3bMGfEw0Eld+1NJZPOxgWvqu
> lmkvZpVyZTkPBbNTwbnGWVVcaTULacT5akugFXgyBuo2GomNI9r0QicufyrlmM4P
> FLNjsipdIZe+lfUzpOkypAfH7ikvfjgdNz69Dzh1HE7hwVuTmTEWpOtKLUyIM1Q7
> EPxwBNk6k5buL47aNgLxYIm65uKS9ARsuebkjGxPIM2ICD7jP1kpFSTXQqfuDhb9
> wBas76Heer8qQb7UGwy3MBM6ta/1YJnW7rhY9uPq3sKrNng6oUCINOXrx6VtVSKC
> 1CGd8ph9dIlG/x7rctU/l7s+eUEjbSYlDgaV+fIEmQR1XmHHpi5qfgXObjRDEzwj
> 2dWYQf9i2nu0Q60MpJ+Dh4CoiL7ixIGjGn9gxVfGcGrJe2W1eJZIHm1j0fAALl9E
> AfkvQsa67deSEhyY3rkRj5LQ61L3Ujkz/w8uIPm2852AOUumC4uwcps3oLxNdxj2
> Y86yLrjyGY5t8cg5SpvxL6KaYwdOuAlB2gacA3KvVC34ZjAOxKJBfP0NZsrpLCZQ
> 0Q2fwv4TY0s6v7Wk5C20t0p5bg8u6FbSTyxgfEwkf+mtF1X2wNS/XULuY0Xk/ume
> A6gEn2EuvEV3QVZZEYpYO2ztHhz1OF/sq1B2vhKWh/gpsb0n0X5V8r1CqyrYT33N
> Q+EGWJKw8wARAQABiQI8BBgBCAAmAhsMFiEE6Mb5R9KbOROUh7WH2IeZcO8YLEsF
> AlzZZ5AFCQEiF7sACgkQ2IeZcO8YLEtgVxAAvupaSx7+uZTJjWtqVNAiXiPVAX8z
> 5zFFt6PKWsYK0VLbgK6cr+c8hXFmKO47dbuBd9aCMzUB+ujrSYzDVOKiSa028xiT
> fjoSjDmFTYmbUPLR3nsRZzhTaz0Ve6QLXgaiAqBtfInaI/OEBKOL6mkWsq+ErZ/8
> Mtk2sSDJK0RkBIj4CupBJM9vkf9cmofrDN5osmn1DaRGoXqmNkmSi9kV2GOh7hBu
> +TDyy7vv/YNWX8eBchXmbHQhgQGGOa6ub8IjISddWVYshmExqA9DYlQ9km+u7rr+
> nJRCSoC82r7qs3QH0+8YiTFzZ8WGGb66fo+smj75Uy7e68CgKQ+UxwOINNzTB2nB
> PF/xVO50rjHg774i1KJcimqwyPCjTmnwK2dERmhLXAX3Op3Dunb4ANh545KDqhTs
> 4BozqeNahe1zsq9KHq1cd80t2FZSQ6Ij9lr7Q/efofvuF+WzjGeJ3yHWSVdY4Uo7
> F7CB6wiljCzrqLbLFNfgjUsZIrNd0t+5h9j5YVWSMJ+AXPTTM5eWnrEeq9TppTmL
> 7o2rwc5QBDSzA+fCLPTk/JummsZ+ij/pSHGPk55QKy5Z07otsTflrw6idMhEGrVZ
> 0WFha9zt7k9VyA9fcVfXCYnKwgqGcDbjBuoSfyYjXJzgHtRanipydB0cxD5+s22C
> Rm+Mi2bLCUAZwBo=
> =Ychr
> -----END PGP PUBLIC KEY BLOCK-----


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lua-Filter, Span-text to metadata-text: How to get rid of linebreaks in metadata
       [not found] ` <a2955f67-6668-c9a6-3f46-ff100088759d-ncST9ati83jjhi9iKp3Nug@public.gmane.org>
  2019-10-10  4:20   ` John MacFarlane
@ 2019-10-10  8:43   ` BPJ
  1 sibling, 0 replies; 5+ messages in thread
From: BPJ @ 2019-10-10  8:43 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3497 bytes --]

If you don't want to disable wrapping globally or if John's solution
doesn't work for you here is another:

My first question would be where these line breaks are introduced. Most
probably they existed in the Markdown as SoftBreak elements and got
preserved as linebreaks by the stringify function. If you want only space
(U+0020) characters in the JSON the easiest way is probably to replace all
runs of whitespace with a single space character in the stringified text
like so:

````lua
local str = pandoc.utils.stringify(span)
str = str:gsub('%s+', " ")
````

If that doesn't do it you may want to try to replace SoftBreak and
LineBreak elements with Space elements before stringification:

````lua
local space_filter = {
  LineBreak = pandoc.Space,
  LineBreak = pandoc.Space,
}

local function stringify_span (span)
  span = pandoc.walk_inline(span, space_filter)
  return pandoc.utils.stringify(span)
end
````

I'd try postprocessing with string.gsub first since that has less overhead
though.

I hope this helps!

Den ons 9 okt. 2019 18:58Jonas Zohren <jonas.zohren-ncST9ati83jjhi9iKp3Nug@public.gmane.org> skrev:

> Dear list!
>
> Setup:
> PandocMarkdown transcript of meeting with specially tagged spans. E.g.
>
> ```md
> [Let's declare war on those other guys over there which we don't want to
> live.]{.resolution}
>
> [Let's buy a tank.]{.resolution}
> ```
>
> I want to extract those resolutions out of the document and store them
> as metadata for further processing. I managed to do so with a lua filter
> using `pandoc.utils.stringify(span)`. As a result the strings get stored
> in metadata:
> ```yaml
> date: 0000-00-00
> resolutions:
> - text: |
>       Let's declare war on those other guys over there which
>       we don't want to live on.
>
> ```
>
> And here is my problem: It gets split up into multiple lines, even
> though the original text did not have line breaks. In markdown this
> wouldn't be a huge problem, as it ignores this, but when I now output
> the metadata as json with the template
>
> ```md
> $meta-json$
> ```
>
> and `pandoc -t markdown -s` the resulting json contains those line
> break, which originally weren't there:
>
> ```json
> {text: "Let's declare war on those other guys over there which\nwe don't
> want to live on."}
> ```
>
> AFAIK PandocMarkdown treats the yaml metadata strings as regular
> markdown strings and might auto line break them up as a result, but why
> does this leak into the JSON-output? The raw JSON-AST (`pandoc -t JSON`)
> does not containt those line breaks.
>
> How can I avoid that and export my metadata as JSON with _clean_ strings?
>
>
> Kind Regards
>
> Jonas
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/a2955f67-6668-c9a6-3f46-ff100088759d%40tu-dortmund.de
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBzur51TULQSiGesXfH%2B3331sHKWzyrdSb5qbA1cVb3zA%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 5203 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lua-Filter, Span-text to metadata-text: How to get rid of linebreaks in metadata
       [not found]     ` <m2a7a9nse8.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-10-10  9:59       ` Jonas Zohren
       [not found]         ` <7538f86a-e1ac-d7b5-cee7-f67eb34f5127-ncST9ati83jjhi9iKp3Nug@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jonas Zohren @ 2019-10-10  9:59 UTC (permalink / raw)
  To: John MacFarlane; +Cc: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1.1: Type: text/plain, Size: 2520 bytes --]

Yes, this helps to solve the problem, thanks for that.

But why do those metavalues get wrapped in the JSON-output in the first
place? Is there a specific rationale behind it?

On 10.10.19 06:20, John MacFarlane wrote:
> 
> If you don't want any line wrapping behavior, just use
> --wrap=none on the command line.
> 
> Alternatively, --wrap=preserve will preserve newlines
> in your source file.
> 
> Does that help or have I misunderstood the problem?
> 
> Jonas Zohren <jonas.zohren-ncST9ati83jjhi9iKp3Nug@public.gmane.org> writes:
> 
>> Dear list!
>>
>> Setup:
>> PandocMarkdown transcript of meeting with specially tagged spans. E.g.
>>
>> ```md
>> [Let's declare war on those other guys over there which we don't want to
>> live.]{.resolution}
>>
>> [Let's buy a tank.]{.resolution}
>> ```
>>
>> I want to extract those resolutions out of the document and store them
>> as metadata for further processing. I managed to do so with a lua filter
>> using `pandoc.utils.stringify(span)`. As a result the strings get stored
>> in metadata:
>> ```yaml
>> date: 0000-00-00
>> resolutions:
>> - text: |
>>       Let's declare war on those other guys over there which
>>       we don't want to live on.
>>
>> ```
>>
>> And here is my problem: It gets split up into multiple lines, even
>> though the original text did not have line breaks. In markdown this
>> wouldn't be a huge problem, as it ignores this, but when I now output
>> the metadata as json with the template
>>
>> ```md
>> $meta-json$
>> ```
>>
>> and `pandoc -t markdown -s` the resulting json contains those line
>> break, which originally weren't there:
>>
>> ```json
>> {text: "Let's declare war on those other guys over there which\nwe don't
>> want to live on."}
>> ```
>>
>> AFAIK PandocMarkdown treats the yaml metadata strings as regular
>> markdown strings and might auto line break them up as a result, but why
>> does this leak into the JSON-output? The raw JSON-AST (`pandoc -t JSON`)
>> does not containt those line breaks.
>>
>> How can I avoid that and export my metadata as JSON with _clean_ strings?
>>
>>
>> Kind Regards
>>
>> Jonas

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7538f86a-e1ac-d7b5-cee7-f67eb34f5127%40tu-dortmund.de.

[-- Attachment #1.1.2: 0xD8879970EF182C4B.asc --]
[-- Type: application/pgp-keys, Size: 4919 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Lua-Filter, Span-text to metadata-text: How to get rid of linebreaks in metadata
       [not found]         ` <7538f86a-e1ac-d7b5-cee7-f67eb34f5127-ncST9ati83jjhi9iKp3Nug@public.gmane.org>
@ 2019-10-12  7:27           ` BPJ
  0 siblings, 0 replies; 5+ messages in thread
From: BPJ @ 2019-10-12  7:27 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3848 bytes --]

It's because metadata values are parsed by the ordinary Markdown parser,
which preserves existing line breaks in the input as SoftBreak elements so
that they can be preserved in the output, which is useful for ordinary
text. Many people like to have one sentence per physical line for example,
and will want to have that preserved at least when converting Markdown to
Markdown for cleanup purposes. Apparently the stringify function also
restores them, which is not so useful in your case. You can also try
--wrap=auto, which may remove most breaks in the metadata values but still
insert line breaks in appropriate places in the document body.

Den tors 10 okt. 2019 12:00Jonas Zohren <jonas.zohren-ncST9ati83jjhi9iKp3Nug@public.gmane.org> skrev:

> Yes, this helps to solve the problem, thanks for that.
>
> But why do those metavalues get wrapped in the JSON-output in the first
> place? Is there a specific rationale behind it?
>
> On 10.10.19 06:20, John MacFarlane wrote:
> >
> > If you don't want any line wrapping behavior, just use
> > --wrap=none on the command line.
> >
> > Alternatively, --wrap=preserve will preserve newlines
> > in your source file.
> >
> > Does that help or have I misunderstood the problem?
> >
> > Jonas Zohren <jonas.zohren-ncST9ati83jjhi9iKp3Nug@public.gmane.org> writes:
> >
> >> Dear list!
> >>
> >> Setup:
> >> PandocMarkdown transcript of meeting with specially tagged spans. E.g.
> >>
> >> ```md
> >> [Let's declare war on those other guys over there which we don't want to
> >> live.]{.resolution}
> >>
> >> [Let's buy a tank.]{.resolution}
> >> ```
> >>
> >> I want to extract those resolutions out of the document and store them
> >> as metadata for further processing. I managed to do so with a lua filter
> >> using `pandoc.utils.stringify(span)`. As a result the strings get stored
> >> in metadata:
> >> ```yaml
> >> date: 0000-00-00
> >> resolutions:
> >> - text: |
> >>       Let's declare war on those other guys over there which
> >>       we don't want to live on.
> >>
> >> ```
> >>
> >> And here is my problem: It gets split up into multiple lines, even
> >> though the original text did not have line breaks. In markdown this
> >> wouldn't be a huge problem, as it ignores this, but when I now output
> >> the metadata as json with the template
> >>
> >> ```md
> >> $meta-json$
> >> ```
> >>
> >> and `pandoc -t markdown -s` the resulting json contains those line
> >> break, which originally weren't there:
> >>
> >> ```json
> >> {text: "Let's declare war on those other guys over there which\nwe don't
> >> want to live on."}
> >> ```
> >>
> >> AFAIK PandocMarkdown treats the yaml metadata strings as regular
> >> markdown strings and might auto line break them up as a result, but why
> >> does this leak into the JSON-output? The raw JSON-AST (`pandoc -t JSON`)
> >> does not containt those line breaks.
> >>
> >> How can I avoid that and export my metadata as JSON with _clean_
> strings?
> >>
> >>
> >> Kind Regards
> >>
> >> Jonas
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/7538f86a-e1ac-d7b5-cee7-f67eb34f5127%40tu-dortmund.de
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDPfHKWCRrPEj6BPi-yRGCCn%3DvRM7txXZVpctHPd-KzTg%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 5439 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-12  7:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09 16:56 Lua-Filter, Span-text to metadata-text: How to get rid of linebreaks in metadata Jonas Zohren
     [not found] ` <a2955f67-6668-c9a6-3f46-ff100088759d-ncST9ati83jjhi9iKp3Nug@public.gmane.org>
2019-10-10  4:20   ` John MacFarlane
     [not found]     ` <m2a7a9nse8.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-10-10  9:59       ` Jonas Zohren
     [not found]         ` <7538f86a-e1ac-d7b5-cee7-f67eb34f5127-ncST9ati83jjhi9iKp3Nug@public.gmane.org>
2019-10-12  7:27           ` BPJ
2019-10-10  8:43   ` BPJ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).