public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Pandoc Citeproc doesn't work on HTML format
@ 2022-11-07 13:26 Mladen Babic
       [not found] ` <8e24d40c-5977-4912-9e1b-6cfa0f66d5e5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Mladen Babic @ 2022-11-07 13:26 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1867 bytes --]

Hi all, 

I'm trying to reference cites from the .bib file in the HTML but without 
success. The function perfectly works for Markdown, so my question is does 
the citeproc work on other formats except for MD?

Here are some examples which I use: 

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="en">
<body>
Test [@test1]
</body>
</html>

Command: pandoc --bibliography=test.bib --citeproc test.html -o test.html 
-s --metadata-file=test.yaml

The .bib file contains the following:

@article{test1,
author = {Rathod, N and Kulawik, P and Ozogul, Y and Ozogul, F and Bekhit, 
A},
title = {Recent developments in non-thermal processing for seafood and 
seafood products: cold plasma, pulsed electric field and high hydrostatic 
pressure},
journal = {International Journal of Food Science & Technology},
date = {2022},
year = {2022},
pages = {774--790},
volume = {57},
number = {2},
doi = {10.1111/ijfs.15392},
raw = {Rathod, N. B., Kulawik, P., Ozogul, Y., Ozogul, F., & Bekhit, A. E. 
D. A. (2022). Recent 
developments in non-thermal processing for seafood and seafood products: 
cold plasma, pulsed 
electric field and high hydrostatic pressure. International Journal of Food 
Science & 
Technology, 57(2), 774-790. https://doi.org/10.1111/ijfs.15392}
}

I have created the Lua filter which covers only partial cases. I'm a newbie 
in Lua and can not currently make the complex filter like we have it for MD.

Thank you.





-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8e24d40c-5977-4912-9e1b-6cfa0f66d5e5n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2551 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc Citeproc doesn't work on HTML format
       [not found] ` <8e24d40c-5977-4912-9e1b-6cfa0f66d5e5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-11-07 14:21   ` Albert Krewinkel
       [not found]     ` <87v8nqon26.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Albert Krewinkel @ 2022-11-07 14:21 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hi Mladen,

Mladen Babic <mladen.babic-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I'm trying to reference cites from the .bib file in the HTML but
> without success. The function perfectly works for Markdown, so my
> question is does the citeproc work on other formats except for MD?
>
> Here are some examples which I use:
>
> <!DOCTYPE html>
> <html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="en">
> <body>
> Test [@test1]
> </body>
> </html>

The file is parsed as HTML, but body above uses Markdown syntax; it's
not possible to use that syntax in HTML. However, Markdown can contain
HTML, so you could try with `pandoc --from=markdown ...`.

Note, however, that pandoc conversions are lossy in general. Going from
HTML to HTML might not do what you expect.


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc Citeproc doesn't work on HTML format
       [not found]     ` <87v8nqon26.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2022-11-07 15:12       ` Mladen Babic
       [not found]         ` <b67f836a-8d65-4124-bb6c-900d9933d2d2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Mladen Babic @ 2022-11-07 15:12 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2190 bytes --]

Hi Albert,

Thanks a lot for the quick reply. 
Ok, I probably missed in the Pandoc citeproc doc that doesn't mention that 
supports only MD, so I thought it would work for all formats with pattern 
@test.

What I actually want to do is when the user uploads the DOCX file, Pandoc 
converts the file to HTML and shows it to the HTML editor for additional 
editing by the user and converts it back to DOCX. 
After converting to Html, the system (my app) will replace current cites in 
HTML cite i.e. [1] with the key from the .bib file (like in my case 
[@test1] so the citeproc will know how to process it.

I guess I need to convert DOCX to MD from MD to HTML but I'm afraid the 
file will lose some of the styles during the conversion process.

Any tips/hints will be appreciated. 

Thank you. 

On Monday, November 7, 2022 at 3:29:29 PM UTC+1 Albert Krewinkel wrote:

> Hi Mladen,
>
> Mladen Babic <mladen...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > I'm trying to reference cites from the .bib file in the HTML but
> > without success. The function perfectly works for Markdown, so my
> > question is does the citeproc work on other formats except for MD?
> >
> > Here are some examples which I use:
> >
> > <!DOCTYPE html>
> > <html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="en">
> > <body>
> > Test [@test1]
> > </body>
> > </html>
>
> The file is parsed as HTML, but body above uses Markdown syntax; it's
> not possible to use that syntax in HTML. However, Markdown can contain
> HTML, so you could try with `pandoc --from=markdown ...`.
>
> Note, however, that pandoc conversions are lossy in general. Going from
> HTML to HTML might not do what you expect.
>
>
> -- 
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b67f836a-8d65-4124-bb6c-900d9933d2d2n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3304 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc Citeproc doesn't work on HTML format
       [not found]         ` <b67f836a-8d65-4124-bb6c-900d9933d2d2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-11-08  8:07           ` Albert Krewinkel
       [not found]             ` <87r0ydoo0n.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Albert Krewinkel @ 2022-11-08  8:07 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: Frederik Eichler


Mladen Babic <mladen.babic-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> What I actually want to do is when the user uploads the DOCX file,
> Pandoc converts the file to HTML and shows it to the HTML editor for
> additional editing by the user and converts it back to DOCX.
> After converting to Html, the system (my app) will replace current
> cites in HTML cite i.e. [1] with the key from the .bib file (like in
> my case [@test1] so the citeproc will know how to process it.

That's an interesting use case. I don't have any immediate ideas; going
via Markdown might be the best option.

But please make sure to also checkout [OS-APS], an open-source
project that uses pandoc for some of the document conversions. Going
from your description it sounds like it could be exactly what you need.
I've added Frederik from that org to CC, he may be able give more info.

[OS-APS]: https://os-aps.de

-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc Citeproc doesn't work on HTML format
       [not found]             ` <87r0ydoo0n.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2022-11-08  9:21               ` 'William Lupton' via pandoc-discuss
       [not found]                 ` <CAEe_xxizCtYTk_m5ROjitBB9WPxivF3rKdmk2vOFqEdZBtLX0Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: 'William Lupton' via pandoc-discuss @ 2022-11-08  9:21 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: Frederik Eichler

[-- Attachment #1: Type: text/plain, Size: 3502 bytes --]

Re this:

> Ok, I probably missed in the Pandoc citeproc doc that doesn't mention
that supports only MD, so I thought it would work for all formats with
pattern @test.

The @test citation syntax is defined under the citations extension
<https://pandoc.org/MANUAL.html#extension-citations> (with target
'extension-citations'). This is within the 'Pandoc's Markdown' section and
so perhaps applies only to markdown.

However, there's another citations extension
<https://pandoc.org/MANUAL.html#org-citations> (with target
'org-citations') in the 'Extensions -> Other extensions' section, and this
describes its usage within org and docx documents.

This little shell script illustrates that the 'citations' extension is
supported for docx, ipynb, jats, markdown (+variants), opml and org, and is
enabled by default for markdown, opml and org.

% for i in $(pandoc --list-input-formats); do echo -n $i:; pandoc
--list-extensions=$i | grep citations || echo; done | grep ':.citations'
docx:-citations
ipynb:-citations
markdown:+citations
markdown_github:-citations
markdown_mmd:-citations
markdown_phpextra:-citations
markdown_strict:-citations
opml:+citations
org:+citations

So I think that (not surprisingly?) the 'citations' syntax supported by a
given input format (if supported) is a function of that input format. The
supported format is clear for markdown (+variants?), org and docx but
perhaps not for ipynb and opml.

I think that it might be useful to clarify some of this in the man page?
Please let me know if I should create an issue.

On Tue, 8 Nov 2022 at 08:21, Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
wrote:

>
> Mladen Babic <mladen.babic-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > What I actually want to do is when the user uploads the DOCX file,
> > Pandoc converts the file to HTML and shows it to the HTML editor for
> > additional editing by the user and converts it back to DOCX.
> > After converting to Html, the system (my app) will replace current
> > cites in HTML cite i.e. [1] with the key from the .bib file (like in
> > my case [@test1] so the citeproc will know how to process it.
>
> That's an interesting use case. I don't have any immediate ideas; going
> via Markdown might be the best option.
>
> But please make sure to also checkout [OS-APS], an open-source
> project that uses pandoc for some of the document conversions. Going
> from your description it sounds like it could be exactly what you need.
> I've added Frederik from that org to CC, he may be able give more info.
>
> [OS-APS]: https://os-aps.de
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/87r0ydoo0n.fsf%40zeitkraut.de
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxizCtYTk_m5ROjitBB9WPxivF3rKdmk2vOFqEdZBtLX0Q%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4926 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Pandoc Citeproc doesn't work on HTML format
       [not found]                 ` <CAEe_xxizCtYTk_m5ROjitBB9WPxivF3rKdmk2vOFqEdZBtLX0Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2022-11-10 14:09                   ` Mladen Babic
  0 siblings, 0 replies; 6+ messages in thread
From: Mladen Babic @ 2022-11-10 14:09 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4371 bytes --]

Thanks all for the feedback. 

It would be nice to have citeproc for HTML too. I guess it won't take too 
much effort for it. In the meantime, I would like to create some Lua 
filters that will cover several cases but I'm a newbie to Lua. 
I created a case for the first case  [@test1], but I'm not able to 
implement for  i.e [@test1; @test2]. How can I return a list of cites?

This is my Lua filter:

function Str(el)
local citekey = el.text:match("[[]@(%w+)[]]")
if citekey then
local citation = pandoc.Citation(citekey, 'NormalCitation')
return pandoc.Cite({pandoc.Str(citekey)},
{citation})
end
end


Any help will be appreciated. 

Thanks



On Tuesday, November 8, 2022 at 10:22:05 AM UTC+1 
wlu...-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org wrote:

> Re this:
>
> > Ok, I probably missed in the Pandoc citeproc doc that doesn't mention 
> that supports only MD, so I thought it would work for all formats with 
> pattern @test.
>
> The @test citation syntax is defined under the citations extension 
> <https://pandoc.org/MANUAL.html#extension-citations> (with target 
> 'extension-citations'). This is within the 'Pandoc's Markdown' section and 
> so perhaps applies only to markdown.
>
> However, there's another citations extension 
> <https://pandoc.org/MANUAL.html#org-citations> (with target 
> 'org-citations') in the 'Extensions -> Other extensions' section, and this 
> describes its usage within org and docx documents.
>
> This little shell script illustrates that the 'citations' extension is 
> supported for docx, ipynb, jats, markdown (+variants), opml and org, and is 
> enabled by default for markdown, opml and org.
>
> % for i in $(pandoc --list-input-formats); do echo -n $i:; pandoc 
> --list-extensions=$i | grep citations || echo; done | grep ':.citations'
> docx:-citations
> ipynb:-citations
> markdown:+citations
> markdown_github:-citations
> markdown_mmd:-citations
> markdown_phpextra:-citations
> markdown_strict:-citations
> opml:+citations
> org:+citations
>
> So I think that (not surprisingly?) the 'citations' syntax supported by a 
> given input format (if supported) is a function of that input format. The 
> supported format is clear for markdown (+variants?), org and docx but 
> perhaps not for ipynb and opml.
>
> I think that it might be useful to clarify some of this in the man page? 
> Please let me know if I should create an issue.
>
> On Tue, 8 Nov 2022 at 08:21, Albert Krewinkel <albert...-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> 
> wrote:
>
>>
>> Mladen Babic <mladen...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > What I actually want to do is when the user uploads the DOCX file,
>> > Pandoc converts the file to HTML and shows it to the HTML editor for
>> > additional editing by the user and converts it back to DOCX.
>> > After converting to Html, the system (my app) will replace current
>> > cites in HTML cite i.e. [1] with the key from the .bib file (like in
>> > my case [@test1] so the citeproc will know how to process it.
>>
>> That's an interesting use case. I don't have any immediate ideas; going
>> via Markdown might be the best option.
>>
>> But please make sure to also checkout [OS-APS], an open-source
>> project that uses pandoc for some of the document conversions. Going
>> from your description it sounds like it could be exactly what you need.
>> I've added Frederik from that org to CC, he may be able give more info.
>>
>> [OS-APS]: https://os-aps.de
>>
>> -- 
>> Albert Krewinkel
>> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/87r0ydoo0n.fsf%40zeitkraut.de
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b4d87a5f-0eaf-4cfb-82cd-5699aad36402n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 7280 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-10 14:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-07 13:26 Pandoc Citeproc doesn't work on HTML format Mladen Babic
     [not found] ` <8e24d40c-5977-4912-9e1b-6cfa0f66d5e5n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-11-07 14:21   ` Albert Krewinkel
     [not found]     ` <87v8nqon26.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-11-07 15:12       ` Mladen Babic
     [not found]         ` <b67f836a-8d65-4124-bb6c-900d9933d2d2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-11-08  8:07           ` Albert Krewinkel
     [not found]             ` <87r0ydoo0n.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-11-08  9:21               ` 'William Lupton' via pandoc-discuss
     [not found]                 ` <CAEe_xxizCtYTk_m5ROjitBB9WPxivF3rKdmk2vOFqEdZBtLX0Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-11-10 14:09                   ` Mladen Babic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).