public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: John McCorkle <jmco67-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Getting Citations in Wikipedia page to convert over to HTML, Docx, LaTeX.
Date: Wed, 3 Jun 2020 06:05:55 -0700 (PDT)	[thread overview]
Message-ID: <ad675bd9-ffc9-42b3-abb8-b78713b1b2e5@googlegroups.com> (raw)
In-Reply-To: <6ac2c977-59b8-159c-93e2-c0a8bf9599fe-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 4001 bytes --]

Joseph, Thank you very much. I was not aware of the List-defined style in 
Wikimedia.
I use Zotero and all the references in the article are in my Zotero 
library. I could easily put them all in a Zotero collection so that 
exporting the whole batch to Bibtex or Wikimedia, or other formats Zotero 
supports is easy. Bibtex or BibLaTex would be nice since these 
automatically create the short REF-Name. The whole list can also be 
exported as Wikimedia citations, but that does not create the <ref><\ref> 
container, so there is no automatically generated short REF-Name.
Is there a way to put the whole BibTex list at the end of my wikimedia 
source file and then reference those in the article text? Is that what you 
are suggesting?
It does seem like if the source was set up that way, converting it to a 
LaTex or docx format would go better.
I'm also thinking the HTML that I grab from my browser, looking at the 
wikipedia page, would also be cleaner and perhaps the HTML would convert to 
LaTex or docx better.
If you have never used Zotero, you might check it out. It is an absolutely 
fabulous tool. Great grabber and great database.
Thanks again.

On Friday, May 29, 2020 at 10:53:19 AM UTC-4, Joseph wrote:
>
>
> Hello John, as someone who authors a lot of citation-heavy content in 
> markdown and Wikitext, I know it'd be nice if there was an easy way to 
> convert between the two. 
>
> However, on Wikipedia, citations are templates (appearing between '{{' and 
> '}}'). Any specific template is not actually part of Wikitex, it is instead 
> a dynamic and arbitrarily customizable extension. Pandoc, obviously, 
> doesn't support that. I suppose someone could write a filter to do some of 
> the work, but they'd need to decide which template to support: {{cite}}, 
> {{citation}}, {{sfn}}, ... . And then when it comes to the bibliography, 
> there's <references/>, {{reflist}}, ... And then deal with all of the 
> paramaters, converting their semantics, and bugs. 
>
> Wikitext, and especially templates, is a god-awful mess; it's often not 
> even well-formed. I tried running a citation bot on your article and it 
> found many errors, which would make conversion difficult. (Feel feel to 
> revert that edit.) 
>
>   
> https://en.wikipedia.org/w/index.php?title=User:JohnM7190/John%27s_Noise_Figure_Page&action=history 
>
> If you do actually want to do a proper semantic conversion of your 
> citations, I think the thing to do would be: 
>
> 1. Convert your article into List-defined style, so that each citation is 
> a short reference (<REF NAME=FOO/>) to a longer one (<REF 
> NAME=FOO>{{citation ...}}</REF>) at the bottom of your page. 
>
>         https://en.wikipedia.org/wiki/Help:List-defined_references 
> <https://www.google.com/url?q=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FHelp%3AList-defined_references&sa=D&sntz=1&usg=AFQjCNEdu4ja4u-rRswI7z0p61m4TlQi9A> 
>
> This is how latex and pandoc-markdown structures things. 
>
> 2. You'll then need to turn your references (in the prose) and citations 
> (at the bottom) into the appropriate pandoc/YAML -- you could use bibtex 
> for the latter. Some regexs might get you part of the way, but given the 
> sloppiness in the citations, it would be a very manual process. For some of 
> them, perhaps you could use a DOI or ISSN to get bibtex formatted citations 
> from an API, which you could use with pandoc. 
>
> There are tools that can output Wikipedia citations given a well-formed 
> and defined input (bibtex or YAML), but I'm not aware of anything that goes 
> the other way. 
>
> Good luck! 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ad675bd9-ffc9-42b3-abb8-b78713b1b2e5%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5787 bytes --]

  parent reply	other threads:[~2020-06-03 13:05 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07 19:34 John McCorkle
     [not found] ` <52683ae4-6dc6-45cd-8e2f-66b1226d6b08-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-05-07 21:41   ` John MacFarlane
     [not found]     ` <m2eerv1kz2.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2020-05-29 13:47       ` John McCorkle
2020-05-29 14:53   ` Joseph Reagle
     [not found]     ` <6ac2c977-59b8-159c-93e2-c0a8bf9599fe-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2020-06-03 13:05       ` John McCorkle [this message]
     [not found]         ` <ad675bd9-ffc9-42b3-abb8-b78713b1b2e5-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-06-03 13:53           ` Joseph Reagle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad675bd9-ffc9-42b3-abb8-b78713b1b2e5@googlegroups.com \
    --to=jmco67-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).