From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/25338 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: John McCorkle Newsgroups: gmane.text.pandoc Subject: Re: Getting Citations in Wikipedia page to convert over to HTML, Docx, LaTeX. Date: Wed, 3 Jun 2020 06:05:55 -0700 (PDT) Message-ID: References: <52683ae4-6dc6-45cd-8e2f-66b1226d6b08@googlegroups.com> <6ac2c977-59b8-159c-93e2-c0a8bf9599fe@reagle.org> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_263_769779250.1591189555139" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="106900"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBD5JXB4TV4FRBNGA333AKGQEH77GT6A-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Jun 03 15:05:59 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oo1-f61.google.com ([209.85.161.61]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1jgT5j-000RgT-AQ for gtp-pandoc-discuss@m.gmane-mx.org; Wed, 03 Jun 2020 15:05:59 +0200 Original-Received: by mail-oo1-f61.google.com with SMTP id l19sf1530433oov.12 for ; Wed, 03 Jun 2020 06:05:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=XENnY9heJ1imlH9KHEelVvfv6ybDalwBSyvRhvFW0EM=; b=TgJlxwZ4uFjjc/ukpI52o44GfMssunkA0ASDtahAXTTqvmK0jj5uG3PeYWloqJ/X9Q A0IIeLb6eyfKUNEtT/Tr5gj7gsrbZu5eQ69WJVM2TyV7F7nKXjMhgtuY3gZN1k77Y2ub sbKxN8iCxjXu5CbeJOuV9av+QVJj6w+CMgSlgzIrVlkozyRzmCQkW1X6rP12Gl/3Mtd/ /IVk7jEVXH+NlyPaVvvCEOtr6LoPr9M66MAWyM8n54kvVk8W15nFkzv6s7xukdBoL/50 WvkjWQVEBFdC0JZRRsG04X5FIg17vgseLWTT6IRmUUJKcuxEQ02/uqZ0OBiEZSPAR8Ee LIcQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=XENnY9heJ1imlH9KHEelVvfv6ybDalwBSyvRhvFW0EM=; b=eTLGuwBzKOZXXI0HfEbv+h3HmnDxin1aElSYJIMF7nI2yqw2/HPp2aVhyPBpKcNH4S qB5ejUvJzGkQFN2Vnw6VvOwHuuHmtiJqCFMeQJnQjKHoukGJQPfMJjVwhn6LRHY9v1p2 VW5C6K5v0ib8PQu2aQxmI5lYAKBsf56kEtL3vNHihUBzrkC8GKHtoWNjcQx9UNnncdUQ Xwh9m48BVIJbnyi/f0llWDJYMSzGhXmuQKo+Gvg3AoLI2Il2Wla68yamp7Oa0i0/tWE4 pEdpKZSaqLaq08R+LaGdwYdesyDVqUpF8VJhnisLM+P5B1jepLgcTvKWIEB/tE91DBvV 0D8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=XENnY9heJ1imlH9KHEelVvfv6ybDalwBSyvRhvFW0EM=; b=gLZPVFjQae7tytBsoav3/HybKcl/ihIFhObrqiQbhVDoJf4LPvzTL4MGLQc1tqG/ih aSwI3nPgYWPAQDHujP6YVw2LLXZcnUvFcycpzVOnCOCbLdaGIq9ZDycxPGuuvnqaBFma WvUtLfRwznbOQvfUIDEP4helrD50fDFdqJf1eDqENB72J1vnvnM3ajEKF9NRXz/i4L36 MMiuwOy7erlJWb0qf8BXtdln4VI0e+fP+1bHWRgcLGL7hTAJb9UUT+JWOoj7qY7Q46TZ gCiqRvWYwZMUNPrHbCRKFQKdE+ZHatBqGhGfBsaEopMd8ylXX91ubn1N8cS5nDEDD7K3 xaOg== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM531RnkHvMrA/o2sCDcFT25Kivhk5oKOB6oBTgwM7FmVhu7MoXVOY qjITuSLcY3nU8s84wAy5+OI= X-Google-Smtp-Source: ABdhPJyUoz+l4DQBWX/a3kKrIiA0rpoTmlCNzQswVhUQIoPBaTPBTrHK0UcVbwl4m43jQN34wVZVsQ== X-Received: by 2002:a9d:6a0a:: with SMTP id g10mr3134064otn.105.1591189558287; Wed, 03 Jun 2020 06:05:58 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6830:3116:: with SMTP id b22ls407320ots.6.gmail; Wed, 03 Jun 2020 06:05:56 -0700 (PDT) X-Received: by 2002:a9d:2012:: with SMTP id n18mr3044159ota.205.1591189555870; Wed, 03 Jun 2020 06:05:55 -0700 (PDT) In-Reply-To: <6ac2c977-59b8-159c-93e2-c0a8bf9599fe-T1oY19WcHSwdnm+yROfE0A@public.gmane.org> X-Original-Sender: JMCO67-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:25338 Archived-At: ------=_Part_263_769779250.1591189555139 Content-Type: multipart/alternative; boundary="----=_Part_264_1684501759.1591189555139" ------=_Part_264_1684501759.1591189555139 Content-Type: text/plain; charset="UTF-8" Joseph, Thank you very much. I was not aware of the List-defined style in Wikimedia. I use Zotero and all the references in the article are in my Zotero library. I could easily put them all in a Zotero collection so that exporting the whole batch to Bibtex or Wikimedia, or other formats Zotero supports is easy. Bibtex or BibLaTex would be nice since these automatically create the short REF-Name. The whole list can also be exported as Wikimedia citations, but that does not create the <\ref> container, so there is no automatically generated short REF-Name. Is there a way to put the whole BibTex list at the end of my wikimedia source file and then reference those in the article text? Is that what you are suggesting? It does seem like if the source was set up that way, converting it to a LaTex or docx format would go better. I'm also thinking the HTML that I grab from my browser, looking at the wikipedia page, would also be cleaner and perhaps the HTML would convert to LaTex or docx better. If you have never used Zotero, you might check it out. It is an absolutely fabulous tool. Great grabber and great database. Thanks again. On Friday, May 29, 2020 at 10:53:19 AM UTC-4, Joseph wrote: > > > Hello John, as someone who authors a lot of citation-heavy content in > markdown and Wikitext, I know it'd be nice if there was an easy way to > convert between the two. > > However, on Wikipedia, citations are templates (appearing between '{{' and > '}}'). Any specific template is not actually part of Wikitex, it is instead > a dynamic and arbitrarily customizable extension. Pandoc, obviously, > doesn't support that. I suppose someone could write a filter to do some of > the work, but they'd need to decide which template to support: {{cite}}, > {{citation}}, {{sfn}}, ... . And then when it comes to the bibliography, > there's , {{reflist}}, ... And then deal with all of the > paramaters, converting their semantics, and bugs. > > Wikitext, and especially templates, is a god-awful mess; it's often not > even well-formed. I tried running a citation bot on your article and it > found many errors, which would make conversion difficult. (Feel feel to > revert that edit.) > > > https://en.wikipedia.org/w/index.php?title=User:JohnM7190/John%27s_Noise_Figure_Page&action=history > > If you do actually want to do a proper semantic conversion of your > citations, I think the thing to do would be: > > 1. Convert your article into List-defined style, so that each citation is > a short reference () to a longer one ( NAME=FOO>{{citation ...}}) at the bottom of your page. > > https://en.wikipedia.org/wiki/Help:List-defined_references > > > This is how latex and pandoc-markdown structures things. > > 2. You'll then need to turn your references (in the prose) and citations > (at the bottom) into the appropriate pandoc/YAML -- you could use bibtex > for the latter. Some regexs might get you part of the way, but given the > sloppiness in the citations, it would be a very manual process. For some of > them, perhaps you could use a DOI or ISSN to get bibtex formatted citations > from an API, which you could use with pandoc. > > There are tools that can output Wikipedia citations given a well-formed > and defined input (bibtex or YAML), but I'm not aware of anything that goes > the other way. > > Good luck! > > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ad675bd9-ffc9-42b3-abb8-b78713b1b2e5%40googlegroups.com. ------=_Part_264_1684501759.1591189555139 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Joseph, Thank you very much. I was not aware of the List-d= efined style in Wikimedia.
I use Zotero and all the references in the a= rticle are in my Zotero library. I could easily put them all in a Zotero co= llection so that exporting the whole batch to Bibtex or Wikimedia, or other= formats Zotero supports is easy. Bibtex or BibLaTex would be nice since th= ese automatically create the short REF-Name. The whole list can also be exp= orted as Wikimedia citations, but that does not create the <ref><\= ref> container, so there is no automatically generated short REF-Name.
Is there a way to put the whole BibTex list at the end of my wikim= edia source file and then reference those in the article text? Is that what= you are suggesting?
It does seem like if the source was set up t= hat way, converting it to a LaTex or docx format would go better.
I'm also thinking the HTML that I grab from my browser, looking at the= wikipedia page, would also be cleaner and perhaps the HTML would convert t= o LaTex or docx better.
If you have never used Zotero, you might = check it out. It is an absolutely fabulous tool. Great grabber and great da= tabase.
Thanks again.

On Friday, May 29, 2020 at 10:53:19 = AM UTC-4, Joseph wrote:

Hello John, as someone who authors a lot of citation-heavy content in m= arkdown and Wikitext, I know it'd be nice if there was an easy way to c= onvert between the two.=20

However, on Wikipedia, citations are templates (appearing between '= {{' and '}}'). Any specific template is not actually part of Wi= kitex, it is instead a dynamic and arbitrarily customizable extension. Pand= oc, obviously, doesn't support that. I suppose someone could write a fi= lter to do some of the work, but they'd need to decide which template t= o support: {{cite}}, {{citation}}, {{sfn}}, ... . And then when it comes to= the bibliography, there's <references/>, {{reflist}}, ... And th= en deal with all of the paramaters, converting their semantics, and bugs.

Wikitext, and especially templates, is a god-awful mess; it's often= not even well-formed. I tried running a citation bot on your article and i= t found many errors, which would make conversion difficult. (Feel feel to r= evert that edit.)

=C2=A0 https://en.wikipedia.org/w/ind= ex.php?title=3DUser:JohnM7190/John%27s_Noise_Figure_Page&acti= on=3Dhistory

If you do actually want to do a proper semantic conversion of your cita= tions, I think the thing to do would be:

1. Convert your article into List-defined style, so that each citation = is a short reference (<REF NAME=3DFOO/>) to a longer one (<REF NAM= E=3DFOO>{{citation ...}}</REF>) at the bottom of your page.

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0https://en.wikipedia.org/wiki/Hel= p:List-defined_references

This is how latex and pandoc-markdown structures things.

2. You'll then need to turn your references (in the prose) and cita= tions (at the bottom) into the appropriate pandoc/YAML -- you could use bib= tex for the latter. Some regexs might get you part of the way, but given th= e sloppiness in the citations, it would be a very manual process. For some = of them, perhaps you could use a DOI or ISSN to get bibtex formatted citati= ons from an API, which you could use with pandoc.

There are tools that can output Wikipedia citations given a well-formed= and defined input (bibtex or YAML), but I'm not aware of anything that= goes the other way.

Good luck!


--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/ad675bd9-ffc9-42b3-abb8-b78713b1b2e5%40googlegroups.co= m.
------=_Part_264_1684501759.1591189555139-- ------=_Part_263_769779250.1591189555139--