From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/25286 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Joseph Reagle Newsgroups: gmane.text.pandoc Subject: Re: Getting Citations in Wikipedia page to convert over to HTML, Docx, LaTeX. Date: Fri, 29 May 2020 10:53:11 -0400 Message-ID: <6ac2c977-59b8-159c-93e2-c0a8bf9599fe@reagle.org> References: <52683ae4-6dc6-45cd-8e2f-66b1226d6b08@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="18875"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBD65ZAVVYEKRBW6DYT3AKGQE7VY4PGY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri May 29 16:53:19 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oi1-f186.google.com ([209.85.167.186]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1jegNq-0004mm-Sh for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 29 May 2020 16:53:18 +0200 Original-Received: by mail-oi1-f186.google.com with SMTP id 3sf1506558oip.4 for ; Fri, 29 May 2020 07:53:18 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1590763998; cv=pass; d=google.com; s=arc-20160816; b=uU7f2w/BQs8Hh2vRXauWY/le+Folk4ZN/Jf51raymy/9qQ742XhqB4qIcDmvyi4eSv 85PdXMLv0i8jOP583pPCCt7DrcRI9grbtM2D/eu9CD6FZ5BtYd2eekPHtrDHVehvdowj z3gVTcqRluxoJ4ex6V+cX5ttNT7WWCKWX/Quh3OhGf3jgRTeqqeo0rsC3v0pG1bsn5WC fUtbnAnR0zUFDbMUWxGlXElHTTmTqXCHxWwexDWQdGIHXEjvl8ZsaAwjGF0/UdTwd5Kx Y9hcy5sbydKULhPTRZvHsv6ySm7T5iMC3zGfVQPBscaHCrqE1nJoMyiqs2mOe3BSi8Bk cEeQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:sender:dkim-signature; bh=HWg/0a5xdZxzUKhAtSUn4OSoIjf7q/poiCrNaZCGWY0=; b=qxfo1GONZrrY+ivBcxy5cCw583IAD1ag/JaNg27sRyhfsKo7M1A8w7uSRJWqjHD6ph 8vQhY4bXOr1uUMWhXAbMFwr69AjdMyONbvyP9ExC8i91p/nzvtfIypj51octadwUeBR1 C6vLBm+t2sHi6y7ZP43TqDgIivoxS8SZm22LLz76W/jUqEr8rmbVioixz8FQK1bnHtko gbJfU6ua6NW6LjAYOfBxOJwKiDkxs2NwrbC2ZmSvFP+wYIehaNxgWScwYjq7u0r6orVm BGt0cWA/L8EQ6jSZ6BCwZUUm4AtbyNlIbU3/cRDiSiV7I0CeDMkbcrmeqns1m+3CRUY7 Nr9w== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@reagle.org header.s=default header.b=mLYnsk6B; spf=pass (google.com: domain of joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org designates 23.83.212.23 as permitted sender) smtp.mailfrom=joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=HWg/0a5xdZxzUKhAtSUn4OSoIjf7q/poiCrNaZCGWY0=; b=IJ6XjTiyYi2z6e7FNmTgQWl1EP077DWsymoAcoSKjQv02sGK0ThDrssNIBFr/i7TP5 JrzKJkASnbp1f6eeZYpzooHj6xeA2X+UuzBtCsTkVdJV0wCPgVzmKgbrdXfLWWLzkbOy kuSlWLcvNbahxI/439OLYKU1aSpwa64DfXaVe/LDyA05jw5PP58R7o0ijob7qZlSTQew mSlITtUrJ9DZjPymttyByF7p/MOFDARYz0qGDPPwjeVUmpAVsZmOZ08cHIsoF4GHyodd QNUyKye20ZRMv/vRBa5cKabLKcwf/uTgcJEhgO+IuOBeiZ/QyjCqUmCAfcVLfLk/4PgT JYmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=HWg/0a5xdZxzUKhAtSUn4OSoIjf7q/poiCrNaZCGWY0=; b=djEeW9Mb+gTF7Gvg5laPmno1gnZcQfiCXXsYAfRMaiiLj83+lV7C3B+cq019P3vKcQ bDTbk9knoyUTVTy9v+492ggo5NScqTVsEe9bZhmBUHWvE5eEX8RnMFK34HFxLAydQEsw uMu9LltdTJ19usVxQvPR2gOKxkr0m/CxAFZiW0Y+wGxvuqxjPJHCJZ69UWNZ4vAo3SPj tj0gAow30tZGgy8Q1EOamZpuQ+Sk3enAGAjJid7CFdjxeAOBU7bcNXMnOVX4nv81C6kg uIDd7i5ix51L2OF+DVNpFi5SFBrALUASADCb Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM531MTBiVrJtGSGrGQFQRI3Bs5jft0RU0oZ3nSAngnX18pgpvOZ// RdPxOrl9tjzIPzFif3u3wXk= X-Google-Smtp-Source: ABdhPJz/uEIOn1LNaBzzKUrGRePZqMu8wPgPUvsORj6+0bCCIb34MW76x7QMdQTyF+HhxE7IxPWDyA== X-Received: by 2002:a9d:838:: with SMTP id 53mr6152500oty.156.1590763997799; Fri, 29 May 2020 07:53:17 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a4a:c295:: with SMTP id b21ls182577ooq.4.gmail; Fri, 29 May 2020 07:53:15 -0700 (PDT) X-Received: by 2002:a4a:c56:: with SMTP id n22mr436615ooe.72.1590763995545; Fri, 29 May 2020 07:53:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590763995; cv=none; d=google.com; s=arc-20160816; b=os5t+kF3YehgYQypKcrBXhPXDg754+Sj1hHvtHdG1769lb/e304CM4lRiex1eOojOH 8EMq0A+xvy2+vL8YXSMEpykTUgiqude8G+ohBE8gJyNErnWAuA1zSccupDrtcZ3iahB1 srmMpDLhRqXPPtiljzSdWdVzqtK3m8u6GQFQf6T3SRIUup1onZzVTZp4/ouxFEOGHoVV 56v6LTJMIv7fHSCaUspckakziRxHupgy5zfC8Jm/QfpAIdVRxmVJvBT0lyIMr3eGUdc7 DK4fKa4Olglie1gp5VkETXb0x4PA39FYUE6pAdl3fIIHgOsgJshdR8xaj3/x80DEy8HM sl0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:from:references:to:subject :dkim-signature; bh=D0ibFXHwnUSxY45EZAP26PsdQDhpRJcK8TDc4Tff/HU=; b=ZxVVOzPB0biXay4MVW+T2dxa0r1yCa6fyDSnnXw2g3wUvrk3GzuhFEaVHeab2zx/Sf 65JcyCI7DmrBpqwA1ZaMa+u3pvAKexSEcx7rreuq30rjHigjP8jky10p4Rfedw0pMn1X 8+qnO1OgySEO9hn6e48JmXDP5ZWDofwtVX5GO6KB5uLTmO6zSO880kzdlhBCikGFbbwz TKkClGXlDpeDus2UntZnYGz1uKf0Z45y0rPMxmRMJfWd7cXMj4CRlcNuXIED+SRr4Sb+ NBgIcy5PvTPKexYZ2eZU0pP3vNz5fWj1dIokJI/RJzKxDUaIJfXx+GRra2Algf7FbeNQ J3UQ== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@reagle.org header.s=default header.b=mLYnsk6B; spf=pass (google.com: domain of joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org designates 23.83.212.23 as permitted sender) smtp.mailfrom=joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org Original-Received: from brown.elm.relay.mailchannels.net (brown.elm.relay.mailchannels.net. [23.83.212.23]) by gmr-mx.google.com with ESMTPS id e23si581122oti.4.2020.05.29.07.53.14 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 29 May 2020 07:53:15 -0700 (PDT) Received-SPF: pass (google.com: domain of joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org designates 23.83.212.23 as permitted sender) client-ip=23.83.212.23; X-Sender-Id: a2hosting|x-authuser|joseph-T1oY19WcHSwdnm+yROfE0A@public.gmane.org Original-Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id CEC81360513 for ; Fri, 29 May 2020 14:53:13 +0000 (UTC) Original-Received: from az1-ss21.a2hosting.com (100-96-23-33.trex.outbound.svc.cluster.local [100.96.23.33]) (Authenticated sender: a2hosting) by relay.mailchannels.net (Postfix) with ESMTPA id D06A036106A for ; Fri, 29 May 2020 14:53:12 +0000 (UTC) X-Sender-Id: a2hosting|x-authuser|joseph-T1oY19WcHSwdnm+yROfE0A@public.gmane.org Original-Received: from az1-ss21.a2hosting.com (az1-ss21.a2hosting.com [68.66.224.43]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.18.8); Fri, 29 May 2020 14:53:13 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: a2hosting|x-authuser|joseph-T1oY19WcHSwdnm+yROfE0A@public.gmane.org X-MailChannels-Auth-Id: a2hosting X-Trouble-Whimsical: 5e362541581f3419_1590763993380_1523750557 X-MC-Loop-Signature: 1590763993379:3384617863 X-MC-Ingress-Time: 1590763993379 Original-Received: from c-73-149-23-48.hsd1.ma.comcast.net ([73.149.23.48]:64385 helo=[192.168.0.50]) by az1-ss21.a2hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1jegNk-003w9b-1I for pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; Fri, 29 May 2020 07:53:12 -0700 In-Reply-To: <52683ae4-6dc6-45cd-8e2f-66b1226d6b08-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Content-Language: en-US X-AuthUser: joseph-T1oY19WcHSwdnm+yROfE0A@public.gmane.org X-Original-Sender: joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@reagle.org header.s=default header.b=mLYnsk6B; spf=pass (google.com: domain of joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org designates 23.83.212.23 as permitted sender) smtp.mailfrom=joseph.2011-T1oY19WcHSwdnm+yROfE0A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:25286 Archived-At: Hello John, as someone who authors a lot of citation-heavy content in markd= own and Wikitext, I know it'd be nice if there was an easy way to convert b= etween the two.=20 However, on Wikipedia, citations are templates (appearing between '{{' and = '}}'). Any specific template is not actually part of Wikitex, it is instead= a dynamic and arbitrarily customizable extension. Pandoc, obviously, doesn= 't support that. I suppose someone could write a filter to do some of the w= ork, but they'd need to decide which template to support: {{cite}}, {{citat= ion}}, {{sfn}}, ... . And then when it comes to the bibliography, there's <= references/>, {{reflist}}, ... And then deal with all of the paramaters, co= nverting their semantics, and bugs. Wikitext, and especially templates, is a god-awful mess; it's often not eve= n well-formed. I tried running a citation bot on your article and it found = many errors, which would make conversion difficult. (Feel feel to revert th= at edit.) https://en.wikipedia.org/w/index.php?title=3DUser:JohnM7190/John%27s_Nois= e_Figure_Page&action=3Dhistory If you do actually want to do a proper semantic conversion of your citation= s, I think the thing to do would be: 1. Convert your article into List-defined style, so that each citation is a= short reference () to a longer one ({{cit= ation ...}}) at the bottom of your page. https://en.wikipedia.org/wiki/Help:List-defined_references This is how latex and pandoc-markdown structures things. 2. You'll then need to turn your references (in the prose) and citations (a= t the bottom) into the appropriate pandoc/YAML -- you could use bibtex for = the latter. Some regexs might get you part of the way, but given the sloppi= ness in the citations, it would be a very manual process. For some of them,= perhaps you could use a DOI or ISSN to get bibtex formatted citations from= an API, which you could use with pandoc. There are tools that can output Wikipedia citations given a well-formed and= defined input (bibtex or YAML), but I'm not aware of anything that goes th= e other way. Good luck! --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/6ac2c977-59b8-159c-93e2-c0a8bf9599fe%40reagle.org.