From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/25285 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: John McCorkle Newsgroups: gmane.text.pandoc Subject: Re: Getting Citations in Wikipedia page to convert over to HTML, Docx, LaTeX. Date: Fri, 29 May 2020 06:47:44 -0700 (PDT) Message-ID: References: <52683ae4-6dc6-45cd-8e2f-66b1226d6b08@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_3454_925187835.1590760064949" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="114411"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBD5JXB4TV4FRBAVFYT3AKGQEMDRXV6Y-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri May 29 15:47:50 2020 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oi1-f187.google.com ([209.85.167.187]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1jefMT-000Tdl-Oi for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 29 May 2020 15:47:49 +0200 Original-Received: by mail-oi1-f187.google.com with SMTP id w196sf1399451oia.12 for ; Fri, 29 May 2020 06:47:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=qYsH3kR82hzhK64DG92FZ1HIj5nIJBO/tL1tuXuV8t4=; b=sthzxHqmwzh/L7SbINHCbcx2JNX4A83UPL6fcOxC5RPE7z1VeY6Laa4JVeWPee6Sqz M4hwnGpPwBMjS6wi1zoT1TqW76KW0uW2MhNZ0hwVgQB3NrqWmM2VSIEePj+mZtzX1QG3 JyK92UV7klm3zszVkr39NXVfxx6LPzcx5k9wJJpkOiWBUfSETWbA+uhSuzc6oNsTnjaN R8uDbHJhgk2voAGzPEZJ+ZdpHhho4ynA/XFQleSAwMJW49Ypv4rbj9Ny3CWYqCW7bQCA HB67nyD1hKkAuJcRMPmK07jXAmEn4Ksoh9CK178uVhrRAC4qNS8KARawAr4F8ASFh7fo 2EIA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=qYsH3kR82hzhK64DG92FZ1HIj5nIJBO/tL1tuXuV8t4=; b=WFEqld3VxBM7WIV40I0lvWqh1fcxptrRylJOwG9CxPOyjoH5s1h9eB4Kxv3wKOyJuP sHr2BpAZF/xslC9jo2WCLlf9rBMRoSGnu5Y74Lwuu83neE1ggyH6Qy4hGdQrkyu3Jicx t1tEHDBa00VcvXPburMZ89OVzky4c1jpwgfXylI+9J9wqoP2p25yR6DnxMT2bk+Q5cv5 TGu9/wNC5zSMds+myG1oT9JkiZDwgf4yW1W6xcTnhFD+hcs/mPffx8vHcFus1JcXzDCA 6HhoZ/2/RsCP9f3PgPJrKHdC/OE+foVncgIYMIvd4p3ZKZCW85XeE9v6qSZ7Ta2T4WtQ QXjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=qYsH3kR82hzhK64DG92FZ1HIj5nIJBO/tL1tuXuV8t4=; b=Ov9SC5bXs+KITmVfi5pZscJ8ubkpkrVjePM8BZ8GiCLZLxNcLoMFWwP5ap+A/QoOvq AGfYnB62Tjnhg65Eybd7Re6vxntYyzkjNRmzkkZGZOHw7Dt56HrA/Uwhv9PKd0lwbXjG giZi30BsOuXGJohdUAhi2RlIFt+LgeDBNeSIzVre5Pbp8HribFxfUoBw3g8M3UHqeOzy yOma0NxVY0J7CEWLw+lZLYJ+cJUPnkciohNi6gU5l+is6TfMtR7dkztVvTVwyiXzHIRQ jvO3tT5sf8FGs5w8wephUASb8X44IpxoAXAPS0v2TDT3hGtXeza8wBxM5pNpNLB3/Vl3 81jw== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM531FYk7V0xsbs/9x0sLnZbBeJZI4sAddsaHylyQ4gIlvG1aUogy1 XHpvhP4DIo1nqQVpUMTAcYM= X-Google-Smtp-Source: ABdhPJxUjoZ3Fn+uwmIIOHYaB1SrolQ4C+/jmAbw13JHFrHWzbab7MEQrNByp7g9xKjxWPlY1TwKQw== X-Received: by 2002:aca:644:: with SMTP id 65mr5288182oig.148.1590760068757; Fri, 29 May 2020 06:47:48 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6830:309c:: with SMTP id f28ls1227271ots.7.gmail; Fri, 29 May 2020 06:47:46 -0700 (PDT) X-Received: by 2002:a9d:7745:: with SMTP id t5mr2090982otl.134.1590760065522; Fri, 29 May 2020 06:47:45 -0700 (PDT) In-Reply-To: X-Original-Sender: JMCO67-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:25285 Archived-At: ------=_Part_3454_925187835.1590760064949 Content-Type: multipart/alternative; boundary="----=_Part_3455_313698347.1590760064950" ------=_Part_3455_313698347.1590760064950 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I never thought about grabbing the HTML. Works great for my HTML need. Many= =20 thanks!! Unfortunately, besides not working on the source Wikimedia file, Pandoc=20 does not convert the HTML to TeX or Docx successfully either. On Thursday, May 7, 2020 at 5:41:54 PM UTC-4, John MacFarlane wrote: > > > You might have better luck converting the HTML version of the=20 > wikipedia page. See=20 > https://groups.google.com/d/msg/pandoc-discuss/ptiLha5vJ2I/bPJvyLw0BAAJ= =20 > > > John McCorkle > writes:=20 > > > I need to convert a Wikipedia page I wrote, to HTML and to either LaTex= =20 > or=20 > > Docx.=20 > > I go to the page here=20 > > (https://en.wikipedia.org/wiki/User:JohnM7190/John%27s_Noise_Figure_Pag= e),=20 > > > click on the "edit source" tab, select and copy the source text, and=20 > then=20 > > click on the "read" tab so I don't risk actually editing anything. I=20 > paste=20 > > that text into Notepad++ and use several regular expression=20 > search/replace=20 > > operations to eliminate the styles (since=20 > Pandoc=20 > > does not recognize them), but keeps the equation and the equation=20 > reference=20 > > number they contain plus fixes the {{EquationNote|x}} references to=20 > those=20 > > equations. That gets saved, UTF-8 encoded, as my source.wiki file.=20 > Pandoc=20 > > converts my source.wiki file to all three output formats pretty well=20 > except=20 > > the citations don't come across.=20 > >=20 > > Can someone please tell me how to modify the citations in my source.wik= i=20 > > file so the citations get converted properly (i.e. both first use of th= e=20 > > citation, and additional references to the same citation), and end up= =20 > > listed at the end of the article the same way they do on the Wikipedia= =20 > page?=20 > >=20 > > For example, on first use, one of my citations is:=20 > > {{Cite=20 > > book|url=3Dhttps://cds.cern.ch/record/105963|title=3DCommunication syst= em=20 > > principles|last=3DPeebles|first=3DPeyton=20 > > Z.|date=3D1976|publisher=3DAddison-Wesley|year=3D|isbn=3D|location=3DRe= ading,=20 > > MA|pages=3D457}}=20 > >=20 > > and then other references to it are:=20 > > =20 > >=20 > > There are several types of references, like=20 > >=20 > > {{Cite journal|last=3DFriis|first=3DH. T.|date=3DJuly= =20 > > 1944|title=3DNoise Figures of Radio Receivers|url=3D|journal=3DProceedi= ngs of=20 > the=20 > >=20 > IRE|volume=3D32|issue=3D7|pages=3D419=E2=80=93422|doi=3D10.1109/JRPROC.19= 44.232049|issn=3D0096-8390|via=3D}}[ > https://ieeexplore.ieee.org/abstract/document/1695024]=20 > >=20 > > {{Cite=20 > > web|url=3D > http://www.electropedia.org/iev/iev.nsf/display?openform&ievref=3D702-08-= 57|title=3DIEC=20 > > 60050 - International Electrotechnical Vocabulary - IEV number=20 > 702-08-57:=20 > > "spot noise factor (of a linear two-port device); spot noise figure (of= =20 > a=20 > > linear two-port device)"|last=3D|first=3D|date=3DSeptember=20 > >=20 > 2018|website=3D|url-status=3Dlive|archive-url=3D|archive-date=3D|accessda= te=3D2019-12-29}}=20 > > >=20 > > {{Cite journal|last=3DFisk|first=3DJames R.|date=3DO= ct=20 > > 1975|title=3DReceiver Noise Figure Sensitivity and Dynamic Range - What= =20 > The=20 > > Numbers=20 > > Mean|url=3D > http://www.electronicsandbooks.com/eab3/manual/Magazine/H/Ham%20Radio%20M= agazine%20US/Ham%20Radio%20Magazine%201975/10%20October%201975.pdf|journal= =3DHam=20 > =20 > > Radio|volume=3D|pages=3D8-25, pg. 12|via=3D}}=20 > >=20 > > Then Wikimedia automatically numbers these and puts them all at the end= =20 > of=20 > > the article with the command:=20 > > {{Reflist}}=20 > >=20 > > Is there some format I could convert these citations to, e.g. using=20 > regular=20 > > expressions, so that Pandoc would convert them properly? And is there= =20 > > something I can use to replace the {{Reflist}} command?=20 > >=20 > > Thanks in advance for any help!=20 > >=20 > >=20 > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group.=20 > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org .=20 > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/52683ae4-6dc6-45cd-8e2f-= 66b1226d6b08%40googlegroups.com.=20 > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/f57971df-00de-48d2-b4ad-3c3e0dc6a629%40googlegroups.com. ------=_Part_3455_313698347.1590760064950 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I never thought about grabbing the HTML. Works great for m= y HTML need. Many thanks!!
Unfortunately, besides not working on the so= urce Wikimedia file, Pandoc does not convert the HTML to TeX or Docx succes= sfully either.

On Thursday, May 7, 2020 at 5:41:54 PM UTC-4, John Ma= cFarlane wrote:

You might have better luck converting the HTML version of the
wikipedia page. =C2=A0See
https://gr= oups.google.com/d/msg/pandoc-discuss/ptiLha5vJ2I/bPJvyLw0BAAJ


John McCorkle <jmc...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I need to convert a Wikipedia page I wrote, to HTML and to either = LaTex or=20
> Docx.
> I go to the page here=20
> (https://en.wikipedia.org/wiki/User:JohnM7190/John%27s_Noise_= Figure_Page),=20
> click on the "edit source" tab, select and copy the sour= ce text, and then=20
> click on the "read" tab so I don't risk actually edi= ting anything. I paste=20
> that text into Notepad++ and use several regular expression search= /replace=20
> operations to eliminate the <NumBlk blah blah /NumBlk> style= s (since Pandoc=20
> does not recognize them), but keeps the equation and the equation = reference=20
> number they contain plus fixes the {{EquationNote|x}} =C2=A0refere= nces to those=20
> equations. That gets saved, UTF-8 encoded, as my source.wiki file.= Pandoc=20
> converts my source.wiki file to all three output formats pretty we= ll except=20
> the citations don't come across.
>
> Can someone please tell me how to modify the citations in my sourc= e.wiki=20
> file so the citations get converted properly (i.e. both first use = of the=20
> citation, and additional references to the same citation), and end= up=20
> listed at the end of the article the same way they do on the Wikip= edia page?
>
> For example, on first use, one of my citations is:
> <ref name=3D"Peebles457">{{Cite=20
> book|url=3Dhttps://cds.cern.ch/record/10= 5963|title=3DCommunication system=20
> principles|last=3DPeebles|first=3DPeyton=20
> Z.|date=3D1976|publisher=3DAddison-Wesley|year=3D|isbn=3D|location=3DReading,=20
> MA|pages=3D457}}</ref>
>
> and then other references to it are:
> <ref name=3D"Peebles457" />
>
> There are several types of references, like
>
> <ref name=3D":2">{{Cite journal|last=3DFriis|first= =3DH. T.|date=3DJuly=20
> 1944|title=3DNoise Figures of Radio Receivers|url=3D|journal=3DProceedings of the=20
> IRE|volume=3D32|issue=3D7|pages=3D419=E2=80=93422|doi=3D10.11= 09/JRPROC.1944.232049|issn=3D0096-8390|via=3D}}[https://i= eeexplore.ieee.org/abstract/document/1695024]</ref>
>
> <ref name=3D"IEC_Spot_NF">{{Cite=20
> web|url=3Dh= ttp://www.electropedia.org/iev/iev.nsf/display?openform&ievre= f=3D702-08-57|title=3DIEC=20
> 60050 - International Electrotechnical Vocabulary - IEV number 702= -08-57:=20
> "spot noise factor (of a linear two-port device); spot noise = figure (of a=20
> linear two-port device)"|last=3D|first=3D|date=3DSeptemb= er=20
> 2018|website=3D|url-status=3Dlive|archive-url=3D|archive-date= =3D|accessdate=3D2019-12-29}}</ref>
>
> <ref name=3D"Fisk">{{Cite journal|last=3DFisk|firs= t=3DJames R.|date=3DOct=20
> 1975|title=3DReceiver Noise Figure Sensitivity and Dynamic Range -= What The=20
> Numbers=20
> Mean|url=3Dhttp://www.electronicsandbooks.com/eab3/manual/Magazine/= H/Ham%20Radio%20Magazine%20US/Ham%20Radio%20Magazine%201975/10%20October%201975.pdf|journal=3DHam=20
> Radio|volume=3D|pages=3D8-25, pg. 12|via=3D}}</ref>
>
> Then Wikimedia automatically numbers these and puts them all at th= e end of=20
> the article with the command:
> {{Reflist}}
>
> Is there some format I could convert these citations to, e.g. usin= g regular=20
> expressions, so that Pandoc would convert them properly? And is th= ere=20
> something I can use to replace the {{Reflist}} command?
>
> Thanks in advance for any help!=20
>
>
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-...@googlegroups.com.
> To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/52683ae4-6dc6-45cd-8e2f-66b1226= d6b08%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/f57971df-00de-48d2-b4ad-3c3e0dc6a629%40googlegroups.co= m.
------=_Part_3455_313698347.1590760064950-- ------=_Part_3454_925187835.1590760064949--