From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/72356 Path: news.gmane.org!not-for-mail From: Philipp Gesang Newsgroups: gmane.comp.tex.context Subject: Re: idea: Module to automatically extract and insert information from Wikipedia Date: Sat, 12 Nov 2011 17:31:23 +0100 Message-ID: <20111112163123.GA1225@orcus.urz.uni-heidelberg.de> References: <1321111170.3557.31.camel@mattotaupa> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0372231353==" X-Trace: dough.gmane.org 1321115511 6526 80.91.229.12 (12 Nov 2011 16:31:51 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 12 Nov 2011 16:31:51 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Sat Nov 12 17:31:47 2011 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RPGUc-0002oc-8o for gctc-ntg-context-518@m.gmane.org; Sat, 12 Nov 2011 17:31:46 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id CB547CB0ED; Sat, 12 Nov 2011 17:31:44 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id LYIhPQda6NUK; Sat, 12 Nov 2011 17:31:42 +0100 (CET) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id ECF24CB0D4; Sat, 12 Nov 2011 17:31:41 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id E5791CB0D4 for ; Sat, 12 Nov 2011 17:31:39 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 2IIfi14Tis7F for ; Sat, 12 Nov 2011 17:31:28 +0100 (CET) Original-Received: from filter5-til.mf.surf.net (filter5-til.mf.surf.net [194.171.167.221]) by balder.ntg.nl (Postfix) with ESMTP id 2CA87CB0C9 for ; Sat, 12 Nov 2011 17:31:28 +0100 (CET) Original-Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by filter5-til.mf.surf.net (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id pACGVRBh018176 for ; Sat, 12 Nov 2011 17:31:27 +0100 Original-Received: from ix.urz.uni-heidelberg.de (cyrus-portal.urz.uni-heidelberg.de [129.206.100.176]) by relay.uni-heidelberg.de (8.14.1/8.14.1) with ESMTP id pACGVQAr029469 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 12 Nov 2011 17:31:27 +0100 Original-Received: from extmail.urz.uni-heidelberg.de (extmail.urz.uni-heidelberg.de [129.206.100.140]) by ix.urz.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id pACGVQEt028040 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 12 Nov 2011 17:31:26 +0100 Original-Received: from localhost (vpn168a.rzuser.uni-heidelberg.de [129.206.196.168]) by extmail.urz.uni-heidelberg.de (8.13.4/8.13.1) with ESMTP id pACGVOJn002085 for ; Sat, 12 Nov 2011 17:31:25 +0100 Mail-Followup-To: mailing list for ConTeXt users In-Reply-To: <1321111170.3557.31.camel@mattotaupa> X-Operating-System: Linux orcus 3.1.0-3-ARCH User-Agent: Mutt/1.5.21 (2010-09-15) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=129.206.100.212; country=DE; region=01; city=Heidelberg; latitude=49.4167; longitude=8.7000; http://maps.google.com/maps?q=49.4167,8.7000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0xFUgvrAE - 1630792ea38e - 20111112 X-Scanned-By: CanIt (www . roaringpenguin . com) on 194.171.167.221 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:72356 Archived-At: --===============0372231353== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1yeeQ81UyVL57Vl7" Content-Disposition: inline --1yeeQ81UyVL57Vl7 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Paul, On 2011-11-12 16:19, Paul Menzel wrote: > A macro `\infofromwikipedia{Donald Knuth}` would be nice which gets the > first sentence of the article and puts an item into the bibliography. >=20 > There is even an API to access articles [2]. Besides coding that up I > see the following problems. >=20 > 1. The output [3] needs to be converted to ConTeXt. > 2. An Internet connection would be necessary. But that is just a note > and not a problem. you could take this as a starting point: and implement a function that ignores everything but the first text paragraph. Autodownload should work for the English WP. (I=E2=80=99m sorry I have no time to do this myself atm.) Btw. as =E2=80=9CSentence=E2=80=9D is not a markup category of wikitext, th= ere is no sentence recognition built in ... ymmv. (Beware that processing wiki text from WP is extremely complicated due to WP=E2=80=99s using special plugins (=E2=80=9Ctemplates= =E2=80=9D and stuff). So the only way to make sure that a parser accept any well formed WP page would be to include all those plugins. Which would entail rewriting the PHP code in Lua for use as a context script. And then you=E2=80=99d have to decide for every plugin what its output should look like in Context.[0] If you have the time ...) Good luck Philipp [0] Get an impression on how much work this can be at http://en.wikipedia.org/wiki/Wikipedia:List_of_templates The more important ones are at http://en.wikipedia.org/wiki/Category:Infobox_templates =20 > Thanks, >=20 > Paul >=20 >=20 > [1] https://en.wikipedia.org/wiki/Donald_Knuth > [2] http://www.mediawiki.org/wiki/API > [3] http://www.mediawiki.org/wiki/API:Data_formats#Output > _________________________________________________________________________= __________ > If your question is of interest to others as well, please add an entry to= the Wiki! >=20 > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-co= ntext > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > _________________________________________________________________________= __________ --1yeeQ81UyVL57Vl7 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk6+n1sACgkQ02lYlJYWs9LmCgCfTHAF6VgPNzuKJ21r5rHjrhZv l5kAn1UjDUNkcjYgG82pJ4QDlmbyNyF0 =mN+Y -----END PGP SIGNATURE----- --1yeeQ81UyVL57Vl7-- --===============0372231353== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============0372231353==--