From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/72357 Path: news.gmane.org!not-for-mail From: Khaled Hosny Newsgroups: gmane.comp.tex.context Subject: Re: idea: Module to automatically extract and insert information from Wikipedia Date: Sat, 12 Nov 2011 18:40:08 +0200 Message-ID: <20111112164008.GA5922@khaled-laptop> References: <1321111170.3557.31.camel@mattotaupa> <20111112163123.GA1225@orcus.urz.uni-heidelberg.de> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1400201106==" X-Trace: dough.gmane.org 1321116030 9862 80.91.229.12 (12 Nov 2011 16:40:30 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 12 Nov 2011 16:40:30 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Sat Nov 12 17:40:26 2011 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RPGcz-0006K8-LT for gctc-ntg-context-518@m.gmane.org; Sat, 12 Nov 2011 17:40:25 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id D02B4CB0EA; Sat, 12 Nov 2011 17:40:24 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id XRd71jRAIXlY; Sat, 12 Nov 2011 17:40:21 +0100 (CET) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 704DBCB0E1; Sat, 12 Nov 2011 17:40:21 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 049CBCB0E1 for ; Sat, 12 Nov 2011 17:40:20 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id bPEGlVNhFU36 for ; Sat, 12 Nov 2011 17:40:18 +0100 (CET) Original-Received: from filter4-nij.mf.surf.net (filter4-nij.mf.surf.net [195.169.124.155]) by balder.ntg.nl (Postfix) with ESMTP id 44D44CB0E0 for ; Sat, 12 Nov 2011 17:40:18 +0100 (CET) Original-Received: from mail-wy0-f169.google.com (mail-wy0-f169.google.com [74.125.82.169]) by filter4-nij.mf.surf.net (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id pACGeEYJ006499 for ; Sat, 12 Nov 2011 17:40:17 +0100 Original-Received: by wyg24 with SMTP id 24so6366094wyg.14 for ; Sat, 12 Nov 2011 08:40:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=FzzZ7xHfGuTScqWsoxcVRV0/1ebdAKnNcNUt6Cp1IDo=; b=nTaXRhK5qZNFiE/0pZ6KK2uKE3UDH2EjdRgNqLJTe/yZrfsTRcQdqLna5bIMNmiDS/ dtnAMtwpquUFTifdwioc4pSGGLFYtFcSXoY4hZXyHX95vUGkwNsseFqw344yEh3Su/3a sfE/43XLftZlVN7MxsCIfD/U4GOl2Cg79xo6c= Original-Received: by 10.180.92.163 with SMTP id cn3mr18578162wib.26.1321116013902; Sat, 12 Nov 2011 08:40:13 -0800 (PST) Original-Received: from localhost ([41.176.163.130]) by mx.google.com with ESMTPS id en7sm8983859wib.0.2011.11.12.08.40.11 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 12 Nov 2011 08:40:12 -0800 (PST) In-Reply-To: <20111112163123.GA1225@orcus.urz.uni-heidelberg.de> User-Agent: Mutt/1.5.20 (2009-06-14) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=74.125.82.169; country=US; region=CA; city=Mountain View; postalcode=94043; latitude=37.4192; longitude=-122.0574; metrocode=807; areacode=650; http://maps.google.com/maps?q=37.4192,-122.0574&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 04FUgEe8c - aa302fb5ed79 - 20111112 X-Scanned-By: CanIt (www . roaringpenguin . com) on 195.169.124.155 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:72357 Archived-At: --===============1400201106== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="AhhlLboLdkugWU4S" Content-Disposition: inline --AhhlLboLdkugWU4S Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Nov 12, 2011 at 05:31:23PM +0100, Philipp Gesang wrote: > (Beware that processing wiki text from WP is extremely > complicated due to WP=E2=80=99s using special plugins (=E2=80=9Ctemplates= =E2=80=9D and > stuff). So the only way to make sure that a parser accept any > well formed WP page would be to include all those plugins. Which > would entail rewriting the PHP code in Lua for use as a context > script. And then you=E2=80=99d have to decide for every plugin what its > output should look like in Context.[0] If you have the time ...) I think scraping the MediaWiki-generated HTML would be simpler. Regards, Khaled --AhhlLboLdkugWU4S Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAk6+oWgACgkQRoqITGOuyPI+vwCdGmRurxgpROfuuFnlmdIKdLz7 ReEAn3iEEvPIZLZvtOTPwrJwAUjoc6nr =WPLt -----END PGP SIGNATURE----- --AhhlLboLdkugWU4S-- --===============1400201106== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============1400201106==--