From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/72361 Path: news.gmane.org!not-for-mail From: Hans Hagen Newsgroups: gmane.comp.tex.context Subject: Re: idea: Module to automatically extract and insert information from Wikipedia Date: Sat, 12 Nov 2011 18:11:24 +0100 Message-ID: <4EBEA8BC.4080106@wxs.nl> References: <1321111170.3557.31.camel@mattotaupa> <20111112163123.GA1225@orcus.urz.uni-heidelberg.de> <20111112164008.GA5922@khaled-laptop> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1321117916 22025 80.91.229.12 (12 Nov 2011 17:11:56 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 12 Nov 2011 17:11:56 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Sat Nov 12 18:11:52 2011 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RPH7N-0001ei-KG for gctc-ntg-context-518@m.gmane.org; Sat, 12 Nov 2011 18:11:49 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 32CAFCB0EA; Sat, 12 Nov 2011 18:11:49 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id awYo1t+5mK2l; Sat, 12 Nov 2011 18:11:46 +0100 (CET) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 8029BCB0E6; Sat, 12 Nov 2011 18:11:46 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 753E7CB0E6 for ; Sat, 12 Nov 2011 18:11:44 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id F7wHfcseXadT for ; Sat, 12 Nov 2011 18:11:33 +0100 (CET) Original-Received: from filter2-til.mf.surf.net (filter2-til.mf.surf.net [194.171.167.218]) by balder.ntg.nl (Postfix) with ESMTP id 73AB0CB0D4 for ; Sat, 12 Nov 2011 18:11:33 +0100 (CET) Original-Received: from smtp.ziggozakelijk.nl (D57D1DA2.static.ziggozakelijk.nl [213.125.29.162]) by filter2-til.mf.surf.net (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id pACHBWdx019984 for ; Sat, 12 Nov 2011 18:11:32 +0100 X-Default-Received-SPF: pass (skip=loggedin (res=PASS)) x-ip-name=10.100.1.100; Original-Received: from [10.100.1.100] (unverified [10.100.1.100]) by pragma-net.nl (SurgeMail 5.3h2) with ESMTP (TLS) id 4701-1713362 for multiple; Sat, 12 Nov 2011 18:11:27 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20111105 Thunderbird/8.0 In-Reply-To: <20111112164008.GA5922@khaled-laptop> X-Authenticated-User: hagen@controller-9 X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=213.125.29.162; country=NL; region=11; city=Den Haag; latitude=52.0833; longitude=4.3000; http://maps.google.com/maps?q=52.0833,4.3000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0bFUhbw7A - 1982504ca073 - 20111112 X-Scanned-By: CanIt (www . roaringpenguin . com) on 194.171.167.218 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:72361 Archived-At: On 12-11-2011 17:40, Khaled Hosny wrote: > On Sat, Nov 12, 2011 at 05:31:23PM +0100, Philipp Gesang wrote: >> (Beware that processing wiki text from WP is extremely >> complicated due to WP=92s using special plugins (=93templates=94 and >> stuff). So the only way to make sure that a parser accept any >> well formed WP page would be to include all those plugins. Which >> would entail rewriting the PHP code in Lua for use as a context >> script. And then you=92d have to decide for every plugin what its >> output should look like in Context.[0] If you have the time ...) > > I think scraping the MediaWiki-generated HTML would be simpler. Doesn't it also depend on the first line being recognizable as such? Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com | www.pragma-pod.nl ----------------------------------------------------------------- ___________________________________________________________________________= ________ If your question is of interest to others as well, please add an entry to t= he Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-cont= ext webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________= ________