From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/72371 Path: news.gmane.org!not-for-mail From: Aditya Mahajan Newsgroups: gmane.comp.tex.context Subject: Re: idea: Module to automatically extract and insert information from Wikipedia Date: Sun, 13 Nov 2011 01:07:32 -0500 (EST) Message-ID: References: <1321111170.3557.31.camel@mattotaupa> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1321164323 20573 80.91.229.12 (13 Nov 2011 06:05:23 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 13 Nov 2011 06:05:23 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Sun Nov 13 07:05:18 2011 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RPTBt-0006SC-1S for gctc-ntg-context-518@m.gmane.org; Sun, 13 Nov 2011 07:05:17 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 190C1CB0E1; Sun, 13 Nov 2011 07:05:16 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id s-IQA4cXT3G4; Sun, 13 Nov 2011 07:05:13 +0100 (CET) Original-Received: from balder.ntg.nl (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id ECE95CB0DC; Sun, 13 Nov 2011 07:05:12 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id CEA23CB0DC for ; Sun, 13 Nov 2011 07:05:10 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id MPZUk9U9FRhM for ; Sun, 13 Nov 2011 07:04:54 +0100 (CET) Original-Received: from filter1-nij.mf.surf.net (filter1-nij.mf.surf.net [195.169.124.152]) by balder.ntg.nl (Postfix) with ESMTP id 27C78CB0BD for ; Sun, 13 Nov 2011 07:04:54 +0100 (CET) Original-Received: from tombraider.mr.itd.umich.edu (smtp.mail.umich.edu [141.211.12.86]) by filter1-nij.mf.surf.net (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id pAD64ooh023077 for ; Sun, 13 Nov 2011 07:04:52 +0100 Original-Received: FROM adi-laptop (bas3-montreal02-1096680562.dsl.bell.ca [65.94.4.114]) By tombraider.mr.itd.umich.edu ID 4EBF5E00.DFCAE.1276 ; Authuser adityam; 13 Nov 2011 01:04:49 EST In-Reply-To: <1321111170.3557.31.camel@mattotaupa> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=141.211.12.86; country=US; region=MI; city=Ann Arbor; postalcode=48109; latitude=42.2923; longitude=-83.7145; metrocode=505; areacode=734; http://maps.google.com/maps?q=42.2923,-83.7145&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 07FUu4PQN - e0e5223997e0 - 20111113 X-Scanned-By: CanIt (www . roaringpenguin . com) on 195.169.124.152 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.12 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:72371 Archived-At: On Sat, 12 Nov 2011, Paul Menzel wrote: > just now I thought of the following and I am wondering if there exists > already a solution. Not exactly for wikipedia, but I have an experimental module that pulls information from the web. I use it get images from sites like yuml.me an dwebsequencediagrams.com. https://github.com/adityam/context-webfilter See test/ directory for examples. > Writing a text which includes people I want to add information about > these peoples as footnotes. The first sentence in a Wikipedia article is > most of the time good enough for that. > > A macro `\infofromwikipedia{Donald Knuth}` would be nice which gets the > first sentence of the article and puts an item into the bibliography. This actually requires a more detailed spec. What happens if there is more than one person with the same name: http://en.wikipedia.org/wiki/Wolfgang_Schuster > There is even an API to access articles [2]. Besides coding that up I > see the following problems. > > 1. The output [3] needs to be converted to ConTeXt. I don't see anything in the API specs that returns the contents of the page. My guess is that simply downloading the html page and scraping the main paragraph might be easier. Once the data is retreived, using ConTeXt to typeset HTML is fairly easy. Another option is to just use one of the existing scripts to scrap the first paragraph/first line from Wikipedia, e.g., http://stackoverflow.com/questions/1565347/get-first-lines-of-wikipedia-article http://query7.com/scrape-the-first-paragraph-image-from-a-wikipedia-entry and use the filter module to call them. Aditya ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________