ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Aditya Mahajan <adityam@umich.edu>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: idea: Module to automatically extract and insert information from Wikipedia
Date: Sun, 13 Nov 2011 01:07:32 -0500 (EST)	[thread overview]
Message-ID: <alpine.LNX.2.00.1111130043170.31866@ybpnyubfg.ybpnyqbznva> (raw)
In-Reply-To: <1321111170.3557.31.camel@mattotaupa>

On Sat, 12 Nov 2011, Paul Menzel wrote:

> just now I thought of the following and I am wondering if there exists
> already a solution.

Not exactly for wikipedia, but I have an experimental module that pulls 
information from the web. I use it get images from sites like yuml.me an 
dwebsequencediagrams.com.

https://github.com/adityam/context-webfilter

See test/ directory for examples.

> Writing a text which includes people I want to add information about
> these peoples as footnotes. The first sentence in a Wikipedia article is
> most of the time good enough for that.
>
> A macro `\infofromwikipedia{Donald Knuth}` would be nice which gets the
> first sentence of the article and puts an item into the bibliography.

This actually requires a more detailed spec. What happens if there is 
more than one person with the same name:

http://en.wikipedia.org/wiki/Wolfgang_Schuster

> There is even an API to access articles [2]. Besides coding that up I
> see the following problems.
>
> 1. The output [3] needs to be converted to ConTeXt.

I don't see anything in the API specs that returns the contents of the 
page. My guess is that simply downloading the html page and scraping the 
main paragraph might be easier. Once the data is retreived, using ConTeXt 
to typeset HTML is fairly easy.

Another option is to just use one of the existing scripts to scrap the 
first paragraph/first line from Wikipedia, e.g.,

http://stackoverflow.com/questions/1565347/get-first-lines-of-wikipedia-article
http://query7.com/scrape-the-first-paragraph-image-from-a-wikipedia-entry

and use the filter module to call them.

Aditya
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


      parent reply	other threads:[~2011-11-13  6:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-12 15:19 Paul Menzel
2011-11-12 16:31 ` Philipp Gesang
2011-11-12 16:40   ` Khaled Hosny
2011-11-12 17:11     ` Hans Hagen
2011-11-13  6:07 ` Aditya Mahajan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LNX.2.00.1111130043170.31866@ybpnyubfg.ybpnyqbznva \
    --to=adityam@umich.edu \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).