ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Philipp Gesang <gesang@stud.uni-heidelberg.de>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: idea: Module to automatically extract and insert information from Wikipedia
Date: Sat, 12 Nov 2011 17:31:23 +0100	[thread overview]
Message-ID: <20111112163123.GA1225@orcus.urz.uni-heidelberg.de> (raw)
In-Reply-To: <1321111170.3557.31.camel@mattotaupa>


[-- Attachment #1.1: Type: text/plain, Size: 2254 bytes --]

Hi Paul,

On 2011-11-12 16:19, Paul Menzel wrote:
> A macro `\infofromwikipedia{Donald Knuth}` would be nice which gets the
> first sentence of the article and puts an item into the bibliography.
> 
> There is even an API to access articles [2]. Besides coding that up I
> see the following problems.
> 
> 1. The output [3] needs to be converted to ConTeXt.
> 2. An Internet connection would be necessary. But that is just a note
> and not a problem.

you could take this as a starting point:
  <https://bitbucket.org/phg/context-acceptor/>
and implement a function that ignores everything but the first
text paragraph. Autodownload should work for the English WP.
(I’m sorry I have no time to do this myself atm.)

Btw. as “Sentence” is not a markup category of wikitext, there is
no sentence recognition built in ... ymmv.

(Beware that processing wiki text from WP is extremely
complicated due to WP’s using special plugins (“templates” and
stuff). So the only way to make sure that a parser accept any
well formed WP page would be to include all those plugins. Which
would entail rewriting the PHP code in Lua for use as a context
script. And then you’d have to decide for every plugin what its
output should look like in Context.[0] If you have the time ...)

Good luck
Philipp

[0] Get an impression on how much work this can be at
    http://en.wikipedia.org/wiki/Wikipedia:List_of_templates
    The more important ones are at
    http://en.wikipedia.org/wiki/Category:Infobox_templates
    

> Thanks,
> 
> Paul
> 
> 
> [1] https://en.wikipedia.org/wiki/Donald_Knuth
> [2] http://www.mediawiki.org/wiki/API
> [3] http://www.mediawiki.org/wiki/API:Data_formats#Output



> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________


[-- Attachment #1.2: Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2011-11-12 16:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-12 15:19 Paul Menzel
2011-11-12 16:31 ` Philipp Gesang [this message]
2011-11-12 16:40   ` Khaled Hosny
2011-11-12 17:11     ` Hans Hagen
2011-11-13  6:07 ` Aditya Mahajan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111112163123.GA1225@orcus.urz.uni-heidelberg.de \
    --to=gesang@stud.uni-heidelberg.de \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).