caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] [ANNOUNCE] ODisco, for large-scale data processing in OCaml
@ 2011-05-11 18:16 Prashanth Mundkur
  0 siblings, 0 replies; only message in thread
From: Prashanth Mundkur @ 2011-05-11 18:16 UTC (permalink / raw)
  To: caml-list

Hello,

The Disco team is pleased to announce the possibility of doing
large-scale data analysis (ala map-reduce) in OCaml.

Disco [1] is an open-source distributed computing framework inspired
by the map-reduce paradigm.  It includes a distributed replicating
tag-based filesystem that allows you to store your datasets in a
fault-tolerant manner.  Disco comes with additional tools: DiscoDB [2]
for implementing efficient mapping objects and Discodex [3] for
distributed indices for querying large datasets.

Disco has been in production use at Nokia for two years, and is used
to process terabytes of data daily [4].

The core job scheduling, cluster monitoring and filesystem logic of
Disco is written in Erlang, leveraging the strengths of Erlang in
concurrency and distribution.  The primary language for writing
compute jobs is currently Python; however, the latest Disco 0.4
release [5] has opened up the Disco worker interface, allowing jobs
written to be written in any language.

ODisco is the first available non-Python implementation of this Disco
worker interface, and allows distributed processing of large-scale
datasets in OCaml.  The computation is not restricted to a
record-oriented key-value style interface; the OCaml task directly
gets access to the input data source and writes the output data in
whatever format it chooses.  The overall computation however currently
still follows the traditional map-reduce dataflow, with
map/shuffle/reduce stages.

ODisco is available at https://github.com/pmundkur/odisco and also in
the 3.12 section of Godi as the godi-odisco package.

Please let us know if you have any issues with either ODisco or Disco
on the Disco mailing list.

Happy hacking!

[1] Disco Project, http://discoproject.org
[2] DiscoDB, http://discoproject.org/doc/contrib/discodb/discodb.html
[3] Discodex, http://discoproject.org/doc/contrib/discodex/discodex.html#discodex
[4] Disco at Nokia, http://www.erlang-factory.com/conference/SFBay2011/speakers/VilleTuulos
[5] Disco 0.4 release, http://disco.posterous.com/disco-04

--prashanth

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2011-05-12  1:19 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-11 18:16 [Caml-list] [ANNOUNCE] ODisco, for large-scale data processing in OCaml Prashanth Mundkur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).