From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.4 required=5.0 tests=AWL,NO_REAL_NAME,SPF_FAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id A5099BBAF for ; Tue, 2 Jun 2009 05:08:10 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjEDAAI0JErAEKcge2dsb2JhbACOJQGIeXMBARYiBbQPhAwF X-IronPort-AV: E=Sophos;i="4.41,288,1241388000"; d="scan'208";a="28664500" Received: from selenite.metnet.navy.mil ([192.16.167.32]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 02 Jun 2009 05:08:09 +0200 Received: (qmail 28467 invoked by uid 107); 2 Jun 2009 03:08:05 -0000 Received: from unknown (HELO Adric.metnet.navy.mil) (10.100.105.184) by selenite.metnet.navy.mil with SMTP; 2 Jun 2009 03:08:05 -0000 Received: by Adric.metnet.navy.mil (Postfix, from userid 760) id CA73017832; Mon, 1 Jun 2009 20:06:57 -0700 (PDT) From: oleg@okmij.org To: caml-list@inria.fr Subject: ANN: Probabilistic programming in OCaml Message-Id: <20090602030657.CA73017832@Adric.metnet.navy.mil> Date: Mon, 1 Jun 2009 20:06:57 -0700 (PDT) X-Spam: no; 0.00; oleg:01 ocaml:01 ocaml:01 inference:01 model:01 model:01 bool:01 bool:01 booleans:01 memoization:01 ocaml's:01 ocaml's:01 inference:01 ifip:01 0.3:98 Chung-chieh Shan and I would like to announce the OCaml library HANSEI, to express probabilistic models and perform probabilistic inference. OCaml thus becomes a probabilistic programming language. The canonical example of Bayesian net, the grass model, looks as follows open ProbM;; let grass_model () = (* unit -> bool *) let rain = flip 0.3 and sprinkler = flip 0.5 in let grass_is_wet = (flip 0.9 && rain) || (flip 0.8 && sprinkler) || flip 0.1 in if not grass_is_wet then fail (); rain;; The model first defines the prior distributions of two events: of raining and of the sprinkler being on. We then specify the Bayesian: grass may be wet because it rained or because the sprinkler was on, or -- with the probability 10% -- for some other reason. We also consider that there is 10% chance rain did not wet the grass. We observe that the grass is wet. What are the chances it rained? To find out, we execute exact_reify grass_model;; - : bool pV = [(0.322, V false); (0.2838, V true)] which after normalization tells the posterior probability of raining, about 7/15. The probabilistic model is the regular OCaml function; the independent random variables rain and sprinkler and the dependent random variable grass_is_wet are regular OCaml boolean variables. We can pass the values of these random variables (which are just booleans) to regular OCaml functions such as 'not' and use the result in the regular if statement. HANSEI can handle models that are far more complex than the grass model, supporting variable (or bucket) elimination, on-demand evaluation of probabilistic expressions, memoization of stochastic functions, and importance sampling. Here is an example of on-demand evaluation: let lazy_pair () = let x = letlazy (fun () -> flip 0.5) in (x (), x ());; exact_reify lazy_pair;; - : (bool * bool) pV = [(0.5, V (true, true)); (0.5, V (false, false))] We do not observe the pair (true, false). Evaluating the expression x () several times gives the same result -- in the same possible world. That result may be different in another possible world. For that reason, we cannot use OCaml's own 'lazy' evaluation: OCaml's lazy is not thread-safe. A particular feature of HANSEI is that it permits calls to inference procedures (e.g., exact_reify) appear in models. After all, both are OCaml expressions. Distributions thus can be parameterized over distributions and inference procedures can reason about their own accuracy. The HANSEI code is available at http://okmij.org/ftp/kakuritu/ The web page also presents HANSEI code for sample probabilistic models and standard benchmarks (HMM, noisy-or, population estimation, belief networks). The current documentation includes two complementary papers http://okmij.org/ftp/kakuritu/dsl-paper.pdf to be presented at the IFIP working conference on domain-specific languages and http://okmij.org/ftp/kakuritu/embedpp.pdf to be presented at `Uncertainty in AI'. The papers are written for different audiences. The first paper explains the implementation of probabilistic primitives whereas the second describes the applications of the library.