Re: [Caml-list] speeding up matrix multiplication (newbie question)

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Martin Jambon <martin.jambon@ens-lyon.org>
To: Erick Matsen <matsen@berkeley.edu>
Cc: Xavier Leroy <Xavier.Leroy@inria.fr>, caml-list@inria.fr
Subject: Re: [Caml-list] speeding up matrix multiplication (newbie question)
Date: Fri, 20 Feb 2009 22:37:53 +0100	[thread overview]
Message-ID: <499F22B1.6040600@ens-lyon.org> (raw)
In-Reply-To: <243054520902201153q4efbe1f0j686bd7212b515c03@mail.gmail.com>

Erick Matsen wrote:
> Wow, once again I am amazed by the vitality of this list. Thank you
> for your suggestions.
> 
> Here is the context: we are interested in calculating the likelihood
> of taxonomic placement of short "metagenomics" sequence fragments from
> unknown organisms in the ocean. We start by assuming a model of
> sequence evolution, which is a reversible Markov chain. The taxonomy
> is represented as a tree, and the sequence information is a collection
> of likelihoods of sequence identities. As we move up the tree, these
> sequences "evolve" by getting multiplied by the exponentiated
> instantaneous Markov matrix.
> 
> The matrices are of the size of the sequence model: 4x4 when looking
> at nucleotides, and 20x20 when looking at proteins.
> 
> The bottleneck is (I mis-spoke before) that we are multiplying many
> length-4 or length-20 vectors by a collection of matrices which
> represent the time evolution of those sequences as follows.
> 
> Outer loop:
>   modify the amount of time each markov process runs
>   exponentiate the rate matrices to get transition matrices
> 
>   Recur over the tree, starting at the leaves:
>     at a node, multiply all of the daughter likelihood vectors together
>     return the multiplication of that product by the trasition matrix
> (bottleneck!)
> 
> The trees are on the order of 50 leaves, and there are about 500
> likelihood vectors done at once.
> 
> All of this gets run on a big cluster of Xeons. It's not worth
> parallelizing because we are running many instances of this process
> already, which fills up the cluster nodes.
> 
> So far I have been doing the simplest thing possible, which is just to
> multiply the matrices out like \sum_j a_ij v_j. Um, this is a bit
> embarassing.
> 
> let mul_vec m v =
>     if Array.length v <> n_cols m then
>       failwith "mul_vec: matrix size and vector size don't match!";
>     let result = Array.create (n_rows m) N.zero in
>     for i=0 to (n_rows m)-1 do
>       for j=0 to (n_cols m)-1 do
> 	result.(i) <- N.add result.(i) (N.mul (get m i j) v.(j))
>       done;
>     done;
>     result
> 
> I have implemented it in a functorial way for flexibility. N is the
> number class. How much improvement might I hope for if I make a
> dedicated float vector multiplication function? I'm sorry, I know
> nothing about "boxing." Where can I read about that?

Depending on the savings, you can afford to spend more or less time optimizing
this. Here are some simple things to consider:

In the OCaml land, try first getting rid of the functor (or use a
defunctorizer; ocamldefun?).

Limit memory accesses, by doing something like:

for i = 0 to m - 1 do
  let a_i = m.(i) in
  for j = 0 to n - 1 do
    let a_ij = a_i.(j) in (* instead of a.(i).(j) *)
    ...
  done
done

Also you can use Array.unsafe_get where it really matters.


You can also use bigarrays and implement the loop in C. It could be fun. I'm
not sure how much it saves.


Martin

next prev parent reply	other threads:[~2009-02-20 21:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-20 15:40 Erick Matsen
2009-02-20 15:45 ` [Caml-list] " RABIH.ELCHAAR
2009-02-20 17:46 ` Jon Harrop
2009-02-20 18:46 ` Xavier Leroy
2009-02-20 19:53   ` Erick Matsen
2009-02-20 21:21     ` Will M. Farr
2009-02-20 21:37     ` Martin Jambon [this message]
2009-02-20 22:23     ` Mike Lin
2009-02-20 22:30       ` Will M. Farr
2009-02-20 22:43         ` Markus Mottl
2009-02-23 22:59           ` Erick Matsen
2009-02-20 22:43         ` Mike Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=499F22B1.6040600@ens-lyon.org \
    --to=martin.jambon@ens-lyon.org \
    --cc=Xavier.Leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=matsen@berkeley.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).