caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jon Kleiser <jon.kleiser@ceres.no>
To: "caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] Create Array of floats from string
Date: Fri, 28 Apr 2017 12:19:57 +0000	[thread overview]
Message-ID: <B2580A9B-F8A8-4DA7-9FA7-7913E1581FAF@mail.uio.no> (raw)
In-Reply-To: <E7AA81E4-D690-4F8F-8C22-87F5CF9575D8@mail.uio.no>

In case anybody wants to take a look, I have put my two program versions, the fast one and the slow one, here:

<http://folk.uio.no/jkleiser/ocaml/read_vec.ml>
<http://folk.uio.no/jkleiser/ocaml/scan_vec.ml>

The fast one, which uses ‘String.split_on_char’ and ‘List.iteri’, reads a 1.35 GB file in about 18 secs on my Mac, while the slower one, which uses ‘Scanf.bscanf’, reads the same file in about 43 secs.

If I have made some stupid things that makes the slower one so slow, than I’d be glad to hear how to fix it, just to learn a bit more OCaml.

The file that I use as input, is the wiki.no.vec that you can find here:
<https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.no.vec>

If you would like to play with other files in the same format, you find them here:
<https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md>

/Jon

      reply	other threads:[~2017-04-28 12:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26 10:48 Jon Kleiser
2017-04-26 11:02 ` rixed
2017-04-26 13:36   ` Francois BERENGER
     [not found] ` <CAPFanBGh0q2AaF7ROWJJF81o=8+79sn-q4-CxqCKGQ__Oa5SEw@mail.gmail.com>
2017-04-26 14:05   ` Jon Kleiser
2017-04-26 15:26     ` Gabriel Scherer
2017-04-27 14:00       ` [Caml-list] Create Array of floats from string, surprise Jon Kleiser
2017-04-26 15:27 ` [Caml-list] Create Array of floats from string Alain Frisch
2017-04-27  8:36   ` Jon Kleiser
2017-04-27  9:15   ` Jon Kleiser
2017-04-28 12:19     ` Jon Kleiser [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B2580A9B-F8A8-4DA7-9FA7-7913E1581FAF@mail.uio.no \
    --to=jon.kleiser@ceres.no \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).