caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Oliver Bandel <oliver@first.in-berlin.de>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Now it's faster (addendum to "Performance-question")
Date: Sat,  9 Feb 2008 11:18:11 +0100	[thread overview]
Message-ID: <1202552291.47ad7de31aa31@webmail.in-berlin.de> (raw)
In-Reply-To: <20080206120403.GA5335@snarc.org>

Hello Vincent,

Zitat von Vincent Hanquez <tab@snarc.org>:

> On Wed, Feb 06, 2008 at 12:55:04PM +0100, Oliver Bandel wrote:
> > Hello,
> >
> > I should have changed the Subject to: "Shocking Performance!!!"
> >
> > but then possibly the spam-filter would become active ;-)
> >
> >
> > The performance dramatically increased now!
> >
> > I first had about 3min34  on my dataset.
> > After throwing out some of the "^"-using
> > functions, the time was about 1min55.
> >
> > Now, after I threw out the rest of that "^"-stuff
> > (which btw. made more of the catanations then
> > the first thrown out functions, but was not called
> > as often as trhe other functions) I'm under 20 seconds!
> > (17..18 seconds!)
> >
> > That's amazing! :-)
>
>
> well i'm pretty sure you could go down even further with your own
> implementation of a buffer library.
[...]

Possibly, but I have no reason to start such an implementation,
if the current possibilities are fast enough.
IMHO optimization comes at the end. When things are working
well and fast enough, optimization is wasted time.
If the software needs optimization, it can be done then.

This is from a practical perspective.
The academic perspective might be different.
And when I have some time to do it, I may
change the datastructures again, to be faster and cleaner.
But that would be not really necessary for the program that
was the reason to ask here. It would be fine to do it better,
but also can be used as it is now.


>
> the buffer library is actually pretty bad since it's actually just a
> simple string.

IMHO it's differently, but I didn't looked at the code.


> each time the buffer need to grow, the string is
> reallocated and the previous one is copied to the new string.

Are you talking about Buffer-module or the "^"-operator?

> and you got the 16mb limit (max_string_length) on 32bit.

For me that limit would be ok.
The strings I use are not that big, but bigger than expected.
And there are a lot of strings that I concat'ed.
I think because of that there was so much allocation/deallocation
work to do.
With Buffer-module it was much faster.
And even the current implementation could be done more efficient,
because I use Buffer.create() locally. I could use it module-global
and use Buffer.clear() or Buffer.reset().
But when performance is not an nissue, I chose for the cleaner way,
which means: not even module global stuff, if possible.

In a library the decision would be differently.




>
> if you implement a growing array of fixed sized string (4K for
> example),
> you just don't need to copy data each time your buffer need to grow.
> I
> suspect it might be even faster than the normal buffer in your case
> (lots of data appending), but depends on what you do with your buffer
> afterwards.

I only do use that string to write it to a dbm-database.
I need a certain layout of the strings, because more
than one data-item must be stored for each key.
It's not a complicated format, but the strings must be concated.
I did this with "^" first, because I didn't expected
that the string-stuff needs that much time. I thought my
mathematical operations (statistical things) need most time,
but my expectation was wrong. The calculations were done very fast.
So using Bufeer-module instead of "^" for the concat's
did bring a good performance boost.

Ciao,
   Oliver

P.S.:

===============================================
# Sys.max_string_length;;
- : int = 144115188075855863
#
===============================================


  parent reply	other threads:[~2008-02-09 10:18 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-06 11:33 Oliver Bandel
2008-02-06 11:55 ` [Caml-list] " Oliver Bandel
2008-02-06 12:04   ` Vincent Hanquez
2008-02-07  9:55     ` David Teller
2008-02-09 10:03       ` Oliver Bandel
2008-02-09 10:29         ` David Teller
2008-02-09 10:18     ` Oliver Bandel [this message]
2008-02-11 12:36       ` Vincent Hanquez
2008-02-11 10:01     ` Jean-Christophe Filliâtre
2008-02-11 12:41       ` Vincent Hanquez
2008-02-11 14:34         ` Jean-Christophe Filliâtre
2008-02-11 14:51           ` Vincent Hanquez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1202552291.47ad7de31aa31@webmail.in-berlin.de \
    --to=oliver@first.in-berlin.de \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).