From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 0AD21BC6C for ; Sat, 9 Feb 2008 11:18:13 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAEUMrUfAXQImh2dsb2JhbACQNgEBCQopl2k X-IronPort-AV: E=Sophos;i="4.25,326,1199660400"; d="scan'208";a="7848900" Received: from discorde.inria.fr ([192.93.2.38]) by mail1-smtp-roc.national.inria.fr with ESMTP; 09 Feb 2008 11:18:12 +0100 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id m19AICwL007210 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Sat, 9 Feb 2008 11:18:12 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAEUMrUfAbSoIY2dsb2JhbACQKQMVBAYml2w X-IronPort-AV: E=Sophos;i="4.25,326,1199660400"; d="scan'208";a="7121704" Received: from einhorn.in-berlin.de ([192.109.42.8]) by mail2-smtp-roc.national.inria.fr with ESMTP; 09 Feb 2008 11:18:12 +0100 X-Envelope-From: oliver@first.in-berlin.de X-Envelope-To: Received: from einhorn.in-berlin.de (localhost [127.0.0.1]) by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id m19AIBt4021746 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Sat, 9 Feb 2008 11:18:11 +0100 Received: (from www-data@localhost) by einhorn.in-berlin.de (8.13.6/8.13.6/Submit) id m19AIBkW021744 for caml-list@inria.fr; Sat, 9 Feb 2008 11:18:11 +0100 X-Authentication-Warning: einhorn.in-berlin.de: www-data set sender to oliver@first.in-berlin.de using -f Received: from dslb-088-073-071-100.pools.arcor-ip.net (dslb-088-073-071-100.pools.arcor-ip.net [88.73.71.100]) by webmail.in-berlin.de (IMP) with HTTP for ; Sat, 9 Feb 2008 11:18:11 +0100 Message-ID: <1202552291.47ad7de31aa31@webmail.in-berlin.de> Date: Sat, 9 Feb 2008 11:18:11 +0100 From: Oliver Bandel To: caml-list@inria.fr Subject: Re: [Caml-list] Now it's faster (addendum to "Performance-question") References: <1202297628.47a99b1c7ec53@webmail.in-berlin.de> <1202298904.47a9a018998e4@webmail.in-berlin.de> <20080206120403.GA5335@snarc.org> In-Reply-To: <20080206120403.GA5335@snarc.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.6 X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8 X-Miltered: at discorde with ID 47AD7DE4.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; bandel:01 in-berlin:01 0100,:01 bandel:01 buffer:01 buffer:01 deallocation:01 appending:01 shocking:98 55.:98 imho:01 imho:01 wrote:01 oliver:01 oliver:01 Hello Vincent, Zitat von Vincent Hanquez : > On Wed, Feb 06, 2008 at 12:55:04PM +0100, Oliver Bandel wrote: > > Hello, > > > > I should have changed the Subject to: "Shocking Performance!!!" > > > > but then possibly the spam-filter would become active ;-) > > > > > > The performance dramatically increased now! > > > > I first had about 3min34 on my dataset. > > After throwing out some of the "^"-using > > functions, the time was about 1min55. > > > > Now, after I threw out the rest of that "^"-stuff > > (which btw. made more of the catanations then > > the first thrown out functions, but was not called > > as often as trhe other functions) I'm under 20 seconds! > > (17..18 seconds!) > > > > That's amazing! :-) > > > well i'm pretty sure you could go down even further with your own > implementation of a buffer library. [...] Possibly, but I have no reason to start such an implementation, if the current possibilities are fast enough. IMHO optimization comes at the end. When things are working well and fast enough, optimization is wasted time. If the software needs optimization, it can be done then. This is from a practical perspective. The academic perspective might be different. And when I have some time to do it, I may change the datastructures again, to be faster and cleaner. But that would be not really necessary for the program that was the reason to ask here. It would be fine to do it better, but also can be used as it is now. > > the buffer library is actually pretty bad since it's actually just a > simple string. IMHO it's differently, but I didn't looked at the code. > each time the buffer need to grow, the string is > reallocated and the previous one is copied to the new string. Are you talking about Buffer-module or the "^"-operator? > and you got the 16mb limit (max_string_length) on 32bit. For me that limit would be ok. The strings I use are not that big, but bigger than expected. And there are a lot of strings that I concat'ed. I think because of that there was so much allocation/deallocation work to do. With Buffer-module it was much faster. And even the current implementation could be done more efficient, because I use Buffer.create() locally. I could use it module-global and use Buffer.clear() or Buffer.reset(). But when performance is not an nissue, I chose for the cleaner way, which means: not even module global stuff, if possible. In a library the decision would be differently. > > if you implement a growing array of fixed sized string (4K for > example), > you just don't need to copy data each time your buffer need to grow. > I > suspect it might be even faster than the normal buffer in your case > (lots of data appending), but depends on what you do with your buffer > afterwards. I only do use that string to write it to a dbm-database. I need a certain layout of the strings, because more than one data-item must be stored for each key. It's not a complicated format, but the strings must be concated. I did this with "^" first, because I didn't expected that the string-stuff needs that much time. I thought my mathematical operations (statistical things) need most time, but my expectation was wrong. The calculations were done very fast. So using Bufeer-module instead of "^" for the concat's did bring a good performance boost. Ciao, Oliver P.S.: =============================================== # Sys.max_string_length;; - : int = 144115188075855863 # ===============================================