caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: "Alexander V. Voinov" <avv@quasar.ipa.nw.ru>
Cc: "caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] ocaml-3.05: a performance experience
Date: Sat, 3 Aug 2002 14:33:11 +0200	[thread overview]
Message-ID: <20020803123311.GA631@ice.gerd-stolpmann.de> (raw)
In-Reply-To: <3D49FD72.68388864@quasar.ipa.nw.ru>; from avv@quasar.ipa.nw.ru on Fri, Aug 02, 2002 at 05:33:06 +0200


On 2002.08.02 05:33 Alexander V. Voinov wrote:
> Hi All,
> 
> I have an application, which parses a huge XML file and stores resulting
> records to a database.
> 
> The file is parsed using PXP, but in a 'pulldom' manner, by extracting
> (to a Buffer) first level tags manually with pcre, then an array insert
> of 30000 recognized and accumulated records is performed. DB access
> takes a small fraction of the run time.
> 
> Compiled with ocaml-3.04 it took 1h40m+-5m of 'user' process time and
> occupied about 340M in RAM. With 3.05 it took 2h40m+-5m and occupied
> 250M. 
> 
> Is this the consequence of the new GC strategy? Actually I'd tolerate
> large footprint for the sake of more speed.
> 
> It's also interesting to note, than in the case of 3.04 the footprint of
> the application starts from 330M and slowly expands to 350M. With 3.05
> it starts with 250M and then almost does not expand till the end.
> 
> Sparc Solaris 2.7, gcc 3.0.4.
> 
> A previous version of this app, written in Python with PyXML, runs 3-4
> times slower than the 3.04 version and takes 20M in RAM.

I think you observe GC compaction. You can turn it off:
OCAMLRUNPARAM="O=1000000" (or Gc.set).

If XML validation is not needed, you could also rewrite your program
to use the new event-based parsing in PXP-1.1.90. That would completely
avoid to represent the XML tree in memory (and increase the speed, because
GC of large memory footprints is expensive).

Gerd
-- 
----------------------------------------------------------------------------
Gerd Stolpmann      Telefon: +49 6151 997705 (privat)
Viktoriastr. 45             
64293 Darmstadt     EMail:   gerd@gerd-stolpmann.de
Germany                     
----------------------------------------------------------------------------
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2002-08-03 12:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-02  3:33 Alexander V. Voinov
2002-08-03 12:33 ` Gerd Stolpmann [this message]
2002-08-03 17:27   ` [Caml-list] OCAMLRUNPARAM=b David Fox
2002-08-04  2:50   ` [Caml-list] ocaml-3.05: a performance experience Alexander V. Voinov
2002-08-04 20:45     ` Gerd Stolpmann
2002-08-05 15:18       ` John Max Skaller
2002-08-05 16:24         ` Mike Lin
2002-08-05 16:53           ` Alexander V.Voinov
2002-08-06  3:22           ` John Max Skaller
2002-08-06 13:24             ` Mike Lin
2002-08-06 11:10           ` Noel Welsh
2002-08-06 12:56             ` Andreas Rossberg
2002-08-04 18:06 Damien Doligez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020803123311.GA631@ice.gerd-stolpmann.de \
    --to=info@gerd-stolpmann.de \
    --cc=avv@quasar.ipa.nw.ru \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).