From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA16450; Sat, 3 Aug 2002 14:34:17 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA16242 for ; Sat, 3 Aug 2002 14:34:16 +0200 (MET DST) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.184]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g73CXR504527 for ; Sat, 3 Aug 2002 14:33:29 +0200 (MET DST) Received: from [212.227.126.160] (helo=mrelayng0.kundenserver.de) by moutng6.kundenserver.de with esmtp (Exim 3.35 #2) id 17ay62-00089M-00; Sat, 03 Aug 2002 14:33:26 +0200 Received: from [80.129.103.239] (helo=gate.gerd-stolpmann.de) by mrelayng0.kundenserver.de with asmtp (Exim 3.35 #2) id 17ay62-0001a7-00; Sat, 03 Aug 2002 14:33:26 +0200 Received: from ice.gerd-stolpmann.de (ice.gerd-stolpmann.de [192.168.0.13]) by gate.gerd-stolpmann.de (Postfix) with ESMTP id 897ABCCDD; Sat, 3 Aug 2002 14:33:19 +0200 (CEST) Received: from ice (localhost [127.0.0.1]) by ice.gerd-stolpmann.de (Postfix) with ESMTP id D2B891AD93; Sat, 3 Aug 2002 14:33:12 +0200 (MEST) Date: Sat, 3 Aug 2002 14:33:11 +0200 From: Gerd Stolpmann To: "Alexander V. Voinov" Cc: "caml-list@inria.fr" Subject: Re: [Caml-list] ocaml-3.05: a performance experience Message-ID: <20020803123311.GA631@ice.gerd-stolpmann.de> References: <3D49FD72.68388864@quasar.ipa.nw.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3D49FD72.68388864@quasar.ipa.nw.ru>; from avv@quasar.ipa.nw.ru on Fri, Aug 02, 2002 at 05:33:06 +0200 X-Mailer: Balsa 1.3.6 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On 2002.08.02 05:33 Alexander V. Voinov wrote: > Hi All, > > I have an application, which parses a huge XML file and stores resulting > records to a database. > > The file is parsed using PXP, but in a 'pulldom' manner, by extracting > (to a Buffer) first level tags manually with pcre, then an array insert > of 30000 recognized and accumulated records is performed. DB access > takes a small fraction of the run time. > > Compiled with ocaml-3.04 it took 1h40m+-5m of 'user' process time and > occupied about 340M in RAM. With 3.05 it took 2h40m+-5m and occupied > 250M. > > Is this the consequence of the new GC strategy? Actually I'd tolerate > large footprint for the sake of more speed. > > It's also interesting to note, than in the case of 3.04 the footprint of > the application starts from 330M and slowly expands to 350M. With 3.05 > it starts with 250M and then almost does not expand till the end. > > Sparc Solaris 2.7, gcc 3.0.4. > > A previous version of this app, written in Python with PyXML, runs 3-4 > times slower than the 3.04 version and takes 20M in RAM. I think you observe GC compaction. You can turn it off: OCAMLRUNPARAM="O=1000000" (or Gc.set). If XML validation is not needed, you could also rewrite your program to use the new event-based parsing in PXP-1.1.90. That would completely avoid to represent the XML tree in memory (and increase the speed, because GC of large memory footprints is expensive). Gerd -- ---------------------------------------------------------------------------- Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 45 64293 Darmstadt EMail: gerd@gerd-stolpmann.de Germany ---------------------------------------------------------------------------- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners