From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id XAA27425; Wed, 8 Oct 2003 23:25:11 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id XAA30901 for ; Wed, 8 Oct 2003 23:25:10 +0200 (MET DST) Received: from nemerle.org (lilith.ii.uni.wroc.pl [156.17.4.7]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id h98LP9115767 for ; Wed, 8 Oct 2003 23:25:10 +0200 (MET DST) Received: from malekith by nemerle.org with local (Exim 4.24) id 1A7Lnv-00020O-5M; Wed, 08 Oct 2003 23:25:07 +0200 Date: Wed, 8 Oct 2003 23:22:23 +0200 From: Michal Moskal To: Pierre Weis Cc: caml-list@pauillac.inria.fr Subject: scanf (Re: [Caml-list] what is the functional way to solve this problem?) Message-ID: <20031008212223.GA23554@roke.freak> Mail-Followup-To: Pierre Weis , caml-list@pauillac.inria.fr References: <20031008160915.GA4730@roke.freak> <200310082034.WAA29610@pauillac.inria.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200310082034.WAA29610@pauillac.inria.fr> User-Agent: Mutt/1.4.1i X-PGP-Fingerprint: CF89 1B14 11BE 1CC9 2CA3 7497 5E32 69B4 BC71 B4C2 X-Loop: caml-list@inria.fr X-Spam: no; 0.00; michal:01 moskal:01 malekith:01 pld-linux:01 scanf:01 caml-list:01 pierre:01 weis:01 slower:01 scanf:01 bscanf:01 filenames:01 usr:01 filenames:01 caching:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Wed, Oct 08, 2003 at 10:34:40PM +0200, Pierre Weis wrote: > [...] > > Hm, with your version it's only 3 times slower: > [...] > > However, your code is not correct: > > > Scanf.bscanf Scanf.Scanning.stdib " %d %d %s" > > ----------------------------------------------------^^ > > > > It should be [^\n], as filenames can contain spaces. In this case: > > I did not know the complete specification :) Neither did I ;-) I just generated filesystem data from my /usr/share and there was few files that contained spaces in names. It simply more reasonable to assume filenames can contain spaces then to assume they can't. OTOH filenames can as well contain \n... But we are talking about scanf efficiency, so original problem becomes somewhat irrelevant :-) > However, your proposed patch is rather inefficient: > > > It should be [^\n], as filenames can contain spaces. In this case: > ---------------^^^^^ > > You should prefer %s@\n, if efficiency is a concern (which since to be the > case :) I admit I didn't go that throughly through scanf manual. And there seems to be no indication about efficiency of both solutions. But as I understand [] thing would involve creating some flag array for all 256 characters, for each end every scanf call, which has to be slow. Maybe some caching (for cases where [] really needs to be used)? > > Well, it's 15 times slower. > > Could you please let me have access to the data to be able to test the > code without bothering you ? Sure, but along with stats.ml I posted mkd.ml (from make-data). You can run it on your /usr or /usr/share redirecting input to file. If you need *my* test case, please drop me an email. > Thank you very much indeed for your testbed case, since you exhibit > situations where there is room for improvement in the implementation > of scanf (or at least there is the necessity to explain what are the > efficient pattern constructs for Scanf). I generally not use Scanf. This is because I use OCaml mostly for some compilers, preprocessors etc, that need more sophisticated lexing. It is however nice when using OCaml to do some simple and/or scripting tasks. I used Scanf mostly for programming contest, where input data is formated into sequence of numbers or some simple strings. And found it to be fast enough (you know, this is programming contest, they time you, so you have to be fast). But I used only %d and maybe %s, mainly through: let get_int () = Scanf.scanf " %d" (fun x -> x) Which was more natural to me (didn't still convert myself to all-functional-style-I'm-using-only-Haskell :-) -- : Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv : When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners