From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.7 required=5.0 tests=AWL,SPF_SOFTFAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 4CE45BB84 for ; Sat, 17 Jan 2009 10:28:47 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApEBAKszcUnVhjEYkWdsb2JhbACUAwEBAQEJCwoHEQO2boQ0gT4 X-IronPort-AV: E=Sophos;i="4.37,280,1231110000"; d="scan'208";a="19744048" Received: from ihsmtp02voda.lis.interhost.com (HELO ihsmtp02cons.lis.interhost.com) ([213.134.49.24]) by mail2-smtp-roc.national.inria.fr with ESMTP; 17 Jan 2009 10:28:46 +0100 Received: from [192.168.1.64] ([77.54.249.136]) by ihsmtp02cons.lis.interhost.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 17 Jan 2009 09:26:10 +0000 Message-ID: <4971A4CD.9070708@inescporto.pt> Date: Sat, 17 Jan 2009 09:28:45 +0000 From: Hugo Ferreira User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: Andrej Bauer Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Optimizing symbolic processing code References: <4970488C.9080104@inescporto.pt> <200901161341.53632.jon@ffconsultancy.com> <49709693.50201@inescporto.pt> <4970B398.5010100@inescporto.pt> <7d8707de0901161109y2de73536oc7a454f4f0e1ad91@mail.gmail.com> <7d8707de0901161248t39960316v46c6fa64e8001531@mail.gmail.com> In-Reply-To: <7d8707de0901161248t39960316v46c6fa64e8001531@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 17 Jan 2009 09:26:10.0089 (UTC) FILETIME=[A2037990:01C97885] X-Spam: no; 0.00; andrej:01 andrej:01 unification:01 integers:01 parser:01 arrays:01 unification:01 lri:01 filliatr:01 experimented:01 mutable:01 unifier:01 u-f:01 ocaml:01 variants:01 Andrej, First and foremost thanks for taking the time to answer. Andrej Bauer wrote: > After being so bad spirited in my last message, I decided to make it > up by doing something positive. I have added to the PL Zoo a mini > prolog interpreter, see http://andrej.com/plzoo/ . Interesting. I had visited this page for a quick look and never realized (or don't remember) seeing any Prolog interpreter. > It is very slow and Maybe be slow, but it is very clear and concise. > I am sure a decent implementation would speed it up by an order of > magnitude (at least a 100 fold). Indeed this may be possible. > I wonder how your implementation > compares to mine. > I will go through your code in the order it is available: 1) First thing I noticed is that you have very few type of terms. In my case I add parsing and unification of integers, strings, lists, negation (not), etc. I also added some extras because I use the same parser to manipulate first order logic sentences used by a AI planner. 2) Your parsing doesn't construct a symbol table. So this means that all comparisons are string based. In my case all comparisons are integer based i.e: everything from a predicate symbol to a constant is an integer. 3) Your variable bank is based on a list. This means that any look-up requires linear time. I use arrays for this. This has effects on unification. 4) Related to the above I use a Union-Find implementation (See http://www.lri.fr/~filliatr/ for the implementation I use) to bind variables. I have also experimented with another data structure for this, but this implementation is simple and fast although mutable. 5) Your unification algorithm looks like a standard (at-worst) quadric order unifier, which is not too bad. However you use your linear list of substitutions (3+4). What is more your occurs check is done on every variable - term binding. I on the other hand, use a near-linear algorithm using fast union-find and do a occurs check only at the end for only those terms bound to variable and again using the U-F data structure. 6) Ok, this one is the one that seems to be killing my application. You use a very simple database, its basically a list of assertions. Any look-up is linear in respect to the number of assertions, which means that resolution of a goal is exponential in respect to the number of assertions and number of goals. I use a discriminant trie whose search is linear in respect to the number of goal (as opposed to the assertions). 7) Before unification you take care of performing variable renaming. This has to be repeated every time you use a predicate. I use clauses represented in a canonical form. Renaming for me is simply a matter of bumping a counter of variables in the variable bank and attaching this offset to the terms in question. I also "reuse" variables because of the way the variable bank is implemented. 8) Your algorithm is a very clean implementation of SLDNF. You keep a stack of sorts and allow one to continue search for the next goal. One of my implementations did this however keeping track of the position in a trie resulted in complicated code. I now use simple folds over the data structure. A function is invoked whenever a solution is found. If only one solution is required then a quick exit is performed via an exception. In your case your implementation is simpler because it uses only lists. Ok, in respect to your first response: > Judging from what your responses, the most probable explanation for > inefficiency is that you implemented your prolog interpreter badly. Possibly yes, but from what I have explained above you can see I have taken pains to have a decent intepreter. > It would help a lot if you just showed us your code. True. But the problem is I did not know what was killing performance, hence the request for more information on how to diagnose the problem. Only then would I analyse further and ask for help with more details. I have compiled and executed the code with profiling. "grpof" shows that about 8-10 % of the time is spent on folding over the trie (I use map folds and finds, why are map folds taking so long?). In other words it is not an issue on unification or the resolution function but the search in the trie. I also find calls to caml currying functions. This seems to point me to [1] for solutions. I am going to make some additional experiments in order to diagnose the problem further. One simple question: is Ocaml matching fast enough that I need not worry with: a) the number of variants in a type b) comparisons of the sort: and bind_all_var_to_any f h ps t delay acc = (* k a b -> b *) let scan k e acc = match e with | Leaf _ -> acc | Node(Pred.Rel _,ps',vs',jps',jvs') -> (* jump over ptedicate *) fold_all f jps' jvs' t delay acc | Node(_,ps',vs',jps',jvs') -> fold_all f ps' vs' t delay acc in Node_map.fold (scan) ps acc Once again, appreciate any comments on the above. Regards, Hugo F. [1] http://ocaml.janestreet.com/?q=node/30 > Best regards, > > Andrej > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs >