From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/31880 Path: main.gmane.org!not-for-mail From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=) Newsgroups: gmane.emacs.gnus.general Subject: Re: nnir/freeWAIS-sf Date: Fri, 21 Jul 2000 19:27:41 +0200 Sender: owner-ding@hpc.uh.edu Message-ID: References: NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035168241 16595 80.91.224.250 (21 Oct 2002 02:44:01 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 02:44:01 +0000 (UTC) Cc: ding@gnus.org Return-Path: Original-Received: from fisher.math.uh.edu (fisher.math.uh.edu [129.7.128.35]) by mailhost.sclp.com (Postfix) with ESMTP id DA141D051E for ; Fri, 21 Jul 2000 13:33:55 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by fisher.math.uh.edu (8.9.1/8.9.1) with ESMTP id MAC00049; Fri, 21 Jul 2000 12:30:15 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Fri, 21 Jul 2000 12:28:05 -0500 (CDT) Original-Received: from mailhost.sclp.com (postfix@sclp3.sclp.com [204.252.123.139]) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id MAA15729 for ; Fri, 21 Jul 2000 12:27:45 -0500 (CDT) Original-Received: from waldorf.cs.uni-dortmund.de (waldorf.cs.uni-dortmund.de [129.217.4.42]) by mailhost.sclp.com (Postfix) with ESMTP id 9FEE0D051E for ; Fri, 21 Jul 2000 13:28:17 -0400 (EDT) Original-Received: from marcy.cs.uni-dortmund.de (marcy.cs.uni-dortmund.de [129.217.20.159]) by waldorf.cs.uni-dortmund.de with ESMTP id TAA23862; Fri, 21 Jul 2000 19:27:41 +0200 (MES) Original-Received: from lucy.cs.uni-dortmund.de (lucy [129.217.20.160]) by marcy.cs.uni-dortmund.de id TAA11391; Fri, 21 Jul 2000 19:27:41 +0200 (MET DST) Original-Received: (from grossjoh@localhost) by lucy.cs.uni-dortmund.de (8.9.3/8.9.3/Debian 8.9.3-21) id TAA07979; Fri, 21 Jul 2000 19:27:41 +0200 X-Authentication-Warning: lucy.cs.uni-dortmund.de: grossjoh set sender to Kai.Grossjohann@CS.Uni-Dortmund.DE using -f Original-To: Harry Putnam In-Reply-To: Harry Putnam's message of "20 Jul 2000 09:33:13 -0700" User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.0.90 Original-Lines: 101 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:31880 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:31880 On 20 Jul 2000, Harry Putnam wrote: > bsd > waissearch -d mail 'global resounding and silence' > > Search Response: > NumberOfRecordsReturned: 1 > 1: Score: 2113, lines: 54 '2177 /home/reader/Mail/ding2/' > > Seems to work... but wait.. that message contains neither > `resounding' nor silence. Whee. Hm. Maybe fwsf implements `and' in a fuzzy way. This is useful for people who issue queries like `term1 and term2 and ... and term10'. If there are no documents with all ten terms, chances are that people will be happy with a document containing only nine of them. I'm not sure about this one, though. > Message-ID: (on ding) > > Whats worse > > `grep -rl 'resounding.*silence' ~Mail' easily finds 5 that actually > contain the strings. > > /home/reader/Mail/ding2/2771 > /home/reader/Mail/ding2/2790 > <14703.1528.800308.229816@klortho.stepstone.ie> > /home/reader/Mail/bbdb/460 > /home/reader/Mail/bbdb/463 > <14703.1528.800308.229816@klortho.stepstone.ie> > /home/reader/Mail/bbdb/472 > > Its looking more and more like freewais is just not a sturdy tool > like glimpse. It has to be molycoddled every inch of the way, every > phase is as painful as pulling teeth. Then the end result is flaky > and not dependable. It lacks precision in searching. And on the > command line fails to show the hits in some fashion. Only full > documents. Yes, it appears so. FWIW, I'm quite interested in reading all this. >>From an Information Retrieval point of view, fwsf is doing the right things. Yet it is obviously not easy to use at all! Quite amazing. But thanks a lot for persevering, this sure helps me to learn things, and I can only hope that my feeble attempts at getting some of this into DesIRe (the successor of fwsf) will bear some fruits. > The little awk based search tool I made, is much sturdier and can be > used on any unix like platform. It is excruciatingly slow, but > because it is fully regexp based it finds strings with great > precision. Needs some handy way to insert the search string, in an > easy one step manor very badly too. Maybe it could be integrated into nnir.el. Hm. Do you think that in principle the idea of producing a summary buffer containing the search results is a good idea? If so, it might be worth it to try to integrate the two. Basically, nnir.el needs a list of article identifiers as a result. The article identifier needs to contain the group name (in some form) and the article number. So if your tool just spits out the file names, this should be sufficient for searching nnml groups. > Sometimes `ranking'or `heuristics' of some kind aren't what is > needed. :-) > I'm thinking of how to intetgrate that into gnus. It can report hte > group and file name, message id or whatever. I wasn't able to see > in nnir how the lisp code grabs that info from glimpse or wais. But > surely if the tool can pass the article number, filename, message > ID, then gnus can assemble the hits. Yes. Hm. You may wish to have a look at the nnir-run-glimpse function. This function contains two parts. The first part invokes glimpse with the right options. The second part expects glimpse to produce a list of file names, which is then massaged in an appropriate way. It seems that you can reuse the second part (if your tool just prints file names), but have to change the first part a bit. When you have written your nnir-run-harrys-tool function, you can hook it into nnir.el by adding an entry into nnir-engines, like this: (add-to-list 'nnir-engines '(harrys-tool nnir-run-harrys-tool nil)) And then you (setq nnir-search-engine 'harrys-tool), and that's it! (You might need a couple of variables, for example a variable for your home dir, so that you can cut off the right prefix from the file names. Can you understand the code in nnir-run-glimpse that does this?) kai -- I like BOTH kinds of music.