From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/31451 Path: main.gmane.org!not-for-mail From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=) Newsgroups: gmane.emacs.gnus.general Subject: Re: example queries for nnir Date: Sun, 18 Jun 2000 21:27:28 +0200 Sender: owner-ding@hpc.uh.edu Message-ID: References: NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035167859 14090 80.91.224.250 (21 Oct 2002 02:37:39 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 02:37:39 +0000 (UTC) Cc: ding@gnus.org Return-Path: Original-Received: from karazm.math.uh.edu (karazm.math.uh.edu [129.7.128.1]) by mailhost.sclp.com (Postfix) with ESMTP id A49E3D051F for ; Sun, 18 Jun 2000 15:28:56 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by karazm.math.uh.edu (8.9.3/8.9.3) with ESMTP id OAC29844; Sun, 18 Jun 2000 14:28:39 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Sun, 18 Jun 2000 14:27:42 -0500 (CDT) Original-Received: from mailhost.sclp.com (postfix@sclp3.sclp.com [204.252.123.139]) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id OAA11252 for ; Sun, 18 Jun 2000 14:27:31 -0500 (CDT) Original-Received: from waldorf.cs.uni-dortmund.de (waldorf.cs.uni-dortmund.de [129.217.4.42]) by mailhost.sclp.com (Postfix) with ESMTP id 334F5D051F for ; Sun, 18 Jun 2000 15:28:03 -0400 (EDT) Original-Received: from marcy.cs.uni-dortmund.de (marcy.cs.uni-dortmund.de [129.217.20.159]) by waldorf.cs.uni-dortmund.de with ESMTP id VAA15713; Sun, 18 Jun 2000 21:27:29 +0200 (MES) Original-Received: from lucy.cs.uni-dortmund.de (lucy [129.217.20.160]) by marcy.cs.uni-dortmund.de id VAA17155; Sun, 18 Jun 2000 21:27:28 +0200 (MET DST) Original-Received: (from grossjoh@localhost) by lucy.cs.uni-dortmund.de (8.9.3/8.9.3/Debian 8.9.3-21) id VAA05113; Sun, 18 Jun 2000 21:27:28 +0200 X-Authentication-Warning: lucy.cs.uni-dortmund.de: grossjoh set sender to Kai.Grossjohann@CS.Uni-Dortmund.DE using -f Original-To: Harry Putnam In-Reply-To: Harry Putnam's message of "17 Jun 2000 11:21:13 -0700" User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/21.0.90 Original-Lines: 91 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:31451 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:31451 Harry Putnam writes: > A small request for nnir users with some experience, to post a few > sample queries using the `glimpse' or `wais' engines. > > Very unfamiliar with wais here, so any examples would be very usefull. Using the example format file given in nnir.el, there are four fields: to, from, subject, and the global field. The to field contains the To and Cc mail headers. The from field contains the From mail header. The subject field contains the Subject mail header. The global field contains all of the above headers, plus the body of each message. Thus, the query `foo' searches for foo occuring anywhere in the body or in the above mentioned header fields. `from=miller' searches for `miller' in the From header. `to=smith' searches for `smith' in the To and Cc headers. I think you can see what `subject=bla' does... If you want to search for several words, you can say `to=(smith and miller)' to search for messages which were sent to both people at the same time. `to=(smith or miller)' finds messages sent to smith, as well as messages sent to miller, as well as messages sent to both smith and miller. Due to ranking, messages sent to both smith and miller have a higher score than messages sent to only smith or only miller. The score is that four-digit number in the beginning of the subject header. Since `or' is the default, you can say `to=(smith miller)' instead of `to=(smith or miller)'. You can also combine searching in several fields, such as from=miller and subject=hungry and blarfl Here, a matching mail must be from miller, and `hungry' must be in the subject header, and `blarfl' must be somewhere in the body (or in the From, To, Cc, Subject header). > Glimpse queries can be difficult since glimpse doesn't allow full > regexp in the actual query string. > > This can make something like searching for a subject where you don't > know the full string difficult since ^Subject:.*KEYWORDS won't work. It doesn't? The documentation seems to say it should. Does `^Subject:#KEYWORD' do anything useful? Oh, now I see that the documentation doesn't mention `^', so that's why it doesn't work. But does `Subject:#KEYWORD' work? `#' seems to be an abbrev for `.*'. > An example that can be confusing is(from nnir.el): > > ;; . . . . . . . . . . . . . . . . . . . . . . . . . . . . The > ;; second variable to set is `nnir-search-engine'. Choose one of the > ;; engines listed in `nnir-engines'. I have now included an explicit pointer to the variable documentation. I have also tried to beef up the documentation for nnir-engines. Did it help? New version on ftp server, ftp://ls6-ftp.cs.uni-dortmund.de/pub/src/emacs/ > Showing some example settings would be a little less confusing. Right. Hm. I'll do that. > A final query.. Does anyone know of a search tool that uses a database > but also allows full regexp use in the queries? This seems to be > mutually exclusive, or something. Normal search engines build a word-based index, and when they see a query for the word foo, they look in the index for all documents containing that word. This approach does not really play ball with full-blown regular expressions... For regular expressions, you would have to scan each document in succession, and with multi-gigabyte databases, this is just much too slow. But it does appear that Glimpse tries to strike a balance between added efficiency due to indexes and the power of regular expressions. kai -- I like BOTH kinds of music.