From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/31850 Path: main.gmane.org!not-for-mail From: Harry Putnam Newsgroups: gmane.emacs.gnus.general Subject: Re: example queries for nnir Date: 18 Jul 2000 21:11:20 -0700 Sender: owner-ding@hpc.uh.edu Message-ID: References: NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1035168214 16407 80.91.224.250 (21 Oct 2002 02:43:34 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 02:43:34 +0000 (UTC) Cc: ding@gnus.org Return-Path: Original-Received: from fisher.math.uh.edu (fisher.math.uh.edu [129.7.128.35]) by mailhost.sclp.com (Postfix) with ESMTP id AA666D051E for ; Wed, 19 Jul 2000 00:21:03 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by fisher.math.uh.edu (8.9.1/8.9.1) with ESMTP id XAC24670; Tue, 18 Jul 2000 23:17:28 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Tue, 18 Jul 2000 23:16:22 -0500 (CDT) Original-Received: from mailhost.sclp.com (postfix@sclp3.sclp.com [204.252.123.139]) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id XAA15525 for ; Tue, 18 Jul 2000 23:16:04 -0500 (CDT) Original-Received: from mail.networkone.net (mail.networkone.net [209.144.112.75]) by mailhost.sclp.com (Postfix) with SMTP id 3A135D051E for ; Wed, 19 Jul 2000 00:16:32 -0400 (EDT) Original-Received: (qmail 1937 invoked from network); 19 Jul 2000 04:16:29 -0000 Original-Received: from adsl-116-86.ln.networkone.net (HELO reader.ptw.com) (209.144.116.86) by mail.networkone.net with SMTP; 19 Jul 2000 04:16:29 -0000 Original-Received: (from reader@localhost) by reader.ptw.com (8.9.3/8.9.3) id VAA08580; Tue, 18 Jul 2000 21:16:21 -0700 X-Authentication-Warning: reader.ptw.com: reader set sender to reader@newsguy.com using -f Original-To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) In-Reply-To: Kai.Grossjohann@CS.Uni-Dortmund.DE's message of "Fri, 23 Jun 2000 14:33:19 +0200" User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.5 Original-Lines: 81 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:31850 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:31850 Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Gro=DFjohann) writes: > Once you have such a region, you can either put all the words in it > into the global field, or into some other field. In both cases, if > you want to be able to restrict the search to the body, you should NOT > specify another region which puts the words into the global field, or > into the other field. >=20 > My suggestion meant that you put all the words from the body into the > global field, and no other words into the global field. Hence > searching the global field is the same as searching the body. Is this really possible? I've experimented at some length with GLOBAL and LOCAL keywords.=20=20 Using the nnir example *.fmt file, leaving the regexps as is and only changing the index specs. In short replace every occurance of BOTH with LOCAL and for simplicity every field spec is set to TEXT rather than SOUNDEX. So that they read `TEXT LOCAL'. Only the `body' (/^$/) RE is left GLOBAL. Posting the file in case I've got something else wrong in there. But this *.fmt file causes waisindex to dump core. There are only about 1000 messages in the data base.=20 # Each mail is in a file, much like the MH format. =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 # Document separator should never match -- each file is a document. record-sep: /^@this regex should never match@$/ =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 # Searchable fields specification.=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 region: /^[sS]ubject:/ /^[sS]ubject: */=20 subject "Subject header" stemming TEXT LOCAL end: /^[^ \t]/=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ to "To and Cc headers" TEXT LOCAL end: /^[^ \t]/=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */ from "From header" TEXT LOCAL end: /^[^ \t]/=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 region: /^$/=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20 stemming TEXT GLOBAL=20=20=20=20=20=20=20=20=20=20=20=20 end: /^@this regex should never match@$/ With this command line: The last few lines of output:=20 2433: 7305: Jul 18 21:07:23 2000: 100: Total word count for dictionary is: 0 2433: 7306: Jul 18 21:07:23 2000: -1: error finding total_word_count in dictionary ./mail =20=20=20 2433: 7307: Jul 18 21:07:23 2000: -1: Could not read the dictionary block 139980800, length 1000 Segmentation fault (core dumped)