From: Harry Putnam <reader@newsguy.com>
Cc: ding@gnus.org
Subject: Re: example queries for nnir
Date: 18 Jul 2000 21:11:20 -0700 [thread overview]
Message-ID: <m2lmyyn4if.fsf@reader.ptw.com> (raw)
In-Reply-To: Kai.Grossjohann@CS.Uni-Dortmund.DE's message of "Fri, 23 Jun 2000 14:33:19 +0200"
Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
> Once you have such a region, you can either put all the words in it
> into the global field, or into some other field. In both cases, if
> you want to be able to restrict the search to the body, you should NOT
> specify another region which puts the words into the global field, or
> into the other field.
>
> My suggestion meant that you put all the words from the body into the
> global field, and no other words into the global field. Hence
> searching the global field is the same as searching the body.
Is this really possible? I've experimented at some length with GLOBAL
and LOCAL keywords.
Using the nnir example *.fmt file, leaving the regexps as is and only
changing the index specs. In short replace every occurance of BOTH
with LOCAL and for simplicity every field spec is set to TEXT rather
than SOUNDEX. So that they read `TEXT LOCAL'.
Only the `body' (/^$/) RE is left GLOBAL.
Posting the file in case I've got something else wrong in there. But
this *.fmt file causes waisindex to dump core. There are only about
1000 messages in the data base.
# Each mail is in a file, much like the MH format.
# Document separator should never match -- each file is a document.
record-sep: /^@this regex should never match@$/
# Searchable fields specification.
region: /^[sS]ubject:/ /^[sS]ubject: */
subject "Subject header" stemming TEXT LOCAL
end: /^[^ \t]/
region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */
to "To and Cc headers" TEXT LOCAL
end: /^[^ \t]/
region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */
from "From header" TEXT LOCAL
end: /^[^ \t]/
region: /^$/
stemming TEXT GLOBAL
end: /^@this regex should never match@$/
With this command line:
The last few lines of output:
2433: 7305: Jul 18 21:07:23 2000: 100: Total word count for dictionary
is: 0
2433: 7306: Jul 18 21:07:23 2000: -1: error finding
total_word_count in dictionary ./mail
2433: 7307: Jul 18 21:07:23 2000: -1: Could not read the dictionary
block 139980800, length 1000
Segmentation fault (core dumped)
prev parent reply other threads:[~2000-07-19 4:11 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-06-17 18:21 Harry Putnam
2000-06-18 19:27 ` Kai Großjohann
2000-06-18 20:31 ` Harry Putnam
2000-06-18 22:25 ` Kai Großjohann
2000-06-19 0:18 ` Harry Putnam
2000-06-20 11:29 ` Harry Putnam
2000-06-20 16:30 ` Kai Großjohann
2000-06-20 23:32 ` Harry Putnam
2000-06-23 12:33 ` Kai Großjohann
2000-06-23 23:50 ` Harry Putnam
2000-06-24 19:36 ` Kai Großjohann
2000-06-24 23:23 ` Harry Putnam
2000-06-25 7:22 ` Norbert Koch
2000-07-19 4:11 ` Harry Putnam [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m2lmyyn4if.fsf@reader.ptw.com \
--to=reader@newsguy.com \
--cc=ding@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).