From: Harry Putnam <reader@newsguy.com>
Cc: ding@gnus.org,
Norbert Gövert <goevert@amaunet.cs.uni-dortmund.de>,
Ulrich Pfeifer <upf@de.uu.net>
Subject: Re: nnir/freeWAIS-sf
Date: 16 Jul 2000 09:17:41 -0700 [thread overview]
Message-ID: <m2em4uaw2i.fsf@reader.ptw.com> (raw)
In-Reply-To: Kai.Grossjohann@CS.Uni-Dortmund.DE's message of "Sun, 16 Jul 2000 14:25:42 +0200"
NOTE: If you want the punch line first.. skip straight to a section below
preceeded by:
"*NOW THE GOOD PART!*"
Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
> I have now found out something with respect to searching in `from' and
> `to' fields.
>
> (For Norbert & Uli: I'm indexing my mail with freeWAIS-sf, and
> specifying `SOUNDEX BOTH' in the format file for the `to' and `from'
> fields.)
Above are you refering to the example *.fmt file in nnir-1.57 as it stands?
>
> Running `dictionary mail_field_from' tells me that the dictionary
> contains a lot of soundex codes (which is the right thing, since we
> are specifying soundex in the format file). And indeed, searching for
> the soundex codes works:
Ditto here if soundex codes look like:
term occurances pointer
m300 83886080 5687214
m320 536870912 1810235
[...]
>
> /----
> | $ waissearch -d mail from=m230
> |
> | Search Response:
> | NumberOfRecordsReturned: 16
> | 1: Score: 3471, lines: 55 '445 /home-local/grossjoh/Mail/auto/linux-utf8/'
> | 2: Score: 3471, lines: 111 '491 /home-local/grossjoh/Mail/auto/linux-utf8/'
> | 3: Score: 3098, lines: 79 '3416 /home-local/grossjoh/Mail/auto/dbworld/'
> | [...]
> \----
> But searching for normal terms does _not_ work.
I see the nearly the opposite behavior: From and To searches find nothing
but free text and Subject searches work.
Using your example *.fmt file and this indexing command:
waisindex -r -d mail -stem -t fields ~/Mail
Free text search (this data base contains my ding-list and bbdb-list nnml
directories):
waissearch -d mail agent
Search Response:
NumberOfRecordsReturned: 6
1: Score: 1546, lines: 229 '2591 /home/reader/Mail/ding2/'
2: Score: 1434, lines: 60 '2592 /home/reader/Mail/ding2/'
3: Score: 1402, lines: 83 '2783 /home/reader/Mail/ding2/'
4: Score: 1378, lines: 82 '1964 /home/reader/Mail/ding2/'
5: Score: 1349, lines: 99 '2611 /home/reader/Mail/ding2/'
6: Score: 1123, lines: 55 '2514 /home/reader/Mail/ding2/'
Subject search:
waissearch -d mail subject=give
Search Response:
NumberOfRecordsReturned: 6
1: Score: 2484, lines: 87 '2771 /home/reader/Mail/ding2/'
2: Score: 2222, lines: 153 '2791 /home/reader/Mail/ding2/'
3: Score: 2222, lines: 69 '2793 /home/reader/Mail/ding2/'
4: Score: 2222, lines: 160 '2 /home/reader/Mail/bbdb/'
5: Score: 2222, lines: 76 '3 /home/reader/Mail/bbdb/'
6: Score: 2222, lines: 174 '11 /home/reader/Mail/bbdb/'
>From search:
waissearch -d mail from=Ronan
Search Response:
NumberOfRecordsReturned: 1
1: Score: 0, lines:3457 'Search produced no result. Here's the Catalog for database: mail'
However by changing SOUNDEX BOTH to TEXT BOTH I find that the from
search then works:
>From search with edited *.fmt file:
waissearch -d mail from=Ronan
Search Response:
NumberOfRecordsReturned: 4
1: Score: 2403, lines: 153 '2791 /home/reader/Mail/ding2/'
2: Score: 2403, lines: 69 '2793 /home/reader/Mail/ding2/'
3: Score: 2403, lines: 160 '2 /home/reader/Mail/bbdb/'
4: Score: 2403, lines: 76 '3 /home/reader/Mail/bbdb/'
At first I thought BOTH meant LOCAL and GLOBAL but I think now it doesn't
Because I found that if I set the field specific parts to LOCAL then a
free text search fails and reports that it cannot find the dataindex.
The last two lines of indexing output with the *.fmt fields set TEXT LOCAL
Tells the story:
1731: 3481: Jul 16 08:37:55 2000: 100: Total word count for dictionary is: 0
1731: 3482: Jul 16 08:37:55 2000: -1: error finding total_word_count
in dictionary ./mail
No dictionary is built.
*NOW THE GOOD PART!*
Pouring through the info file I found this passage that finally might
be an explanation of what is needed. And to corroborate this I tried
setting `To' and `From" to SOUNDEX LOCAL TEXT BOTH like this:
to "To and Cc headers" SOUNDEX LOCAL TEXT BOTH
Telling the indexer to "put the word in the default and the 'to'[ed -hp]
category and its soundex code only in the `to'[ed -hp] category."
So BOTH here means .. the default (which is LOCAL) and the current catagory.
And all type of searches now work.... Whoopee
dictionary mail_field_from mail is now more than twice its previous size
and contains both SOUNDEX and TEXT code.
>From the INFO file:
Consider the following example:
region: /^AU: /
au "author names" SOUNDEX LOCAL TEXT BOTH
end: /^[A-Z][A-Z]:/
To the indexer this means:
For all words starting with `AU: ' at the beginning of a line up to a
line which starts with two capital letters followed by a colon and a
blank, put the word in the default and the `au' category and its soundex
code only in the `au' category.
Thus an author name can be found in the created database in the default
category or the `au' category if the exact spelling is known. If the
name is misspelled, it might be found using the query `au=(soundex
MISSPELLED-NAME)'. *Note Sample Format::, *Note Query Syntax::.
Included working *.fmt file:
# Harry's rendition of Kai's format file for freeWAIS-sf for indexing mails.
# Each mail is in a file, much like the MH format.
# Document separator should never match -- each file is a document.
record-sep: /^@this regex should never match@$/
# Searchable fields specification.
region: /^[sS]ubject:/ /^[sS]ubject: */
subject "Subject header" stemming TEXT BOTH
end: /^[^ \t]/
region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */
to "To and Cc headers" SOUNDEX LOCAL TEXT BOTH
end: /^[^ \t]/
region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */
from "From header" SOUNDEX LOCAL TEXT BOTH
end: /^[^ \t]/
region: /^$/
stemming TEXT GLOBAL
end: /^@this regex should never match@$/
next prev parent reply other threads:[~2000-07-16 16:17 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-07-15 13:53 nnir/freeWAIS-sf Harry Putnam
2000-07-15 18:04 ` nnir/freeWAIS-sf Norman Walsh
2000-07-15 18:10 ` nnir/freeWAIS-sf Francisco Solsona
2000-07-15 21:22 ` nnir/freeWAIS-sf Harry Putnam
2000-07-17 13:51 ` nnir/freeWAIS-sf Francisco Solsona
2000-07-18 1:03 ` nnir/freeWAIS-sf Harry Putnam
2000-07-18 9:06 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-19 0:57 ` nnir/freeWAIS-sf Harry Putnam
2000-07-20 14:34 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-20 18:13 ` nnir/freeWAIS-sf Harry Putnam
2000-07-21 17:31 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-21 22:35 ` nnir/freeWAIS-sf Harry Putnam
2000-07-16 12:25 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-16 16:17 ` Harry Putnam [this message]
2000-07-16 21:43 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-16 22:22 ` nnir/freeWAIS-sf Harry Putnam
2000-07-20 14:44 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-16 23:08 ` nnir/freeWAIS-sf Harry Putnam
2000-07-20 14:48 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-20 16:33 ` nnir/freeWAIS-sf Harry Putnam
2000-07-21 17:27 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-21 22:04 ` nnir/freeWAIS-sf Harry Putnam
2000-07-21 22:34 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-21 23:12 ` nnir/freeWAIS-sf Harry Putnam
2000-07-22 11:59 ` nnir/freeWAIS-sf Kai Großjohann
2000-07-22 13:40 ` nnir/freeWAIS-sf Harry Putnam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m2em4uaw2i.fsf@reader.ptw.com \
--to=reader@newsguy.com \
--cc=ding@gnus.org \
--cc=goevert@amaunet.cs.uni-dortmund.de \
--cc=upf@de.uu.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).