* nnir/freeWAIS-sf @ 2000-07-15 13:53 Harry Putnam 2000-07-15 18:04 ` nnir/freeWAIS-sf Norman Walsh ` (3 more replies) 0 siblings, 4 replies; 26+ messages in thread From: Harry Putnam @ 2000-07-15 13:53 UTC (permalink / raw) FreeWAIS seems to be a difficult beast to deal with... I finally got version 2.2.13 installed successfully on a FreeBSD OS. Still having poor luck getting freeWAIS to co-operate with nnir. In brief, the problem areas I encounter are: 1) The C-u G G (allow group selection) option does not work with freeWAIS 2) Queries aimed at `from' or `to' fields fail, although queries to `subject' field or global queries, work First a few facts about the basis of this report: 1) I've created a ~/Mail directory containing only two sub directories. My collections of messages to ding list and bbdb list So ~/Mail/ ding bbdb 2) Using the example *.fmt file from nnir-1.57.el and giving it the title mail.fmt: # Kai's format file for freeWAIS-sf for indexing mails. # Each mail is in a file, much like the MH format. # Document separator should never match -- each file is a document. record-sep: /^@this regex should never match@$/ # Searchable fields specification. region: /^[sS]ubject:/ /^[sS]ubject: */ subject "Subject header" stemming TEXT BOTH end: /^[^ \t]/ region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ to "To and Cc headers" SOUNDEX BOTH end: /^[^ \t]/ region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */ from "From header" SOUNDEX BOTH end: /^[^ \t]/ region: /^$/ stemming TEXT GLOBAL end: /^@this regex should never match@$/ 3) Wais command line used to create the index: waisindex -r -d mail -stem -t fields ~/Mail (also tried without -stem getting the same results: waisindex -r -d mail -t fields ~/Mail ) 4) These settings in .gnus: (load "nnir-1.57.elc") (setq nnir-wais-database "/home/reader/.wais/mail") (setq nnir-search-engine `wais) In gnus Group buffer pressing `C-u G G': Query: nnir <RET> (No prompt to select group spec appears during any of what follows) Gives almost instantaneous results of two messages from ding list Query: from=Kai <RET> Gives: Couldn't request group: Search produced empty results (but we all know better..: grep -r ^From:.*Kai' Mail/ding2|wc -l 150) I read somewhere that freeWAIS has trouble with words containing both upper and lower case so: Query: from=rossjohann <RET> Gives: Couldn't request group: Search produced empty results (But again we know better: grep -v 'From:.*rossjohann' ~/Mail|wc -l 150) Further: Query: to=ding or to=bbdb both give the no results message There are *NO* stop words in the index Subject queries work: Query: subject=nnir Gives intant results from ding group Query: subject=postal Gives instant results from bbdb group Query: subject=give Gives intant results from both groups ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-15 13:53 nnir/freeWAIS-sf Harry Putnam @ 2000-07-15 18:04 ` Norman Walsh 2000-07-15 18:10 ` nnir/freeWAIS-sf Francisco Solsona ` (2 subsequent siblings) 3 siblings, 0 replies; 26+ messages in thread From: Norman Walsh @ 2000-07-15 18:04 UTC (permalink / raw) / Harry Putnam <reader@newsguy.com> was heard to say: | FreeWAIS seems to be a difficult beast to deal with... I finally got | version 2.2.13 installed successfully on a FreeBSD OS. I couldn't get FreeWAIS working at all on my Linux box. And I can't use glimpse because there's a significant chunk of my email that's "corporate" to be sure. However, I did finally get the Rememberance Agent working (http://rhodes.www.media.mit.edu/people/rhodes/RA/) and it's tres cool. But not related to nnir at all. Be seeing you, norm -- Norman Walsh <ndw@nwalsh.com> | I have seen the truth and it makes no http://nwalsh.com/ | sense. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-15 13:53 nnir/freeWAIS-sf Harry Putnam 2000-07-15 18:04 ` nnir/freeWAIS-sf Norman Walsh @ 2000-07-15 18:10 ` Francisco Solsona 2000-07-15 21:22 ` nnir/freeWAIS-sf Harry Putnam 2000-07-16 12:25 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-20 14:48 ` nnir/freeWAIS-sf Kai Großjohann 3 siblings, 1 reply; 26+ messages in thread From: Francisco Solsona @ 2000-07-15 18:10 UTC (permalink / raw) Harry Putnam <reader@newsguy.com> writes: > 1) The C-u G G (allow group selection) option does not work with freeWAIS It wasn't supposed to work in the first place. Take a look at `nnir-engines', you will notice that only Glimpse supports group selection. So doing `C-u G G' and simply `G G' yields the exact same results if 'wais is your search engine. > 2) Queries aimed at `from' or `to' fields fail, although queries to > `subject' field or global queries, work For me (using the mail.fmt file suggested by Kai) was even worst, because only `subject' queries used to work. I'm not sure (at all) that this is the Right Thing(TM) to do, but I'm using: ,-------------------- | record-sep: /^@this regex should never match@$/ | | # Searchable fields specification. | region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */ | from "From header" stemming TEXT BOTH | end: /^[^ \t]/ | | region: /^[sS]ubject:/ /^[sS]ubject: */ | subject "Subject header" stemming TEXT BOTH | end: /^[^ \t]/ | | region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ | to "To and Cc headers" stemming TEXT BOTH | end: /^[^ \t]/ | | region: /^$/ | stemming TEXT GLOBAL | end: /^@this regex should never match@$/ `-------------------- and it works almost as you would expect it to. I used to use glimpse, but it was way too slow compared to freeWAIS-sf. Give it a try, replace your mail.fmt with the one above, and then run `makedb -clean mail', and `makedb -update mail', if you're using the makedb.conf file suggested by Kai, that is. > 4) These settings in .gnus: > (load "nnir-1.57.elc") > (setq nnir-wais-database "/home/reader/.wais/mail") > (setq nnir-search-engine `wais) I use: ,-------------------- | (require 'nnir) | (setq nnir-mail-backend '(nnml "private") | nnir-search-engine 'wais) `-------------------- and since nnir is `provided' is better to `require' it than `load' it, even though it won't make much of a difference. Does this help? Francisco -- Famous Last Words: It's perfectly safe. Let me show you. (contributed by Frank v Waveren) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-15 18:10 ` nnir/freeWAIS-sf Francisco Solsona @ 2000-07-15 21:22 ` Harry Putnam 2000-07-17 13:51 ` nnir/freeWAIS-sf Francisco Solsona 0 siblings, 1 reply; 26+ messages in thread From: Harry Putnam @ 2000-07-15 21:22 UTC (permalink / raw) Francisco Solsona <flsc@hp.fciencias.unam.mx> writes: > Harry Putnam <reader@newsguy.com> writes: > > > 1) The C-u G G (allow group selection) option does not work with freeWAIS > > It wasn't supposed to work in the first place. Take a look at > `nnir-engines', you will notice that only Glimpse supports group > selection. So doing `C-u G G' and simply `G G' yields the exact same > results if 'wais is your search engine. Here I go again making bug reports that are really feature requests.... he he. I over looked that information. But coming to think of it, wouldn't putting a stanza in *.fmt for Xref: field do the same thing? > > > 2) Queries aimed at `from' or `to' fields fail, although queries to > > `subject' field or global queries, work > > For me (using the mail.fmt file suggested by Kai) was even worst, > because only `subject' queries used to work. I'm not sure (at all) > that this is the Right Thing(TM) to do, but I'm using: > > ,-------------------- > | record-sep: /^@this regex should never match@$/ > | > | # Searchable fields specification. > | region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */ > | from "From header" stemming TEXT BOTH > | end: /^[^ \t]/ > | > | region: /^[sS]ubject:/ /^[sS]ubject: */ > | subject "Subject header" stemming TEXT BOTH > | end: /^[^ \t]/ > | > | region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ > | to "To and Cc headers" stemming TEXT BOTH > | end: /^[^ \t]/ > | > | region: /^$/ > | stemming TEXT GLOBAL > | end: /^@this regex should never match@$/ > `-------------------- > > and it works almost as you would expect it to. I used to use glimpse, > but it was way too slow compared to freeWAIS-sf. Not working here... searches on `from' or `to' fail while `subject' and global work. > Give it a try, replace your mail.fmt with the one above, and then run > `makedb -clean mail', and `makedb -update mail', if you're using the > makedb.conf file suggested by Kai, that is. I don't get anything but `ignoring line <N> using the makedb.conf from nnir.el (more on this at the end) so making index with. waisindex -r -d mail -stem ~/Mail Using your *.fmt file produces the same results as reported in previous post. Only with a further complication. This query: Query: subject=give Turns up these messages: 3 12-Jul+[ 40: -> bbdb-info@xemacs.] [2485: ding2/2771] [bbdb]Give me a break.. 1 14-Jul [ 27: Adrian Aichner ] [2223: bbdb/10] Re: Postal codes 1 13-Jul [ 47: John F. Whitehead ] [2223: bbdb/4] Re: Postal codes Some have no instance of `give' in the subject line. > > > 4) These settings in .gnus: > > (load "nnir-1.57.elc") > > (setq nnir-wais-database "/home/reader/.wais/mail") > > (setq nnir-search-engine `wais) > I use: > ,-------------------- > | (require 'nnir) > | (setq nnir-mail-backend '(nnml "private") > | nnir-search-engine 'wais) > `-------------------- How does freeWAIS know where the data base is. Is there a built in default? > and since nnir is `provided' is better to `require' it than `load' it, > even though it won't make much of a difference. Oh yeah, that is a hold over from when nnir wasn't included in gnus... better catch up with the times eh? > Does this help? It helps that I'm getting some input on this but so far I see no improvement in the query results. Are you able to search on `from' and `to'? Just for good measure I'm including the *.fmt file in case I've butchered it somehow: mail.fmt record-sep: /^@this regex should never match@$/ # Searchable fields specification. region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */ from "From header" stemming TEXT BOTH end: /^[^ \t]/ region: /^[sS]ubject:/ /^[sS]ubject: */ subject "Subject header" stemming TEXT BOTH end: /^[^ \t]/ region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ to "To and Cc headers" stemming TEXT BOTH end: /^[^ \t]/ region: /^$/ stemming TEXT GLOBAL end: /^@this regex should never match@$/ Results of trying to use the makedb approach with Kai's makedb.conf reader@satellite /home/reader/.wais $ makedb -clean mail Ignoring line 1: Ignoring line 2: Ignoring line 3: Ignoring line 4: waisindex = /usr/local/bin/waisindex Ignoring line 5: wais_opt = -stem -t fields Ignoring line 6: Ignoring line 7: Ignoring line 8: Ignoring line 9: Ignoring line 10: homedir = /home/reader Ignoring line 11: Ignoring line 12: Ignoring line 13: database = mail Ignoring line 14: files = `find $homedir/Mail -name \*[0-9] -print` Ignoring line 15: dbdir = $homedir/.wais Ignoring line 16: limit = 100 Working on database ### mail ### Unknown database 'mail' So deleting all mail* files except mail.fmt and running: makedb -verbose -clean mail Give the identical results as above ..ending with "Unknown database 'mail'" ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-15 21:22 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-17 13:51 ` Francisco Solsona 2000-07-18 1:03 ` nnir/freeWAIS-sf Harry Putnam 0 siblings, 1 reply; 26+ messages in thread From: Francisco Solsona @ 2000-07-17 13:51 UTC (permalink / raw) Harry Putnam <reader@newsguy.com> writes: > Here I go again making bug reports that are really feature > requests.... he he. I over looked that information. But coming to > think of it, wouldn't putting a stanza in *.fmt for Xref: field do the > same thing? I guess you have already proved that in one of your mails. I use `/ /' (gnus-summary-limit-to-subject) which is good enough for me, because normally I don't remember the group in which what i'm looking for was posted. e.g. `G G' Query: `from=Putnam' gives: ,-------------------- | [ 171: Harry Putnam ] [2968: ding/12649] Re: nnir/freeWAIS-sf | [ 8: Harry Putnam ] [2968: rcp/963] Tramp name | [ 17: Harry Putnam ] [2968: abacus/141] Re: [Abacus] MailAlert.pl | [ 23: Harry Putnam ] [2968: rcp/965] Re: Tramp name | M [ 72: Harry Putnam ] [2968: ding/9378] Re: searching mail | M [ 15: Harry Putnam ] [2968: ding/3675] Re: ftp sites for RFC's | [ 59: Harry Putnam ] [2968: ding/12663] Re: nnir/freeWAIS-sf | M [ 57: Harry Putnam ] [2968: ding/12661] Re: nnir/freeWAIS-sf | M [ 173: Harry Putnam ] [2968: ding/12657] Re: nnir/freeWAIS-sf `-------------------- `/ /' limit to subject (regexp): `: ding' gives: ,-------------------- | [ 171: Harry Putnam ] [2968: ding/12649] Re: nnir/freeWAIS-sf | M [ 72: Harry Putnam ] [2968: ding/9378] Re: searching mail | M [ 15: Harry Putnam ] [2968: ding/3675] Re: ftp sites for RFC's | [ 59: Harry Putnam ] [2968: ding/12663] Re: nnir/freeWAIS-sf | M [ 57: Harry Putnam ] [2968: ding/12661] Re: nnir/freeWAIS-sf | M [ 173: Harry Putnam ] [2968: ding/12657] Re: nnir/freeWAIS-sf `-------------------- it would be cool to be able to search for `from=Putnam and group=ding' using freeWais, though. [...] > Using your *.fmt file produces the same results as reported in > previous post. Only with a further complication. > > This query: > Query: subject=give > Turns up these messages: > 3 12-Jul+[ 40: -> bbdb-info@xemacs.] [2485: ding2/2771] [bbdb]Give me a break.. > 1 14-Jul [ 27: Adrian Aichner ] [2223: bbdb/10] Re: Postal codes > 1 13-Jul [ 47: John F. Whitehead ] [2223: bbdb/4] Re: Postal codes The few times I've seen this is when, for instance, the message is one of those nasty replies that include the full message under a line reading: -----Original Message-----, and then a few headers from the previous message (To, From, Subject, etc.) Are you sure, those last two messages don't include a line starting with: "Subject:" and having "give" on the same lines on the body of the message? > > ,-------------------- > > | (require 'nnir) > > | (setq nnir-mail-backend '(nnml "private") > > | nnir-search-engine 'wais) > > `-------------------- > > How does freeWAIS know where the data base is. Is there a built in default? That's correct, nnir-wais-database defaults to: `(expand-file-name "~/.wais/mail")', and nnir-search-engine defaults to 'wais. So I only need to set nnir-mail-backend really (which default value is '(nnml "")). > Are you able to search on `from' and `to'? Yes I do. > Results of trying to use the makedb approach with Kai's makedb.conf > [...] Unknown database 'mail' This sounds like you have a broken makedb script on your system, I installed the freeWais FreeBSD port, and it worked out of the box. It was a nightmare to try it on Linux, though. Francisco -- If you keep thinking about what you want to do or what you hope will happen, you don't do it, and it won't happen. -Joe Dimaggio ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-17 13:51 ` nnir/freeWAIS-sf Francisco Solsona @ 2000-07-18 1:03 ` Harry Putnam 2000-07-18 9:06 ` nnir/freeWAIS-sf Kai Großjohann 0 siblings, 1 reply; 26+ messages in thread From: Harry Putnam @ 2000-07-18 1:03 UTC (permalink / raw) Francisco Solsona <flsc@hp.fciencias.unam.mx> writes: > Harry Putnam <reader@newsguy.com> writes: > > > Here I go again making bug reports that are really feature > > requests.... he he. I over looked that information. But coming to > > think of it, wouldn't putting a stanza in *.fmt for Xref: field do the > > same thing? > > I guess you have already proved that in one of your mails. I use `/ /' > (gnus-summary-limit-to-subject) which is good enough for me, because > normally I don't remember the group in which what i'm looking for was > posted. e.g. [...] I have lots of stuff on my nnml server. Several thousand Newgroup postings among them. Several more thousand of messages from freebsd-list groups that are mail to news gateway messages. What I was after works something like this. . . . Lets say I 'm looking for specific freebsd info. Not very likely to find the good stuff in most of my groups, however all the freebsd groups contain an Xref header indicating which group they are in. I've also added a `newsgoups' stanza to *.fmt. Because of the thousands of Usenet posts I have on my nnml server. The newsgroups field can be used in the same way. These queries are probably inaccurate and definitely untested (mostly). I'm having trouble authoring queris that work consistently, but you'll get the idea: Query: xref=(freebsd) and kernel and compile That syntax doesn't actually work but I think it will if I get it right. The objective above is to search only messages from freebsd groups with `kernel' and `compile' in the body.. I don't want a bunch of hits from my redhat Linux lists about compiling the linux kernel in this particular search, or from nnml:comp.os.linux.misc Another example might be, if looking for some examples of shell scripts under the Korn shell. A likely place to find that is comp.unix.questions and comp.unix.shells: Query: newsgroups=(comp.unix) and Korn and script > > it would be cool to be able to search for `from=Putnam and > group=ding' using freeWais, though. The *.fmt file I posted containing the xref stanza will do exactly that. I have tested this to some extent. Query: xref=ding and from=Putnam I think you could be even more specific but not sure of the syntax: Query: (xref=ding) and (from=Putnam) and .fmt (NOTE: syntax is incorrect) Should find only messages from ding that are from Putnam and contain `.fmt' in the body. Query: (xref=ding) and (from=larsi) and unplugged. ding only .. from lars .. containing `unplugged' in the body [...] > > The few times I've seen this is when, for instance, the message is one > of those nasty replies that include the full message under a line > reading: -----Original Message-----, and then a few headers from the > previous message (To, From, Subject, etc.) Are you sure, those last > two messages don't include a line starting with: "Subject:" and having > "give" on the same lines on the body of the message? Can't be sure now, I've changed the database extensively, trying to recreate the condition isn't likely. However Running that command now finds only the ones in bbdb group that say give in subject line as it should so maybe I did something wrong. [...] > > > Results of trying to use the makedb approach with Kai's makedb.conf > > [...] Unknown database 'mail' > > This sounds like you have a broken makedb script on your system, I > installed the freeWais FreeBSD port, and it worked out of the box. It > was a nightmare to try it on Linux, though. I'm using the makedb that was generated when installing freeWAIS-2.2.13, also via a freeBSD port. The only makedb.conf I have is the one from the comments in nnir.el ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-18 1:03 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-18 9:06 ` Kai Großjohann 2000-07-19 0:57 ` nnir/freeWAIS-sf Harry Putnam 0 siblings, 1 reply; 26+ messages in thread From: Kai Großjohann @ 2000-07-18 9:06 UTC (permalink / raw) Cc: ding On 17 Jul 2000, Harry Putnam wrote: > Query: (xref=ding) and (from=Putnam) and .fmt > (NOTE: syntax is incorrect) Why? The syntax looks good to me. Well, there is a problem with the dot in `.fmt' -- freeWAIS-sf just ignores non-letters, so this is the same as `fmt'. kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-18 9:06 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-19 0:57 ` Harry Putnam 2000-07-20 14:34 ` nnir/freeWAIS-sf Kai Großjohann 0 siblings, 1 reply; 26+ messages in thread From: Harry Putnam @ 2000-07-19 0:57 UTC (permalink / raw) Cc: ding Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > On 17 Jul 2000, Harry Putnam wrote: > > > Query: (xref=ding) and (from=Putnam) and .fmt > > (NOTE: syntax is incorrect) > > Why? The syntax looks good to me. > > Well, there is a problem with the dot in `.fmt' -- freeWAIS-sf just > ignores non-letters, so this is the same as `fmt'. Seems to be something more here. I just assumed it was my syntax because I wasn't that sure of it. It turns out to be the `fmt' spec itself. This query: Query: (xref=ding2) and (from=Putnam) Works and produces this message in *Messages* Doing WAIS query ((query . (xref=ding2) and (from=Putnam)))... Massaging waissearch output... Massaging waissearch output...done Retrieving newsgroup: nnir:((query . "(xref=ding2) and (from=Putnam)"))... Fetching headers for nnir:((query . "(xref=ding2) and (from=Putnam)"))... Fetching headers for nnir:((query . "(xref=ding2) and (from=Putnam)"))...done This query: Query: (xref=ding2) and (from=Putnam) and fmt Does not work and produces this in *Messages*: Doing WAIS query ((query . (xref=ding2) and (from=Putnam) and fmt))... Massaging waissearch output... Massaging waissearch output...done Search produced empty results. Couldn't request group: Search produced empty results. (NOTE: Backtrace included at the end) A query just for plain `fmt' by it self also fails: Query: fmt Produces this in *Messages* Doing WAIS query ((query . fmt))... Massaging waissearch output... Massaging waissearch output...done Search produced empty results. Couldn't request group: Search produced empty results. bsd > waissearch -d mail fmt >From the command line also fails Search Response: NumberOfRecordsReturned: 1 1: Score: 0, lines:7085 'Search produced no result. Here's the Catalog for database: mail' Whereas,this query works (so I guess I lucked out and got the right syntax after all). But it only finds messages with `file' in the subject line and there are some that fulfill the spec but contain `file' in the body: Query: (xref=ding2) and (from=Putnam) and file Shows: 1 03-Jul+[ 51: -> ding@gnus.org ] [5: bbdb.ding2/2646] Re: Suggestion for "file" mail-backend (was Re: nov and procmail compatability?) 1 03-Jul+[ 36: -> ding@gnus.org ] [5: bbdb.ding2/2645] Re: Suggestion for " [...] (NOTE: all are from=Putnam but you don't see it because I have that header squelched) Data base was indexed with this command: waisindex -r -d mail -t fields ~/Mail This awk script shows what is actually available to waissearch in the freshly indexed data base: awk '/X-From-Line: / {a=1} /^From:.*Putnam/ {from=$0;fr=1} /^Xref:.*ding2/ {xref=$0;xr=1} /\.fmt/ {fmt=$0;fm=1} /^\d/ && 4 == (a+fr+xr+fm){print FILENAME,from"\n"\ FILENAME,xref"\n"\FILENAME,fmt"\n-- "} /^\d/ {a=fr=xf=fm=0}' ~/Mail/ding2/[0-9]* (only 2 shown of 4 found) /home/reader/Mail/ding2/2831 From: Harry Putnam <reader@newsguy.com> /home/reader/Mail/ding2/2831 Xref: satellite.local.lan ding2:2831 /home/reader/Mail/ding2/2831 > Using your *.fmt file produces the same results as reported in -- /home/reader/Mail/ding2/2839 From: Harry Putnam <reader@newsguy.com> /home/reader/Mail/ding2/2839 Xref: satellite.local.lan ding2:2839 /home/reader/Mail/ding2/2839 `.fmt' in the body. -- [...] By changing the search spec from `fmt' to `file' in the awk script I find a number of files fitting the spec. Only two are displayed: [ed -hp](subject line is: Subject: Re: example queries for nnir) /home/reader/Mail/ding2/2486 From: Harry Putnam <reader@newsguy.com> /home/reader/Mail/ding2/2486 Xref: reader.ptw.com ding2:2486 /home/reader/Mail/ding2/2486 > Using the example format file given in nnir.el, there are four fields: -- [ed -hp] (subject line is: Subject: Re: (provide 'nnmaildir)) /home/reader/Mail/ding2/2804 From: Harry Putnam <reader@newsguy.com> /home/reader/Mail/ding2/2804 Xref: satellite.local.lan ding2:2804 /home/reader/Mail/ding2/2804 list of file names from the NOV cache, and compare. This would help [...] waissearch seems to only display a hit on file if it is in the subject line even though the spec does not specify that. I suspect it has something to do with the *.fmt files liberal use of the `BOTH' specifier... So experimenting with that. As a side note, I don't really understand why the `body' regexp, the one beginning with: region: /^$/ Needs a non-matching regexp. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-19 0:57 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-20 14:34 ` Kai Großjohann 2000-07-20 18:13 ` nnir/freeWAIS-sf Harry Putnam 0 siblings, 1 reply; 26+ messages in thread From: Kai Großjohann @ 2000-07-20 14:34 UTC (permalink / raw) Cc: ding On 18 Jul 2000, Harry Putnam wrote: > waissearch seems to only display a hit on file if it is in the subject > line even though the spec does not specify that. > > I suspect it has something to do with the *.fmt files liberal use of > the `BOTH' specifier... So experimenting with that. Hm. Right. My database also fails to contain `fmt' in the dictionary for the global field. Hm. I wish there was a debugging switch to waisindex where it told me what it thought about the document. This way, debugging would be much easier. I still think it's a problem with the indexing process. Ie, once we get the format file right, Bob will be our uncle. > As a side note, I don't really understand why the `body' regexp, the > one beginning with: > > region: /^$/ Needs a non-matching regexp. Well, I wanted to make sure that all the remaining lines in each file will be indexed in the body field, and choosing a non-matching regexp is a sure way to have waisindex go through till the end of the file. Maybe I need to say a few words about how waisindex works. Here's what it does: for each region defined in the fmt file, it goes through the whole file to be indexed, line by line. It starts indexing on the first line that matches the start regexp. It then reads the next line and also adds it to the field, until it reads a line which matches the end regexp (or reaches end of file). This process is repeated for each region defined in the fmt file, and for each file. (I'm not sure if it goes through all files for the first region, then goes through all files for the second region, or if it goes through the first file several times for each region, then goes through the second file several times.) kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-20 14:34 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-20 18:13 ` Harry Putnam 2000-07-21 17:31 ` nnir/freeWAIS-sf Kai Großjohann 0 siblings, 1 reply; 26+ messages in thread From: Harry Putnam @ 2000-07-20 18:13 UTC (permalink / raw) Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > On 18 Jul 2000, Harry Putnam wrote: > > > waissearch seems to only display a hit on file if it is in the subject > > line even though the spec does not specify that. > > > > I suspect it has something to do with the *.fmt files liberal use of > > the `BOTH' specifier... So experimenting with that. > > Hm. Right. My database also fails to contain `fmt' in the dictionary > for the global field. Hm. > > I wish there was a debugging switch to waisindex where it told me what > it thought about the document. This way, debugging would be much > easier. > > I still think it's a problem with the indexing process. Ie, once we > get the format file right, Bob will be our uncle. That sounds right too but current results are not very incouraging here. Have you noticed that there doesn't seem to be a way to do a strictly body search? You explained it as simply not putting data from any field sources into GLOBAL, but if you set all field specs like: SOUNDEX LOCAL TEXT LOCAL and leave only the /^$/ spec as GLOBAL wais caves in and drops core. > > > As a side note, I don't really understand why the `body' regexp, the > > one beginning with: > > > > region: /^$/ Needs a non-matching regexp. > > Well, I wanted to make sure that all the remaining lines in each file > will be indexed in the body field, and choosing a non-matching regexp > is a sure way to have waisindex go through till the end of the file. I understood the reasoning but thought there might be some regexp that fit the end of a file. Guess not. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-20 18:13 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-21 17:31 ` Kai Großjohann 2000-07-21 22:35 ` nnir/freeWAIS-sf Harry Putnam 0 siblings, 1 reply; 26+ messages in thread From: Kai Großjohann @ 2000-07-21 17:31 UTC (permalink / raw) Cc: ding On 20 Jul 2000, Harry Putnam wrote: > That sounds right too but current results are not very incouraging > here. Have you noticed that there doesn't seem to be a way to do a > strictly body search? You explained it as simply not putting data > from any field sources into GLOBAL, but if you set all field specs > like: SOUNDEX LOCAL TEXT LOCAL and leave only the /^$/ spec as > GLOBAL wais caves in and drops core. Arf. Workaround: define a `body' field which gets /^$/ as the start regexp and the non-matching regexp as the end regexp, and otherwise looks like the Subject field. Like this, maybe: region: /^$/ body "Message body" stemming TEXT BOTH end: /^@this regex should never match@$/ Then, you can say `body=foo' to search in the body. I have now added the above change to my fmt file, will try running waisindex next. kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-21 17:31 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-21 22:35 ` Harry Putnam 0 siblings, 0 replies; 26+ messages in thread From: Harry Putnam @ 2000-07-21 22:35 UTC (permalink / raw) Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > On 20 Jul 2000, Harry Putnam wrote: > > > That sounds right too but current results are not very incouraging > > here. Have you noticed that there doesn't seem to be a way to do a > > strictly body search? You explained it as simply not putting data > > from any field sources into GLOBAL, but if you set all field specs > > like: SOUNDEX LOCAL TEXT LOCAL and leave only the /^$/ spec as > > GLOBAL wais caves in and drops core. > > Arf. > > Workaround: define a `body' field which gets /^$/ as the start regexp > and the non-matching regexp as the end regexp, and otherwise looks > like the Subject field. > > Like this, maybe: > > region: /^$/ > body "Message body" stemming TEXT BOTH > end: /^@this regex should never match@$/ > > Then, you can say `body=foo' to search in the body. It's not helping here.. Still can't get results in a body search Wasn't sure if you meant to use the above instead of the old GLOBAL entry or to use them both, so I tried both ways. Either way I get: Total word count for dictionary of field body is: 0 in indexing output. bsd > waissearch -d mail body=\(resounding and silence\) Search Response: NumberOfRecordsReturned: 2 Code: S1, field unexists: body 1: Score: 0, lines:13517 'Search produced no result. Here's the Catalog for database: mail' >From To and subject are all that work for me. Index command: waisindex -r -d mail -stem -t fields ~/Mail mail.fmt: record-sep: /^@this regex should never match@$/ # Searchable fields specification. region: /^[sS]ubject:/ /^[sS]ubject: */ subject "Subject header" stemming TEXT BOTH end: /^[^ \t]/ region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ to "To and Cc headers" SOUNDEX LOCAL TEXT BOTH end: /^[^ \t]/ region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */ from "From header" SOUNDEX LOCAL TEXT BOTH end: /^[^ \t]/ region: /^$/ body "Message body" stemming TEXT BOTH end: /^@this regex should never match@$/ region: /^$/ stemming TEXT GLOBAL end: /^@this regex should never match@$/ I guess you are already sure that these infinite regexp are palatable to wais. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-15 13:53 nnir/freeWAIS-sf Harry Putnam 2000-07-15 18:04 ` nnir/freeWAIS-sf Norman Walsh 2000-07-15 18:10 ` nnir/freeWAIS-sf Francisco Solsona @ 2000-07-16 12:25 ` Kai Großjohann 2000-07-16 16:17 ` nnir/freeWAIS-sf Harry Putnam 2000-07-20 14:48 ` nnir/freeWAIS-sf Kai Großjohann 3 siblings, 1 reply; 26+ messages in thread From: Kai Großjohann @ 2000-07-16 12:25 UTC (permalink / raw) Cc: ding, Norbert Gövert, Ulrich Pfeifer I have now found out something with respect to searching in `from' and `to' fields. (For Norbert & Uli: I'm indexing my mail with freeWAIS-sf, and specifying `SOUNDEX BOTH' in the format file for the `to' and `from' fields.) Running `dictionary mail_field_from' tells me that the dictionary contains a lot of soundex codes (which is the right thing, since we are specifying soundex in the format file). And indeed, searching for the soundex codes works: /---- | $ waissearch -d mail from=m230 | | Search Response: | NumberOfRecordsReturned: 16 | 1: Score: 3471, lines: 55 '445 /home-local/grossjoh/Mail/auto/linux-utf8/' | 2: Score: 3471, lines: 111 '491 /home-local/grossjoh/Mail/auto/linux-utf8/' | 3: Score: 3098, lines: 79 '3416 /home-local/grossjoh/Mail/auto/dbworld/' | [...] \---- But searching for normal terms does _not_ work. Apparently, freeWAIS-sf is not applying soundex to the query. Why could this be? kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-16 12:25 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-16 16:17 ` Harry Putnam 2000-07-16 21:43 ` nnir/freeWAIS-sf Kai Großjohann 0 siblings, 1 reply; 26+ messages in thread From: Harry Putnam @ 2000-07-16 16:17 UTC (permalink / raw) Cc: ding, Norbert Gövert, Ulrich Pfeifer NOTE: If you want the punch line first.. skip straight to a section below preceeded by: "*NOW THE GOOD PART!*" Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > I have now found out something with respect to searching in `from' and > `to' fields. > > (For Norbert & Uli: I'm indexing my mail with freeWAIS-sf, and > specifying `SOUNDEX BOTH' in the format file for the `to' and `from' > fields.) Above are you refering to the example *.fmt file in nnir-1.57 as it stands? > > Running `dictionary mail_field_from' tells me that the dictionary > contains a lot of soundex codes (which is the right thing, since we > are specifying soundex in the format file). And indeed, searching for > the soundex codes works: Ditto here if soundex codes look like: term occurances pointer m300 83886080 5687214 m320 536870912 1810235 [...] > > /---- > | $ waissearch -d mail from=m230 > | > | Search Response: > | NumberOfRecordsReturned: 16 > | 1: Score: 3471, lines: 55 '445 /home-local/grossjoh/Mail/auto/linux-utf8/' > | 2: Score: 3471, lines: 111 '491 /home-local/grossjoh/Mail/auto/linux-utf8/' > | 3: Score: 3098, lines: 79 '3416 /home-local/grossjoh/Mail/auto/dbworld/' > | [...] > \---- > But searching for normal terms does _not_ work. I see the nearly the opposite behavior: From and To searches find nothing but free text and Subject searches work. Using your example *.fmt file and this indexing command: waisindex -r -d mail -stem -t fields ~/Mail Free text search (this data base contains my ding-list and bbdb-list nnml directories): waissearch -d mail agent Search Response: NumberOfRecordsReturned: 6 1: Score: 1546, lines: 229 '2591 /home/reader/Mail/ding2/' 2: Score: 1434, lines: 60 '2592 /home/reader/Mail/ding2/' 3: Score: 1402, lines: 83 '2783 /home/reader/Mail/ding2/' 4: Score: 1378, lines: 82 '1964 /home/reader/Mail/ding2/' 5: Score: 1349, lines: 99 '2611 /home/reader/Mail/ding2/' 6: Score: 1123, lines: 55 '2514 /home/reader/Mail/ding2/' Subject search: waissearch -d mail subject=give Search Response: NumberOfRecordsReturned: 6 1: Score: 2484, lines: 87 '2771 /home/reader/Mail/ding2/' 2: Score: 2222, lines: 153 '2791 /home/reader/Mail/ding2/' 3: Score: 2222, lines: 69 '2793 /home/reader/Mail/ding2/' 4: Score: 2222, lines: 160 '2 /home/reader/Mail/bbdb/' 5: Score: 2222, lines: 76 '3 /home/reader/Mail/bbdb/' 6: Score: 2222, lines: 174 '11 /home/reader/Mail/bbdb/' >From search: waissearch -d mail from=Ronan Search Response: NumberOfRecordsReturned: 1 1: Score: 0, lines:3457 'Search produced no result. Here's the Catalog for database: mail' However by changing SOUNDEX BOTH to TEXT BOTH I find that the from search then works: >From search with edited *.fmt file: waissearch -d mail from=Ronan Search Response: NumberOfRecordsReturned: 4 1: Score: 2403, lines: 153 '2791 /home/reader/Mail/ding2/' 2: Score: 2403, lines: 69 '2793 /home/reader/Mail/ding2/' 3: Score: 2403, lines: 160 '2 /home/reader/Mail/bbdb/' 4: Score: 2403, lines: 76 '3 /home/reader/Mail/bbdb/' At first I thought BOTH meant LOCAL and GLOBAL but I think now it doesn't Because I found that if I set the field specific parts to LOCAL then a free text search fails and reports that it cannot find the dataindex. The last two lines of indexing output with the *.fmt fields set TEXT LOCAL Tells the story: 1731: 3481: Jul 16 08:37:55 2000: 100: Total word count for dictionary is: 0 1731: 3482: Jul 16 08:37:55 2000: -1: error finding total_word_count in dictionary ./mail No dictionary is built. *NOW THE GOOD PART!* Pouring through the info file I found this passage that finally might be an explanation of what is needed. And to corroborate this I tried setting `To' and `From" to SOUNDEX LOCAL TEXT BOTH like this: to "To and Cc headers" SOUNDEX LOCAL TEXT BOTH Telling the indexer to "put the word in the default and the 'to'[ed -hp] category and its soundex code only in the `to'[ed -hp] category." So BOTH here means .. the default (which is LOCAL) and the current catagory. And all type of searches now work.... Whoopee dictionary mail_field_from mail is now more than twice its previous size and contains both SOUNDEX and TEXT code. >From the INFO file: Consider the following example: region: /^AU: / au "author names" SOUNDEX LOCAL TEXT BOTH end: /^[A-Z][A-Z]:/ To the indexer this means: For all words starting with `AU: ' at the beginning of a line up to a line which starts with two capital letters followed by a colon and a blank, put the word in the default and the `au' category and its soundex code only in the `au' category. Thus an author name can be found in the created database in the default category or the `au' category if the exact spelling is known. If the name is misspelled, it might be found using the query `au=(soundex MISSPELLED-NAME)'. *Note Sample Format::, *Note Query Syntax::. Included working *.fmt file: # Harry's rendition of Kai's format file for freeWAIS-sf for indexing mails. # Each mail is in a file, much like the MH format. # Document separator should never match -- each file is a document. record-sep: /^@this regex should never match@$/ # Searchable fields specification. region: /^[sS]ubject:/ /^[sS]ubject: */ subject "Subject header" stemming TEXT BOTH end: /^[^ \t]/ region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ to "To and Cc headers" SOUNDEX LOCAL TEXT BOTH end: /^[^ \t]/ region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */ from "From header" SOUNDEX LOCAL TEXT BOTH end: /^[^ \t]/ region: /^$/ stemming TEXT GLOBAL end: /^@this regex should never match@$/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-16 16:17 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-16 21:43 ` Kai Großjohann 2000-07-16 22:22 ` nnir/freeWAIS-sf Harry Putnam 2000-07-16 23:08 ` nnir/freeWAIS-sf Harry Putnam 0 siblings, 2 replies; 26+ messages in thread From: Kai Großjohann @ 2000-07-16 21:43 UTC (permalink / raw) Cc: ding, Norbert Gövert, Ulrich Pfeifer On 16 Jul 2000, Harry Putnam wrote: >> But searching for normal terms does _not_ work. > > I see the nearly the opposite behavior: From and To searches find > nothing but free text and Subject searches work. Note the strange search: from=m320 This searches for a soundex code, and works. Searching for normal words does NOT work. I'm now trying the `SOUNDEX LOCAL TEXT BOTH' suggestion. We'll see what happens... kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-16 21:43 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-16 22:22 ` Harry Putnam 2000-07-20 14:44 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-16 23:08 ` nnir/freeWAIS-sf Harry Putnam 1 sibling, 1 reply; 26+ messages in thread From: Harry Putnam @ 2000-07-16 22:22 UTC (permalink / raw) Cc: Norbert Gövert, Ulrich Pfeifer Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > On 16 Jul 2000, Harry Putnam wrote: > > >> But searching for normal terms does _not_ work. > > > > I see the nearly the opposite behavior: From and To searches find > > nothing but free text and Subject searches work. > > Note the strange search: from=m320 > This searches for a soundex code, and works. Searching for normal Yes I saw that my statement about opposite should have said `nearly identical' > words does NOT work. Maybe because `normal' (i.e TEXT) words have not been included in that category (from) by saying TEXT BOTH. > > I'm now trying the `SOUNDEX LOCAL TEXT BOTH' suggestion. We'll see > what happens... PS -- I'm now a happy camper with the edited mail.fmt and learned a a bit more about the making of *.fmt files. My big problem now is that I'd like to go on an get SFgate working but it requires the bothersome Wais.pm. My attempts to compile it have failed miserably at `make test' literally every test fails and the `make' process bails out. There seems to be major confusion as to where to put wais.h and libwais.a (from the freeWAIS-sf build). I just ended up putting them in the top directory of Wais.pm and editing the Makefile.PL to find them there. The compilation seems to go ok but when the tests come up its all down hill from there.: bsd # make test PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib -I/usr/libdata/perl/5.00503/mach -I/usr/libdata/perl/5.00503 -e 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests @ARGV;' t/*.t t/a_preop...........ok t/basic.............Can't load 'blib/arch/auto/Wais/Wais.so' for module Wais: blib/arch/auto/Wais/Wais.so: Undefined symbol "SvPV_nolen" at /usr/libdata/perl/5.00503/DynaLoader.pm line 169. Wais.so is a binary file so I couldn't really tell much about it. grep 'SvPV' /usr/libdata/perl/5.00503/DynaLoader.p = nothing ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-16 22:22 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-20 14:44 ` Kai Großjohann 0 siblings, 0 replies; 26+ messages in thread From: Kai Großjohann @ 2000-07-20 14:44 UTC (permalink / raw) Cc: ding, Norbert Gövert, Ulrich Pfeifer On 16 Jul 2000, Harry Putnam wrote: > Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > >> Note the strange search: from=m320 >> This searches for a soundex code, and works. Searching for normal > > Yes I saw that my statement about opposite should have said `nearly > identical' But the small difference is crucial. >> words does NOT work. > > Maybe because `normal' (i.e TEXT) words have not been included in > that category (from) by saying TEXT BOTH. When waisindex runs, it looks at the fmt file. And if the fmt file says to use soundex, it converts each word into its soundex code and enters the soundex codes into the inverted index (the *.inv file). So, when you use waissearch, it should also look at the fmt file, and if the query searches in a field which was indexed using soundex, the search term should also be converted into the soundex code, and then waissearch should look for the soundex code in the index. So, let's say the word is `Miller'. When indexing, waisindex turns it into the soundex code (M320, say), and puts that into the *.inv file. And when you then search for Miller, waissearch converts that into M320 and looks in the index for M320. Different, but similar-sounding, words get the same soundex code. For example, Smith and Smithee and Smythe will get the same soundex code. This way, searching for soundex codes will find similar-sounding words. > My big problem now is that I'd like to go on an get SFgate working > but it requires the bothersome Wais.pm. My attempts to compile it > have failed miserably at `make test' literally every test fails and > the `make' process bails out. There seems to be major confusion as > to where to put wais.h and libwais.a (from the freeWAIS-sf build). > > I just ended up putting them in the top directory of Wais.pm and > editing the Makefile.PL to find them there. > > The compilation seems to go ok but when the tests come up its all > down hill from there.: > > bsd # make test > > PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib > -I/usr/libdata/perl/5.00503/mach -I/usr/libdata/perl/5.00503 -e > 'use Test::Harness qw(&runtests $verbose); $verbose=0; runtests > @ARGV;' t/*.t > > t/a_preop...........ok > > t/basic.............Can't load 'blib/arch/auto/Wais/Wais.so' for > module Wais: blib/arch/auto/Wais/Wais.so: Undefined symbol > "SvPV_nolen" at /usr/libdata/perl/5.00503/DynaLoader.pm line 169. > > Wais.so is a binary file so I couldn't really tell much about it. > > grep 'SvPV' /usr/libdata/perl/5.00503/DynaLoader.p = nothing Hm. I'm not sure where the problem is. kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-16 21:43 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-16 22:22 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-16 23:08 ` Harry Putnam 1 sibling, 0 replies; 26+ messages in thread From: Harry Putnam @ 2000-07-16 23:08 UTC (permalink / raw) Cc: ding Here is a `mail.fmt' including xref field. With that defined it gives a semi reliable way to search one or several groups. Assuming you don't move messages around too much once gnus has split them. These examples are command line examples, hence the escaping. waissearch -d mail xref=ding and subject=give Will find only messages in ding with the word `give' in the title. waissearch -d mail xref=\(ding or bbdb\) and subject=give Finds messages in either group with the word `give' in the subject. Putting that Xref field in there really opens up a lot of possibilities. The mail.fmt file: # An edited version of Kai's format file for freeWAIS-sf for # indexing mails. Each mail is in a file, much like the MH format. # Document separator should never match -- each file is a document. record-sep: /^@this regex should never match@$/ # Searchable fields specification. region: /^[sS]ubject:/ /^[sS]ubject: */ subject "Subject header" stemming TEXT BOTH end: /^[^ \t]/ region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ to "To and Cc headers" SOUNDEX LOCAL TEXT BOTH end: /^[^ \t]/ region: /^Xref: / /^Xref */ xref "Xref headers" SOUNDEX LOCAL TEXT BOTH end: /^[^ \t]/ region: /^[fF][rR][oO][mM]:/ /^[fF][rR][oO][mM]: */ from "From header" SOUNDEX LOCAL TEXT BOTH end: /^[^ \t]/ region: /^$/ stemming TEXT GLOBAL end: /^@this regex should never match@$/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-15 13:53 nnir/freeWAIS-sf Harry Putnam ` (2 preceding siblings ...) 2000-07-16 12:25 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-20 14:48 ` Kai Großjohann 2000-07-20 16:33 ` nnir/freeWAIS-sf Harry Putnam 3 siblings, 1 reply; 26+ messages in thread From: Kai Großjohann @ 2000-07-20 14:48 UTC (permalink / raw) Cc: ding, Norbert Gövert, Ulrich Pfeifer On 15 Jul 2000, Harry Putnam wrote: > 2) Queries aimed at `from' or `to' fields fail, although queries to > `subject' field or global queries, work I have now, finally, found the solution. Stupid me :-/ It seems formulating fwsf queries isn't as simple as I thought it would be... waissearch -d ~/.wais/mail 'from=(soundex zeimetz)' This is the magic incantation. Okay. So, if you choose `SOUNDEX LOCAL TEXT BOTH' for a field, then you can do the following: waissearch -d ~/.wais/mail 'from=(soundex zeimetz)' This searches for similar-sounding names. waissearch -d ~/.wais/mail 'from=zeimetz' This applies the normal term-search routines. Does it now work for you use fmt specs like the following: region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ to "To and Cc headers" SOUNDEX LOCAL TEXT BOTH end: /^[^ \t]/ This way, you should be able to find names by saying `to=(soundex NAME)' in the query. kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-20 14:48 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-20 16:33 ` Harry Putnam 2000-07-21 17:27 ` nnir/freeWAIS-sf Kai Großjohann 0 siblings, 1 reply; 26+ messages in thread From: Harry Putnam @ 2000-07-20 16:33 UTC (permalink / raw) Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: [...] > > It seems formulating fwsf queries isn't as simple as I thought it > would be... > > waissearch -d ~/.wais/mail 'from=(soundex zeimetz)' Ahaa now I remember seeing that in the info file too. > > This is the magic incantation. Okay. So, if you choose `SOUNDEX LOCAL > TEXT BOTH' for a field, then you can do the following: > > waissearch -d ~/.wais/mail 'from=(soundex zeimetz)' > This searches for similar-sounding names. > > waissearch -d ~/.wais/mail 'from=zeimetz' > This applies the normal term-search routines. OK, that all makes sense > > Does it now work for you use fmt specs like the following: > region: /^([tT][oO]|[cC][cC]):/ /^([tT][oO]|[cC][cC]): */ > to "To and Cc headers" SOUNDEX LOCAL TEXT BOTH > end: /^[^ \t]/ The above approach works: (data base consisites of only ding2 and bbdb groups) waissearch -d mail 'from=soundex Rinan' total 18 waissearch -d mail 'from=soundex Rona' total 25 waissearch -d mail 'from=soundex Ronan' total 18 waissearch -d mail 'from=soundex Conan' total 0 <==he he Something like what you'd expect from `soundex' Where as the free text way: waissearch -d mail 'from=Ronan' total 8 waissearch -d mail 'from=Rona' total 0 [...] Again.. looks very much like what one would expect. Glad that got cleared up. *BUT* now the body queries don't work. Body query spec: region: /^$/ stemming TEXT GLOBAL end: /^@this regex should never match@$/ So applying the same reasoning to that spec we have: waissearch -d mail 'global resounding and silence' bsd > waissearch -d mail 'global resounding and silence' Search Response: NumberOfRecordsReturned: 1 1: Score: 2113, lines: 54 '2177 /home/reader/Mail/ding2/' Seems to work... but wait.. that message contains neither `resounding' nor silence. Message-ID: <vxku2fqhjin.fsf@mesquite.charcoal.com> (on ding) Whats worse `grep -rl 'resounding.*silence' ~Mail' easily finds 5 that actually contain the strings. /home/reader/Mail/ding2/2771 <m2vgybe81z.fsf@reader.ptw.com> /home/reader/Mail/ding2/2790 <14703.1528.800308.229816@klortho.stepstone.ie> /home/reader/Mail/bbdb/460 <m2vgybe81z.fsf@reader.ptw.com> /home/reader/Mail/bbdb/463 <14703.1528.800308.229816@klortho.stepstone.ie> /home/reader/Mail/bbdb/472 <m2n1jkgpsp.fsf@reader.ptw.com> Its looking more and more like freewais is just not a sturdy tool like glimpse. It has to be molycoddled every inch of the way, every phase is as painful as pulling teeth. Then the end result is flaky and not dependable. It lacks precision in searching. And on the command line fails to show the hits in some fashion. Only full documents. Even someone as well grounded in computer science, syntax etc as yourself has problems with it. One virtue it has is speed, but I'm one who needs a sturdy tool I can abuse to some extent, and still have it work. Comes from a life time of heavy construction work maybe... . Tools that can't stand up to hard use and some abuse are quickly discarded there. I haven't `thrown in the towel' yet but its seeming like all the effort I've put into learning and using FreeWAIS over the past few weeks, will not lead to a sturdy well oiled tool I can apply in many situations. Learning gnus itself did lead to a working and sturdy tool that can do lots of different stuff well. The little awk based search tool I made, is much sturdier and can be used on any unix like platform. It is excruciatingly slow, but because it is fully regexp based it finds strings with great precision. Needs some handy way to insert the search string, in an easy one step manor very badly too. Sometimes `ranking'or `heuristics' of some kind aren't what is needed. I'm thinking of how to intetgrate that into gnus. It can report hte group and file name, message id or whatever. I wasn't able to see in nnir how the lisp code grabs that info from glimpse or wais. But surely if the tool can pass the article number, filename, message ID, then gnus can assemble the hits. Probably a faster similar tool can be fashioned from perl. Wais is proving to be an abolute pain in the *A..*. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-20 16:33 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-21 17:27 ` Kai Großjohann 2000-07-21 22:04 ` nnir/freeWAIS-sf Harry Putnam 0 siblings, 1 reply; 26+ messages in thread From: Kai Großjohann @ 2000-07-21 17:27 UTC (permalink / raw) Cc: ding On 20 Jul 2000, Harry Putnam wrote: > bsd > waissearch -d mail 'global resounding and silence' > > Search Response: > NumberOfRecordsReturned: 1 > 1: Score: 2113, lines: 54 '2177 /home/reader/Mail/ding2/' > > Seems to work... but wait.. that message contains neither > `resounding' nor silence. Whee. Hm. Maybe fwsf implements `and' in a fuzzy way. This is useful for people who issue queries like `term1 and term2 and ... and term10'. If there are no documents with all ten terms, chances are that people will be happy with a document containing only nine of them. I'm not sure about this one, though. > Message-ID: <vxku2fqhjin.fsf@mesquite.charcoal.com> (on ding) > > Whats worse > > `grep -rl 'resounding.*silence' ~Mail' easily finds 5 that actually > contain the strings. > > /home/reader/Mail/ding2/2771 <m2vgybe81z.fsf@reader.ptw.com> > /home/reader/Mail/ding2/2790 > <14703.1528.800308.229816@klortho.stepstone.ie> > /home/reader/Mail/bbdb/460 <m2vgybe81z.fsf@reader.ptw.com> > /home/reader/Mail/bbdb/463 > <14703.1528.800308.229816@klortho.stepstone.ie> > /home/reader/Mail/bbdb/472 <m2n1jkgpsp.fsf@reader.ptw.com> > > Its looking more and more like freewais is just not a sturdy tool > like glimpse. It has to be molycoddled every inch of the way, every > phase is as painful as pulling teeth. Then the end result is flaky > and not dependable. It lacks precision in searching. And on the > command line fails to show the hits in some fashion. Only full > documents. Yes, it appears so. FWIW, I'm quite interested in reading all this. >From an Information Retrieval point of view, fwsf is doing the right things. Yet it is obviously not easy to use at all! Quite amazing. But thanks a lot for persevering, this sure helps me to learn things, and I can only hope that my feeble attempts at getting some of this into DesIRe (the successor of fwsf) will bear some fruits. > The little awk based search tool I made, is much sturdier and can be > used on any unix like platform. It is excruciatingly slow, but > because it is fully regexp based it finds strings with great > precision. Needs some handy way to insert the search string, in an > easy one step manor very badly too. Maybe it could be integrated into nnir.el. Hm. Do you think that in principle the idea of producing a summary buffer containing the search results is a good idea? If so, it might be worth it to try to integrate the two. Basically, nnir.el needs a list of article identifiers as a result. The article identifier needs to contain the group name (in some form) and the article number. So if your tool just spits out the file names, this should be sufficient for searching nnml groups. > Sometimes `ranking'or `heuristics' of some kind aren't what is > needed. :-) > I'm thinking of how to intetgrate that into gnus. It can report hte > group and file name, message id or whatever. I wasn't able to see > in nnir how the lisp code grabs that info from glimpse or wais. But > surely if the tool can pass the article number, filename, message > ID, then gnus can assemble the hits. Yes. Hm. You may wish to have a look at the nnir-run-glimpse function. This function contains two parts. The first part invokes glimpse with the right options. The second part expects glimpse to produce a list of file names, which is then massaged in an appropriate way. It seems that you can reuse the second part (if your tool just prints file names), but have to change the first part a bit. When you have written your nnir-run-harrys-tool function, you can hook it into nnir.el by adding an entry into nnir-engines, like this: (add-to-list 'nnir-engines '(harrys-tool nnir-run-harrys-tool nil)) And then you (setq nnir-search-engine 'harrys-tool), and that's it! (You might need a couple of variables, for example a variable for your home dir, so that you can cut off the right prefix from the file names. Can you understand the code in nnir-run-glimpse that does this?) kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-21 17:27 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-21 22:04 ` Harry Putnam 2000-07-21 22:34 ` nnir/freeWAIS-sf Kai Großjohann 0 siblings, 1 reply; 26+ messages in thread From: Harry Putnam @ 2000-07-21 22:04 UTC (permalink / raw) Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > On 20 Jul 2000, Harry Putnam wrote: > > > bsd > waissearch -d mail 'global resounding and silence' > > > > Search Response: > > NumberOfRecordsReturned: 1 > > 1: Score: 2113, lines: 54 '2177 /home/reader/Mail/ding2/' > > > > Seems to work... but wait.. that message contains neither > > `resounding' nor silence. > > Whee. Hm. Maybe fwsf implements `and' in a fuzzy way. This is > useful for people who issue queries like `term1 and term2 and ... and > term10'. If there are no documents with all ten terms, chances are > that people will be happy with a document containing only nine of > them. It turns out it was finding `global' which looking at the query is legal enough. I should have had an `=' in there. But with either of these: waissearch -d mail global='resounding and silence' waissearch -d mail resounding and silence No hits. What exactly does `GLOBAL' mean and how is that index accessed? Can one make a `free text' query using the nnir example *.fmt? it appears not. Which index holds `GLOBAL' data? [...] snipped tips on integrating tools into nnir ... Thanks > Maybe it could be integrated into nnir.el. Hm. Do you think that in > principle the idea of producing a summary buffer containing the search > results is a good idea? If so, it might be worth it to try to > integrate the two. Do you mean a buffer containing the string matches, not full messages? What would really be cool would be having both available. The ephemeral group assembled and a buffer similar to the emacs M-x grep buffer with hypertext that takes one to the exact hit. Since all gnus needs is the file names to generate the group maybe glimpse could be run with out the `-l' flag and let gnus snatch them out by regexp, allowing the other output to be available for such a buffer. [...] > > (You might need a couple of variables, for example a variable for your > home dir, so that you can cut off the right prefix from the file > names. Can you understand the code in nnir-run-glimpse that does this?) Thanks for the tips on this but in truth its not likely I'll attempt it any time too soon. Fact is I'm no where near competent in any programming language.. let alone lisp. The only thing I can say I know even a small amount about is `awk' and its not a real programming language. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-21 22:04 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-21 22:34 ` Kai Großjohann 2000-07-21 23:12 ` nnir/freeWAIS-sf Harry Putnam 0 siblings, 1 reply; 26+ messages in thread From: Kai Großjohann @ 2000-07-21 22:34 UTC (permalink / raw) Cc: ding On 21 Jul 2000, Harry Putnam wrote: > What exactly does `GLOBAL' mean and how is that index accessed? > Can one make a `free text' query using the nnir example *.fmt? it > appears not. Which index holds `GLOBAL' data? If you say `foo=xyzzy', then the foo field is searched for xyzzy. If you say `xyzzy', then the global field is searched. Parens can be used for grouping, so that `foo=a b' searches for `a' in the foo field and for `b' in the global field, but `foo=(a b)' searches for a in the foo field and for b in the foo field. (The last query is the same as `foo=a foo=b'.) (I said `and', but the actual operator used is `or'. This means, it's sufficient if one condition is true for a document to come out. But the documents that fulfill both conditions come out first in the ranking list, they get a higher score. nnir displays the score in the subject line in square brackets.) In the *.fmt file, the two regexes after `region:' and `end:' define a region in each document. And if the field definition says `GLOBAL' or `BOTH', the terms found in the document region are put in the global field. If the field definition says `LOCAL', the terms from that document region are NOT put in the global field. > Do you mean a buffer containing the string matches, not full > messages? nnir.el only needs the file names. Not the matching lines, not the whole message. Just the file names. > What would really be cool would be having both available. The > ephemeral group assembled and a buffer similar to the emacs M-x grep > buffer with hypertext that takes one to the exact hit. Ah, yes. It's not clear how to implement this, but it would be a cool feature. SFgate provides what's known as `search term highlighting'. This means if you issue a query `foo bar' then all occurrences of `foo' and of `bar' will be red, or bold, or blinking, or whatever. > Since all gnus needs is the file names to generate the group maybe > glimpse could be run with out the `-l' flag and let gnus snatch them > out by regexp, allowing the other output to be available for such a > buffer. That's what (setq nnir-engine 'glimpse) does. See the nnir-run-glimpse function. > Thanks for the tips on this but in truth its not likely I'll attempt > it any time too soon. Fact is I'm no where near competent in any > programming language.. let alone lisp. > > The only thing I can say I know even a small amount about is `awk' > and its not a real programming language. >From a theoretical point of view, awk is just as much a `real' programming language as C or Lisp or whatever. But awk allows you a slow start into programming. I'm sure that before you know it you'll be a programmer. kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-21 22:34 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-21 23:12 ` Harry Putnam 2000-07-22 11:59 ` nnir/freeWAIS-sf Kai Großjohann 0 siblings, 1 reply; 26+ messages in thread From: Harry Putnam @ 2000-07-21 23:12 UTC (permalink / raw) Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > On 21 Jul 2000, Harry Putnam wrote: > > > What exactly does `GLOBAL' mean and how is that index accessed? > > Can one make a `free text' query using the nnir example *.fmt? it > > appears not. Which index holds `GLOBAL' data? > > If you say `foo=xyzzy', then the foo field is searched for xyzzy. > If you say `xyzzy', then the global field is searched. [...] > > What would really be cool would be having both available. The > > ephemeral group assembled and a buffer similar to the emacs M-x grep > > buffer with hypertext that takes one to the exact hit. > > Ah, yes. It's not clear how to implement this, but it would be a cool > feature. SFgate provides what's known as `search term highlighting'. > This means if you issue a query `foo bar' then all occurrences of > `foo' and of `bar' will be red, or bold, or blinking, or whatever. > > > Since all gnus needs is the file names to generate the group maybe > > glimpse could be run with out the `-l' flag and let gnus snatch them > > out by regexp, allowing the other output to be available for such a > > buffer. > > That's what (setq nnir-engine 'glimpse) does. See the > nnir-run-glimpse function. No, it runs glimpse *with* the `-l' flag, I'm saying if glimpse was run *without* that flag then the string match data would be available to be snatched into a buffer somehow. Have you seen Uli's CPAN-WAIT? A wais server app for the .cpan interface.? You type `perl -MCPAN -e shell' and then at the prompt wq QUERY or wq des=QUERY. To search www.perl.com for packages. Uli's Wais part seems good, but the cpan stuff seems out of date and fairly brain dead. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-21 23:12 ` nnir/freeWAIS-sf Harry Putnam @ 2000-07-22 11:59 ` Kai Großjohann 2000-07-22 13:40 ` nnir/freeWAIS-sf Harry Putnam 0 siblings, 1 reply; 26+ messages in thread From: Kai Großjohann @ 2000-07-22 11:59 UTC (permalink / raw) Cc: ding On 21 Jul 2000, Harry Putnam wrote: > No, it runs glimpse *with* the `-l' flag, I'm saying if glimpse was > run *without* that flag then the string match data would be > available to be snatched into a buffer somehow. Sorry. I misread your sentence. Yes, if glimpse was run without the `-l' flag, then the matching line would be available, but how to display it? The idea of nnir.el is to display a summary buffer of the query results, and I thought the subject header is the obvious thing to display there. Hm. But maybe we could display the matching line instead of the subject header. kai -- I like BOTH kinds of music. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: nnir/freeWAIS-sf 2000-07-22 11:59 ` nnir/freeWAIS-sf Kai Großjohann @ 2000-07-22 13:40 ` Harry Putnam 0 siblings, 0 replies; 26+ messages in thread From: Harry Putnam @ 2000-07-22 13:40 UTC (permalink / raw) Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes: > On 21 Jul 2000, Harry Putnam wrote: > > > No, it runs glimpse *with* the `-l' flag, I'm saying if glimpse was > > run *without* that flag then the string match data would be > > available to be snatched into a buffer somehow. > > Sorry. I misread your sentence. Yes, if glimpse was run without the > `-l' flag, then the matching line would be available, but how to > display it? The idea of nnir.el is to display a summary buffer of the > query results, and I thought the subject header is the obvious thing > to display there. Hm. But maybe we could display the matching line > instead of the subject header. Er no, that probably wouldn't be appropriate. I was thinking a separate buffer like some of the `<space>*' buffers, like ` *nntpd' or ` *gnus article copy'. Still harping on a way to have both a summary buffer as we now have *plus* a buffer containing the string match hits. Probably too complicated. Especially if trying to have that extra buffer be similar to `M-x grep' buffers with the added hypertext. However, even without the hypertext, having those line matches available would be handy at times. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2000-07-22 13:40 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2000-07-15 13:53 nnir/freeWAIS-sf Harry Putnam 2000-07-15 18:04 ` nnir/freeWAIS-sf Norman Walsh 2000-07-15 18:10 ` nnir/freeWAIS-sf Francisco Solsona 2000-07-15 21:22 ` nnir/freeWAIS-sf Harry Putnam 2000-07-17 13:51 ` nnir/freeWAIS-sf Francisco Solsona 2000-07-18 1:03 ` nnir/freeWAIS-sf Harry Putnam 2000-07-18 9:06 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-19 0:57 ` nnir/freeWAIS-sf Harry Putnam 2000-07-20 14:34 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-20 18:13 ` nnir/freeWAIS-sf Harry Putnam 2000-07-21 17:31 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-21 22:35 ` nnir/freeWAIS-sf Harry Putnam 2000-07-16 12:25 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-16 16:17 ` nnir/freeWAIS-sf Harry Putnam 2000-07-16 21:43 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-16 22:22 ` nnir/freeWAIS-sf Harry Putnam 2000-07-20 14:44 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-16 23:08 ` nnir/freeWAIS-sf Harry Putnam 2000-07-20 14:48 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-20 16:33 ` nnir/freeWAIS-sf Harry Putnam 2000-07-21 17:27 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-21 22:04 ` nnir/freeWAIS-sf Harry Putnam 2000-07-21 22:34 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-21 23:12 ` nnir/freeWAIS-sf Harry Putnam 2000-07-22 11:59 ` nnir/freeWAIS-sf Kai Großjohann 2000-07-22 13:40 ` nnir/freeWAIS-sf Harry Putnam
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).