From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/31871 Path: main.gmane.org!not-for-mail From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=) Newsgroups: gmane.emacs.gnus.general Subject: Re: nnir/freeWAIS-sf Date: Thu, 20 Jul 2000 16:34:39 +0200 Sender: owner-ding@hpc.uh.edu Message-ID: References: <86g0pb4646.fsf@beta.fciencias.unam.mx> <861z0sx3uh.fsf@beta.fciencias.unam.mx> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035168232 16501 80.91.224.250 (21 Oct 2002 02:43:52 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 02:43:52 +0000 (UTC) Cc: ding@gnus.org Return-Path: Original-Received: from fisher.math.uh.edu (fisher.math.uh.edu [129.7.128.35]) by mailhost.sclp.com (Postfix) with ESMTP id 8A518D051F for ; Thu, 20 Jul 2000 10:36:10 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by fisher.math.uh.edu (8.9.1/8.9.1) with ESMTP id JAC27792; Thu, 20 Jul 2000 09:35:56 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Thu, 20 Jul 2000 09:35:03 -0500 (CDT) Original-Received: from mailhost.sclp.com (postfix@sclp3.sclp.com [204.252.123.139]) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id JAA01928 for ; Thu, 20 Jul 2000 09:34:40 -0500 (CDT) Original-Received: from waldorf.cs.uni-dortmund.de (waldorf.cs.uni-dortmund.de [129.217.4.42]) by mailhost.sclp.com (Postfix) with ESMTP id ECDE3D051F for ; Thu, 20 Jul 2000 10:35:13 -0400 (EDT) Original-Received: from marcy.cs.uni-dortmund.de (marcy.cs.uni-dortmund.de [129.217.20.159]) by waldorf.cs.uni-dortmund.de with ESMTP id QAA29308; Thu, 20 Jul 2000 16:34:40 +0200 (MES) Original-Received: from lucy.cs.uni-dortmund.de (lucy [129.217.20.160]) by marcy.cs.uni-dortmund.de id QAA24606; Thu, 20 Jul 2000 16:34:39 +0200 (MET DST) Original-Received: (from grossjoh@localhost) by lucy.cs.uni-dortmund.de (8.9.3/8.9.3/Debian 8.9.3-21) id QAA05160; Thu, 20 Jul 2000 16:34:39 +0200 X-Authentication-Warning: lucy.cs.uni-dortmund.de: grossjoh set sender to Kai.Grossjohann@CS.Uni-Dortmund.DE using -f Original-To: Harry Putnam In-Reply-To: Harry Putnam's message of "18 Jul 2000 17:57:00 -0700" User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.0.90 Original-Lines: 43 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:31871 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:31871 On 18 Jul 2000, Harry Putnam wrote: > waissearch seems to only display a hit on file if it is in the subject > line even though the spec does not specify that. > > I suspect it has something to do with the *.fmt files liberal use of > the `BOTH' specifier... So experimenting with that. Hm. Right. My database also fails to contain `fmt' in the dictionary for the global field. Hm. I wish there was a debugging switch to waisindex where it told me what it thought about the document. This way, debugging would be much easier. I still think it's a problem with the indexing process. Ie, once we get the format file right, Bob will be our uncle. > As a side note, I don't really understand why the `body' regexp, the > one beginning with: > > region: /^$/ Needs a non-matching regexp. Well, I wanted to make sure that all the remaining lines in each file will be indexed in the body field, and choosing a non-matching regexp is a sure way to have waisindex go through till the end of the file. Maybe I need to say a few words about how waisindex works. Here's what it does: for each region defined in the fmt file, it goes through the whole file to be indexed, line by line. It starts indexing on the first line that matches the start regexp. It then reads the next line and also adds it to the field, until it reads a line which matches the end regexp (or reaches end of file). This process is repeated for each region defined in the fmt file, and for each file. (I'm not sure if it goes through all files for the first region, then goes through all files for the second region, or if it goes through the first file several times for each region, then goes through the second file several times.) kai -- I like BOTH kinds of music.