From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/32819 Path: main.gmane.org!not-for-mail From: Harry Putnam Newsgroups: gmane.emacs.gnus.general Subject: Seeking suggestions for pet project Date: 07 Oct 2000 07:09:47 -0700 Sender: owner-ding@hpc.uh.edu Message-ID: NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035169037 21912 80.91.224.250 (21 Oct 2002 02:57:17 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 02:57:17 +0000 (UTC) Cc: ding@gnus.org Return-Path: Original-Received: from fisher.math.uh.edu (fisher.math.uh.edu [129.7.128.35]) by mailhost.sclp.com (Postfix) with ESMTP id 3AE7CD051E for ; Sat, 7 Oct 2000 10:14:01 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by fisher.math.uh.edu (8.9.1/8.9.1) with ESMTP id JAC07363; Sat, 7 Oct 2000 09:10:30 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Sat, 07 Oct 2000 09:09:35 -0500 (CDT) Original-Received: from mailhost.sclp.com (postfix@66-209.196.61.interliant.com [209.196.61.66] (may be forged)) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id JAA18466 for ; Sat, 7 Oct 2000 09:09:25 -0500 (CDT) Original-Received: from mail.networkone.net (mail.networkone.net [209.144.112.75]) by mailhost.sclp.com (Postfix) with SMTP id 95297D051E for ; Sat, 7 Oct 2000 10:09:50 -0400 (EDT) Original-Received: (qmail 20935 invoked from network); 7 Oct 2000 14:09:49 -0000 Original-Received: from adsl-116-86.ln.networkone.net (HELO reader.ptw.com) (209.144.116.86) by mail.networkone.net with SMTP; 7 Oct 2000 14:09:49 -0000 Original-Received: (from reader@localhost) by reader.ptw.com (8.9.3/8.9.3) id HAA03485; Sat, 7 Oct 2000 07:09:47 -0700 Original-To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) In-Reply-To: Kai.Grossjohann@CS.Uni-Dortmund.DE's message of "03 Oct 2000 15:53:26 +0200" Original-Lines: 66 User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.5 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:32819 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:32819 [cc'ed to ding list] Kai, I've written a shell/awk script search engine that one might call a targetted multi grep type search. As you know I'm not very experience in programming so it is quite unsophisticated. What sophistication it has is mainly due to help from posters on comp.lang.awk. And with your knowledge of `information retrieval' you'll probably think this flys in the face of current IR wisdom. Even so, I find this approach to be more exacting and produces better results than other techniques I've tried. It does require some familiarity with regexp but not necessarily thorough knowledge. The code is aimed at `one message per file' mail or `usenet' messages, so works well against nnml or nntp messages. Currently the code is too cumbersom and inflexable for popular usage so am planning two major revamps. Currently the user is required to give three RE and a filename on the command line, like (line wrapped for clarity): `search.sh "^From: someone" "^Newsgroups: comp\.lang\.awk" \ "if?\\\(" /home/reader/news-archive' Might find a bunch of examples of author `someone' using `if' clauses in awk. $1 $2 $3 are used to hardwire the RE and nail down there target. $4 is used to nail down the files to search. All three RE must be present to produce a hit. The script expects the first two to be aimed at the headers and will only find them there. The third is expected to be in the body and will only be found there. The script is a small state machine that knows were it is in a message and the state of hits on the RE so far. I've stripped the lengthy documentation comments from the code below, but for further `in depth' commented documentation please go here: http://www.ptw.com/~reader/exp/awk-search.html But be warned that I have made multiple revisions in the script and the comments may be inaccurate or wrong in places, since I haven't cleaned that part up thoroughly yet. The procedure described can be tedious so I am working on revisions that will query the user for `Header RE' and "body RE" and will work with any combination up to 2 header and 2 body. Including no header but 1 body, no body, but 1 header and anything in between. So the command line would only be `search.sh filenames' and the script would query the user for the rest. If you have time to try this script out you'll see that the output has all the ingredients necessary for gnus to display the results in an nnir buffer. Once I get the script cleaned up and revamped, I'll probably want to add it as an nnir engine. In addition to having gnus display the found messages, I'd really like to figure out how to include the grep style hits (single line containing RE) in a separate buffer, in the style of `M-x occur' or `M-x grep' output. My feeling is that often the single line is all thats needed. So having that information in a separate buffer with hypertext to the files would be a very nice addition I think.