From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.user/7117 Path: news.gmane.org!not-for-mail From: jpranav@cisco.com (Pranav K. Tiwari) Newsgroups: gmane.emacs.gnus.user Subject: Re: nnml article filenames Date: Thu, 20 Apr 2006 14:29:16 +0530 Organization: Cisco Systems Inc. Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1145526070 20644 80.91.229.2 (20 Apr 2006 09:41:10 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 20 Apr 2006 09:41:10 +0000 (UTC) Original-X-From: info-gnus-english-bounces+gegu-info-gnus-english=m.gmane.org@gnu.org Thu Apr 20 11:41:08 2006 Return-path: Envelope-to: gegu-info-gnus-english@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FWVes-000105-AN for gegu-info-gnus-english@m.gmane.org; Thu, 20 Apr 2006 11:41:06 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FWVer-0007C1-H2 for gegu-info-gnus-english@m.gmane.org; Thu, 20 Apr 2006 05:41:05 -0400 Original-Path: shelby.stanford.edu!newsfeed.stanford.edu!headwall.stanford.edu!newshub.sdsu.edu!peer01.west.cox.net!cox.net!hwmnpeer01.phx!hwmedia!news.highwinds-media.com!hw-filter.phx!newsfe07.phx.POSTED!53ab2750!not-for-mail Original-Newsgroups: gnu.emacs.gnus User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3 (windows-nt) Cancel-Lock: sha1:Eo8yEwuZHjYgnfbFmX1mg4aVCXA= Cache-Post-Path: sj-nntpcache-3!unknown@64.103.146.95 X-Cache: nntpcache 2.4.0b2 (see http://www.nntpcache.org/) Original-Lines: 106 Original-NNTP-Posting-Host: 171.69.11.153 Original-X-Complaints-To: newsadmin@cisco.com Original-X-Trace: newsfe07.phx 1145523557 171.69.11.153 (Thu, 20 Apr 2006 01:59:17 MST) Original-NNTP-Posting-Date: Thu, 20 Apr 2006 01:59:17 MST Original-Xref: shelby.stanford.edu gnu.emacs.gnus:77299 Original-To: info-gnus-english@gnu.org X-BeenThere: info-gnus-english@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Announcements and discussions for GNUS, the GNU Emacs Usenet newsreader \(in English\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: info-gnus-english-bounces+gegu-info-gnus-english=m.gmane.org@gnu.org Errors-To: info-gnus-english-bounces+gegu-info-gnus-english=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.gnus.user:7117 Archived-At: jpranav@cisco.com (Pranav K. Tiwari) writes: > Steve Youngs writes: > >> * Pranav K Tiwari writes: >> >> > Steve Youngs writes: >> >> * Pranav K Tiwari writes: >> >> >> >> > To allow desktop search programs go through nnml articles, I would >> >> > like to give an extension like .xyz, and tell these programs to >> >> > treat these files like email. >> >> >> >> I think this is the wrong approach. Instead of modifying the >> >> filenames to suit the search program, find a way to make the search >> >> program work properly. >> >> >> >> It's really not that difficult, see... >> >> >> >> $ find -type f -regex '^.*[0-9]+$' >> >> >> >> > The question is not about 'finding' these files, but about >> > associating a 'type' with the file. >> >> But if you can find them, there's really no point in associating a >> "type" to them. >> >> $ find -type f -regex '^.*[0-9]+$' | \ >> xargs some_app_needing_mail_files_as_input >> >> > Most indexing programs (google/yahoo/microsoft desktop search >> > engines, X1) rely on file extensions to determine the filetype, >> > and then index the contens of the file accordingly. It'll be good >> > if they could deal with files with no extensions, but they don't >> > (afaik). >> >> Yes they do. For example: >> >> >> >> > So - with that in mind, the easiest way would be to change the way gnus >> > nnml stores files, or write another backend that allows changing >> > filenames. >> >> Maybe you should say what it is exactly that you want to do with your >> nnml files. >> > > swish is fine - that's what I've used till now. I've been unable to use > it to index all of my email periodically. I would like to say, here's > the top directory under which all my nnml mail is, and this should be > indexed periodically. But swish runs out of memory (even with -e option, > on my 512Meg Win2k machine) in trying to index my mails (some, 35-40 > nnml folders, each with 2000-5000 emails). So, the way I use swish is to > have one index file per nnml folder, and I have modified the swish > search function to search a list of index files. > > It works, but as you can see, it's not optimal. Maybe, my usage of swish > is not correct - and if so, I'll be glad to be corrected. > > desktop search programs that I mentioned, all support a 'crawl' type of > indexing where they can keep track of what has changed, and update their > indices appropriately. And I have never had any trouble with memory with > them. That's why I'll like to use any of those to index my mail, instead > of swish that I'm using at present. > > -p I've had some success with it by modifying nnml.el to store articles with an extension. So, instead of storing articles as group/N, I store it as group/N.nnml, and then configure the search engine to treat .nnml file as a text file. Works well - much better than swish_e for the 50k emails that I have. Diffs attached, in case anyone else cares. regards, -p --------------------------------------------------------------------------- Index: lisp/nnml.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/nnml.el,v retrieving revision 7.8 diff -r7.8 nnml.el 512a513,517 > (defvar pkt:nnml-txt-ext ".nnml" > "*extension for nnml files") > (defvar pkt:nnml-use-txt-extension t > "should text extension be used?") > 513a519,526 > (let (file) > (setq file (nnml-article-to-file-original article)) > (if (file-exists-p file) > file > (if pkt:nnml-use-txt-extension > (concat file pkt:nnml-txt-ext))))) > > (defun nnml-article-to-file-original (article) 621a635,637 > (setq text-ext > (if pkt:nnml-use-txt-extension > pkt:nnml-txt-ext)) 640a657 > text-ext