From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/30940 Path: main.gmane.org!not-for-mail From: Steinar Bang Newsgroups: gmane.emacs.gnus.general Subject: Re: naked URLs -- a little data (Re: The .. rule) Date: 15 May 2000 09:06:57 +0200 Sender: owner-ding@hpc.uh.edu Message-ID: References: <00May12.111709edt.115683@gateway.intersys.com> <200005121547.RAA12153@marcy.cs.uni-dortmund.de> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035167410 11249 80.91.224.250 (21 Oct 2002 02:30:10 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 02:30:10 +0000 (UTC) Return-Path: Original-Received: from lisa.math.uh.edu (lisa.math.uh.edu [129.7.128.49]) by mailhost.sclp.com (Postfix) with ESMTP id 47B40D051F for ; Mon, 15 May 2000 03:08:52 -0400 (EDT) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by lisa.math.uh.edu (8.9.1/8.9.1) with ESMTP id CAB25944; Mon, 15 May 2000 02:08:51 -0500 (CDT) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Mon, 15 May 2000 02:08:13 -0500 (CDT) Original-Received: from mailhost.sclp.com (postfix@sclp3.sclp.com [204.252.123.139]) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id CAA19382 for ; Mon, 15 May 2000 02:08:01 -0500 (CDT) Original-Received: from viffer.metis.no (viffer.oslo.metis.no [195.0.254.249]) by mailhost.sclp.com (Postfix) with ESMTP id 157CED051F for ; Mon, 15 May 2000 03:07:02 -0400 (EDT) Original-Received: (from sb@localhost) by viffer.metis.no (8.9.3/8.9.3) id JAA27167; Mon, 15 May 2000 09:06:57 +0200 Original-To: ding@gnus.org In-Reply-To: Karl Kleinpaste's message of "14 May 2000 17:30:53 -0400" Original-Lines: 70 User-Agent: Gnus/5.0807 (Gnus v5.8.7) XEmacs/20.4 (Emerald) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:30940 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:30940 >>>>> Karl Kleinpaste : > As far as I can see, naked URL highlighting was correct on only 7 > distinct strings (out of a total of 49), for an error rate of 86%. grep "@" | wc -l on the list gave me 24, so if we could eliminate all message ids and email addresses (no, I don't feel a particular urge into making these be mailto: URLs), we would be on to an error rate of about 50% (assuming that 49 samples is statistically significant). Would that be acceptable. If so, then these would be the matches: 4 8 imap.akamai.com 7 6 abcselect.tar.gz 16 3 halfdome.holdit.com 17 3 group.name.here 18 3 dodrt.dod.no 21 2 mail.company.com 25 2 gnu.emacs.gnus 27 1 www.cs.utwente.nl/~hoepman (I'm not sure why the domain name of this one matches, but it does) 28 1 site.config.m4 31 1 mail-source.el.old 32 1 mail-source.el.new 35 1 ftp.gnus.org 37 1 foo.bar.tar.gz 38 1 foo.bar.com 40 1 every.second.dot 42 1 bold.bold.bold 46 1 5.0.2.34.XXX 47 1 21.2.b31 48 1 0.9.5a 49 1 /home/justin/News/agent/nnimap/imap.akamai.com/.imap/inbox If we require that a naked URL is preceeded by white space, then these would be the matches: 4 8 imap.akamai.com 7 6 abcselect.tar.gz 16 3 halfdome.holdit.com 17 3 group.name.here 18 3 dodrt.dod.no 21 2 mail.company.com 25 2 gnu.emacs.gnus 27 1 www.cs.utwente.nl/~hoepman 28 1 site.config.m4 35 1 ftp.gnus.org 38 1 foo.bar.com 40 1 every.second.dot 42 1 bold.bold.bold (note that the current match disallows "-" as a word character) If we require that the last word is one of the top level domains (".com", ".edu" etc.), or a two-letter combination, we would get the following matches: 4 8 imap.akamai.com 7 6 abcselect.tar.gz 16 3 halfdome.holdit.com 18 3 dodrt.dod.no 21 2 mail.company.com 35 1 ftp.gnus.org 38 1 foo.bar.com Of these, only two are legal URLs (dodrt.dod.no and halfdome.holdit.com). The question is if these narrowings make the error rate low enough to be acceptable?