From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 865 invoked from network); 20 Apr 2001 07:23:47 -0000 Received: from sunsite.dk (130.225.51.30) by ns1.primenet.com.au with SMTP; 20 Apr 2001 07:23:47 -0000 Received: (qmail 16402 invoked by alias); 20 Apr 2001 07:23:41 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 14050 Received: (qmail 16388 invoked from network); 20 Apr 2001 07:23:40 -0000 From: "Bart Schaefer" Message-Id: <1010420041158.ZM17254@candle.brasslantern.com> Date: Fri, 20 Apr 2001 04:11:58 +0000 In-Reply-To: <3ADF2DF6.BE358FD5@u.genie.co.uk> Comments: In reply to Oliver Kiddle "Re: completion for MUAs" (Apr 19, 7:27pm) References: <3AD7461F.A6B691FC@u.genie.co.uk> <1010414180144.ZM3683@candle.brasslantern.com> <3ADF2DF6.BE358FD5@u.genie.co.uk> X-Mailer: Z-Mail (5.0.0 30July97) To: zsh-workers@sunsite.dk Subject: Re: completion for MUAs MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii On Apr 19, 7:27pm, Oliver Kiddle wrote: } Subject: Re: completion for MUAs } } Bart Schaefer wrote: } } > I suggest working out some kind of a plugin model -- paste up a } > function name using $service or the like, and call it if it exists. } } That's certainly a possibility but I want to allow, for example the } standard UNIX mail command to complete addresses from the user's } normal MUA's addressbook. The problem is that there is no "normal MUA addressbook." The really old, found-on-every-Unix-system MUAs don't really have an "addressbook" at all, just aliases in e.g. the ~/.mailrc file. Sometimes the location of that file is stored in the MAILRC environment variable, but sometimes not. } What do you think this separate function achieves mainly - simplicity, } speed? Simplicity more than speed. Make it possible for a casual user to write a function to provide his address book to the completion system without having to understand much more than the format of the address book file. } > Then there's also completion of user names @ the local machine, and } > maybe even /etc{/mail,}/aliases names or the equivalent for other MTAs. } } Yes. So we need handling for different MTAs too. I also suspect that in } some contexts (URLs maybe) these might need @localhost appended. It's almost always better to append @$HOST I think. } > Some MUAs also are happy with comma-separated lists of addresses in one } > or more command-line arguments while others make each argument an address } > and don't attempt to parse it further (which can lead to strange problems } > later when calling the MTA or having an SMTP conversation). } } Is the situation that some MUA's aren't fussed and rely on the MTA to } complain if it wants to with MTAs varying on whether they accept comma } separated lists? Not exactly, though that certainly applies in some cases. There are a number of ways in which it can go wrong. (I used to know all of them, but I haven't been in the MUA business for a while now.) Generally it's just that parsing is applied to things like a "To:" line read from a file (where multiple addresses are always expected) and skipped when the command line is involved. The most common failure mode is that the SMTP conversation gets `RCPT TO:' (note placement of comma) and results in a bounce message for an unparsable address. } > By "distribution lists" do you mean `@groupname:addr,addr,addr;' syntax, } > or are you talking about e.g. completing individual addresses that appear } > in the definition of a group or alias in the addressbook, or ...? } } [...] I've not seen syntax you mention. How does that work - is it } some specific MTA's shorthand or something understood by MTAs and } defined in an rfc? It's defined in the standard email RFCs as far back as RFC822, and all MTAs should understand it ... actually, I typed it wrong in the excerpt above, it's just `groupname: addresses ;' with no leading @. With a leading @, then it's got to be inside < > like `<@route:address>' and has nothing to do with groups. } I also seem to remember some syntax, possibly using exclamation marks } where you can specify the route through specific mail servers the } e-mail takes That's very old and dates from the days before the Internet when most mail was transmitted by UUCP (unix-to-unix copy). It's the same syntax that used to be used in the "Path:" header of usenet articles. It's just `host1!host2!host3!host4!user', where the intended recipient is `user@host4' and `host1' is the "first hop" from your host towards the eventual destination. Yes, one used to have to know the names of all the machines in the entire store-and-forward path between machines in order to get mail delivered. This was replaced by the <@route:address> syntax, which would look like `<@host1,@host2,@host3:user@host4>' to mean the same thing. That syntax is strongly discouraged now, and most MTAs simply throw away everthing up to the `:' and decide on their own how to route to `host4'. } The crux of it is that what I'm concerned about is how does one safely } parse e-mail addresses. I've read enough of the mail rfcs and sendmail } configs to know that e-mail addresses and correct parsing of them is } extremely complicated but I've not retained enough details in my head } on this to feel confident of doing something which is absolutely } correct. The grammar is actually pretty regular (if you ignore things like X.400 addresses [if you don't know, don't ask]) and can be parsed with a regex with the exception of comments, which may use nested levels of parens and therefore require a stack machine of some sort. Here's the whole RFC822 grammar expressed as extendedglob patterns; I converted this from a set of regex definitions I had, so I may have missed turning a `+' into a `##' or the like but it should be mostly correct: __specialx='][()<>@,;:\\".' __spacex=" " # Space, tab __specials="[$__specialx]" __atom="[^$__specialx$__spacex]##" __space="[$__spacex]#" # Really, space or comment __qtext='[^"\\]' __qpair='\\?' __beginq='"' __endq='(|[^\\])"' __dot="$__space.$__space" __comma="$__space,$__space" __domainref="$__atom" __domainlit='\[([^]]|'"$__qpair"')#(|[^\\])\]' __quotedstring="$__beginq($__qtext|$__qpair)#$__endq" __word="($__atom|$__quotedstring)" __phrase="($__space$__word$__space)#" # Strictly, should use `##' __localpart="$__word($__dot$__word)#" __subdomain="($__domainref|$__domainlit)" __domain="$__subdomain($__dot$__subdomain)#" __addrspec="$__localpart$__space@$__space$__domain" __domainlist="@$__domain($__comma@$__domain)#" __route="($__space$__domainlist${__space}:$__space)" __routeaddr="<$__route#$__addrspec$__space>" __mailbox="($__addrspec|$__phrase$__space$__routeaddr)" Note that a comment is legal anywhere a space is legal. That's the part that drives everyone bonkers, and I've left it out of the grammar above. It might be better to rewrite it for zregexparse. With the above, given a string with the comments stripped out, you should be able to tell if it is an address with something like: [[ $string = $~__mailbox ]] && echo "That's an email address" } In particular, how do you separate one address from the next one } safely. In a strict RFC822 grammar, addresses can be separated only by unquoted commas. However, many MUAs fudge and try to guess what the user means if he uses spaces between the addresses instead. All bets are off in an addressbook, though, because the RFC822 rules only apply to addresses "on the wire". -- Bart Schaefer Brass Lantern Enterprises http://www.well.com/user/barts http://www.brasslantern.com Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net