From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-14050-mason-zsh=primenet.com.au@sunsite.dk>
Received: (qmail 865 invoked from network); 20 Apr 2001 07:23:47 -0000
Received: from sunsite.dk (130.225.51.30)
  by ns1.primenet.com.au with SMTP; 20 Apr 2001 07:23:47 -0000
Received: (qmail 16402 invoked by alias); 20 Apr 2001 07:23:41 -0000
Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm
Precedence: bulk
X-No-Archive: yes
X-Seq: 14050
Received: (qmail 16388 invoked from network); 20 Apr 2001 07:23:40 -0000
From: "Bart Schaefer" <schaefer@candle.brasslantern.com>
Message-Id: <1010420041158.ZM17254@candle.brasslantern.com>
Date: Fri, 20 Apr 2001 04:11:58 +0000
In-Reply-To: <3ADF2DF6.BE358FD5@u.genie.co.uk>
Comments: In reply to Oliver Kiddle <opk@u.genie.co.uk>
        "Re: completion for MUAs" (Apr 19,  7:27pm)
References: <3AD7461F.A6B691FC@u.genie.co.uk> 
	<1010414180144.ZM3683@candle.brasslantern.com> 
	<3ADF2DF6.BE358FD5@u.genie.co.uk>
X-Mailer: Z-Mail (5.0.0 30July97)
To: zsh-workers@sunsite.dk
Subject: Re: completion for MUAs
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

On Apr 19,  7:27pm, Oliver Kiddle wrote:
} Subject: Re: completion for MUAs
}
} Bart Schaefer wrote:
} 
} > I suggest working out some kind of a plugin model -- paste up a
} > function name using $service or the like, and call it if it exists.
} 
} That's certainly a possibility but I want to allow, for example the
} standard UNIX mail command to complete addresses from the user's
} normal MUA's addressbook.

The problem is that there is no "normal MUA addressbook."  The really old,
found-on-every-Unix-system MUAs don't really have an "addressbook" at all,
just aliases in e.g. the ~/.mailrc file.  Sometimes the location of that
file is stored in the MAILRC environment variable, but sometimes not.

} What do you think this separate function achieves mainly - simplicity,
} speed?

Simplicity more than speed.  Make it possible for a casual user to write
a function to provide his address book to the completion system without
having to understand much more than the format of the address book file.
 
} > Then there's also completion of user names @ the local machine, and
} > maybe even /etc{/mail,}/aliases names or the equivalent for other MTAs.
} 
} Yes. So we need handling for different MTAs too. I also suspect that in
} some contexts (URLs maybe) these might need @localhost appended.

It's almost always better to append @$HOST I think.
 
} > Some MUAs also are happy with comma-separated lists of addresses in one
} > or more command-line arguments while others make each argument an address
} > and don't attempt to parse it further (which can lead to strange problems
} > later when calling the MTA or having an SMTP conversation).
} 
} Is the situation that some MUA's aren't fussed and rely on the MTA to
} complain if it wants to with MTAs varying on whether they accept comma
} separated lists?

Not exactly, though that certainly applies in some cases.  There are a
number of ways in which it can go wrong.  (I used to know all of them,
but I haven't been in the MUA business for a while now.)  Generally it's
just that parsing is applied to things like a "To:" line read from a
file (where multiple addresses are always expected) and skipped when the
command line is involved.  The most common failure mode is that the
SMTP conversation gets `RCPT TO:<user@host,>' (note placement of comma)
and results in a bounce message for an unparsable address.

} > By "distribution lists" do you mean `@groupname:addr,addr,addr;' syntax,
} > or are you talking about e.g. completing individual addresses that appear
} > in the definition of a group or alias in the addressbook, or ...?
} 
} [...] I've not seen syntax you mention. How does that work - is it
} some specific MTA's shorthand or something understood by MTAs and
} defined in an rfc?

It's defined in the standard email RFCs as far back as RFC822, and all
MTAs should understand it ... actually, I typed it wrong in the excerpt
above, it's just `groupname: addresses ;' with no leading @.  With a
leading @, then it's got to be inside < > like `<@route:address>' and
has nothing to do with groups.

} I also seem to remember some syntax, possibly using exclamation marks
} where you can specify the route through specific mail servers the
} e-mail takes

That's very old and dates from the days before the Internet when most
mail was transmitted by UUCP (unix-to-unix copy).  It's the same syntax
that used to be used in the "Path:" header of usenet articles.  It's
just `host1!host2!host3!host4!user', where the intended recipient is
`user@host4' and `host1' is the "first hop" from your host towards the
eventual destination.  Yes, one used to have to know the names of all
the machines in the entire store-and-forward path between machines in
order to get mail delivered.

This was replaced by the <@route:address> syntax, which would look like
`<@host1,@host2,@host3:user@host4>' to mean the same thing.  That syntax
is strongly discouraged now, and most MTAs simply throw away everthing
up to the `:' and decide on their own how to route to `host4'.

} The crux of it is that what I'm concerned about is how does one safely
} parse e-mail addresses. I've read enough of the mail rfcs and sendmail
} configs to know that e-mail addresses and correct parsing of them is
} extremely complicated but I've not retained enough details in my head
} on this to feel confident of doing something which is absolutely
} correct.

The grammar is actually pretty regular (if you ignore things like X.400
addresses [if you don't know, don't ask]) and can be parsed with a regex
with the exception of comments, which may use nested levels of parens
and therefore require a stack machine of some sort.

Here's the whole RFC822 grammar expressed as extendedglob patterns; I
converted this from a set of regex definitions I had, so I may have
missed turning a `+' into a `##' or the like but it should be mostly
correct:

__specialx='][()<>@,;:\\".'
__spacex=" 	"				# Space, tab
__specials="[$__specialx]"
__atom="[^$__specialx$__spacex]##"
__space="[$__spacex]#"				# Really, space or comment
__qtext='[^"\\]'
__qpair='\\?'
__beginq='"'
__endq='(|[^\\])"'
__dot="$__space.$__space"
__comma="$__space,$__space"

__domainref="$__atom"
__domainlit='\[([^]]|'"$__qpair"')#(|[^\\])\]'
__quotedstring="$__beginq($__qtext|$__qpair)#$__endq"
__word="($__atom|$__quotedstring)"
__phrase="($__space$__word$__space)#"		# Strictly, should use `##'
__localpart="$__word($__dot$__word)#"

__subdomain="($__domainref|$__domainlit)"
__domain="$__subdomain($__dot$__subdomain)#"
__addrspec="$__localpart$__space@$__space$__domain"
__domainlist="@$__domain($__comma@$__domain)#"
__route="($__space$__domainlist${__space}:$__space)"
__routeaddr="<$__route#$__addrspec$__space>"
__mailbox="($__addrspec|$__phrase$__space$__routeaddr)"

Note that a comment is legal anywhere a space is legal.  That's the part
that drives everyone bonkers, and I've left it out of the grammar above.

It might be better to rewrite it for zregexparse.

With the above, given a string with the comments stripped out, you should
be able to tell if it is an address with something like:

    [[ $string = $~__mailbox ]] && echo "That's an email address"

} In particular, how do you separate one address from the next one
} safely.

In a strict RFC822 grammar, addresses can be separated only by unquoted
commas.  However, many MUAs fudge and try to guess what the user means
if he uses spaces between the addresses instead.  All bets are off in
an addressbook, though, because the RFC822 rules only apply to addresses
"on the wire".

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com

Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net