From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/34689 Path: main.gmane.org!not-for-mail From: "Steven E. Harris" Newsgroups: gmane.emacs.gnus.general Subject: Re: SMTP question (not quite Gnus-related) Date: 09 Feb 2001 09:26:06 -0800 Organization: Tenzing Communications Inc. Sender: owner-ding@hpc.uh.edu Message-ID: <87u263hioh.fsf@torus.tenzing.com> References: <87y9vujkvd.fsf@torus.tenzing.com> <87lmrij8e2.fsf@inanna.rimspace.net> <8766imnfa9.fsf@torus.tenzing.com> <874ry6j5i7.fsf@inanna.rimspace.net> <87pugtm754.fsf@torus.tenzing.com> <871yt9m4k3.fsf@torus.tenzing.com> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1035170568 31652 80.91.224.250 (21 Oct 2002 03:22:48 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 03:22:48 +0000 (UTC) Return-Path: Original-Received: from karazm.math.uh.edu (karazm.math.uh.edu [129.7.128.1]) by mailhost.sclp.com (Postfix) with ESMTP id 486A3D049E for ; Fri, 9 Feb 2001 12:32:15 -0500 (EST) Original-Received: from sina.hpc.uh.edu (lists@Sina.HPC.UH.EDU [129.7.3.5]) by karazm.math.uh.edu (8.9.3/8.9.3) with ESMTP id LAC13078; Fri, 9 Feb 2001 11:32:00 -0600 (CST) Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Fri, 09 Feb 2001 11:31:03 -0600 (CST) Original-Received: from mailhost.sclp.com (postfix@66-209.196.61.interliant.com [209.196.61.66] (may be forged)) by sina.hpc.uh.edu (8.9.3/8.9.3) with ESMTP id LAA29165 for ; Fri, 9 Feb 2001 11:30:50 -0600 (CST) Original-Received: from ts-exch01.tenzing.com (ts-exch01.tenzing.com [63.115.0.25]) by mailhost.sclp.com (Postfix) with ESMTP id E3BBAD049E for ; Fri, 9 Feb 2001 12:31:12 -0500 (EST) Original-Received: from torus (torus.seattle.tenzing.com [63.115.3.200]) by ts-exch01.tenzing.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id 1PWQJ1ML; Fri, 9 Feb 2001 09:31:10 -0800 Original-Received: from seh by torus with local (Exim 3.12 #1 (Debian)) id 14RHJ8-0004Uc-00 for ; Fri, 09 Feb 2001 09:26:06 -0800 Original-To: ding@gnus.org In-Reply-To: Kai.Grossjohann@CS.Uni-Dortmund.DE's message of "09 Feb 2001 13:11:42 +0100" User-Agent: Gnus/5.0807 (Gnus v5.8.7) XEmacs/21.1 (Capitol Reef) Precedence: list X-Majordomo: 1.94.jlt7 Original-Lines: 79 Xref: main.gmane.org gmane.emacs.gnus.general:34689 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:34689 Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Gro=DFjohann) writes: > I don't think you should treat the differently for empty > messages. Here's how to send an empty message: >=20 > DATA > . >=20 > Here's a nonempty message: >=20 > DATA > foo > . >=20 > The message content is one line, containing the characters "foo" plus > the EOL that comes after it. That's what I was trying to say, but I'm stuck seeing it from the point of view of my problem space. For now we're not sending messages; we're only consuming them. I'm writing an SMTP "DATA" stream filter that only gets its first byte *after* the "DATA" is consumed by another part of the system. One part dispatches on "command lines," so after seeing the "DATA" command on a line, it fires off the "DATA handler" part, which is what I'm refining. As you can see, I never get the after the "DATA" command. That's okay, though, with our latest consensus on RFC821. My solution (based upon this interpretation) uses a state machine that's sensitive to the "beginning of line" state. Only there do we consider whether we're encountering a terminator. The saving grace is that the first in the "." terminator can be left as part of the body. That means that the only role the first plays is to return the state machine to the "beginning of line" state. I had two other interpretations of RFC821 and two other corresponding state machine implementations. Just for completeness, I'll explain the differences from our latest consensus. One of them insisted upon "." as the complete terminator, even from the start state. The first is considered part of the terminator, not the body, so we had a four-byte lookahead scheme. It only requires caching one byte for unwinding. The bad part of this scheme is that it doesn't tolerate "DATA.," because it's really looking for "DATA.." The second one was similar to the last one described, but tolerated a three-byte terminator (".") only from the start state. It could therefore accept "DATA." as a valid stream. The state machine for this version was an augmented version of the prior, with a few extra states for the potential initial three-byte terminator. It was tractable, but a little ugly. It's apparent now that all of this discussion would have been much simpler if: 1. I wasn't faced with the problem that the first after the "DATA" string is clipped from my view. 2. I hadn't assumed that the first in "." is part of the terminator rather than the body. This assumption required unwinding the first in an incomplete terminator. One poster suggested simply "starting out as though you've just seen the first ." The problem with that approach is unwinding. If you saw ".a" as the first three bytes of the stream, you'd need to unwind just the , then emit the "a." As an alternate scenario, consider that you're mid-way through a more complete body and you encounter ".a." You then need to unwind "," then emit the "a." The unwinding involves four steps rather than two in the first scenario. It's a different path through the state machine, so you either need different states or some transition control flags to guide you through the different paths. I can share this code (it's Java) if anyone is interested in seeing some of these variations. --=20 Steven E. Harris :: steven.harris@tenzing.com Tenzing :: http://www.tenzing.com