Gnus development mailing list
 help / color / mirror / Atom feed
From: "Steven E. Harris" <steven.harris@tenzing.com>
Subject: Re: SMTP question (not quite Gnus-related)
Date: 09 Feb 2001 09:26:06 -0800	[thread overview]
Message-ID: <87u263hioh.fsf@torus.tenzing.com> (raw)
In-Reply-To: Kai.Grossjohann@CS.Uni-Dortmund.DE's message of "09 Feb 2001 13:11:42 +0100"

Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> I don't think you should treat the <CRLF> differently for empty
> messages.  Here's how to send an empty message:
> 
> DATA<CRLF>
> .<CRLF>
> 
> Here's a nonempty message:
> 
> DATA<CRLF>
> foo<CRLF>
> .<CRLF>
> 
> The message content is one line, containing the characters "foo" plus
> the EOL that comes after it.

That's what I was trying to say, but I'm stuck seeing it from the
point of view of my problem space. For now we're not sending messages;
we're only consuming them. I'm writing an SMTP "DATA" stream filter
that only gets its first byte *after* the "DATA<CRLF>" is consumed by
another part of the system. One part dispatches on "command lines," so
after seeing the "DATA" command on a line, it fires off the "DATA
handler" part, which is what I'm refining. As you can see, I never get
the <CRLF> after the "DATA" command. That's okay, though, with our
latest consensus on RFC821.

My solution (based upon this interpretation) uses a state machine
that's sensitive to the "beginning of line" state. Only there do we
consider whether we're encountering a terminator. The saving grace is
that the first <CRLF> in the "<CRLF>.<CRLF>" terminator can be left as
part of the body. That means that the only role the first <CRLF> plays
is to return the state machine to the "beginning of line" state.

I had two other interpretations of RFC821 and two other corresponding
state machine implementations. Just for completeness, I'll explain the
differences from our latest consensus.

One of them insisted upon "<CRLF>.<CRLF>" as the complete terminator,
even from the start state. The first <CRLF> is considered part of the
terminator, not the body, so we had a four-byte lookahead scheme. It
only requires caching one byte for unwinding. The bad part of this
scheme is that it doesn't tolerate "DATA<CRLF>.<CRLF>," because it's
really looking for "DATA<CRLF><CRLF>.<CRLF>."

The second one was similar to the last one described, but tolerated a
three-byte terminator (".<CRLF>") only from the start state. It could
therefore accept "DATA<CRLF>.<CRLF>" as a valid stream. The state
machine for this version was an augmented version of the prior, with a
few extra states for the potential initial three-byte terminator. It
was tractable, but a little ugly.

It's apparent now that all of this discussion would have been much
simpler if:

1. I wasn't faced with the problem that the first <CRLF> after the
   "DATA" string is clipped from my view.

2. I hadn't assumed that the first <CRLF> in "<CRLF>.<CRLF>" is part
   of the terminator rather than the body. This assumption required
   unwinding the first <CRLF> in an incomplete terminator.

One poster suggested simply "starting out as though you've just seen
the first <CRLF>." The problem with that approach is unwinding. If you
saw ".<CR>a" as the first three bytes of the stream, you'd need to
unwind just the <CR>, then emit the "a." As an alternate scenario,
consider that you're mid-way through a more complete body and you
encounter "<CRLF>.<CR>a." You then need to unwind "<CRLF><CR>," then
emit the "a." The unwinding involves four steps rather than two in the
first scenario. It's a different path through the state machine, so
you either need different states or some transition control flags to
guide you through the different paths.

I can share this code (it's Java) if anyone is interested in seeing
some of these variations.

-- 
Steven E. Harris        :: steven.harris@tenzing.com
Tenzing                 :: http://www.tenzing.com



  reply	other threads:[~2001-02-09 17:26 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-01-29 18:04 Steven E. Harris
2001-02-07 17:56 ` Kai Großjohann
2001-02-08  1:00   ` Daniel Pittman
2001-02-08  1:18     ` Steven E. Harris
2001-02-08  2:03       ` Daniel Pittman
2001-02-08 13:24         ` Kai Großjohann
2001-02-08 17:11           ` Steven E. Harris
2001-02-08 17:25             ` Paul Jarc
2001-02-08 17:30               ` Paul Jarc
2001-02-08 18:02               ` Steven E. Harris
2001-02-08 18:20                 ` Paul Jarc
2001-02-09 12:09                 ` Kai Großjohann
2001-02-09 17:33                   ` Steven E. Harris
2001-02-08 17:33             ` Kai Großjohann
2001-02-08 18:07               ` Steven E. Harris
2001-02-09 12:11                 ` Kai Großjohann
2001-02-09 17:26                   ` Steven E. Harris [this message]
2001-02-09  0:42           ` Daniel Pittman
2001-02-08 13:22     ` Kai Großjohann
2001-02-08 17:18       ` Steven E. Harris
2001-02-08 17:37         ` Kai Großjohann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87u263hioh.fsf@torus.tenzing.com \
    --to=steven.harris@tenzing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).