From: "Steven E. Harris" <steven.harris@tenzing.com>
Subject: Re: SMTP question (not quite Gnus-related)
Date: 09 Feb 2001 09:26:06 -0800 [thread overview]
Message-ID: <87u263hioh.fsf@torus.tenzing.com> (raw)
In-Reply-To: Kai.Grossjohann@CS.Uni-Dortmund.DE's message of "09 Feb 2001 13:11:42 +0100"
Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
> I don't think you should treat the <CRLF> differently for empty
> messages. Here's how to send an empty message:
>
> DATA<CRLF>
> .<CRLF>
>
> Here's a nonempty message:
>
> DATA<CRLF>
> foo<CRLF>
> .<CRLF>
>
> The message content is one line, containing the characters "foo" plus
> the EOL that comes after it.
That's what I was trying to say, but I'm stuck seeing it from the
point of view of my problem space. For now we're not sending messages;
we're only consuming them. I'm writing an SMTP "DATA" stream filter
that only gets its first byte *after* the "DATA<CRLF>" is consumed by
another part of the system. One part dispatches on "command lines," so
after seeing the "DATA" command on a line, it fires off the "DATA
handler" part, which is what I'm refining. As you can see, I never get
the <CRLF> after the "DATA" command. That's okay, though, with our
latest consensus on RFC821.
My solution (based upon this interpretation) uses a state machine
that's sensitive to the "beginning of line" state. Only there do we
consider whether we're encountering a terminator. The saving grace is
that the first <CRLF> in the "<CRLF>.<CRLF>" terminator can be left as
part of the body. That means that the only role the first <CRLF> plays
is to return the state machine to the "beginning of line" state.
I had two other interpretations of RFC821 and two other corresponding
state machine implementations. Just for completeness, I'll explain the
differences from our latest consensus.
One of them insisted upon "<CRLF>.<CRLF>" as the complete terminator,
even from the start state. The first <CRLF> is considered part of the
terminator, not the body, so we had a four-byte lookahead scheme. It
only requires caching one byte for unwinding. The bad part of this
scheme is that it doesn't tolerate "DATA<CRLF>.<CRLF>," because it's
really looking for "DATA<CRLF><CRLF>.<CRLF>."
The second one was similar to the last one described, but tolerated a
three-byte terminator (".<CRLF>") only from the start state. It could
therefore accept "DATA<CRLF>.<CRLF>" as a valid stream. The state
machine for this version was an augmented version of the prior, with a
few extra states for the potential initial three-byte terminator. It
was tractable, but a little ugly.
It's apparent now that all of this discussion would have been much
simpler if:
1. I wasn't faced with the problem that the first <CRLF> after the
"DATA" string is clipped from my view.
2. I hadn't assumed that the first <CRLF> in "<CRLF>.<CRLF>" is part
of the terminator rather than the body. This assumption required
unwinding the first <CRLF> in an incomplete terminator.
One poster suggested simply "starting out as though you've just seen
the first <CRLF>." The problem with that approach is unwinding. If you
saw ".<CR>a" as the first three bytes of the stream, you'd need to
unwind just the <CR>, then emit the "a." As an alternate scenario,
consider that you're mid-way through a more complete body and you
encounter "<CRLF>.<CR>a." You then need to unwind "<CRLF><CR>," then
emit the "a." The unwinding involves four steps rather than two in the
first scenario. It's a different path through the state machine, so
you either need different states or some transition control flags to
guide you through the different paths.
I can share this code (it's Java) if anyone is interested in seeing
some of these variations.
--
Steven E. Harris :: steven.harris@tenzing.com
Tenzing :: http://www.tenzing.com
next prev parent reply other threads:[~2001-02-09 17:26 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-01-29 18:04 Steven E. Harris
2001-02-07 17:56 ` Kai Großjohann
2001-02-08 1:00 ` Daniel Pittman
2001-02-08 1:18 ` Steven E. Harris
2001-02-08 2:03 ` Daniel Pittman
2001-02-08 13:24 ` Kai Großjohann
2001-02-08 17:11 ` Steven E. Harris
2001-02-08 17:25 ` Paul Jarc
2001-02-08 17:30 ` Paul Jarc
2001-02-08 18:02 ` Steven E. Harris
2001-02-08 18:20 ` Paul Jarc
2001-02-09 12:09 ` Kai Großjohann
2001-02-09 17:33 ` Steven E. Harris
2001-02-08 17:33 ` Kai Großjohann
2001-02-08 18:07 ` Steven E. Harris
2001-02-09 12:11 ` Kai Großjohann
2001-02-09 17:26 ` Steven E. Harris [this message]
2001-02-09 0:42 ` Daniel Pittman
2001-02-08 13:22 ` Kai Großjohann
2001-02-08 17:18 ` Steven E. Harris
2001-02-08 17:37 ` Kai Großjohann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87u263hioh.fsf@torus.tenzing.com \
--to=steven.harris@tenzing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).