Gnus development mailing list
 help / color / mirror / Atom feed
* SMTP question (not quite Gnus-related)
@ 2001-01-29 18:04 Steven E. Harris
  2001-02-07 17:56 ` Kai Großjohann
  0 siblings, 1 reply; 21+ messages in thread
From: Steven E. Harris @ 2001-01-29 18:04 UTC (permalink / raw)


I'm not sure if this question is appropriate for this forum, but I
trust that avid Gnus contributors and users may know the answer.

Regarding the SMTP DATA command: must a *completely blank* e-mail
message be submitted as:

  <CRLF>.<CRLF>

or simply as

  .<CRLF>

RFC821 is somewhat ambiguous - or contradictory - about the point. In
section 4.1.1 (Command Semantics), under DATA, we find:

,----[ RFC821 4.1.1 ]
| The mail data is terminated by a line containing only a period, that
| is the character sequence "<CRLF>.<CRLF>" (see Section 4.5.2 on
| Transparency).  This is the end of mail data indication
`----

This suggests that the *first* <CRLF> - the terminator for the line
before - is actually part of the terminating sequence, and hence is
not actually part of the mail body.

We know that the <CRLF> at the end of the DATA command is mandatory,
for section 4.1.2 (Command Syntax) shows it to be so. Therefore, we
can't quite argue that an empty body is specified by

  DATA<CRLF>.<CRLF>

because we'd still be missing a <CRLF>.

Section 4.5.2 (Transparency) contains the contradiction:

,----[ RFC821 4.5.2 ]
| Without some provision for data transparency the character sequence
| "<CRLF>.<CRLF>" ends the mail text and cannot be sent by the user.
| 
| [...]
| 
| 2. When a line of mail text is received by the receiver-SMTP
|    it checks the line.  If the line is composed of a single
|    period it is the end of mail.  If the first character is a
|    period and there are other characters on the line, the first
|    character is deleted
`----

The second point suggests that all it takes is a period and a <CRLF>
on a line of its own to terminate the message. If that's the case,
then an empty, terminated message would be missing the first <CRLF> in
the first paragraph's required termination sequence. The fragment
"When a line of mail text is received" implies that the only way you
know you're at the beginning of a line is if you're just seen a
<CRLF>, or if you're at the start of the data. In the former case,
you'd have the first <CRLF> before the period. In the second case -
the "blank message" case - you wouldn't have the first <CRLF>.

I've tried connecting to two SMTP servers (sendmail and exim) by
telnet and manually submitting a blank message. In each case, the
server accepted a period and newline as an acceptably-terminated
body. That is, I didn't have to hit <return period return>. Existing
implementations therefore seem to to tolerate a terminator that's not
quite what RFC821 requires.

I would like to know both whether the first <CRLF> is required and if
the first <CRLF> is part of the body text or part of the
terminator. The distinction is relevant as I'm writing a state
machine-based stream filter to process input from the DATA
command. (Yes, it's an SMTP server.)

Any insight would be greatly appreciated. (Or, if I should ask
elsewhere, please advise.)

-- 
Steven E. Harris        :: steven.harris@tenzing.com
Tenzing                 :: http://www.tenzing.com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-01-29 18:04 SMTP question (not quite Gnus-related) Steven E. Harris
@ 2001-02-07 17:56 ` Kai Großjohann
  2001-02-08  1:00   ` Daniel Pittman
  0 siblings, 1 reply; 21+ messages in thread
From: Kai Großjohann @ 2001-02-07 17:56 UTC (permalink / raw)
  Cc: ding

On 29 Jan 2001, Steven E. Harris wrote:

> ,----[ RFC821 4.1.1 ]
>| The mail data is terminated by a line containing only a period,
>| that is the character sequence "<CRLF>.<CRLF>" (see Section 4.5.2
>| on Transparency).  This is the end of mail data indication
> `----

Actually, there is a contradiction right here, because `a line
containing only a period' and the character sequence "<CRLF>.<CRLF>"
are not the same thing.

I presume that the first part is the relevant part and the
"<CRLF>.<CRLF>" thing is just an explanation.  Maybe the explanation
makes it clear that there is no whitespace either before or after the
period.  If they had said the character sequence ".<CRLF>", then
"foo.<CRLF>" might be a valid suffix for a message, which is not what
the standard intends.

kai
-- 
Be indiscrete.  Do it continuously.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-07 17:56 ` Kai Großjohann
@ 2001-02-08  1:00   ` Daniel Pittman
  2001-02-08  1:18     ` Steven E. Harris
  2001-02-08 13:22     ` Kai Großjohann
  0 siblings, 2 replies; 21+ messages in thread
From: Daniel Pittman @ 2001-02-08  1:00 UTC (permalink / raw)


On 07 Feb 2001, Kai Großjohann wrote:
> On 29 Jan 2001, Steven E. Harris wrote:
> 
>> ,----[ RFC821 4.1.1 ]
>>| The mail data is terminated by a line containing only a period,
>>| that is the character sequence "<CRLF>.<CRLF>" (see Section 4.5.2
>>| on Transparency).  This is the end of mail data indication
>> `----
> 
> Actually, there is a contradiction right here, because `a line
> containing only a period' and the character sequence "<CRLF>.<CRLF>"
> are not the same thing.

Actually, in SMTP, there isn't. The sequence of characters described
*anywhere* in the SMTP stream is an unambiguous end-of-data marker.

Any other sequence of bytes, including "<LF>.<LF>" and the like is not
because, in SMTP, lines end with (and only with) "<cr><lf>".

> I presume that the first part is the relevant part and the
> "<CRLF>.<CRLF>" thing is just an explanation.  

Actually, it's a details, byte-by-byte explanation of exactly what a
"line containing only a period" looks like, including the framing bytes.

> Maybe the explanation makes it clear that there is no whitespace
> either before or after the period. If they had said the character
> sequence ".<CRLF>", then "foo.<CRLF>" might be a valid suffix for a
> message, which is not what the standard intends.

Specifically, the explanation makes it possible to write a
stream-oriented SMTP `DATA' reader easily. To be strictly conforming,
you read bytes and scan for the sequence described. When you see it and
it alone, terminate the reading of the `DATA' command.

Your guess, though, is right. The intention of the example is to make
certain that there is no ambiguity about exactly what "a line containing
only a period" is.

        Daniel

-- 
Language screens reality as a filter on a camera lens screens light waves.
        -- Casey Miller and Kate Swift, _Words and Women_ (1976)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08  1:00   ` Daniel Pittman
@ 2001-02-08  1:18     ` Steven E. Harris
  2001-02-08  2:03       ` Daniel Pittman
  2001-02-08 13:22     ` Kai Großjohann
  1 sibling, 1 reply; 21+ messages in thread
From: Steven E. Harris @ 2001-02-08  1:18 UTC (permalink / raw)


Daniel Pittman <daniel@rimspace.net> writes:

> Specifically, the explanation makes it possible to write a
> stream-oriented SMTP `DATA' reader easily. To be strictly conforming,
> you read bytes and scan for the sequence described. When you see it and
> it alone, terminate the reading of the `DATA' command.

That's exactly what I've written - two of them, in fact. Each is based
on a different state machine, depending upon differing interpretations
of RFC821. My question still stands, though:

  What about an "empty" message?

Is this supposed to be okay?

,----
| DATA<CRLF>
| .<CRLF>
`----

If so, then where is the terminator for the "DATA" command string?
That first <CRLF> either belongs to the "DATA" as a command
terminator, or to the period as the start of the message
terminator. It can't be both. Most MTAs accept it, though.

I'm now thinking that a completely empty message like the one above
isn't a valid RFC822 message anyway, but it still seems like there's a
hole in RFC821 on this.

-- 
Steven E. Harris        :: steven.harris@tenzing.com
Tenzing                 :: http://www.tenzing.com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08  1:18     ` Steven E. Harris
@ 2001-02-08  2:03       ` Daniel Pittman
  2001-02-08 13:24         ` Kai Großjohann
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel Pittman @ 2001-02-08  2:03 UTC (permalink / raw)


On 07 Feb 2001, Steven E. Harris wrote:
> Daniel Pittman <daniel@rimspace.net> writes:
> 
>> Specifically, the explanation makes it possible to write a
>> stream-oriented SMTP `DATA' reader easily. To be strictly conforming,
>> you read bytes and scan for the sequence described. When you see it
>> and it alone, terminate the reading of the `DATA' command.
> 
> That's exactly what I've written - two of them, in fact. Each is based
> on a different state machine, depending upon differing interpretations
> of RFC821. My question still stands, though:
> 
>   What about an "empty" message?
> 
> Is this supposed to be okay?

It is fine by the fragment you posted. By my reading of it, anyway. 

> ,----
> | DATA<CRLF>
> | .<CRLF>
> `----

That final line is a period on an otherwise empty line, the defined
terminator. Remember that this is legal:

,----
| DATA<crlf>
| text<crlf>
| .<crlf>
`----

So, clearly the initial <crlf> does not need to *exclusively* signal the
end-of-text. As such, making it do double duty for opening and closing
the DATA command seems legal to me.

> If so, then where is the terminator for the "DATA" command string?
> That first <CRLF> either belongs to the "DATA" as a command
> terminator, or to the period as the start of the message
> terminator. It can't be both. 

I don't see any indication in the RFC that the <crlf> sequence at the
end-of-line cannot serve double duty as the start of the indicated octet
sequence for the end-of-data inticatior.

> Most MTAs accept it, though.
> 
> I'm now thinking that a completely empty message like the one above
> isn't a valid RFC822 message anyway, but it still seems like there's a
> hole in RFC821 on this.

*shrug*  I don't think so, you do. That makes it a hole. ;)

        Daniel

-- 
We must put aside the weapons of our minds, 
cross the no man's land come with empty hands. 
Overcome the fear to drop the last defense, 
speak forbidden words reach out beyond ourselves.
        -- Covenant, _Wall Of Sound_



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08  1:00   ` Daniel Pittman
  2001-02-08  1:18     ` Steven E. Harris
@ 2001-02-08 13:22     ` Kai Großjohann
  2001-02-08 17:18       ` Steven E. Harris
  1 sibling, 1 reply; 21+ messages in thread
From: Kai Großjohann @ 2001-02-08 13:22 UTC (permalink / raw)
  Cc: ding

On 08 Feb 2001, Daniel Pittman wrote:

> On 07 Feb 2001, Kai Großjohann wrote:
>> 
>> Actually, there is a contradiction right here, because `a line
>> containing only a period' and the character sequence
>> "<CRLF>.<CRLF>" are not the same thing.
> 
> Actually, in SMTP, there isn't. The sequence of characters described
> *anywhere* in the SMTP stream is an unambiguous end-of-data marker.

In my understanding, a line contains some characters (possibly no
characters) and then an end of line indicator.  By this reading,
"<CRLF>.<CRLF>" is *two* lines: an empty line and a line containing
only a dot.

The dot-stuffing algorithm talks about lines in this way: some
characters (or octets) followed by an EOL indicator.

In "DATA<CRLF>foo<CRLF>.<CRLF>", does the message contain "foo" or
"foo<CRLF>"?  Or even "<CRLF>foo<CRLF>"?

kai
-- 
Be indiscrete.  Do it continuously.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08  2:03       ` Daniel Pittman
@ 2001-02-08 13:24         ` Kai Großjohann
  2001-02-08 17:11           ` Steven E. Harris
  2001-02-09  0:42           ` Daniel Pittman
  0 siblings, 2 replies; 21+ messages in thread
From: Kai Großjohann @ 2001-02-08 13:24 UTC (permalink / raw)
  Cc: ding

On 08 Feb 2001, Daniel Pittman wrote:

> It is fine by the fragment you posted. By my reading of it, anyway. 
> 
>> ,----
>> | DATA<CRLF>
>> | .<CRLF>
>> `----
> 
> That final line is a period on an otherwise empty line, the defined
> terminator.

But one message earlier, you said that "<CRLF>.<CRLF>" is the
terminator.  Now you are saying ".<CRLF>" is the terminator.  Which
one is it?  (I vote for ".<CRLF>", opposing the example in the RFC.)

Presumably, an SMTP server reading the DATA command will also read the
<CRLF> that follows it.  Hence, the <CRLF> is not available for the
subsequent end of data indication anymore.

kai
-- 
Be indiscrete.  Do it continuously.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 13:24         ` Kai Großjohann
@ 2001-02-08 17:11           ` Steven E. Harris
  2001-02-08 17:25             ` Paul Jarc
  2001-02-08 17:33             ` Kai Großjohann
  2001-02-09  0:42           ` Daniel Pittman
  1 sibling, 2 replies; 21+ messages in thread
From: Steven E. Harris @ 2001-02-08 17:11 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> Presumably, an SMTP server reading the DATA command will also read the
> <CRLF> that follows it.  Hence, the <CRLF> is not available for the
> subsequent end of data indication anymore.

Yes! That's exactly my point! The <CRLF> following DATA is already
"gone" by the time my stream starts eating bytes, so I have to either
start assuming I've just seen it, or start waiting for it.

If, as Kai suggests, we should take the terminator to only be
".<CRLF>," then that leaves the question about mail messages that
don't end with a <CRLF>. If the protocol requires an insertion of an
extra <CRLF> to get that period on a line of its own, then really that
extra <CRLF> should be stripped as part of the protocol decoding.

It makes for a messy state machine if you're trying to catch a
five-byte terminator, but want to permit a three-byte terminator right
from your start state. I have it worked out, but it's ugly. RFC821 has
been around long enough that I figured there must have been some
well-known discussion about this.

-- 
Steven E. Harris        :: steven.harris@tenzing.com
Tenzing                 :: http://www.tenzing.com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 13:22     ` Kai Großjohann
@ 2001-02-08 17:18       ` Steven E. Harris
  2001-02-08 17:37         ` Kai Großjohann
  0 siblings, 1 reply; 21+ messages in thread
From: Steven E. Harris @ 2001-02-08 17:18 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

[...]

> By this reading, "<CRLF>.<CRLF>" is *two* lines: an empty line and a
> line containing only a dot.

Well, I think that there can be line content before the first <CRLF>
and still have the <CRLF>.<CRLF> be a valid terminator.

> In "DATA<CRLF>foo<CRLF>.<CRLF>", does the message contain "foo" or
> "foo<CRLF>"?  Or even "<CRLF>foo<CRLF>"?

This is the same question I asked in my previous post a few minutes
ago. I think that the message just contains "foo," because there must
be some way for you to send "foo" and get "foo" back out on the other
side. If you send "foo" and you get back "foo<CRLF>," that would seem
like an intolerable asymmetry in a "transparency" mechanism.

That would then raise another question: If you meant to send
"foo<CRLF>," then would the SMTP encoding be "foo<CRLF><CRLF>.<CRLF>"
or just "foo<CRLF>.<CRLF>"? Who owns that first <CRLF>?!?

I hope it's clear that I'm not trying to nitpick. I must have defined
behavior for this stream, and it's supposed to work "transparently."

-- 
Steven E. Harris        :: steven.harris@tenzing.com
Tenzing                 :: http://www.tenzing.com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 17:11           ` Steven E. Harris
@ 2001-02-08 17:25             ` Paul Jarc
  2001-02-08 17:30               ` Paul Jarc
  2001-02-08 18:02               ` Steven E. Harris
  2001-02-08 17:33             ` Kai Großjohann
  1 sibling, 2 replies; 21+ messages in thread
From: Paul Jarc @ 2001-02-08 17:25 UTC (permalink / raw)


"Steven E. Harris" <steven.harris@tenzing.com> writes:
> It makes for a messy state machine if you're trying to catch a
> five-byte terminator, but want to permit a three-byte terminator right
> from your start state.

Use a different start state.  I.e., start in the state you're normally
in after seeing <CRLF>.  The first <CRLF> in <CRLF>.<CRLF> is, I
believe, considered part of the message.  (Thus, SMTP makes it
impossible to send a message that ends with text other than a line
ending.)


paul



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 17:25             ` Paul Jarc
@ 2001-02-08 17:30               ` Paul Jarc
  2001-02-08 18:02               ` Steven E. Harris
  1 sibling, 0 replies; 21+ messages in thread
From: Paul Jarc @ 2001-02-08 17:30 UTC (permalink / raw)


I wrote:
> The first <CRLF> in <CRLF>.<CRLF> is, I believe, considered part of
> the message.

Unless it's the one immediately following "DATA", that is.  And yes,
this does mean that SMTP is "broken" in that it can't encode arbitrary
data, but remember that it was designed simply for plain,
human-readable text, and not for all the things we use it for today.


paul



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 17:11           ` Steven E. Harris
  2001-02-08 17:25             ` Paul Jarc
@ 2001-02-08 17:33             ` Kai Großjohann
  2001-02-08 18:07               ` Steven E. Harris
  1 sibling, 1 reply; 21+ messages in thread
From: Kai Großjohann @ 2001-02-08 17:33 UTC (permalink / raw)
  Cc: ding

On 08 Feb 2001, Steven E. Harris wrote:

> If, as Kai suggests, we should take the terminator to only be
> ".<CRLF>," then that leaves the question about mail messages that
> don't end with a <CRLF>.

They don't exist.  Here's part of RFC822:

/----
|           A message consists of header fields and, optionally, a body.
|      The  body  is simply a sequence of lines containing ASCII charac-
|      ters.  It is separated from the headers by a null line  (i.e.,  a
|      line with nothing preceding the CRLF).
\----

A line always includes the EOL indicator, and the body is a sequence
of lines, so the last line always includes the EOL indicator.

No?

kai
-- 
Be indiscrete.  Do it continuously.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 17:18       ` Steven E. Harris
@ 2001-02-08 17:37         ` Kai Großjohann
  0 siblings, 0 replies; 21+ messages in thread
From: Kai Großjohann @ 2001-02-08 17:37 UTC (permalink / raw)
  Cc: ding

On 08 Feb 2001, Steven E. Harris wrote:
> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
> 
> [...]
> 
>> By this reading, "<CRLF>.<CRLF>" is *two* lines: an empty line and
>> a line containing only a dot.
> 
> Well, I think that there can be line content before the first <CRLF>
> and still have the <CRLF>.<CRLF> be a valid terminator.

Yes, yes.  This misunderstanding is the reason why they included the
first <CRLF>.  If you write "foo<CRLF>", do you have a line which
contains the 3 characters "foo" plus an EOL?  Well, this depends on
what comes before that.  If there is an "x" before this, the line
contains 4 characters plus EOL.

But I hope you understood what I meant: "<CRLF>.<CRLF>" is an empty
line plus a 1-char line only if it is the start of the whole thing, or
follows a line.

>> In "DATA<CRLF>foo<CRLF>.<CRLF>", does the message contain "foo" or
>> "foo<CRLF>"?  Or even "<CRLF>foo<CRLF>"?
> 
> This is the same question I asked in my previous post a few minutes
> ago. I think that the message just contains "foo," because there
> must be some way for you to send "foo" and get "foo" back out on the
> other side. If you send "foo" and you get back "foo<CRLF>," that
> would seem like an intolerable asymmetry in a "transparency"
> mechanism.
> 
> That would then raise another question: If you meant to send
> "foo<CRLF>," then would the SMTP encoding be
> "foo<CRLF><CRLF>.<CRLF>" or just "foo<CRLF>.<CRLF>"? Who owns that
> first <CRLF>?!?
> 
> I hope it's clear that I'm not trying to nitpick. I must have
> defined behavior for this stream, and it's supposed to work
> "transparently."

Well, RFC 822 says that the body is a sequence of lines, so the body
always ends with a line, and a line ends with an EOL.  So the body
ends with an EOL.

If you want to transmit something which is not a sequence of lines,
you have to add something on top of SMTP.  Maybe MIME is your friend?

kai
-- 
Be indiscrete.  Do it continuously.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 17:25             ` Paul Jarc
  2001-02-08 17:30               ` Paul Jarc
@ 2001-02-08 18:02               ` Steven E. Harris
  2001-02-08 18:20                 ` Paul Jarc
  2001-02-09 12:09                 ` Kai Großjohann
  1 sibling, 2 replies; 21+ messages in thread
From: Steven E. Harris @ 2001-02-08 18:02 UTC (permalink / raw)


prj@po.cwru.edu (Paul Jarc) writes:

> Use a different start state.  I.e., start in the state you're normally
> in after seeing <CRLF>.  The first <CRLF> in <CRLF>.<CRLF> is, I
> believe, considered part of the message.

This is maddening! If the first <CRLF> if part of the body (don't get
me wrong - I like the idea), then they should not define the
*terminator* to be "<CRLF>.<CRLF>." They should say something more
like, "The terminator is .<CRLF>, but only if immediately preceded by
<CRLF> or as the first three bytes of the stream."

> (Thus, SMTP makes it impossible to send a message that ends with
> text other than a line ending.)

Okay, so then any sending MUA effectively augments your message if
it's lacking the final <CRLF>? I didn't know that, but it could make
sense.

-- 
Steven E. Harris        :: steven.harris@tenzing.com
Tenzing                 :: http://www.tenzing.com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 17:33             ` Kai Großjohann
@ 2001-02-08 18:07               ` Steven E. Harris
  2001-02-09 12:11                 ` Kai Großjohann
  0 siblings, 1 reply; 21+ messages in thread
From: Steven E. Harris @ 2001-02-08 18:07 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> A line always includes the EOL indicator, and the body is a sequence
> of lines, so the last line always includes the EOL indicator.
> 
> No?

Okay, that sounds reasonable. So an empty body doesn't *need"
"<CRLF>.<CRLF>" to terminate. It only needs ".<CRLF>" to terminate. If
the body isn't empty, we leave the first <CRLF> as part of the body
and discard the ".<CRLF>". That would match the behavior I've seen on
MTAs I've tested with. This interpretation makes sense, but it's not
immediately apparent from reading RFC821.

-- 
Steven E. Harris        :: steven.harris@tenzing.com
Tenzing                 :: http://www.tenzing.com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 18:02               ` Steven E. Harris
@ 2001-02-08 18:20                 ` Paul Jarc
  2001-02-09 12:09                 ` Kai Großjohann
  1 sibling, 0 replies; 21+ messages in thread
From: Paul Jarc @ 2001-02-08 18:20 UTC (permalink / raw)


"Steven E. Harris" <steven.harris@tenzing.com> writes:
> prj@po.cwru.edu (Paul Jarc) writes:
> > Use a different start state.  I.e., start in the state you're normally
> > in after seeing <CRLF>.  The first <CRLF> in <CRLF>.<CRLF> is, I
> > believe, considered part of the message.
> 
> This is maddening! If the first <CRLF> if part of the body (don't get
> me wrong - I like the idea), then they should not define the
> *terminator* to be "<CRLF>.<CRLF>."

I don't think they intended to say that <CRLF>.<CRLF> is itself the
terminator.  I think the RFC was written by people who already knew
what SMTP looked like, and who didn't try to get inside the heads of
people who didn't.  So it doesn't quite tell you what SMTP is unless
you already know.  If you said that SMTP was badly designed and its
specification confusingly worded, I would agree with you.

> They should say something more like, "The terminator is .<CRLF>, but
> only if immediately preceded by <CRLF> or as the first three bytes
> of the stream."

Yes, that would be clearer, and would match (I think) the *intended*
meaning of the existing words.

> Okay, so then any sending MUA effectively augments your message if
> it's lacking the final <CRLF>? I didn't know that, but it could make
> sense.

Yes.  I don't know that the RFC says a sender must do this, but this
(or perhaps rejection) is the only sane way to handle such a message.


paul



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 13:24         ` Kai Großjohann
  2001-02-08 17:11           ` Steven E. Harris
@ 2001-02-09  0:42           ` Daniel Pittman
  1 sibling, 0 replies; 21+ messages in thread
From: Daniel Pittman @ 2001-02-09  0:42 UTC (permalink / raw)
  Cc: ding

On 08 Feb 2001, Kai Großjohann wrote:
> On 08 Feb 2001, Daniel Pittman wrote:
> 
>> It is fine by the fragment you posted. By my reading of it, anyway. 
>> 
>>> ,----
>>> | DATA<CRLF>
>>> | .<CRLF>
>>> `----
>> 
>> That final line is a period on an otherwise empty line, the defined
>> terminator.
> 
> But one message earlier, you said that "<CRLF>.<CRLF>" is the
> terminator.  Now you are saying ".<CRLF>" is the terminator.  Which
> one is it?  (I vote for ".<CRLF>", opposing the example in the RFC.)

Ah. I have the same unclearness as the RFC. So:

DATA <CR> <LF> . <CR> <LF>
--------------                  This is the DATA command.
     ---------------------      This is the end-of-data marker.

> Presumably, an SMTP server reading the DATA command will also read the
> <CRLF> that follows it.  Hence, the <CRLF> is not available for the
> subsequent end of data indication anymore.

That's not how I read the RFC. Specifically, I see the notes on
'<crlf>.<crlf>' as being a hint about what to look for.

Anyway, if you are writing an SMTP sender, avoid sending this (just in
case) and if you write an SMTP receiver, accept it.

Because there *is* ambiguity in the RFC - if there wasn't, we wouldn't
be debating it. ;)

        Daniel

-- 
There is no happiness in having or in getting, but only in giving.
        -- Henry Drummond



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 18:02               ` Steven E. Harris
  2001-02-08 18:20                 ` Paul Jarc
@ 2001-02-09 12:09                 ` Kai Großjohann
  2001-02-09 17:33                   ` Steven E. Harris
  1 sibling, 1 reply; 21+ messages in thread
From: Kai Großjohann @ 2001-02-09 12:09 UTC (permalink / raw)
  Cc: ding

On 08 Feb 2001, Steven E. Harris wrote:

> This is maddening! If the first <CRLF> if part of the body (don't
> get me wrong - I like the idea), then they should not define the
> *terminator* to be "<CRLF>.<CRLF>." They should say something more
> like, "The terminator is .<CRLF>, but only if immediately preceded
> by <CRLF> or as the first three bytes of the stream."

I think they did.  Note that they said that the terminator is a line
containing only one character, a period.  Only as an example did they
provide the "<CRLF>.<CRLF>" character sequence.

kai
-- 
Be indiscrete.  Do it continuously.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-08 18:07               ` Steven E. Harris
@ 2001-02-09 12:11                 ` Kai Großjohann
  2001-02-09 17:26                   ` Steven E. Harris
  0 siblings, 1 reply; 21+ messages in thread
From: Kai Großjohann @ 2001-02-09 12:11 UTC (permalink / raw)
  Cc: ding

On 08 Feb 2001, Steven E. Harris wrote:

> Okay, that sounds reasonable. So an empty body doesn't *need"
> "<CRLF>.<CRLF>" to terminate. It only needs ".<CRLF>" to
> terminate. If the body isn't empty, we leave the first <CRLF> as
> part of the body and discard the ".<CRLF>". That would match the
> behavior I've seen on MTAs I've tested with. This interpretation
> makes sense, but it's not immediately apparent from reading RFC821.

I don't think you should treat the <CRLF> differently for empty
messages.  Here's how to send an empty message:

DATA<CRLF>
.<CRLF>

Here's a nonempty message:

DATA<CRLF>
foo<CRLF>
.<CRLF>

The message content is one line, containing the characters "foo" plus
the EOL that comes after it.

kai
-- 
Be indiscrete.  Do it continuously.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-09 12:11                 ` Kai Großjohann
@ 2001-02-09 17:26                   ` Steven E. Harris
  0 siblings, 0 replies; 21+ messages in thread
From: Steven E. Harris @ 2001-02-09 17:26 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> I don't think you should treat the <CRLF> differently for empty
> messages.  Here's how to send an empty message:
> 
> DATA<CRLF>
> .<CRLF>
> 
> Here's a nonempty message:
> 
> DATA<CRLF>
> foo<CRLF>
> .<CRLF>
> 
> The message content is one line, containing the characters "foo" plus
> the EOL that comes after it.

That's what I was trying to say, but I'm stuck seeing it from the
point of view of my problem space. For now we're not sending messages;
we're only consuming them. I'm writing an SMTP "DATA" stream filter
that only gets its first byte *after* the "DATA<CRLF>" is consumed by
another part of the system. One part dispatches on "command lines," so
after seeing the "DATA" command on a line, it fires off the "DATA
handler" part, which is what I'm refining. As you can see, I never get
the <CRLF> after the "DATA" command. That's okay, though, with our
latest consensus on RFC821.

My solution (based upon this interpretation) uses a state machine
that's sensitive to the "beginning of line" state. Only there do we
consider whether we're encountering a terminator. The saving grace is
that the first <CRLF> in the "<CRLF>.<CRLF>" terminator can be left as
part of the body. That means that the only role the first <CRLF> plays
is to return the state machine to the "beginning of line" state.

I had two other interpretations of RFC821 and two other corresponding
state machine implementations. Just for completeness, I'll explain the
differences from our latest consensus.

One of them insisted upon "<CRLF>.<CRLF>" as the complete terminator,
even from the start state. The first <CRLF> is considered part of the
terminator, not the body, so we had a four-byte lookahead scheme. It
only requires caching one byte for unwinding. The bad part of this
scheme is that it doesn't tolerate "DATA<CRLF>.<CRLF>," because it's
really looking for "DATA<CRLF><CRLF>.<CRLF>."

The second one was similar to the last one described, but tolerated a
three-byte terminator (".<CRLF>") only from the start state. It could
therefore accept "DATA<CRLF>.<CRLF>" as a valid stream. The state
machine for this version was an augmented version of the prior, with a
few extra states for the potential initial three-byte terminator. It
was tractable, but a little ugly.

It's apparent now that all of this discussion would have been much
simpler if:

1. I wasn't faced with the problem that the first <CRLF> after the
   "DATA" string is clipped from my view.

2. I hadn't assumed that the first <CRLF> in "<CRLF>.<CRLF>" is part
   of the terminator rather than the body. This assumption required
   unwinding the first <CRLF> in an incomplete terminator.

One poster suggested simply "starting out as though you've just seen
the first <CRLF>." The problem with that approach is unwinding. If you
saw ".<CR>a" as the first three bytes of the stream, you'd need to
unwind just the <CR>, then emit the "a." As an alternate scenario,
consider that you're mid-way through a more complete body and you
encounter "<CRLF>.<CR>a." You then need to unwind "<CRLF><CR>," then
emit the "a." The unwinding involves four steps rather than two in the
first scenario. It's a different path through the state machine, so
you either need different states or some transition control flags to
guide you through the different paths.

I can share this code (it's Java) if anyone is interested in seeing
some of these variations.

-- 
Steven E. Harris        :: steven.harris@tenzing.com
Tenzing                 :: http://www.tenzing.com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SMTP question (not quite Gnus-related)
  2001-02-09 12:09                 ` Kai Großjohann
@ 2001-02-09 17:33                   ` Steven E. Harris
  0 siblings, 0 replies; 21+ messages in thread
From: Steven E. Harris @ 2001-02-09 17:33 UTC (permalink / raw)


Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:

> Only as an example did they provide the "<CRLF>.<CRLF>" character
> sequence.

Re-reading it with this point of view, I can see that. The one
"example" that I found frustrating was the last one in 3.1:

,----[ RFC821 (section 3.1) ]
|  S: MAIL FROM:<Smith@Alpha.ARPA>
|  R: 250 OK
| 
|  S: RCPT TO:<Jones@Beta.ARPA>
|  R: 250 OK
| 
|  S: RCPT TO:<Green@Beta.ARPA>
|  R: 550 No such user here
| 
|  S: RCPT TO:<Brown@Beta.ARPA>
|  R: 250 OK
| 
|  S: DATA
|  R: 354 Start mail input; end with <CRLF>.<CRLF>
|  S: Blah blah blah...
|  S: ...etc. etc. etc.
|  S: <CRLF>.<CRLF>
|  R: 250 OK
`----

See that "S: <CRLF>.<CRLF>" near then end? I kept thinking, "Isn't the
first <CRLF> implicit in the fact that it's a new line after the
'etc.' line? Hmm, maybe the really do mean that you need all five
bytes as the terminator." Our discussion now makes me see this example
as an error, perhaps due to being overly verbose.

-- 
Steven E. Harris        :: steven.harris@tenzing.com
Tenzing                 :: http://www.tenzing.com



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2001-02-09 17:33 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-29 18:04 SMTP question (not quite Gnus-related) Steven E. Harris
2001-02-07 17:56 ` Kai Großjohann
2001-02-08  1:00   ` Daniel Pittman
2001-02-08  1:18     ` Steven E. Harris
2001-02-08  2:03       ` Daniel Pittman
2001-02-08 13:24         ` Kai Großjohann
2001-02-08 17:11           ` Steven E. Harris
2001-02-08 17:25             ` Paul Jarc
2001-02-08 17:30               ` Paul Jarc
2001-02-08 18:02               ` Steven E. Harris
2001-02-08 18:20                 ` Paul Jarc
2001-02-09 12:09                 ` Kai Großjohann
2001-02-09 17:33                   ` Steven E. Harris
2001-02-08 17:33             ` Kai Großjohann
2001-02-08 18:07               ` Steven E. Harris
2001-02-09 12:11                 ` Kai Großjohann
2001-02-09 17:26                   ` Steven E. Harris
2001-02-09  0:42           ` Daniel Pittman
2001-02-08 13:22     ` Kai Großjohann
2001-02-08 17:18       ` Steven E. Harris
2001-02-08 17:37         ` Kai Großjohann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).