The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.)
@ 2024-05-20 13:06 Douglas McIlroy
  2024-05-20 13:14 ` [TUHS] " arnold
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Douglas McIlroy @ 2024-05-20 13:06 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

I'm surprised by nonchalance about bad inputs evoking bad program behavior.
That attitude may have been excusable 50 years ago. By now, though, we have
seen so much malicious exploitation of open avenues of "undefined behavior"
that we can no longer ignore bugs that "can't happen when using the tool
correctly". Mature software should not brook incorrect usage.

"Bailing out near line 1" is a sign of defensive precautions. Crashes and
unjustified output betray their absence.

I commend attention to the LangSec movement, which advocates for rigorously
enforced separation between legal and illegal inputs.

Doug

[-- Attachment #2: Type: text/html, Size: 752 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: A fuzzy awk. (Was: The 'usage: ...' message.)
  2024-05-20 13:06 [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Douglas McIlroy
@ 2024-05-20 13:14 ` arnold
  2024-05-20 14:00   ` G. Branden Robinson
  2024-05-20 13:25 ` Chet Ramey
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: arnold @ 2024-05-20 13:14 UTC (permalink / raw)
  To: tuhs, douglas.mcilroy

Perhaps I should not respond to this immediately. But:

Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:

> I'm surprised by nonchalance about bad inputs evoking bad program behavior.
> That attitude may have been excusable 50 years ago. By now, though, we have
> seen so much malicious exploitation of open avenues of "undefined behavior"
> that we can no longer ignore bugs that "can't happen when using the tool
> correctly". Mature software should not brook incorrect usage.

It's not nonchalance, not at all!

The current behavior is to die on the first syntax error, instead of
trying to be "helpful" by continuing to try to parse the program in the
hope of reporting other errors.

> "Bailing out near line 1" is a sign of defensive precautions. Crashes and
> unjustified output betray their absence.

The crashes came because errors cascaded.  I don't see a reason to spend
valuable, *personal* time on adding defenses *where they aren't needed*.

A steel door on your bedroom closet does no good if your front door
is made of balsa wood. My change was to stop the badness at the
front door.

> I commend attention to the LangSec movement, which advocates for rigorously
> enforced separation between legal and illegal inputs.

Illegal input, in gawk, as far as I know, should always cause a syntax
error report and an immediate exit.

If it doesn't, that is a bug, and I'll be happy to try to fix it.

I hope that clarifies things.

Arnold

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: A fuzzy awk. (Was: The 'usage: ...' message.)
  2024-05-20 13:06 [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Douglas McIlroy
  2024-05-20 13:14 ` [TUHS] " arnold
@ 2024-05-20 13:25 ` Chet Ramey
  2024-05-20 13:41   ` [TUHS] Re: A fuzzy awk Ralph Corderoy
  2024-05-20 13:54 ` Ralph Corderoy
  2024-05-20 16:06 ` [TUHS] Re: A fuzzy awk. (Was: The 'usage: ...' message.) Paul Winalski
  3 siblings, 1 reply; 18+ messages in thread
From: Chet Ramey @ 2024-05-20 13:25 UTC (permalink / raw)
  To: Douglas McIlroy, TUHS main list


[-- Attachment #1.1: Type: text/plain, Size: 465 bytes --]

On 5/20/24 9:06 AM, Douglas McIlroy wrote:
> I'm surprised by nonchalance about bad inputs evoking bad program behavior.

I think the claim is that it's better to stop immediately with an error
on invalid input rather than guess at the user's intent and try to go on.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: A fuzzy awk.
  2024-05-20 13:25 ` Chet Ramey
@ 2024-05-20 13:41   ` Ralph Corderoy
  2024-05-20 14:26     ` Chet Ramey
  2024-05-22 13:44     ` arnold
  0 siblings, 2 replies; 18+ messages in thread
From: Ralph Corderoy @ 2024-05-20 13:41 UTC (permalink / raw)
  To: TUHS main list

Hi Chet,

> Doug wrote:
> > I'm surprised by nonchalance about bad inputs evoking bad program
> > behavior.
>
> I think the claim is that it's better to stop immediately with an
> error on invalid input rather than guess at the user's intent and try
> to go on.

That aside, having made the decision to patch up the input so more
punched cards are consumed, the patch should be bug free.

Say it's inserting a semicolon token for pretence.  It should have
initialised source-file locations just as if it were real.  Not an
uninitialised pointer to a source filename so a later dereference
failed.

I can see an avalanche of errors in an earlier gawk caused problems, but
each time there would have been a first patch of the input which made
a mistake causing the pebble to start rolling.  My understanding is that
there was potentially a lot of these and rather than fix them it was
more productive of the limited time to stop patching the input.  Then
the code which patched could be deleted, getting rid of the buggy bits
along the way?

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: A fuzzy awk.
  2024-05-20 13:06 [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Douglas McIlroy
  2024-05-20 13:14 ` [TUHS] " arnold
  2024-05-20 13:25 ` Chet Ramey
@ 2024-05-20 13:54 ` Ralph Corderoy
  2024-05-20 15:39   ` [TUHS] OT: LangSec (Re: A fuzzy awk.) Åke Nordin
  2024-05-20 16:06 ` [TUHS] Re: A fuzzy awk. (Was: The 'usage: ...' message.) Paul Winalski
  3 siblings, 1 reply; 18+ messages in thread
From: Ralph Corderoy @ 2024-05-20 13:54 UTC (permalink / raw)
  To: TUHS main list

Hi,

Doug wrote:
> I commend attention to the LangSec movement, which advocates for
> rigorously enforced separation between legal and illegal inputs.

    https://langsec.org

   ‘The Language-theoretic approach (LangSec) regards the Internet
    insecurity epidemic as a consequence of ‘ad hoc’ programming of
    input handling at all layers of network stacks, and in other kinds
    of software stacks.  LangSec posits that the only path to
    trustworthy software that takes untrusted inputs is treating all
    valid or expected inputs as a formal language, and the respective
    input-handling routines as a ‘recognizer’ for that language.
    The recognition must be feasible, and the recognizer must match the
    language in required computation power.

   ‘When input handling is done in ad hoc way, the ‘de facto’
    recognizer, i.e. the input recognition and validation code ends up
    scattered throughout the program, does not match the programmers'
    assumptions about safety and validity of data, and thus provides
    ample opportunities for exploitation.  Moreover, for complex input
    languages the problem of full recognition of valid or expected
    inputs may be *undecidable*, in which case no amount of
    input-checking code or testing will suffice to secure the program.
    Many popular protocols and formats fell into this trap, the
    empirical fact with which security practitioners are all too
    familiar.

   ‘LangSec helps draw the boundary between protocols and API designs
    that can and cannot be secured and implemented securely, and charts
    a way to building truly trustworthy protocols and systems.  A longer
    summary of LangSec in this USENIX Security BoF hand-out, and in the
    talks, articles, and papers below.’

That does look interesting; I'd not heard of it.

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: A fuzzy awk. (Was: The 'usage: ...' message.)
  2024-05-20 13:14 ` [TUHS] " arnold
@ 2024-05-20 14:00   ` G. Branden Robinson
  0 siblings, 0 replies; 18+ messages in thread
From: G. Branden Robinson @ 2024-05-20 14:00 UTC (permalink / raw)
  To: tuhs; +Cc: douglas.mcilroy, groff

[-- Attachment #1: Type: text/plain, Size: 6974 bytes --]

Hi folks,

At 2024-05-20T07:14:07-0600, arnold@skeeve.com wrote:
> Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:
> > I'm surprised by nonchalance about bad inputs evoking bad program
> > behavior.  That attitude may have been excusable 50 years ago. By
> > now, though, we have seen so much malicious exploitation of open
> > avenues of "undefined behavior" that we can no longer ignore bugs
> > that "can't happen when using the tool correctly". Mature software
> > should not brook incorrect usage.
> 
> It's not nonchalance, not at all!
> 
> The current behavior is to die on the first syntax error, instead of
> trying to be "helpful" by continuing to try to parse the program in
> the hope of reporting other errors.
[...]
> The crashes came because errors cascaded.  I don't see a reason to
> spend valuable, *personal* time on adding defenses *where they aren't
> needed*.
> 
> A steel door on your bedroom closet does no good if your front door is
> made of balsa wood. My change was to stop the badness at the front
> door.
> 
> > I commend attention to the LangSec movement, which advocates for
> > rigorously enforced separation between legal and illegal inputs.
> 
> Illegal input, in gawk, as far as I know, should always cause a syntax
> error report and an immediate exit.
> 
> If it doesn't, that is a bug, and I'll be happy to try to fix it.
> 
> I hope that clarifies things.

For grins, and for a data point from elsewhere in GNU-land, GNU troff is
pretty robust to this sort of thing.  Much as I might like to boast of
having improved it in this area, it appears to have already come with
iron long johns courtesy of James Clark and/or Werner Lemberg.  I threw
troff its own ELF executable as a crude fuzz test some years ago, and I
don't recall needing to fix anything except unhelpfully vague diagnostic
messages (a phenomenon I am predisposed to observe anyway).

I did notice today that in one case we were spewing back out unprintable
characters (newlines, character codes > 127) _in_ one (but only one) of
the diagnostic messages, and while that's ugly, it's not an obvious
exploitation vector to me.

Nevertheless I decided to fix it and it will be in my next push.

So here's the mess you get when feeding GNU troff to itself.  No GNU
troff since before 1.22.3 core dumps on this sort of unprepossessing
input.

$ ./build/test-groff -Ww -z /usr/bin/troff 2>&1 | sed 's/:[0-9]\+:/:/' | sort | uniq -c
     17 troff:/usr/bin/troff: error: a backspace character is not allowed in an escape sequence parameter
     10 troff:/usr/bin/troff: error: a space character is not allowed in an escape sequence parameter
      1 troff:/usr/bin/troff: error: a space is not allowed as a starting delimiter
      1 troff:/usr/bin/troff: error: a special character is not allowed in an identifier
      1 troff:/usr/bin/troff: error: character '-' is not allowed as a starting delimiter
      1 troff:/usr/bin/troff: error: invalid argument ')' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid argument 'c' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid argument 'l' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid argument 'm' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid positional argument number ','
      3 troff:/usr/bin/troff: error: invalid positional argument number '<'
      3 troff:/usr/bin/troff: error: invalid positional argument number 'D'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'E'
     10 troff:/usr/bin/troff: error: invalid positional argument number 'H'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'Hi'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'I'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'I9'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'L'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'LD'
      2 troff:/usr/bin/troff: error: invalid positional argument number 'LL'
      5 troff:/usr/bin/troff: error: invalid positional argument number 'LT'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'M'
      4 troff:/usr/bin/troff: error: invalid positional argument number 'P'
      5 troff:/usr/bin/troff: error: invalid positional argument number 'X'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'dH'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'h'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'l'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'p'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'x'
      3 troff:/usr/bin/troff: error: invalid positional argument number '|'
     35 troff:/usr/bin/troff: error: invalid positional argument number (unprintable)
      3 troff:/usr/bin/troff: error: unterminated transparent embedding escape sequence

The second to last (and most frequent) message in the list above is the
"new" one.  Here's the diff.

diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 8d828a01e..596ecf6f9 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -4556,10 +4556,21 @@ static void interpolate_arg(symbol nm)
   }
   else {
     const char *p;
-    for (p = s; *p && csdigit(*p); p++)
-      ;
-    if (*p)
-      copy_mode_error("invalid positional argument number '%1'", s);
+    bool is_valid = true;
+    bool is_printable = true;
+    for (p = s; *p != 0 /* nullptr */; p++) {
+      if (!csdigit(*p))
+       is_valid = false;
+      if (!csprint(*p))
+       is_printable = false;
+    }
+    if (!is_valid) {
+      const char msg[] = "invalid positional argument number";
+      if (is_printable)
+       copy_mode_error("%1 '%2'", msg, s);
+      else
+       copy_mode_error("%1 (unprintable)", msg);
+    }
     else
       input_stack::push(input_stack::get_arg(atoi(s)));
   }

GNU troff may have started out with an easier task in this area than an
AWK or a shell had; its syntax is not block-structured in the same way,
so parser state recovery is easier, and it's _inherently_ a filter.

The only fruitful fuzz attack on groff I can recall was upon indexed
bibliographic database files, which are a binary format.  This went
unresolved for several years[1] but I fixed it for groff 1.23.0.

https://bugs.debian.org/716109

Regards,
Branden

[1] I think I understand the low triage priority.  Few groff users use
    the refer(1) preprocessor, and of those who do, even fewer find
    modern systems so poorly performant at text scanning that they
    desire the services of indxbib(1) to speed lookup of bibliographic
    entries.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: A fuzzy awk.
  2024-05-20 13:41   ` [TUHS] Re: A fuzzy awk Ralph Corderoy
@ 2024-05-20 14:26     ` Chet Ramey
  2024-05-22 13:44     ` arnold
  1 sibling, 0 replies; 18+ messages in thread
From: Chet Ramey @ 2024-05-20 14:26 UTC (permalink / raw)
  To: Ralph Corderoy, TUHS main list


[-- Attachment #1.1: Type: text/plain, Size: 2043 bytes --]

On 5/20/24 9:41 AM, Ralph Corderoy wrote:
> Hi Chet,
> 
>> Doug wrote:
>>> I'm surprised by nonchalance about bad inputs evoking bad program
>>> behavior.
>>
>> I think the claim is that it's better to stop immediately with an
>> error on invalid input rather than guess at the user's intent and try
>> to go on.
> 
> That aside, having made the decision to patch up the input so more
> punched cards are consumed, the patch should be bug free.
> 
> Say it's inserting a semicolon token for pretence.  It should have
> initialised source-file locations just as if it were real.  Not an
> uninitialised pointer to a source filename so a later dereference
> failed.
> 
> I can see an avalanche of errors in an earlier gawk caused problems, but
> each time there would have been a first patch of the input which made
> a mistake causing the pebble to start rolling.  My understanding is that
> there was potentially a lot of these and rather than fix them it was
> more productive of the limited time to stop patching the input.  Then
> the code which patched could be deleted, getting rid of the buggy bits
> along the way?

Maybe we're talking about the same thing. My impression is that at
each point there was more than one potential token to insert and go on,
and gawk chose one (probably the most common one), in the hopes that it
would be able to report as many errors as possible. There's always the
chance you'll be wrong there.

(I have no insight into the actual nature of these issues, or the actual
corruption that caused the crashes, so take the next with skepticism.)

And then rather than go back and modify other state after inserting
this token -- which gawk did not do -- for the sole purpose of making
this guessing more crash-resistant, Arnold chose a different approach:
exit on invalid input.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] OT: LangSec (Re: A fuzzy awk.)
  2024-05-20 13:54 ` Ralph Corderoy
@ 2024-05-20 15:39   ` Åke Nordin
  2024-05-20 16:09     ` [TUHS] " Ben Kallus
  0 siblings, 1 reply; 18+ messages in thread
From: Åke Nordin @ 2024-05-20 15:39 UTC (permalink / raw)
  To: tuhs

On 2024-05-20 15:54, Ralph Corderoy wrote:

> Doug wrote:
>> I commend attention to the LangSec movement, which advocates for
>> rigorously enforced separation between legal and illegal inputs.
>     https://langsec.org
>
>    ‘The Language-theoretic approach (LangSec) regards the Internet
>     insecurity epidemic as a consequence of ‘ad hoc’ programming of
>     input handling at all layers of network stacks, and in other kinds
>     of software stacks.  LangSec posits that the only path to
>     trustworthy software that takes untrusted inputs is treating all
>     valid or expected inputs as a formal language, and the respective
>     input-handling routines as a ‘recognizer’ for that language.

. . .

>    ‘LangSec helps draw the boundary between protocols and API designs
>     that can and cannot be secured and implemented securely, and charts
>     a way to building truly trustworthy protocols and systems.  A longer
>     summary of LangSec in this USENIX Security BoF hand-out, and in the
>     talks, articles, and papers below.’

Yes, it's an interesting concept. Those *n?x tools that have
lex/yacc frontends are probably closer to this than the average
hack.

It may become hard to reconcile this with the robustness principle 
(Be conservative in what you send, be liberal in what you accept)
that Jon Postel popularized. Maybe it becomes necessary, though.

-- 
Åke Nordin <ake.nordin@netia.se>, resident Net/Lunix/telecom geek.
Netia Data AB, Stockholm SWEDEN *46#7O466OI99#


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: A fuzzy awk. (Was: The 'usage: ...' message.)
  2024-05-20 13:06 [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Douglas McIlroy
                   ` (2 preceding siblings ...)
  2024-05-20 13:54 ` Ralph Corderoy
@ 2024-05-20 16:06 ` Paul Winalski
  3 siblings, 0 replies; 18+ messages in thread
From: Paul Winalski @ 2024-05-20 16:06 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

On Mon, May 20, 2024 at 9:17 AM Douglas McIlroy <
douglas.mcilroy@dartmouth.edu> wrote:

> I'm surprised by nonchalance about bad inputs evoking bad program
> behavior. That attitude may have been excusable 50 years ago. By now,
> though, we have seen so much malicious exploitation of open avenues of
> "undefined behavior" that we can no longer ignore bugs that "can't happen
> when using the tool correctly". Mature software should not brook incorrect
> usage.
>
> Accepting bad inputs can also lead to security issues.  The data breaches
from SQL-based attacks are a modern case in point.

IMO, as a programmer you owe it to your users to do your best to detect bad
input and to handle it in a graceful fashion.  Nothing is more frustrating
to a user than to have a program blow up in their face with a seg fault, or
even worse, simply exit silently.

As the DEC compiler team's expert on object files, I was called on to add
object file support to a compiler back end originally targeted to VMS
only.  I inherited support of the object file generator for Unix COFF and
later wrote the support for Microsoft PECOFF and ELF.  When our group was
bought by Intel I did the object file support for Apple OS X MACH-O in the
Intel compiler back end.

I found that the folks who write linkers are particularly lazy about error
checking and error handling.  They assume that the compiler always
generates clean object files.  That's OK I suppose if the compiler and
linker people are in the same organization.  If the linker falls over you
can just go down the hall and have the linker developer debug the issue and
tell you where you went wrong.  But that doesn't work when they work for
different companies and the compiler person doesn't have access to the
linker sources.  I ran into a lot of cases where my buggy object file
caused the linker to seg fault or, even worse, simply exit without an error
message.

I ended up writing a very thorough formatted dumper for each object file
format that did very thorough checking for proper syntax and as many
semantic errors (e.g., symbol table index number out of range) as I could.

-Paul W.

[-- Attachment #2: Type: text/html, Size: 2610 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: OT: LangSec (Re: A fuzzy awk.)
  2024-05-20 15:39   ` [TUHS] OT: LangSec (Re: A fuzzy awk.) Åke Nordin
@ 2024-05-20 16:09     ` Ben Kallus
  2024-05-20 20:02       ` John Levine
  0 siblings, 1 reply; 18+ messages in thread
From: Ben Kallus @ 2024-05-20 16:09 UTC (permalink / raw)
  To: Åke Nordin; +Cc: tuhs

> It may become hard to reconcile this with the robustness principle
> (Be conservative in what you send, be liberal in what you accept)
> that Jon Postel popularized. Maybe it becomes necessary, though.

Yes; the LangSec people essentially reject the robustness principle.

See https://langsec.org/papers/postel-patch.pdf

-Ben

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: OT: LangSec (Re: A fuzzy awk.)
  2024-05-20 16:09     ` [TUHS] " Ben Kallus
@ 2024-05-20 20:02       ` John Levine
  2024-05-20 20:11         ` Larry McVoy
  0 siblings, 1 reply; 18+ messages in thread
From: John Levine @ 2024-05-20 20:02 UTC (permalink / raw)
  To: tuhs; +Cc: benjamin.p.kallus.gr

It appears that Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu> said:
>> It may become hard to reconcile this with the robustness principle
>> (Be conservative in what you send, be liberal in what you accept)
>> that Jon Postel popularized. Maybe it becomes necessary, though.
>
>Yes; the LangSec people essentially reject the robustness principle.
>
>See https://langsec.org/papers/postel-patch.pdf

On the contrary, they actually understand it.

Postel was widely misunderstood to say that you should try to accept
arbitrary garbage. People who knew him tell me that he meant to be
liberal when the spec is ambiguous, not to allow stuff that is just
wrong. As their quote from RFC 1122 points out, he also said you
should be prepared for arbitrary garbage so you can reject it.

R's,
John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: OT: LangSec (Re: A fuzzy awk.)
  2024-05-20 20:02       ` John Levine
@ 2024-05-20 20:11         ` Larry McVoy
  2024-05-20 21:00           ` Ben Kallus
  0 siblings, 1 reply; 18+ messages in thread
From: Larry McVoy @ 2024-05-20 20:11 UTC (permalink / raw)
  To: John Levine; +Cc: tuhs, benjamin.p.kallus.gr

On Mon, May 20, 2024 at 04:02:26PM -0400, John Levine wrote:
> It appears that Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu> said:
> >> It may become hard to reconcile this with the robustness principle
> >> (Be conservative in what you send, be liberal in what you accept)
> >> that Jon Postel popularized. Maybe it becomes necessary, though.
> >
> >Yes; the LangSec people essentially reject the robustness principle.
> >
> >See https://langsec.org/papers/postel-patch.pdf
> 
> On the contrary, they actually understand it.
> 
> Postel was widely misunderstood to say that you should try to accept
> arbitrary garbage. People who knew him tell me that he meant to be
> liberal when the spec is ambiguous, not to allow stuff that is just
> wrong. As their quote from RFC 1122 points out, he also said you
> should be prepared for arbitrary garbage so you can reject it.

Yeah, I read the pdf and I took away the same thing as John.
-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: OT: LangSec (Re: A fuzzy awk.)
  2024-05-20 20:11         ` Larry McVoy
@ 2024-05-20 21:00           ` Ben Kallus
  2024-05-20 21:03             ` John R Levine
  2024-05-20 21:14             ` Larry McVoy
  0 siblings, 2 replies; 18+ messages in thread
From: Ben Kallus @ 2024-05-20 21:00 UTC (permalink / raw)
  To: Larry McVoy; +Cc: John Levine, tuhs

What I meant was that the LangSec people reject the robustness
principle as it is commonly understood (i.e., make a "reasonable"
guess when receiving garbage), not necessarily that their view is
incompatible with Postel's original vision. This interpretation of the
principle is pretty widespread; take a look at the Nginx mailing list
if you have any doubt. I attribute this to the same phenomenon that
inverted the meaning of REST.

-Ben

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: OT: LangSec (Re: A fuzzy awk.)
  2024-05-20 21:00           ` Ben Kallus
@ 2024-05-20 21:03             ` John R Levine
  2024-05-20 21:14             ` Larry McVoy
  1 sibling, 0 replies; 18+ messages in thread
From: John R Levine @ 2024-05-20 21:03 UTC (permalink / raw)
  To: Ben Kallus; +Cc: tuhs

> What I meant was that the LangSec people reject the robustness
> principle as it is commonly understood (i.e., make a "reasonable"
> guess when receiving garbage), not necessarily that their view is
> incompatible with Postel's original vision. This interpretation of the
> principle is pretty widespread; take a look at the Nginx mailing list
> if you have any doubt. I attribute this to the same phenomenon that
> inverted the meaning of REST.

Oh, OK, no disagreement there.  I'm as tired as you are of people invoking 
Postel to excuse slovenly code.

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: OT: LangSec (Re: A fuzzy awk.)
  2024-05-20 21:00           ` Ben Kallus
  2024-05-20 21:03             ` John R Levine
@ 2024-05-20 21:14             ` Larry McVoy
  2024-05-20 21:46               ` Ben Kallus
  1 sibling, 1 reply; 18+ messages in thread
From: Larry McVoy @ 2024-05-20 21:14 UTC (permalink / raw)
  To: Ben Kallus; +Cc: John Levine, tuhs

On Mon, May 20, 2024 at 05:00:40PM -0400, Ben Kallus wrote:
> What I meant was that the LangSec people reject the robustness
> principle as it is commonly understood (i.e., make a "reasonable"
> guess when receiving garbage)

That most certainly is not what I took from what Postel said.  And I
say that as someone who designed a distributed system that had client
and server sides and had to make that work across versions from last
week to 10-20 years ago.

I took it more as "Be more and more careful what you say, get that more
correct with each release, but tolerate the less correct stuff you might
get from earlier versions".  In no way did I think he meant ``make a
"reasonable" guess when receiving garbage''.  Garbage is garbage, you
error on that.  
-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: OT: LangSec (Re: A fuzzy awk.)
  2024-05-20 21:14             ` Larry McVoy
@ 2024-05-20 21:46               ` Ben Kallus
  2024-05-20 21:57                 ` Larry McVoy
  0 siblings, 1 reply; 18+ messages in thread
From: Ben Kallus @ 2024-05-20 21:46 UTC (permalink / raw)
  To: Larry McVoy; +Cc: John Levine, tuhs

My point was that, regardless of Postel's original intent, many people
have interpreted his principle to mean that accepting garbage is good.
*This* interpretation is incompatible with LangSec.

See RFC 9413 for an exploration of the many interpretations of
Postel's principle.

-Ben

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: OT: LangSec (Re: A fuzzy awk.)
  2024-05-20 21:46               ` Ben Kallus
@ 2024-05-20 21:57                 ` Larry McVoy
  0 siblings, 0 replies; 18+ messages in thread
From: Larry McVoy @ 2024-05-20 21:57 UTC (permalink / raw)
  To: Ben Kallus; +Cc: John Levine, tuhs

Those would be the stupid people and you can't fix stupid.  Seriously,
people can twist anything into anything.  Just because dumb people
didn't understand his principle doesn't mean it was a bad principle.

On Mon, May 20, 2024 at 05:46:48PM -0400, Ben Kallus wrote:
> My point was that, regardless of Postel's original intent, many people
> have interpreted his principle to mean that accepting garbage is good.
> *This* interpretation is incompatible with LangSec.
> 
> See RFC 9413 for an exploration of the many interpretations of
> Postel's principle.
> 
> -Ben

-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [TUHS] Re: A fuzzy awk.
  2024-05-20 13:41   ` [TUHS] Re: A fuzzy awk Ralph Corderoy
  2024-05-20 14:26     ` Chet Ramey
@ 2024-05-22 13:44     ` arnold
  1 sibling, 0 replies; 18+ messages in thread
From: arnold @ 2024-05-22 13:44 UTC (permalink / raw)
  To: tuhs, ralph

I've been travelling, so I haven't been able to answer
these mails until now.

Ralph Corderoy <ralph@inputplus.co.uk> wrote:

> I can see an avalanche of errors in an earlier gawk caused problems, but
> each time there would have been a first patch of the input which made
> a mistake causing the pebble to start rolling.  My understanding is that
> there was potentially a lot of these and rather than fix them it was
> more productive of the limited time to stop patching the input.  Then
> the code which patched could be deleted, getting rid of the buggy bits
> along the way?

That's not the case. Gawk didn't try to patch the input. It
simply set a flag saying "don't try to run" but kept on parsing
anyway, in the hope of finding more errors.

That was a bad idea, because the representation of the program
being built was then not in the correct state to have more
stuff parsed and converted into byte code.

Very early on, the first parse error caused an exit. I changed
it to keep going to try to be helpful. But when that became a source
for essentially specious bug reports and a time sink for me, it
became time to go back to exiting on the first problem.

HTH,

Arnold

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-05-22 13:44 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-20 13:06 [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Douglas McIlroy
2024-05-20 13:14 ` [TUHS] " arnold
2024-05-20 14:00   ` G. Branden Robinson
2024-05-20 13:25 ` Chet Ramey
2024-05-20 13:41   ` [TUHS] Re: A fuzzy awk Ralph Corderoy
2024-05-20 14:26     ` Chet Ramey
2024-05-22 13:44     ` arnold
2024-05-20 13:54 ` Ralph Corderoy
2024-05-20 15:39   ` [TUHS] OT: LangSec (Re: A fuzzy awk.) Åke Nordin
2024-05-20 16:09     ` [TUHS] " Ben Kallus
2024-05-20 20:02       ` John Levine
2024-05-20 20:11         ` Larry McVoy
2024-05-20 21:00           ` Ben Kallus
2024-05-20 21:03             ` John R Levine
2024-05-20 21:14             ` Larry McVoy
2024-05-20 21:46               ` Ben Kallus
2024-05-20 21:57                 ` Larry McVoy
2024-05-20 16:06 ` [TUHS] Re: A fuzzy awk. (Was: The 'usage: ...' message.) Paul Winalski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).