The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
To: tuhs@tuhs.org
Cc: douglas.mcilroy@dartmouth.edu, groff@gnu.org
Subject: [TUHS] Re: A fuzzy awk. (Was: The 'usage: ...' message.)
Date: Mon, 20 May 2024 09:00:47 -0500	[thread overview]
Message-ID: <20240520140047.4x4lwzs6wmo34uge@illithid> (raw)
In-Reply-To: <202405201314.44KDE7rq170661@freefriends.org>

[-- Attachment #1: Type: text/plain, Size: 6974 bytes --]

Hi folks,

At 2024-05-20T07:14:07-0600, arnold@skeeve.com wrote:
> Douglas McIlroy <douglas.mcilroy@dartmouth.edu> wrote:
> > I'm surprised by nonchalance about bad inputs evoking bad program
> > behavior.  That attitude may have been excusable 50 years ago. By
> > now, though, we have seen so much malicious exploitation of open
> > avenues of "undefined behavior" that we can no longer ignore bugs
> > that "can't happen when using the tool correctly". Mature software
> > should not brook incorrect usage.
> 
> It's not nonchalance, not at all!
> 
> The current behavior is to die on the first syntax error, instead of
> trying to be "helpful" by continuing to try to parse the program in
> the hope of reporting other errors.
[...]
> The crashes came because errors cascaded.  I don't see a reason to
> spend valuable, *personal* time on adding defenses *where they aren't
> needed*.
> 
> A steel door on your bedroom closet does no good if your front door is
> made of balsa wood. My change was to stop the badness at the front
> door.
> 
> > I commend attention to the LangSec movement, which advocates for
> > rigorously enforced separation between legal and illegal inputs.
> 
> Illegal input, in gawk, as far as I know, should always cause a syntax
> error report and an immediate exit.
> 
> If it doesn't, that is a bug, and I'll be happy to try to fix it.
> 
> I hope that clarifies things.

For grins, and for a data point from elsewhere in GNU-land, GNU troff is
pretty robust to this sort of thing.  Much as I might like to boast of
having improved it in this area, it appears to have already come with
iron long johns courtesy of James Clark and/or Werner Lemberg.  I threw
troff its own ELF executable as a crude fuzz test some years ago, and I
don't recall needing to fix anything except unhelpfully vague diagnostic
messages (a phenomenon I am predisposed to observe anyway).

I did notice today that in one case we were spewing back out unprintable
characters (newlines, character codes > 127) _in_ one (but only one) of
the diagnostic messages, and while that's ugly, it's not an obvious
exploitation vector to me.

Nevertheless I decided to fix it and it will be in my next push.

So here's the mess you get when feeding GNU troff to itself.  No GNU
troff since before 1.22.3 core dumps on this sort of unprepossessing
input.

$ ./build/test-groff -Ww -z /usr/bin/troff 2>&1 | sed 's/:[0-9]\+:/:/' | sort | uniq -c
     17 troff:/usr/bin/troff: error: a backspace character is not allowed in an escape sequence parameter
     10 troff:/usr/bin/troff: error: a space character is not allowed in an escape sequence parameter
      1 troff:/usr/bin/troff: error: a space is not allowed as a starting delimiter
      1 troff:/usr/bin/troff: error: a special character is not allowed in an identifier
      1 troff:/usr/bin/troff: error: character '-' is not allowed as a starting delimiter
      1 troff:/usr/bin/troff: error: invalid argument ')' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid argument 'c' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid argument 'l' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid argument 'm' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid positional argument number ','
      3 troff:/usr/bin/troff: error: invalid positional argument number '<'
      3 troff:/usr/bin/troff: error: invalid positional argument number 'D'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'E'
     10 troff:/usr/bin/troff: error: invalid positional argument number 'H'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'Hi'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'I'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'I9'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'L'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'LD'
      2 troff:/usr/bin/troff: error: invalid positional argument number 'LL'
      5 troff:/usr/bin/troff: error: invalid positional argument number 'LT'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'M'
      4 troff:/usr/bin/troff: error: invalid positional argument number 'P'
      5 troff:/usr/bin/troff: error: invalid positional argument number 'X'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'dH'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'h'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'l'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'p'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'x'
      3 troff:/usr/bin/troff: error: invalid positional argument number '|'
     35 troff:/usr/bin/troff: error: invalid positional argument number (unprintable)
      3 troff:/usr/bin/troff: error: unterminated transparent embedding escape sequence

The second to last (and most frequent) message in the list above is the
"new" one.  Here's the diff.

diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 8d828a01e..596ecf6f9 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -4556,10 +4556,21 @@ static void interpolate_arg(symbol nm)
   }
   else {
     const char *p;
-    for (p = s; *p && csdigit(*p); p++)
-      ;
-    if (*p)
-      copy_mode_error("invalid positional argument number '%1'", s);
+    bool is_valid = true;
+    bool is_printable = true;
+    for (p = s; *p != 0 /* nullptr */; p++) {
+      if (!csdigit(*p))
+       is_valid = false;
+      if (!csprint(*p))
+       is_printable = false;
+    }
+    if (!is_valid) {
+      const char msg[] = "invalid positional argument number";
+      if (is_printable)
+       copy_mode_error("%1 '%2'", msg, s);
+      else
+       copy_mode_error("%1 (unprintable)", msg);
+    }
     else
       input_stack::push(input_stack::get_arg(atoi(s)));
   }

GNU troff may have started out with an easier task in this area than an
AWK or a shell had; its syntax is not block-structured in the same way,
so parser state recovery is easier, and it's _inherently_ a filter.

The only fruitful fuzz attack on groff I can recall was upon indexed
bibliographic database files, which are a binary format.  This went
unresolved for several years[1] but I fixed it for groff 1.23.0.

https://bugs.debian.org/716109

Regards,
Branden

[1] I think I understand the low triage priority.  Few groff users use
    the refer(1) preprocessor, and of those who do, even fewer find
    modern systems so poorly performant at text scanning that they
    desire the services of indxbib(1) to speed lookup of bibliographic
    entries.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-05-20 14:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-20 13:06 [TUHS] " Douglas McIlroy
2024-05-20 13:14 ` [TUHS] " arnold
2024-05-20 14:00   ` G. Branden Robinson [this message]
2024-05-20 13:25 ` Chet Ramey
2024-05-20 13:41   ` [TUHS] Re: A fuzzy awk Ralph Corderoy
2024-05-20 14:26     ` Chet Ramey
2024-05-22 13:44     ` arnold
2024-05-20 13:54 ` Ralph Corderoy
2024-05-20 15:39   ` [TUHS] OT: LangSec (Re: A fuzzy awk.) Åke Nordin
2024-05-20 16:09     ` [TUHS] " Ben Kallus
2024-05-20 20:02       ` John Levine
2024-05-20 20:11         ` Larry McVoy
2024-05-20 21:00           ` Ben Kallus
2024-05-20 21:03             ` John R Levine
2024-05-20 21:14             ` Larry McVoy
2024-05-20 21:46               ` Ben Kallus
2024-05-20 21:57                 ` Larry McVoy
2024-05-20 16:06 ` [TUHS] Re: A fuzzy awk. (Was: The 'usage: ...' message.) Paul Winalski
  -- strict thread matches above, loose matches on Subject: below --
2024-05-19 23:08 [TUHS] The 'usage: ...' message. (Was: On Bloat...) Douglas McIlroy
2024-05-20  0:58 ` [TUHS] " Rob Pike
2024-05-20  3:19   ` arnold
2024-05-20  9:20     ` [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Ralph Corderoy
2024-05-20 11:58       ` [TUHS] " arnold
2024-05-20 13:10       ` Chet Ramey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240520140047.4x4lwzs6wmo34uge@illithid \
    --to=g.branden.robinson@gmail.com \
    --cc=douglas.mcilroy@dartmouth.edu \
    --cc=groff@gnu.org \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).