Hi folks, At 2024-05-20T07:14:07-0600, arnold@skeeve.com wrote: > Douglas McIlroy wrote: > > I'm surprised by nonchalance about bad inputs evoking bad program > > behavior. That attitude may have been excusable 50 years ago. By > > now, though, we have seen so much malicious exploitation of open > > avenues of "undefined behavior" that we can no longer ignore bugs > > that "can't happen when using the tool correctly". Mature software > > should not brook incorrect usage. > > It's not nonchalance, not at all! > > The current behavior is to die on the first syntax error, instead of > trying to be "helpful" by continuing to try to parse the program in > the hope of reporting other errors. [...] > The crashes came because errors cascaded. I don't see a reason to > spend valuable, *personal* time on adding defenses *where they aren't > needed*. > > A steel door on your bedroom closet does no good if your front door is > made of balsa wood. My change was to stop the badness at the front > door. > > > I commend attention to the LangSec movement, which advocates for > > rigorously enforced separation between legal and illegal inputs. > > Illegal input, in gawk, as far as I know, should always cause a syntax > error report and an immediate exit. > > If it doesn't, that is a bug, and I'll be happy to try to fix it. > > I hope that clarifies things. For grins, and for a data point from elsewhere in GNU-land, GNU troff is pretty robust to this sort of thing. Much as I might like to boast of having improved it in this area, it appears to have already come with iron long johns courtesy of James Clark and/or Werner Lemberg. I threw troff its own ELF executable as a crude fuzz test some years ago, and I don't recall needing to fix anything except unhelpfully vague diagnostic messages (a phenomenon I am predisposed to observe anyway). I did notice today that in one case we were spewing back out unprintable characters (newlines, character codes > 127) _in_ one (but only one) of the diagnostic messages, and while that's ugly, it's not an obvious exploitation vector to me. Nevertheless I decided to fix it and it will be in my next push. So here's the mess you get when feeding GNU troff to itself. No GNU troff since before 1.22.3 core dumps on this sort of unprepossessing input. $ ./build/test-groff -Ww -z /usr/bin/troff 2>&1 | sed 's/:[0-9]\+:/:/' | sort | uniq -c 17 troff:/usr/bin/troff: error: a backspace character is not allowed in an escape sequence parameter 10 troff:/usr/bin/troff: error: a space character is not allowed in an escape sequence parameter 1 troff:/usr/bin/troff: error: a space is not allowed as a starting delimiter 1 troff:/usr/bin/troff: error: a special character is not allowed in an identifier 1 troff:/usr/bin/troff: error: character '-' is not allowed as a starting delimiter 1 troff:/usr/bin/troff: error: invalid argument ')' to output suppression escape sequence 1 troff:/usr/bin/troff: error: invalid argument 'c' to output suppression escape sequence 1 troff:/usr/bin/troff: error: invalid argument 'l' to output suppression escape sequence 1 troff:/usr/bin/troff: error: invalid argument 'm' to output suppression escape sequence 1 troff:/usr/bin/troff: error: invalid positional argument number ',' 3 troff:/usr/bin/troff: error: invalid positional argument number '<' 3 troff:/usr/bin/troff: error: invalid positional argument number 'D' 1 troff:/usr/bin/troff: error: invalid positional argument number 'E' 10 troff:/usr/bin/troff: error: invalid positional argument number 'H' 1 troff:/usr/bin/troff: error: invalid positional argument number 'Hi' 1 troff:/usr/bin/troff: error: invalid positional argument number 'I' 1 troff:/usr/bin/troff: error: invalid positional argument number 'I9' 1 troff:/usr/bin/troff: error: invalid positional argument number 'L' 1 troff:/usr/bin/troff: error: invalid positional argument number 'LD' 2 troff:/usr/bin/troff: error: invalid positional argument number 'LL' 5 troff:/usr/bin/troff: error: invalid positional argument number 'LT' 1 troff:/usr/bin/troff: error: invalid positional argument number 'M' 4 troff:/usr/bin/troff: error: invalid positional argument number 'P' 5 troff:/usr/bin/troff: error: invalid positional argument number 'X' 1 troff:/usr/bin/troff: error: invalid positional argument number 'dH' 1 troff:/usr/bin/troff: error: invalid positional argument number 'h' 1 troff:/usr/bin/troff: error: invalid positional argument number 'l' 1 troff:/usr/bin/troff: error: invalid positional argument number 'p' 1 troff:/usr/bin/troff: error: invalid positional argument number 'x' 3 troff:/usr/bin/troff: error: invalid positional argument number '|' 35 troff:/usr/bin/troff: error: invalid positional argument number (unprintable) 3 troff:/usr/bin/troff: error: unterminated transparent embedding escape sequence The second to last (and most frequent) message in the list above is the "new" one. Here's the diff. diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp index 8d828a01e..596ecf6f9 100644 --- a/src/roff/troff/input.cpp +++ b/src/roff/troff/input.cpp @@ -4556,10 +4556,21 @@ static void interpolate_arg(symbol nm) } else { const char *p; - for (p = s; *p && csdigit(*p); p++) - ; - if (*p) - copy_mode_error("invalid positional argument number '%1'", s); + bool is_valid = true; + bool is_printable = true; + for (p = s; *p != 0 /* nullptr */; p++) { + if (!csdigit(*p)) + is_valid = false; + if (!csprint(*p)) + is_printable = false; + } + if (!is_valid) { + const char msg[] = "invalid positional argument number"; + if (is_printable) + copy_mode_error("%1 '%2'", msg, s); + else + copy_mode_error("%1 (unprintable)", msg); + } else input_stack::push(input_stack::get_arg(atoi(s))); } GNU troff may have started out with an easier task in this area than an AWK or a shell had; its syntax is not block-structured in the same way, so parser state recovery is easier, and it's _inherently_ a filter. The only fruitful fuzz attack on groff I can recall was upon indexed bibliographic database files, which are a binary format. This went unresolved for several years[1] but I fixed it for groff 1.23.0. https://bugs.debian.org/716109 Regards, Branden [1] I think I understand the low triage priority. Few groff users use the refer(1) preprocessor, and of those who do, even fewer find modern systems so poorly performant at text scanning that they desire the services of indxbib(1) to speed lookup of bibliographic entries.