The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Dan Cross <crossd@gmail.com>
To: Alejandro Colomar <alx.manpages@gmail.com>
Cc: tuhs@tuhs.org
Subject: [TUHS] Re: printf (was: python)
Date: Fri, 4 Aug 2023 17:16:56 -0400	[thread overview]
Message-ID: <CAEoi9W5WhmE+dOdzmeAn57wRLxA8460XufkBn6GR7iDcLQwTgA@mail.gmail.com> (raw)
In-Reply-To: <cd744926-b60f-d9e3-a959-6adaefe71b1a@gmail.org>

On Fri, Aug 4, 2023 at 12:57 PM Alejandro Colomar
<alx.manpages@gmail.com> wrote:
> On 2023-08-04 18:06, Dan Cross wrote:
> > On Thu, Aug 3, 2023 at 7:55 PM Alejandro Colomar <alx.manpages@gmail.com> wrote:
> >> On 2023-08-03 23:29, Dan Cross wrote:
> >>> On Thu, Aug 3, 2023 at 2:05 PM Alejandro Colomar <alx.manpages@gmail.com> wrote:
> >>>> -  It is type-safe, with the right tools.
> >>>
> >>> No it's not, and it really can't be. True, there are linters that can
> >>> try to match up types _if_ the format string is a constant and all the
> >>> arguments are known at e.g. compile time, but C permits one to
> >>> construct the format string at run time (or just select between a
> >>> bunch of variants); the language gives you no tools to enforce type
> >>> safety in a meaningful way once you do that.
> >>
> >> Isn't a variable format string a security vulnerability?  Where do you
> >> need it?
> >
> > It _can_ be a security vulnerability, but it doesn't necessarily
> > _need_ to be. If one is careful in how one constructs it, such things
> > can be very safe indeed.
> >
> > As to where one needs it, there are examples like `vsyslog()`,
>
> I guessed you'd mention v*() formatting functions, as that's the only
> case where a variable format string is indeed necessary (or kind of).

I think you are conflating "necessary" with "possible."

> I'll simplify your example to vwarnx(3), from the BSDs, which does less
> job, but has a similar API regarding our discussion.
>
> I'm not sure if you meant vsyslog() uses or its implementation, but
> I'll cover both (but for vwarnx(3)).
>
> Uses:
>
> This function (and all v*() functions) will be used to implement a
> wrapper variadic function, like for example warnx(3).  It's there, in
> the variadic function, where the string /must be/ a literal, and where

No, the format string does not need to be a literal at all: it can be
constructed at runtime. Is that a good idea? Perhaps not. Is it
possible? Yes. Can the compiler type-check it in that case? No, it
cannot (since it hasn't been constructed at compile time).  Consider
this program:

: chandra; cat warn.c
#include <err.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int
main(void)
{
        char buf[1024];

        strlcpy(buf, "%s ", sizeof(buf));
        strlcat(buf, "%s ", sizeof(buf));
        strlcat(buf, "%d", sizeof(buf));

        warnx(buf, "Hello", "World", 42);

        return EXIT_SUCCESS;
}
: chandra; cc -Wall -Werror -o warn warn.c
: chandra; ./warn
warn: Hello World 42
: chandra;

That's a perfectly legal C program, even if it is a silly one. "Don't
do that" isn't a statement about the language, it's a statement about
programmer practice, which is the point.

> the arguments are checked.  There's never a good reason to use a
> non-literal there (AFAIK),

I believe that you believe that. You may even be right. However,
that's not how the language works.

> and there are compiler warnings and linters
> to enforce that.  Since those args have been previously checked, you
> should just pass the va_list pristine to other formatting functions.

I'm afraid that this reasonable advice misses the point: there's
nothing in the language that says you _have_ to do it this way. Some
tools may _help_, but they cannot cover all (reasonable) situations.

Here again `syslog()` is an interesting example, as it supports the
`%m` formatting verb. _An_ implementation of this may work by
interpreting the format string and constructing a new one,
substituting `strerror(errno)` whenever it hits "%m" and then using
`snprintf` (or equivalent) to create the file string that is sent to
`syslogd`. You may argue that programmers should only pass constant
strings (left deliberately vague since there are reasonable cases
where named string constants may be passed as a format string argument
in lieu of a literal) that can be checked by clang and gcc, but again,
nothing in the language _requires_ that, but the implementation of
`vsyslog` that actually implements that logic has no way of knowing
that its caller has done this correctly.

Similarly, someone may choose to implement a templating language that
converts a custom format to a new format string, but assumes that the
arguments are in a `va_list` or similar. Bad idea? Probably. Legal in
C? Yes.

> Then, as long as libc doesn't have bugs, you're fine.

That's a tall order.

> In the implementation of a v*() function:
>
> Do /not/ touch the va_list.  Just pass it to the next function.  Of
> course, in the end, libc will have to iterate over it and do the job,
> but that's not the typical programmer's problem.  Here's the libbsd
> implementation of vwarnx(3), which does exactly that: no messing with
> the va_list.
>
> $ grepc vwarnx
> ./include/bsd/err.h:63:
> void vwarnx(const char *format, va_list ap)
>         __printflike(1, 0);
>
>
> ./src/err.c:97:
> void
> vwarnx(const char *format, va_list ap)
> {
>         fprintf(stderr, "%s: ", getprogname());
>         if (format)
>                 vfprintf(stderr, format, ap);
>         fprintf(stderr, "\n");
> }
>
>
> Just put a [[gnu::format(printf)]] in the outermost wrapper, which
> should be using a string literal, and you'll be fine.

Using a number of extensions aside here, again, that's just (sadly)
not how the language works.

> > but
> > that's almost besides the point, which is that given that you _can_ do
> > things like that, the language can't really save you by type-checking
> > the arguments to printf; and once varargs are in the mix? Forget about
> > it.
>
> Not really.  You can do that _only_ if you really want.

Yes, that's the point: if we're talking about language-level
guarantees, the language can't help you here. It can try, and it can
hit a lot of really useful cases, but not all. By contrast, formatting
in Go and Rust is type-safe by construction.

> If you want to
> not be able, you can "drop privileges" by adding a few flags to your
> compiler, such as -Werror=format-security -Werror=format-nonliteral,
> and add a bunch of linters to your build system for more redundancy,
> and voila, your project is now safe.

Provided that you use a compiler that provides those options, or that
those linters are viable in your codebase. ;-)

        - Dan C.

  reply	other threads:[~2023-08-04 21:17 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-30 18:22 [TUHS] Re: Cool talk on Unix and Sendmail history, by Eric Allman Norman Wilson
2023-07-30 21:43 ` Rob Pike
2023-07-30 23:34   ` George Michaelson
2023-07-30 23:59     ` Erik E. Fair
2023-07-31  0:26       ` Warner Losh
2023-07-31 22:57         ` Grant Taylor via TUHS
2023-07-31 23:05           ` Warner Losh
2023-08-01  2:45             ` Grant Taylor via TUHS
2023-08-01  1:51         ` Niklas Karlsson
2023-08-01  2:47           ` Grant Taylor via TUHS
2023-08-01  3:20           ` Theodore Ts'o
2023-07-31  0:41       ` segaloco via TUHS
2023-08-01  9:22       ` Marc Donner
2023-08-01 10:58         ` Erik E. Fair
2023-08-02  0:37           ` Dave Horsfall
2023-08-02 14:52             ` Ron Natalie
2023-08-02 21:14               ` Grant Taylor via TUHS
2023-08-02 22:20                 ` segaloco via TUHS
2023-08-02 22:37                   ` Warner Losh
2023-08-02 23:49                   ` Rich Salz
2023-08-03  0:51                     ` [TUHS] Re: python Larry McVoy
2023-08-03  1:20                       ` George Michaelson
2023-08-03  2:53                         ` Bakul Shah
2023-08-03  2:55                         ` segaloco via TUHS
2023-08-03  3:24                         ` George Michaelson
2023-08-03  3:32                           ` Warner Losh
2023-08-03  3:55                           ` Bakul Shah
2023-08-03  8:32                             ` Rob Pike
2023-08-03 14:19                               ` Bakul Shah
2023-08-03 14:56                                 ` Dan Halbert
2023-08-03 15:20                                   ` will.senn
2023-08-03 22:05                                     ` Dan Cross
2023-08-04  0:24                                       ` John Cowan
2023-08-04 15:17                                         ` Dan Cross
2023-08-05  4:44                                       ` Bakul Shah
2023-08-03 15:41                                 ` John Cowan
2023-08-03  2:07                       ` Clem Cole
2023-08-03  2:21                         ` Pete Wright via TUHS
2023-08-03  2:56                           ` Warner Losh
2023-08-03 12:36                         ` Mike Markowski
2023-08-03 13:29                           ` Rob Pike
2023-08-03 15:24                             ` emanuel stiebler
2023-08-03 15:39                               ` Steffen Nurpmeso
2023-08-04  1:01                             ` Larry McVoy
2023-08-04  1:28                               ` segaloco via TUHS
2023-08-04  1:58                                 ` Adam Thornton
2023-08-04 15:04                                   ` Dan Cross
2023-08-04 15:10                                     ` Larry McVoy
2023-08-03 16:57                         ` [TUHS] Re: [TULSA] " Phil Budne
2023-08-03 17:00                           ` Rich Salz
2023-08-03 20:35                             ` [TUHS] Split addressing (I/D) space (inspired by the death of the python... thread) Will Senn
2023-08-03 21:05                               ` [TUHS] " Kenneth Goodwin
2023-08-03 21:10                                 ` Ronald Natalie
2023-08-03 21:16                                   ` Warner Losh
2023-08-03 21:24                                     ` Ronald Natalie
2023-08-03 22:34                                   ` Kenneth Goodwin
2023-08-03 21:05                               ` Ronald Natalie
2023-08-03 21:44                               ` Clem Cole
2023-08-03 22:08                                 ` Will Senn
2023-08-03 22:54                                   ` Clem Cole
2023-08-03 23:08                                     ` Dave Horsfall
2023-08-03 23:15                                     ` Clem Cole
2023-08-04  0:38                                     ` John Cowan
2023-08-03 17:29                           ` [TUHS] Re: [TULSA] Re: python Alejandro Colomar
2023-08-03 17:51                             ` John Cowan
2023-08-03 18:05                               ` Alejandro Colomar
2023-08-03 21:29                                 ` Dan Cross
2023-08-03 23:55                                   ` [TUHS] printf (was: python) Alejandro Colomar
2023-08-04 16:06                                     ` [TUHS] " Dan Cross
2023-08-04 16:57                                       ` Alejandro Colomar
2023-08-04 21:16                                         ` Dan Cross [this message]
2023-08-03 21:02                           ` [TUHS] Re: [TULSA] Re: python Steffen Nurpmeso
2023-08-03 23:47                           ` Larry McVoy
2023-08-03 23:54                             ` Will Senn
2023-08-04 19:20                         ` [TUHS] " Ed Bradford
2023-08-04 19:47                           ` Larry McVoy
2023-08-05  5:40                             ` Ed Bradford
2023-08-02 23:33               ` [TUHS] Re: Cool talk on Unix and Sendmail history, by Eric Allman Dave Horsfall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEoi9W5WhmE+dOdzmeAn57wRLxA8460XufkBn6GR7iDcLQwTgA@mail.gmail.com \
    --to=crossd@gmail.com \
    --cc=alx.manpages@gmail.com \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).