From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, NICE_REPLY_A,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 23497 invoked from network); 23 May 2022 05:44:31 -0000 Received: from 9front.inri.net (168.235.81.73) by inbox.vuxu.org with ESMTPUTF8; 23 May 2022 05:44:31 -0000 Received: from mail.posixcafe.org ([45.76.19.58]) by 9front; Mon May 23 01:42:43 -0400 2022 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=posixcafe.org; s=20200506; t=1653284559; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xNw/N1eU18t5HVNmcuF+CnjK2LwA4JA5Fq2gnGerVPo=; b=xyotQ4/99AdONli32YH9WBXphKVHTBTtmdf86e0d/y/in+EPf+xvqmRFAiyAZpge8PstGb 7CRscjw/c7FjPbvCSdU9NRhmV5aVuGiQMHZ6JgwZ5g6OnFuaQGSJEq7E/VvNjBaIx+u0pv elri9nBC4xcxT6vYliMDTnB1ecH8NH8= Received: from [192.168.168.200] (161-97-228-135.lpcnextlight.net [161.97.228.135]) by mail.posixcafe.org (OpenSMTPD) with ESMTPSA id 616efb8b (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for <9front@9front.org>; Mon, 23 May 2022 00:42:39 -0500 (CDT) Message-ID: <4bffa657-6b9e-8069-ae45-e9969c3542c5@posixcafe.org> Date: Sun, 22 May 2022 23:42:29 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Content-Language: en-US To: 9front@9front.org References: From: Jacob Moody In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit List-ID: <9front.9front.org> List-Help: X-Glyph: ➈ X-Bullshit: self-healing callback-oriented content-driven-based HTML over JSON configuration-based GPU frontend Subject: Re: [9front] [PATCH] Unmount to remove sharp devices. Reply-To: 9front@9front.org Precedence: bulk Another go at this. Bit of a refactor of the kernel changes and wrapped up the userspace work. Since some of the thoughts are scattered throughout the thread now I wanted to provide an overview of how things ended up: A process can eject devices from it's namespace through writes to /dev/drivers. Ejected devices can not be walked. A 'permit' command is available to allow processes to eject devices by whitelist rather then blacklist. Support for these commands has been added to newns, allowing namespace files to use eject/permit commands. These new commands are used in /lib/namespace.ftp to restrict anonymous login. aux/listen now allows individual namespace files to be used per service. This namespace is sourced one per listener. Each connection receives a copy of the namespace. A example /rc/bin/service/!tcp80.namespace is provided which builds an isolated webroot. In order to accomplish this without bringing in most of /root, a /rc folder was added to #/. This is prsented as one diff, but I plan to commit this in chunks. Thanks, moody --- diff 6fbb1acc8fa0b6655b14e8c46240a4a8d2d8c672 uncommitted --- a/lib/namespace +++ b/lib/namespace @@ -22,6 +22,7 @@ # standard bin bind /$cputype/bin /bin +bind $rootdir'/rc' /rc bind -a /rc/bin /bin # internal networks --- a/lib/namespace.ftp +++ b/lib/namespace.ftp @@ -8,5 +8,6 @@ # bind a personal incoming directory below incoming bind -c /usr/none/incoming /usr/web/incoming/none +permit |MedIa/ # this cuts off everything not mounted below /usr/web bind /usr/web / --- /tmp/diff100000406351 +++ b/rc/bin/service/!tcp80.namespace @@ -1,0 +1,24 @@ +mount -aC #s/boot /root $rootspec + +# kernel devices +bind #c /dev +bind #d /fd +bind -c #e /env +bind #p /proc +bind -a #l /net +bind -a #I /net + +bind /root/$cputype/bin /bin +bind /root/rc /rc +bind -a /rc/bin /bin + +permit Mcde|pslI/ + +# grab just our webroot +bind /root/usr/web /srv + +# or bind in the actual root +# bind -a /root / + +unmount /root +eject Ms --- a/sys/man/3/cons +++ b/sys/man/3/cons @@ -90,10 +90,32 @@ .PP The .B drivers -file contains, one per line, a listing of the drivers configured in the kernel, in the format +file contains, one per line, a listing of available kernel drivers, in the format .IP .EX #c cons +.EE +.PP +A process can eject a driver from the current namespace through a write to +.B drivers. +A message is one of: +.IP "eject \f2drivers\fP" +block access to the listed +.I drivers. +.IP "permit \f2drivers\fP" +permit access to only the provided +.I drivers. +.PP +\f2Drivers\fP is a string of driver characters. Ejecting +.IR mnt (3) +prevents new mounts in to the current namespace. +The following blocks access to +.IR env (3) +and +.IR sd (3): +.IP +.EX +eject se .EE .PP The --- a/sys/man/3/root +++ b/sys/man/3/root @@ -10,6 +10,7 @@ .B /net .B /net.alt .B /proc +.B /rc .B /root .B /srv .fi --- a/sys/man/6/namespace +++ b/sys/man/6/namespace @@ -59,6 +59,17 @@ .I new is missing. .TP +.BI eject \ drivers +Ejects the listed kernel +.I drivers +from the namespace. +.I Drivers +is a string of driver characters. +.TP +.BI permit \ drivers +Permit access to only the listed kernel +.I drivers. +.TP .BR clear Clear the name space with .BR rfork(RFCNAMEG) . @@ -80,4 +91,5 @@ .SH "SEE ALSO" .IR bind (1), .IR namespace (4), -.IR init (8) +.IR init (8), +.IR cons (3) --- a/sys/man/8/listen +++ b/sys/man/8/listen @@ -96,6 +96,14 @@ an inbound call on the TCP network for port 565 executes service .BR tcp565 . .PP +Services may have individual +.IR namespace (6) +files specified within +.IR srvdir . +If provided, the namespace is used as the parent for each connection +to the corresponding service. Namespace files are found by appending a .namespace +suffix to the service name. +.PP At least the following services are available in .BR /bin/service . .TF \ tcp0000 --- a/sys/src/9/boot/boot.c +++ b/sys/src/9/boot/boot.c @@ -25,6 +25,7 @@ buf[1+read(open("/env/cputype", OREAD|OCEXEC), buf+1, sizeof buf - 6)] = '\0'; strcat(buf, bin); bind(buf, bin, MAFTER); + bind("/root/rc", "/rc", MREPL); bind("/rc/bin", bin, MAFTER); exec("/bin/bootrc", argv); --- a/sys/src/9/port/chan.c +++ b/sys/src/9/port/chan.c @@ -1272,7 +1272,7 @@ Chan* namec(char *aname, int amode, int omode, ulong perm) { - int len, n, t, nomount; + int len, n, t, nomount, devunmount; Chan *c; Chan *volatile cnew; Path *volatile path; @@ -1292,6 +1292,24 @@ name = aname; /* + * When unmounting, the name parameter must be accessed + * using Aopen in order to get the real chan from + * something like /srv/cs or /fd/0. However when sandboxing, + * unmounting a sharp from a union is a valid operation even + * if the device is blocked. + */ + devunmount = 0; + if(amode == Aunmount){ + /* + * Doing any walks down the device could leak information + * about the existence of files. + */ + if(name[0] == '#' && utflen(name) == 2) + devunmount = 1; + amode = Aopen; + } + + /* * Find the starting off point (the current slash, the root of * a device tree, or the current dot) as well as the name to * evaluate starting there. @@ -1313,24 +1331,13 @@ up->genbuf[n++] = *name++; } up->genbuf[n] = '\0'; - /* - * noattach is sandboxing. - * - * the OK exceptions are: - * | it only gives access to pipes you create - * d this process's file descriptors - * e this process's environment - * the iffy exceptions are: - * c time and pid, but also cons and consctl - * p control of your own processes (and unfortunately - * any others left unprotected) - */ n = chartorune(&r, up->genbuf+1)+1; - if(up->pgrp->noattach && utfrune("|decp", r)==nil) - error(Enoattach); t = devno(r, 1); if(t == -1) error(Ebadsharp); + if(!devunmount && !devallowed(up->pgrp, r)) + error(Enoattach); + c = devtab[t]->attach(up->genbuf+n); break; --- a/sys/src/9/port/dev.c +++ b/sys/src/9/port/dev.c @@ -31,6 +31,63 @@ } void +deveject(Pgrp *pgrp, int invert, char *devs) +{ + int i, t, w; + char *p; + Rune r; + u64int mask[nelem(pgrp->notallowed)]; + + if(invert) + memset(mask, 0xFF, sizeof mask); + else + memset(mask, 0, sizeof mask); + + w = sizeof mask[0] * 8; + for(p = devs; *p != '\0';){ + p += chartorune(&r, p); + t = devno(r, 1); + if(t == -1) + continue; + if(invert) + mask[t/w] &= ~(1<ns); + for(i=0; i < nelem(pgrp->notallowed); i++) + pgrp->notallowed[i] |= mask[i]; + wunlock(&pgrp->ns); +} + +int +devallowed(Pgrp *pgrp, int r) +{ + int t, w, b; + + t = devno(r, 1); + if(t == -1) + return 0; + + w = sizeof(u64int) * 8; + rlock(&pgrp->ns); + b = !(pgrp->notallowed[t/w] & 1<ns); + return b; +} + +int +canmount(Pgrp *pgrp) +{ + /* + * Devmnt is not usable directly from user procs, so + * having it removed is interpreted to block any mounts. + */ + return devallowed(pgrp, 'M'); +} + +void devdir(Chan *c, Qid qid, char *n, vlong length, char *user, long perm, Dir *db) { db->name = n; --- a/sys/src/9/port/devcons.c +++ b/sys/src/9/port/devcons.c @@ -39,6 +39,18 @@ CMrdb, "rdb", 0, }; +enum +{ + CMeject, + CMpermit, +}; + +Cmdtab drivermsg[] = +{ + CMeject, "eject", 0, + CMpermit, "permit", 0, +}; + void printinit(void) { @@ -332,7 +344,7 @@ "cons", {Qcons}, 0, 0660, "consctl", {Qconsctl}, 0, 0220, "cputime", {Qcputime}, 6*NUMSIZE, 0444, - "drivers", {Qdrivers}, 0, 0444, + "drivers", {Qdrivers}, 0, 0666, "hostdomain", {Qhostdomain}, DOMLEN, 0664, "hostowner", {Qhostowner}, 0, 0664, "kmesg", {Qkmesg}, 0, 0440, @@ -583,9 +595,15 @@ case Qdrivers: b = smalloc(READSTR); k = 0; - for(i = 0; devtab[i] != nil; i++) + + rlock(&up->pgrp->ns); + for(i = 0; devtab[i] != nil; i++){ + if(up->pgrp->notallowed[i/(sizeof(u64int)*8)] & 1<dc, devtab[i]->name); + } + runlock(&up->pgrp->ns); if(waserror()){ free(b); nexterror(); @@ -622,7 +640,7 @@ long l, bp; char *a; Mach *mp; - int id; + int id, i, invert; ulong offset; Cmdbuf *cb; Cmdtab *ct; @@ -674,6 +692,32 @@ case Qconfig: error(Eperm); + break; + + case Qdrivers: + cb = parsecmd(a, n); + + if(waserror()) { + free(cb); + nexterror(); + } + ct = lookupcmd(cb, drivermsg, nelem(drivermsg)); + invert = 0; + switch(ct->index) { + case CMeject: + invert = 0; + break; + case CMpermit: + invert = 1; + break; + default: + error(Ebadarg); + break; + } + for(i = 1; i < cb->nf; i++) + deveject(up->pgrp, invert, cb->f[i]); + poperror(); + free(cb); break; case Qreboot: --- a/sys/src/9/port/devroot.c +++ b/sys/src/9/port/devroot.c @@ -105,6 +105,7 @@ addrootdir("net"); addrootdir("net.alt"); addrootdir("proc"); + addrootdir("rc"); addrootdir("root"); addrootdir("srv"); addrootdir("shr"); --- a/sys/src/9/port/devshr.c +++ b/sys/src/9/port/devshr.c @@ -464,7 +464,7 @@ cclose(c); return nc; case Qcroot: - if(up->pgrp->noattach) + if(!canmount(up->pgrp)) error(Enoattach); if((perm & DMDIR) == 0 || mode != OREAD) error(Eperm); @@ -498,7 +498,7 @@ sch->shr = shr; break; case Qcshr: - if(up->pgrp->noattach) + if(!canmount(up->pgrp)) error(Enoattach); if((perm & DMDIR) != 0 || mode != OWRITE) error(Eperm); @@ -731,7 +731,7 @@ Mhead *h; Mount *m; - if(up->pgrp->noattach) + if(!canmount(up->pgrp)) error(Enoattach); sch = tosch(c); if(sch->level != Qcmpt) --- a/sys/src/9/port/mkdevc +++ b/sys/src/9/port/mkdevc @@ -78,6 +78,9 @@ if(ARGC < 2) exit "usage" + if(ndev >= 256) + exit "device count will overflow Pgrp.notallowed" + printf "#include \"u.h\"\n"; printf "#include \"../port/lib.h\"\n"; printf "#include \"mem.h\"\n"; --- a/sys/src/9/port/portdat.h +++ b/sys/src/9/port/portdat.h @@ -121,6 +121,7 @@ Amount, /* to be mounted or mounted upon */ Acreate, /* is to be created */ Aremove, /* will be removed by caller */ + Aunmount, /* unmount arg[0] */ COPEN = 0x0001, /* for i/o */ CMSG = 0x0002, /* the message channel for a mount */ @@ -484,7 +485,7 @@ { Ref; RWlock ns; /* Namespace n read/one write lock */ - int noattach; + u64int notallowed[4]; /* Room for 256 devices */ Mhead *mnthash[MNTHASH]; }; --- a/sys/src/9/port/portfns.h +++ b/sys/src/9/port/portfns.h @@ -413,6 +413,9 @@ ushort nhgets(void*); ulong µs(void); long lcycles(void); +void deveject(Pgrp*,int,char*); +int devallowed(Pgrp*, int); +int canmount(Pgrp*); #pragma varargck argpos iprint 1 #pragma varargck argpos panic 1 --- a/sys/src/9/port/sysfile.c +++ b/sys/src/9/port/sysfile.c @@ -1048,7 +1048,7 @@ nexterror(); } - if(up->pgrp->noattach) + if(!canmount(up->pgrp)) error(Enoattach); ac = nil; @@ -1160,14 +1160,8 @@ nexterror(); } if(name != nil) { - /* - * This has to be namec(..., Aopen, ...) because - * if arg[0] is something like /srv/cs or /fd/0, - * opening it is the only way to get at the real - * Chan underneath. - */ validaddr((uintptr)name, 1, 0); - cmounted = namec(name, Aopen, OREAD, 0); + cmounted = namec(name, Aunmount, OREAD, 0); } cunmount(cmount, cmounted); poperror(); --- a/sys/src/9/port/sysproc.c +++ b/sys/src/9/port/sysproc.c @@ -34,6 +34,7 @@ Egrp *oeg; ulong pid, flag; Mach *wm; + char *devs; flag = va_arg(list, ulong); /* Check flags before we commit */ @@ -44,6 +45,11 @@ if((flag & (RFENVG|RFCENVG)) == (RFENVG|RFCENVG)) error(Ebadarg); + /* + * Code using RFNOMNT expects to block all but + * the following devices. + */ + devs = "|decp"; if((flag&RFPROC) == 0) { if(flag & (RFMEM|RFNOWAIT)) error(Ebadarg); @@ -60,12 +66,12 @@ up->pgrp = newpgrp(); if(flag & RFNAMEG) pgrpcpy(up->pgrp, opg); - /* inherit noattach */ - up->pgrp->noattach = opg->noattach; + /* inherit notallowed */ + memmove(up->pgrp->notallowed, opg->notallowed, sizeof up->pgrp->notallowed); closepgrp(opg); } if(flag & RFNOMNT) - up->pgrp->noattach = 1; + deveject(up->pgrp, 1, devs); if(flag & RFREND) { org = up->rgrp; up->rgrp = newrgrp(); @@ -177,8 +183,8 @@ p->pgrp = newpgrp(); if(flag & RFNAMEG) pgrpcpy(p->pgrp, up->pgrp); - /* inherit noattach */ - p->pgrp->noattach = up->pgrp->noattach; + /* inherit notallowed */ + memmove(p->pgrp->notallowed, up->pgrp->notallowed, sizeof p->pgrp->notallowed); } else { p->pgrp = up->pgrp; @@ -185,7 +191,7 @@ incref(p->pgrp); } if(flag & RFNOMNT) - p->pgrp->noattach = 1; + deveject(p->pgrp, 1, devs); if(flag & RFREND) p->rgrp = newrgrp(); --- a/sys/src/cmd/aux/listen.c +++ b/sys/src/cmd/aux/listen.c @@ -136,6 +136,7 @@ { int ctl, pid, start; char dir[40], err[128], ds[128]; + char prog[Maxpath], serv[Maxserv], ns[Maxpath]; long childs; Announce *a; Waitmsg *wm; @@ -178,6 +179,10 @@ sleep((pid*10)%200); snprint(ds, sizeof ds, "%s!%s!%s", protodir, addr, a->a); + snprint(serv, sizeof serv, "%s%s", proto, a->a); + snprint(prog, sizeof prog, "%s/%s", srvdir, serv); + snprint(ns, sizeof ns, "%s.namespace", prog); + whined = a->whined; /* a process per service */ @@ -201,7 +206,11 @@ else exits("ctl"); } - dolisten(dir, ctl, srvdir, a->a, &childs); + procsetname("%s %s", dir, ds); + if(!trusted) + if(newns("none", ns) < 0) + syslog(0, listenlog, "can't build namespace %s: %r\n", ns); + dolisten(dir, ctl, serv, prog, &childs); close(ctl); } default: @@ -299,6 +308,8 @@ continue; if(strncmp(nm, proto, nlen) != 0) continue; + if(strstr(nm + nlen, ".namespace") != nil) + continue; addannounce(nm + nlen); } free(db); @@ -329,15 +340,10 @@ } void -dolisten(char *dir, int ctl, char *srvdir, char *port, long *pchilds) +dolisten(char *dir, int ctl, char *serv, char *prog, long *pchilds) { char ndir[40], wbuf[64]; - char prog[Maxpath], serv[Maxserv]; int nctl, data, wfd, nowait; - - procsetname("%s %s!%s!%s", dir, proto, addr, port); - snprint(serv, sizeof serv, "%s%s", proto, port); - snprint(prog, sizeof prog, "%s/%s", srvdir, serv); wfd = -1; nowait = RFNOWAIT; --- a/sys/src/libauth/newns.c +++ b/sys/src/libauth/newns.c @@ -14,8 +14,8 @@ static int setenv(char*, char*); static char *expandarg(char*, char*); static int splitargs(char*, char*[], char*, int); -static int nsfile(char*, Biobuf *, AuthRpc *); -static int nsop(char*, int, char*[], AuthRpc*); +static int nsfile(char*, Biobuf *, AuthRpc *, int); +static int nsop(char*, int, char*[], AuthRpc*, int); static int catch(void*, char*); int newnsdebug; @@ -35,7 +35,7 @@ { Biobuf *b; char home[4*ANAMELEN]; - int afd, cdroot; + int afd, cdroot, dfd; char *path; AuthRpc *rpc; @@ -51,6 +51,10 @@ } /* rpc != nil iff afd >= 0 */ + dfd = open("#c/drivers", OWRITE|OCEXEC); + if(dfd < 0 && newnsdebug) + fprint(2, "open #c/drivers: %r\n"); + if(file == nil){ if(!newns){ werrstr("no namespace file specified"); @@ -70,7 +74,8 @@ setenv("home", home); } - cdroot = nsfile(newns ? "newns" : "addns", b, rpc); + cdroot = nsfile(newns ? "newns" : "addns", b, rpc, dfd); + close(dfd); Bterm(b); freecloserpc(rpc); @@ -87,7 +92,7 @@ } static int -nsfile(char *fn, Biobuf *b, AuthRpc *rpc) +nsfile(char *fn, Biobuf *b, AuthRpc *rpc, int dfd) { int argc; char *cmd, *argv[NARG+1], argbuf[MAXARG*NARG]; @@ -103,7 +108,7 @@ continue; argc = splitargs(cmd, argv, argbuf, NARG); if(argc) - cdroot |= nsop(fn, argc, argv, rpc); + cdroot |= nsop(fn, argc, argv, rpc, dfd); } atnotify(catch, 0); return cdroot; @@ -143,7 +148,7 @@ } static int -nsop(char *fn, int argc, char *argv[], AuthRpc *rpc) +nsop(char *fn, int argc, char *argv[], AuthRpc *rpc, int dfd) { char *argv0; ulong flags; @@ -181,7 +186,7 @@ b = Bopen(argv[0], OREAD|OCEXEC); if(b == nil) return 0; - cdroot |= nsfile(fn, b, rpc); + cdroot |= nsfile(fn, b, rpc, dfd); Bterm(b); }else if(strcmp(argv0, "clear") == 0 && argc == 0){ rfork(RFCNAMEG); @@ -212,6 +217,14 @@ }else if(strcmp(argv0, "cd") == 0 && argc == 1){ if(chdir(argv[0]) == 0 && *argv[0] == '/') cdroot = 1; + }else if(argc >= 1 && (strcmp(argv0, "permit") == 0 || strcmp(argv0, "eject") == 0)){ + //We should not silently fail if we can not honor a permit/eject + //due to the parent namespace missing #c/drivers. + if(dfd <= 0) + sysfatal("%s requested, but could not open #c/drivers", argv0); + for(i=0; i < argc; i++) + if(fprint(dfd, "%s %s\n", argv0, argv[i]) < 0 && newnsdebug) + fprint(2, "%s: %s %s %r\n", fn, argv0, argv[i]); } return cdroot; }