From mboxrd@z Thu Jan 1 00:00:00 1970 From: steffen at sdaoden.eu (Steffen Nurpmeso) Date: Tue, 22 Sep 2020 16:54:41 +0200 Subject: [COFF] A little networking tool to reduce having to run emulators with privilege In-Reply-To: <1dbc110c-8844-040d-a08d-07914094b47f@spamtrap.tnetconsulting.net> References: <23BB3E13-7306-4BB6-9566-DF4C61DE9799@gmail.com> <20200921213834.KaMCt%steffen@sdaoden.eu> <1dbc110c-8844-040d-a08d-07914094b47f@spamtrap.tnetconsulting.net> Message-ID: <20200922145441.zQkCA%steffen@sdaoden.eu> Grant Taylor wrote in <1dbc110c-8844-040d-a08d-07914094b47f at spamtrap.tnetconsulting.net>: |On 9/21/20 3:38 PM, Steffen Nurpmeso wrote: |> Bridges usually do not work with wireless interfaces, it need some |> v?eth. | |Is it the bridge that's the problem or is it the wireless interface |that's the problem? | |My understanding is that some (many?) wireless interfaces are funky |regarding multiple MAC addresses. Mostly in that they don't work with |extra MAC addresses. My understanding also is about MAC changing, but it seems there are drivers which can do something about it. Not my area, sorry. |What would you do with veth interfaces in this context? This is exactly the nice thing about the approach i use, the one side of the veth pair that is not in the namespace simply plugs into whatever network environment there is on the host, as long as the host as a route to it. The other end, jailed in the namespace, can regulary be used by a bridge device. And is constant and self-sufficient. I use fixed addresses, for example. (Parsed from a hosts.txt that is also read in by dnsmasq(8) so that the host and the VMs find each other. All i have to adjust is the hosts.txt. Having said that, i can improve this by deriving the MAC address from that file, too.) |> And those br* tools are not everywhere, too (grr). | |I've been able to use ip to do much of what I used to do with brctl. | | ip link add bri0 type bridge | ip link set eth1 master bri0 | |(from memory) Yes. It is just that when you search the internet for Linux and bridges you will find mostly brctl or systemd things. (Generally spoken the amount of let me say whisked crap is near hundred percent in fact.) |> Have you ever considered network namespaces? | |What would network namespaces provide in this context? | |Would you run the $EMULATOR in the network namespace (~container)? Yes. |What does that get you that running the $EMULATOR in the main / root / |unnamed network namespace does not get you? -- If anything, I'd think |that running the $EMULATOR in the network namespace would add additional |networking complexity. You have seen all the configuration there is. It is isolated, it is not affected by the firewall rules of the host, the firewall rules of the host do not take care of this thing at all, attack surface is thus only kernel bugs, i thing, and anything on the inside can be hardwired. No worries. |Don't get me wrong, I'm all for network namespaces and doing fun(ky) |things with them. I've emulated entire corporate networks with network |namespaces. I've currently got nine of them on the system I'm replying |from, with dynamic routing. No worries. |> After over a year of using proxy_arp based pseudo bridging (cool!) | |I'll argue that Proxy ARP /is/ a form of routing. ;-) Your system |replies to ARP requests for the IP(s) behind it and the packets are sent |to it as if it's a router. }:-) Yes it is cool. The "Linux Advanced Routing & Traffic Control HOWTO" (twenty years ago such great things were written and said "Welcome, gentle reader", unfortunately that all stopped when the billions came over Linux i think, but, of course, today the manual pages are great) says /proc/sys/net/ipv4/conf/DEV/proxy_arp If you set this to 1, this interface will respond to ARP requests for addresses the kernel has routes to. Can be very useful when building 'ip pseudo bridges'. Do take care that your netmasks are very correct before enabling this! Also be aware that the rp_filter, mentioned elsewhere, also operates on ARP queries. |> i finally wrapped my head around veth, | |I find veth to be quite helpful. {MAC,IP}{VLAN,VTAP} are also fun. | |Aside: {MAC,IP}{VLAN,VTAP} is slightly more difficult to get working. |In my experience, {MAC,IP}{VLAN,VTAP} don't support talking to the host |directly, instead you need to create an additional {MAC,IP}{VLAN,VTAP} |and have the hose use it as if it is it's own guest. | |> with it and Linux network namespaces i loose 40 percent ping response |> speed, but have a drastically reduced need for configuration. | |I've never noticed any sort of increased latency worth mentioning. Compared to proxy_arp it was 40 percent here. However, this was with kernel 4.19 and the network driver (R8822BE) was in staging, now with 5.8 it is not (RTW88) but terribly broken or buggy or whatever. It must be said this driver seems to be very complicated, the R8822BE had a megabyte of code infrastructure iirc. But it is terrible, and now i know that audio via bluetooth and wlan throughput are highly dependent. (Or can.) |> What i have is this, maybe you find it useful. It does not need |> any firewall rules. (Except allowing 10.0.0.0/8.) |> |> In my net-qos.sh (which is my shared-everywhere firewall and tc |> script) |> |> vm_ns_start() { |> #net.ipv4.conf.all.arp_ignore=0 |> sysctl -w \ |> net.ipv4.ip_forward=1 |> |> ${ip} link add v_n type veth peer name v_i |> ${ip} netns add v_ns |> ${ip} link set v_i netns v_ns | |If you create the netns first, then you can have the veth interface |created and moved into the network namespace in one command. | | ${ip} link add v_n type veth peer name v_i netns v_ns | |Note: This does create the v_i veth interface in the network namespace |that you're running the command in and then automatically move it for you. I did not know that, will try it out. |> ${ip} a add 10.0.0.1/8 dev v_n |> ${ip} link set v_n up |> ${ip} route add 10.0.0.1 dev v_n | |Why are you adding a (host) route to 10.0.0.1 when it's part of |10.0.0.0/8 which is going out the same interface? I assign an address to the interface, and make that interface routable from the host. |> ${ip} netns exec v_ns ${ip} link set lo up |> #if [ -z "$BR" ]; then |> # ${ip} netns exec v_ns ip addr add 10.1.0.1/8 dev v_i broadcast \ |> + |> # ${ip} netns exec v_ns ip link set v_i up |> # ${ip} netns exec v_ns ip route add default via 10.0.0.1 |> #else |> ${ip} netns exec v_ns ${ip} link set v_i up |> ${ip} netns exec v_ns ${ip} link add v_br type bridge | |Why are you adding a bridge inside the v_ns network namespace? This is where all the VMs plug into. |> ${ip} netns exec v_ns ${ip} addr add 10.1.0.1/8 dev v_br \ |> broadcast + |> ${ip} netns exec v_ns ${ip} link set v_br up |> ${ip} netns exec v_ns ${ip} link set v_i master v_br | |Why are you adding the v_i interface to the bridge /inside/ the network |namespace? That is the other side of the veth interface pair of course. |> ${ip} netns exec v_ns ${ip} route add default via 10.0.0.1 |> #fi |>} | |What does creating a bridge with a single interface /inside/ of the |network namespace get you? | |I would have assumed that you were creating the bridge outside the |network namespace and adding the network namespace's outside veth to |said bridge. No. The purpose is to be able to create a network of any number of VMs somewhere, so those go via the bridge, no? This network is self-sufficient ANY HOST HOWEVER ONLINE <---> VETH -|- VETH IN NAMESPACE ^ | ANY NUMBER OF VMS <-> BRIDGE < |> vm_ns_stop() { |> ${ip} netns del v_ns |> |> ^ That easy it is! | |Yep. I've done a LOT of things like that. Though I have the bridge |outside. Why would you want to do this? I do not understand. |> #net.ipv4.conf.all.arp_ignore=1 | |What was (historically, since it's commented out) the purpose for |setting arp_ignore to 1. Yeah, this is a leftover from the proxy_arp based pseudo-bridge. Not to forget it, maybe. I should have removed it before posting. I am not a network expert ok, especially not so Linux-specific. |> sysctl -w \ |> net.ipv4.ip_forward=0 |>} |> |> And then, in my /x/vm directory the qemu .ifup.sh script |> |> #!/bin/sh - |> |> if [ "$VMNETMODE" = bridge ]; then |> ip link set dev $1 master v_br | |This is more what I would expect. This is just the startup of a VM, it registers at the bridge. |> ip link set $1 up |> elif [ "$VMNETMODE" = proxy_arp ]; then |> echo 1 > /proc/sys/net/ipv4/conf/$1/proxy_arp |> ip link set $1 up |> ip route add $VMADDR dev $1 | |I guess the route is because you're using Proxy ARP. I am _not_ using proxy_arp no more. This is a leftover, at the beginning i made it configurable and could switch in between the different approaches via a setting in /x/vm/.run.sh. I should have removed it before posting. |That makes me ask, is the 10.0.0.0/8 network also used on the outside |home LAN? That one is isolated, i can reach it from the host, and they can talk to the host, each other and the internet. I never placed servers addressable from the outside in such a thing, this is only a laptop without fixed address. I guess in order to allow a public accessible server to live in the namespace i would maybe need an ipfilter rule on the host. |> else |> echo >&2 Unknown VMNETMODE=$VMNETMODE |> fi | |;-) | |> Of course qemu creates the actual device for me here. |> The .ifdown.sh script i omit, it is not used in this "vbridge" |> mode. It would do nothing really, and it cannot be called because |> i now can chroot into /x/vm (needs dev/u?random due to libcrypt |> needing it though it would not need them, but i cannot help it). | |You can bind mount /dev into the chroot. That way you could chroot in. |Much like Gentoo does during installation. Yes --bind mounting is cool also. But i definetely do not want to give the VM an entire /dev, it only needs u?random and that _only_ because libcrypt (linked into qemu) needs it, even though it is not actually used (the Linux getrandom system call is used instead). |> This then gets driven by a .run.sh script (which is called by the |> real per-VM scripts, like |> |> #!/bin/sh - |> # root.alp-2020, steffen: Sway |> |> debug= |> vmsys=x86_64 |> vmname=alp-2020 |> vmimg=.alp-2020-amd64.vmdk |> vmpower=half |> vmmac=52:54:45:01:00:12 |> vmcustom= #'-boot menu=on -cdrom /x/iso/alpine-virt-3.12.0-x86_64.iso' |> |> . /x/vm/.run.sh |> # s-sh-mode |> |> so, and finally invokes qemu like so |> |> echo 'Monitor at '$0' monitor' |> eval exec $sudo /bin/ip netns exec v_ns /usr/bin/qemu-system-$vmsys \ |> -name $VMNAME $runas $chroot \ |> $host $accel $vmdisp $net $usb $vmrng $vmcustom \ |> -monitor telnet:127.0.0.1:$monport,server,nowait \ |> -drive file=$vmimg,index=0,if=ide$drivecache \ |> $redir |> |> Users in the vm group may use that sudo, qemu is executed in the |> v_ns network namespace under runas='-runas vm' and jailed via |> chroot='-chroot .'. It surely could be more sophisticated, more |> cgroups, whatever. Good enough for me. | |:-) | |Have you spent any time looking at unshare and / or nsenter? | | # Spawn the lab# NetNSs and set it's hostname. | unshare --mount=/run/mountns/${1} --net=/run/netns/${1} |--uts=/run/utsns/${1} /bin/hostname ${1} | # Bring up the loopback interface. | nsenter --mount=/run/mountns/${1} --net=/run/netns/${1} |--uts=/run/utsns/${1} /bin/ip link set dev lo up | |I use the mount, net, and uts namespaces. RTFM for more details on |different combinations of namespaces. Yes. But no, i do not really need it, i use it only for qemu instances. Interesting would be imposing hard CPU and memory restrictions, especially if i would use this approach for creating servers, too. This is a future task, however. |> That .run.sh does enter |> |> if [ "$1" = monitor ]; then |> echo 'Entering monitor of '$VMNAME' ('$VMADDR') at '$monport |> eval exec $sudo /bin/ip netns exec v_ns telnet localhost $monport |> exit 5 |> |> and enters via ssh |> |> elif [ "$1" = ssh ]; then |> echo 'SSH into '$VMNAME' ('$VMADDR')' |> doex=exec |> if command -v tmux >/dev/null 2>&1 && [ -n "$TMUX_PANE" ]; then |> tmux set window-active-style bg=colour231,fg=colour0 |> doex= |> fi |> ( eval $doex ssh $VMADDR ) |> exec tmux set window-active-style bg=default,fg=default |> exit 5 |> |> for me. (I use VMs in Donald Knuth emacs colour scheme it seems, |> at least more or less. VMs here, VM there. Hm.) | |... VM everywhere. | |But ... are they /really/ VMs? | |You're running an /emulator/* in a (home grown) /container/. | |}:-) | |*Okay. QEMU can be more of a VM than an emulator, depending on command |line options. | |> Overall this network namespace thing is pretty cool. Especially |> since, compared to FreeBSD jails, for example, you simply can run |> a single command. Unfair comparison though. WHat i'd really |> wish would be a system which is totally embedded in that |> namespace/jail idea. I.e., _one_ /, and then only moving targets |> mounted via overlayfs into "per-jail" directories. Never found |> time nor motivation to truly try this out. | |I'm not that familiar with jails. But I'm convinced that there are some |possibilities with namespaces (containers) that may come close. | |Network namespaces (ip netns ...) don't alter the mount namespace. |That's one of the advantages of unshare / nsenter. You can create new |namespace (container) specific mount configurations. Yeah like i said, i could impose more restrictions on programs running inside that namespace, qemu also shows some security flaws at times, but this is really just for testing purposes etc. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)