From mboxrd@z Thu Jan 1 00:00:00 1970 From: steffen at sdaoden.eu (Steffen Nurpmeso) Date: Tue, 22 Sep 2020 23:53:09 +0200 Subject: [COFF] A little networking tool to reduce having to run emulators with privilege In-Reply-To: <0f0083d2-1fc6-87dd-3001-6fa7e3752c79@tnetconsulting.net> References: <23BB3E13-7306-4BB6-9566-DF4C61DE9799@gmail.com> <20200921213834.KaMCt%steffen@sdaoden.eu> <1dbc110c-8844-040d-a08d-07914094b47f@spamtrap.tnetconsulting.net> <20200922145441.zQkCA%steffen@sdaoden.eu> <0f0083d2-1fc6-87dd-3001-6fa7e3752c79@tnetconsulting.net> Message-ID: <20200922215309.a4ey9%steffen@sdaoden.eu> Hello. Grant Taylor wrote in <0f0083d2-1fc6-87dd-3001-6fa7e3752c79 at tnetconsulting.net>: |On 9/22/20 8:54 AM, Steffen Nurpmeso wrote: |> My understanding also is about MAC changing, but it seems there are |> drivers which can do something about it. Not my area, sorry. | |That matches my understanding. And that is fuzzy on my side, the best i have heard was on some FreeBSD not too long ago, where the same problem exists. ..Yes. This is exactly the message i remembered: https://marc.info/?l=freebsd-current&m=155924713003353&w=2 (The relevant part is >>> I think there's a (unknown?) problem that makes lagg(4) incompatible with >>> bridge(4). I've never been unable to make a lagg interface work as a member >> of >>> a bridge. Lacking the time to pursue it, I've resorted to NATing instead. >>> >>> Also, wlan interfaces tend to break if you change their MAC address. So in >> a >>> lagg consisting of a wlan interface and a ethernet interface (without a >>> bridge), I always set the MAC of the ethernet to match the native MAC of th >> e >>> wlan, and not vice versa. ) Not much. |> This is exactly the nice thing about the approach i use, the one |> side of the veth pair that is not in the namespace simply plugs into |> whatever network environment there is on the host, as long as the |> host as a route to it. | |Sorry, I was asking for more clarification on what you do with the host |end of the veth to connect it to the rest of your environment. Nothing. I only set the route. |> The other end, jailed in the namespace, can regulary be used by a |> bridge device. | |Yes. | |I'm wondering why you are attaching the NetNS end of the veth to a |bridge instead of just using the veth directly. Ha. That could very well be because i was desperately trying to create a bridge all the time, but it just was not working at all. So i turned to proxy_arp "pseudo-bridging", with just having routes and the TAP devices for the VMs, nothing more. A wild chaos universe thus. You know, to me this is just a programmatic problem, i just do not understand. Why does it matter whether you have eth0 or wlp1s0? I can go into the internet with both, why can i create a bridge on one but not the other? That does not make sense. Just do it? It works when i inject a VETH pair, so may it be like this. Having the bridge now makes things easy, i just make it the master of anything which is plugged into it. Yes, so easy are things if you never programmed a network hardware driver! |> And is constant and self-sufficient. I use fixed addresses, for |> example. (Parsed from a hosts.txt that is also read in by dnsmasq(8) |> so that the host and the VMs find each other. All i have to adjust |> is the hosts.txt. Having said that, i can improve this by deriving |> the MAC address from that file, too.) | |Sure. | |I'm about 98% certain that all of that applies equally as well to the |veth interface inside of the network namespace as it does to the bridge |inside of the network namespace. | |So ... why use a bridge inside of the network namespace? | |I completely get why you use a bridge on the host and the veth interface I do not use a bridge on the host. This i cannot do. |outside of the network namespace. I just don't understand why you are |using a bridge /inside/ the network namespace. But it makes things so easy now. You have seen the scripts, and on the host i see #?0|kent:tmp$ ip rou default via 192.168.0.1 dev wlp1s0 proto dhcp src 192.168.0.153 metric 306 10.0.0.0/8 dev v_n proto kernel scope link src 10.0.0.1 10.0.0.1 dev v_n scope link 192.168.0.0/24 dev wlp1s0 proto dhcp scope link src 192.168.0.153 metric 306 #?0|kent:tmp$ ip a ... 6: wlp1s0: ... inet 192.168.0.153/24 brd 192.168.0.255 scope global dynamic noprefixroute wlp1s0 valid_lft 69766sec preferred_lft 58966sec 8: v_n at if7: ... inet 10.0.0.1/8 scope global v_n valid_lft forever preferred_lft forever and in the namespace #?0|kent:~# ip netns exec v_ns ip rou default via 10.0.0.1 dev v_br 10.0.0.0/8 dev v_br proto kernel scope link src 10.1.0.1 #?0|kent:~# ip netns exec v_ns ip a ... 6: v_br: .. inet 10.1.0.1/8 brd 10.255.255.255 scope global v_br valid_lft forever preferred_lft forever inet6 fe80::dc81:96ff:fe0b:a229/64 scope link valid_lft forever preferred_lft forever 7: v_i at if8: .. link/ether fe:7f:36:b0:04:97 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::fc7f:36ff:feb0:497/64 scope link valid_lft forever preferred_lft forever And i started a VM for you 8: vm_ulinux-010204: .. link/ether ce:4a:41:5c:d6:61 brd ff:ff:ff:ff:ff:ff inet6 fe80::cc4a:41ff:fe5c:d661/64 scope link valid_lft forever preferred_lft forever No address no route setup, and in the machine $ cat default/net TYPE=static DEV=eth0 ADDR=10.0.1.15 MASK=8 GW=10.0.0.1 Except for using a DHCP server in the namespace this is as short as it can get. Of course if i would use that it would be even less work to do when setting up a VM. And even more flexible and automatic. But, you know, this is so overkill given that i use this only for testing, and dhcpcd is now privilege-separated: #?0|kent:src$ pla|grep dhcpc dhcpcd 509 1 S 0.0 1748 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0 root 510 509 S 0.0 2396 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0 dhcpcd 511 509 S 0.0 268 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0 dhcpcd 512 509 S 0.0 268 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0 dhcpcd 7619 510 S 0.0 376 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0 And this twice all week long for in practice nothing? Ah, no. And what if i move along, i have to have dhcpcd, or configure dnsmasq to serve it for the namespace. And so i can use a simple hosts.txt and have dnsmasq integrate it in its normal DNS service, this would not be that easy if it would be dynamic, i had to look. Anyhow, no network setup at all on in the namespace, i have the hosts.txt that i need anyway, and i configure the machine once i install it. Done. |> Yes. It is just that when you search the internet for Linux and |> bridges you will find mostly brctl or systemd things. (Generally |> spoken the amount of let me say whisked crap is near hundred percent |> in fact.) | |The other thing that I find is how to configure bridging in distro init |scripts. It is terrible. Nothing HOWTO like, not "help the people to help themselves", but everybody who understood a topic by himself is quick in fooling others. This makes the FreeBSD handbook and the BSDs and their manual portfolio in general outstanding. |> Yes. |> |> You have seen all the configuration there is. It is isolated, it is |> not affected by the firewall rules of the host, the firewall rules of |> the host do not take care of this thing at all, attack surface is thus |> only kernel bugs, i thing, and anything on the inside can be hardwired. |> No worries. | |Depending on how you are connecting the host side veth to the network, |there is a very real chance that the host firewall will influence what |goes into / comes out of the emulator in the network namespace |(~container). Particularly if you are routing. Less so if you are |bridging. But bridging can still be effected by the firewall. Don't confuse me please, i am .. not a network expert. Surely you could do use firewall stuff, but i do not. At least nothing special, very restrictive indeed, but 10.0.0.0 is set free early, and other... Ah yes, i have forgotten this line: if [ -n "$VM_NS" ]; then ${iptables} -t nat -A POSTROUTING -o ${what} -j MASQUERADE #echo 1 > /proc/sys/net/ipv4/conf/${what}/proxy_arp fi Here $what is the device, wlp1s0 for example. True. Of course. The comment is a leftover. |> Yes it is cool. The "Linux Advanced Routing & Traffic Control HOWTO" |> (twenty years ago such great things were written and said "Welcome, |> gentle reader", unfortunately that all stopped when the billions |> came over Linux i think, but, of course, today the manual pages are |> great) says | |Ya. The Linux Documentation Project and their How-To's were (arguably |still is) great. It's not /quite/ timeless. But much of the stuff |there is still viable. Some of it is woefully out of date though. Unfortunately yes. ... |> Compared to proxy_arp it was 40 percent here. However, this was |> with kernel 4.19 and the network driver (R8822BE) was in staging, now |> with 5.8 it is not (RTW88) but terribly broken or buggy or whatever. |> It must be said this driver seems to be very complicated, the R8822BE |> had a megabyte of code infrastructure iirc. But it is terrible, |> and now i know that audio via bluetooth and wlan throughput are |> highly dependent. (Or can.) | |Sounds like you've got some issues that I typically don't run into with |more traditional RTL 8129 / 8139 / 8169 drivers. Actually yes. In fact i have booted into 4.19 again today because RTW88 is totally unusable here, at least after the first suspend/resume and even though they added the rtw88_pci.disable_aspm=1 kernel command line switch to work around power management problems. In fact the driver messed the hardware so much that Linux was no longer capable to access it, even booting 4.19 and using R8822BE thus did not do it. By sheer luck the friendly salesman gave me a 512 GB NVME SSD for the price of a 256 GB last year, so i kept the maximally minimized 30 GB Windows partition just for win (imagine that: 30 GB of space .. wasted!), and booted into it, because of despair! Of course i had forgotten my password (i just wanted to log into Windows to see how it looks once i bought the laptop, my last Windows login before that was Windows 95 B), but on the welcome screen you could select the network, and once that dialog wanted the password of the network (!) i rebooted into Linux 4.19 .. and was alive again. Sheer luck, dammit, otherwise i would have been just dead. Terrible! This at least seems to be avoidable by using the above command line switch. Today it could be accessed after reboot. |> I did not know that, will try it out. | |I figured that you would appreciate it. | |> I assign an address to the interface, and make that interface routable |> from the host. | |The fact that the prefix is on a directly attached network should be |sufficient to make it routable to the host. | |Unless you are also using the same 10.0.0.0/8 on the other |{wired,wireless} network that your system is connected to. No. |> This is where all the VMs plug into. | |I disagree. | |The VMs plug into the host bridge. | |I'm asking about why you have a bridge /inside/ of each of the VMs. But i do not? |> That is the other side of the veth interface pair of course. | |Yes. | |But you can use the v_i interface /directly/. I'm not seeing any /need/ |for the bridge /inside/ the network namespace. Yes i understood that. I would need to assign addresses to the VMs once the VM starts, whereas now i only do that inside the VM. Hm?. |> No. The purpose is to be able to create a network of any number |> of VMs somewhere, so those go via the bridge, no? This network is |> self-sufficient |> |> ANY HOST HOWEVER ONLINE <---> VETH -|- VETH IN NAMESPACE |> ^ |> ANY NUMBER OF VMS <-> BRIDGE < |> |> Why would you want to do this? I do not understand. | | +---------------------------------------+ ||host +------+ +-----------+| ||| +---v_ns1---+v_i v_ns1|| |||| +-----------+| |||| +-----------+| |(LAN)---+eth0---+ bri0 +---v_ns2---+v_i v_ns2|| ||||| +-----------+| ||||| +-----------+| |||| +---v_ns3---+v_i v_ns3|| ||| +------+ +-----------+| || +---------------------------------------+ || || +---------------+ |(LAN)---+eth0 notebook| | +---------------+ | |Each network namespace (v_ns#) has it's own vEth pair. The host side of |each vEth pair is connected to the bridge on the host. The bridge on |the host is connected to the host's eth0 interface. Thus, each of the |network namespaces have a layer 2 network connection to the LAN. |Meaning that each of the network namespaces are proper members of the |LAN. No routing is needed. No proxy ARP is needed. Notebook, host, |v_ns1, v_ns2, v_ns3 can all be on the same subnet without doing anything |fancy. There is only one network namespace here. One for all VMs. You cannot create bridge devices on wireless interfaces, unless you have a driver which does support that, or, i guess, you create your own host access point, i dimly recall this could be a solution too. |> Yeah, this is a leftover from the proxy_arp based pseudo-bridge. |> Not to forget it, maybe. I should have removed it before posting. |> I am not a network expert ok, especially not so Linux-specific. | |*nod* | |I was just trying to confirm that's historic. Seeing as how it's |commented out. | |Trying to deduce what, and more so why, can be non-trivial at times. | |> This is just the startup of a VM, it registers at the bridge. | |Yep. Adding the host end of the vEth pair to the host bridge. | |> I am _not_ using proxy_arp no more. This is a leftover, at the |> beginning i made it configurable and could switch in between the |> different approaches via a setting in /x/vm/.run.sh. I should have |> removed it before posting. | |It's cool. Methods, scripts there of, evolve. | |> That one is isolated, i can reach it from the host, and they can talk |> to the host, each other and the internet. I never placed servers |> addressable from the outside in such a thing, this is only a laptop |> without fixed address. I guess in order to allow a public accessible |> server to live in the namespace i would maybe need an ipfilter rule |> on the host. | |Or bridging, as depicted above. ;-) | |> Yes --bind mounting is cool also. But i definetely do not want to |> give the VM an entire /dev, it only needs u?random and that _only_ |> because libcrypt (linked into qemu) needs it, even though it is not |> actually used (the Linux getrandom system call is used instead). | |You can copy the /dev/urandom device, or make a new one, or bind mount |the device inside the network namespace. ;-) | |Even if it's not used for anything other than to make the kernel happy, |you are going to need it. No no, only libgcrypt... |> Yes. But no, i do not really need it, i use it only for qemu |> instances. Interesting would be imposing hard CPU and memory |> restrictions, especially if i would use this approach for creating |> servers, too. This is a future task, however. | |Now you're hedging on cgroups. Yes. That i have to do some time. |> Yeah like i said, i could impose more restrictions on programs running |> inside that namespace, qemu also shows some security flaws at times, |> but this is really just for testing purposes etc. | |IMHO /everything/ has security flaws at one point or another. It's just |a matter of when. Unfortunately true. The complexity of cgroups and the Linux kernel as such is however very, very much intensed compared to a FreeBSD jail. At least once jails appeared it often was nothing more than a "if(process->jailed)" at the beginning of some kernel functions. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)