[COFF] A little networking tool to reduce having to run emulators with privilege

Computer Old Farts Forum
 help / color / mirror / Atom feed

From: steffen at sdaoden.eu (Steffen Nurpmeso)
Subject: [COFF] A little networking tool to reduce having to run emulators with privilege
Date: Tue, 22 Sep 2020 23:53:09 +0200	[thread overview]
Message-ID: <20200922215309.a4ey9%steffen@sdaoden.eu> (raw)
In-Reply-To: <0f0083d2-1fc6-87dd-3001-6fa7e3752c79@tnetconsulting.net>

Hello.

Grant Taylor wrote in
 <0f0083d2-1fc6-87dd-3001-6fa7e3752c79 at tnetconsulting.net>:
 |On 9/22/20 8:54 AM, Steffen Nurpmeso wrote:
 |> My understanding also is about MAC changing, but it seems there are 
 |> drivers which can do something about it.  Not my area, sorry.
 |
 |That matches my understanding.

And that is fuzzy on my side, the best i have heard was on some
FreeBSD not too long ago, where the same problem exists.
..Yes.  This is exactly the message i remembered:

  https://marc.info/?l=freebsd-current&m=155924713003353&w=2

(The relevant part is

  >>> I think there's a (unknown?) problem that makes lagg(4) incompatible with 
  >>> bridge(4). I've never been unable to make a lagg interface work as a member
  >>  of 
  >>> a bridge. Lacking the time to pursue it, I've resorted to NATing instead.
  >>>
  >>> Also, wlan interfaces tend to break if you change their MAC address.  So in
  >>  a 
  >>> lagg consisting of a wlan interface and a ethernet interface (without a 
  >>> bridge), I always set the MAC of the ethernet to match the native MAC of th
  >> e 
  >>> wlan, and not vice versa.
)

Not much.

 |> This is exactly the nice thing about the approach i use, the one 
 |> side of the veth pair that is not in the namespace simply plugs into 
 |> whatever network environment there is on the host, as long as the 
 |> host as a route to it.
 |
 |Sorry, I was asking for more clarification on what you do with the host 
 |end of the veth to connect it to the rest of your environment.

Nothing.  I only set the route.

 |> The other end, jailed in the namespace, can regulary be used by a 
 |> bridge device.
 |
 |Yes.
 |
 |I'm wondering why you are attaching the NetNS end of the veth to a 
 |bridge instead of just using the veth directly.

Ha.  That could very well be because i was desperately trying to
create a bridge all the time, but it just was not working at all.
So i turned to proxy_arp "pseudo-bridging", with just having
routes and the TAP devices for the VMs, nothing more.  A wild
chaos universe thus.

You know, to me this is just a programmatic problem, i just do not
understand.  Why does it matter whether you have eth0 or wlp1s0?
I can go into the internet with both, why can i create a bridge on
one but not the other?  That does not make sense.  Just do it?  It
works when i inject a VETH pair, so may it be like this.
Having the bridge now makes things easy, i just make it the master
of anything which is plugged into it.
Yes, so easy are things if you never programmed a network hardware
driver!

 |> And is constant and self-sufficient.  I use fixed addresses, for 
 |> example.  (Parsed from a hosts.txt that is also read in by dnsmasq(8) 
 |> so that the host and the VMs find each other.  All i have to adjust 
 |> is the hosts.txt.  Having said that, i can improve this by deriving 
 |> the MAC address from that file, too.)
 |
 |Sure.
 |
 |I'm about 98% certain that all of that applies equally as well to the 
 |veth interface inside of the network namespace as it does to the bridge 
 |inside of the network namespace.
 |
 |So ... why use a bridge inside of the network namespace?
 |
 |I completely get why you use a bridge on the host and the veth interface 

I do not use a bridge on the host.  This i cannot do.

 |outside of the network namespace.  I just don't understand why you are 
 |using a bridge /inside/ the network namespace.

But it makes things so easy now.  You have seen the scripts, and
on the host i see

  #?0|kent:tmp$ ip rou
  default via 192.168.0.1 dev wlp1s0 proto dhcp src 192.168.0.153 metric 306
  10.0.0.0/8 dev v_n proto kernel scope link src 10.0.0.1
  10.0.0.1 dev v_n scope link
  192.168.0.0/24 dev wlp1s0 proto dhcp scope link src 192.168.0.153 metric 306
  #?0|kent:tmp$ ip a
  ...
  6: wlp1s0: ...
      inet 192.168.0.153/24 brd 192.168.0.255 scope global dynamic noprefixroute wlp1s0
         valid_lft 69766sec preferred_lft 58966sec
  8: v_n at if7: ...
      inet 10.0.0.1/8 scope global v_n
         valid_lft forever preferred_lft forever

and in the namespace

  #?0|kent:~# ip netns exec v_ns ip rou
  default via 10.0.0.1 dev v_br
  10.0.0.0/8 dev v_br proto kernel scope link src 10.1.0.1
  #?0|kent:~# ip netns exec v_ns ip a
  ...
  6: v_br: ..
      inet 10.1.0.1/8 brd 10.255.255.255 scope global v_br
         valid_lft forever preferred_lft forever
      inet6 fe80::dc81:96ff:fe0b:a229/64 scope link
         valid_lft forever preferred_lft forever
  7: v_i at if8: ..
      link/ether fe:7f:36:b0:04:97 brd ff:ff:ff:ff:ff:ff link-netnsid 0
      inet6 fe80::fc7f:36ff:feb0:497/64 scope link
         valid_lft forever preferred_lft forever

And i started a VM for you

  8: vm_ulinux-010204: ..
      link/ether ce:4a:41:5c:d6:61 brd ff:ff:ff:ff:ff:ff
      inet6 fe80::cc4a:41ff:fe5c:d661/64 scope link
       valid_lft forever preferred_lft forever

No address no route setup, and in the machine

  $ cat default/net
  TYPE=static
  DEV=eth0
  ADDR=10.0.1.15
  MASK=8
  GW=10.0.0.1

Except for using a DHCP server in the namespace this is as short
as it can get.  Of course if i would use that it would be even
less work to do when setting up a VM.  And even more flexible and
automatic.  But, you know, this is so overkill given that i use
this only for testing, and dhcpcd is now privilege-separated:

  #?0|kent:src$ pla|grep dhcpc
  dhcpcd     509     1 S     0.0  1748 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
  root       510   509 S     0.0  2396 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
  dhcpcd     511   509 S     0.0   268 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
  dhcpcd     512   509 S     0.0   268 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
  dhcpcd    7619   510 S     0.0   376 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0

And this twice all week long for in practice nothing?  Ah, no.
And what if i move along, i have to have dhcpcd, or configure
dnsmasq to serve it for the namespace.  And so i can use a simple
hosts.txt and have dnsmasq integrate it in its normal DNS service,
this would not be that easy if it would be dynamic, i had to look.

Anyhow, no network setup at all on in the namespace, i have the
hosts.txt that i need anyway, and i configure the machine once
i install it.  Done.

 |> Yes.  It is just that when you search the internet for Linux and 
 |> bridges you will find mostly brctl or systemd things.  (Generally 
 |> spoken the amount of let me say whisked crap is near hundred percent 
 |> in fact.)
 |
 |The other thing that I find is how to configure bridging in distro init 
 |scripts.

It is terrible.  Nothing HOWTO like, not "help the people to help
themselves", but everybody who understood a topic by himself is
quick in fooling others.  This makes the FreeBSD handbook and the
BSDs and their manual portfolio in general outstanding.

 |> Yes.
 |> 
 |> You have seen all the configuration there is.  It is isolated, it is 
 |> not affected by the firewall rules of the host, the firewall rules of 
 |> the host do not take care of this thing at all, attack surface is thus 
 |> only kernel bugs, i thing, and anything on the inside can be hardwired. 
 |> No worries.
 |
 |Depending on how you are connecting the host side veth to the network, 
 |there is a very real chance that the host firewall will influence what 
 |goes into / comes out of the emulator in the network namespace 
 |(~container).  Particularly if you are routing.  Less so if you are 
 |bridging.  But bridging can still be effected by the firewall.

Don't confuse me please, i am .. not a network expert.  Surely you
could do use firewall stuff, but i do not.  At least nothing
special, very restrictive indeed, but 10.0.0.0 is set free early,
and other...  Ah yes, i have forgotten this line:

  if [ -n "$VM_NS" ]; then
     ${iptables} -t nat -A POSTROUTING -o ${what} -j MASQUERADE
     #echo 1 > /proc/sys/net/ipv4/conf/${what}/proxy_arp
  fi

Here $what is the device, wlp1s0 for example.  True.
Of course.  The comment is a leftover.

 |> Yes it is cool.  The "Linux Advanced Routing & Traffic Control HOWTO" 
 |> (twenty years ago such great things were written and said "Welcome, 
 |> gentle reader", unfortunately that all stopped when the billions 
 |> came over Linux i think, but, of course, today the manual pages are 
 |> great) says
 |
 |Ya.  The Linux Documentation Project and their How-To's were (arguably 
 |still is) great.  It's not /quite/ timeless.  But much of the stuff 
 |there is still viable.  Some of it is woefully out of date though.

Unfortunately yes.

  ...
 |> Compared to proxy_arp it was 40 percent here.  However, this was 
 |> with kernel 4.19 and the network driver (R8822BE) was in staging, now 
 |> with 5.8 it is not (RTW88) but terribly broken or buggy or whatever. 
 |> It must be said this driver seems to be very complicated, the R8822BE 
 |> had a megabyte of code infrastructure iirc.  But it is terrible, 
 |> and now i know that audio via bluetooth and wlan throughput are 
 |> highly dependent.  (Or can.)
 |
 |Sounds like you've got some issues that I typically don't run into with 
 |more traditional RTL 8129 / 8139 / 8169 drivers.

Actually yes.  In fact i have booted into 4.19 again today because
RTW88 is totally unusable here, at least after the first
suspend/resume and even though they added the
rtw88_pci.disable_aspm=1 kernel command line switch to work around
power management problems.

In fact the driver messed the hardware so much that Linux was no
longer capable to access it, even booting 4.19 and using R8822BE
thus did not do it.  By sheer luck the friendly salesman gave me
a 512 GB NVME SSD for the price of a 256 GB last year, so i kept
the maximally minimized 30 GB Windows partition just for win
(imagine that: 30 GB of space .. wasted!), and booted into it,
because of despair!  Of course i had forgotten my password (i
just wanted to log into Windows to see how it looks once i bought
the laptop, my last Windows login before that was Windows 95 B),
but on the welcome screen you could select the network, and once
that dialog wanted the password of the network (!) i rebooted into
Linux 4.19 .. and was alive again.  Sheer luck, dammit, otherwise
i would have been just dead.  Terrible!

This at least seems to be avoidable by using the above command
line switch.  Today it could be accessed after reboot.

 |> I did not know that, will try it out.
 |
 |I figured that you would appreciate it.
 |
 |> I assign an address to the interface, and make that interface routable 
 |> from the host.
 |
 |The fact that the prefix is on a directly attached network should be 
 |sufficient to make it routable to the host.
 |
 |Unless you are also using the same 10.0.0.0/8 on the other 
 |{wired,wireless} network that your system is connected to.

No.

 |> This is where all the VMs plug into.
 |
 |I disagree.
 |
 |The VMs plug into the host bridge.
 |
 |I'm asking about why you have a bridge /inside/ of each of the VMs.

But i do not?

 |> That is the other side of the veth interface pair of course.
 |
 |Yes.
 |
 |But you can use the v_i interface /directly/.  I'm not seeing any /need/ 
 |for the bridge /inside/ the network namespace.

Yes i understood that.  I would need to assign addresses to the
VMs once the VM starts, whereas now i only do that inside the VM.
Hm?.

 |> No.  The purpose is to be able to create a network of any number 
 |> of VMs somewhere, so those go via the bridge, no?  This network is 
 |> self-sufficient
 |> 
 |>    ANY HOST HOWEVER ONLINE  <---> VETH -|- VETH IN NAMESPACE
 |>                                         ^
 |>                      ANY NUMBER OF VMS <-> BRIDGE <
 |> 
 |> Why would you want to do this?  I do not understand.
 |
 |         +---------------------------------------+
 ||host   +------+           +-----------+|
 |||      +---v_ns1---+v_i   v_ns1||
 ||||           +-----------+|
 ||||           +-----------+|
 |(LAN)---+eth0---+ bri0 +---v_ns2---+v_i   v_ns2||
 |||||           +-----------+|
 |||||           +-----------+|
 ||||      +---v_ns3---+v_i   v_ns3||
 |||       +------+           +-----------+|
 ||     +---------------------------------------+
 ||
 ||     +---------------+
 |(LAN)---+eth0   notebook|
 |         +---------------+
 |
 |Each network namespace (v_ns#) has it's own vEth pair.  The host side of 
 |each vEth pair is connected to the bridge on the host.  The bridge on 
 |the host is connected to the host's eth0 interface.  Thus, each of the 
 |network namespaces have a layer 2 network connection to the LAN. 
 |Meaning that each of the network namespaces are proper members of the 
 |LAN.  No routing is needed.  No proxy ARP is needed.  Notebook, host, 
 |v_ns1, v_ns2, v_ns3 can all be on the same subnet without doing anything 
 |fancy.

There is only one network namespace here.  One for all VMs.
You cannot create bridge devices on wireless interfaces, unless
you have a driver which does support that, or, i guess, you create
your own host access point, i dimly recall this could be
a solution too.

 |> Yeah, this is a leftover from the proxy_arp based pseudo-bridge. 
 |> Not to forget it, maybe.  I should have removed it before posting. 
 |> I am not a network expert ok, especially not so Linux-specific.
 |
 |*nod*
 |
 |I was just trying to confirm that's historic.  Seeing as how it's 
 |commented out.
 |
 |Trying to deduce what, and more so why, can be non-trivial at times.
 |
 |> This is just the startup of a VM, it registers at the bridge.
 |
 |Yep.  Adding the host end of the vEth pair to the host bridge.
 |
 |> I am _not_ using proxy_arp no more.  This is a leftover, at the 
 |> beginning i made it configurable and could switch in between the 
 |> different approaches via a setting in /x/vm/.run.sh.  I should have 
 |> removed it before posting.
 |
 |It's cool.  Methods, scripts there of, evolve.
 |
 |> That one is isolated, i can reach it from the host, and they can talk 
 |> to the host, each other and the internet.  I never placed servers 
 |> addressable from the outside in such a thing, this is only a laptop 
 |> without fixed address.  I guess in order to allow a public accessible 
 |> server to live in the namespace i would maybe need an ipfilter rule 
 |> on the host.
 |
 |Or bridging, as depicted above.  ;-)
 |
 |> Yes --bind mounting is cool also.  But i definetely do not want to 
 |> give the VM an entire /dev, it only needs u?random and that _only_ 
 |> because libcrypt (linked into qemu) needs it, even though it is not 
 |> actually used (the Linux getrandom system call is used instead).
 |
 |You can copy the /dev/urandom device, or make a new one, or bind mount 
 |the device inside the network namespace.  ;-)
 |
 |Even if it's not used for anything other than to make the kernel happy, 
 |you are going to need it.

No no, only libgcrypt...

 |> Yes.  But no, i do not really need it, i use it only for qemu 
 |> instances.  Interesting would be imposing hard CPU and memory 
 |> restrictions, especially if i would use this approach for creating 
 |> servers, too.  This is a future task, however.
 |
 |Now you're hedging on cgroups.

Yes.  That i have to do some time.

 |> Yeah like i said, i could impose more restrictions on programs running 
 |> inside that namespace, qemu also shows some security flaws at times, 
 |> but this is really just for testing purposes etc.
 |
 |IMHO /everything/ has security flaws at one point or another.  It's just 
 |a matter of when.

Unfortunately true.  The complexity of cgroups and the Linux
kernel as such is however very, very much intensed compared to a
FreeBSD jail.  At least once jails appeared it often was nothing
more than a "if(process->jailed)" at the beginning of some kernel
functions.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

next prev parent reply	other threads:[~2020-09-22 21:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-20 22:28 athornton
2020-09-21 21:38 ` steffen
2020-09-22  0:19   ` gtaylor
2020-09-22 14:54     ` steffen
2020-09-22 18:15       ` gtaylor
2020-09-22 21:53         ` steffen [this message]
2020-09-23  1:54           ` gtaylor
2020-09-23 23:50             ` steffen
2020-09-24  2:58               ` gtaylor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200922215309.a4ey9%steffen@sdaoden.eu \
    --to=coff@minnie.tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).