[COFF] A little networking tool to reduce having to run emulators with privilege

Computer Old Farts Forum
 help / color / mirror / Atom feed

* [COFF] A little networking tool to reduce having to run emulators with privilege
@ 2020-09-20 22:28 athornton
  2020-09-21 21:38 ` steffen
  0 siblings, 1 reply; 9+ messages in thread
From: athornton @ 2020-09-20 22:28 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

I finally got around to tidying up a little shell tool I wrote that turns a network interface you specify into a bridge, and then creates some tap devices with owning user and group you specify and attaches them to that bridge.

This gets around having to run emulated older systems under sudo if you want networking to work.

It’s mostly intended for the PiDP-11/simh, but it also works fine with klh10 and TOPS-20.

Maybe it will be useful to someone else.

https://github.com/athornton/brnet <https://github.com/athornton/brnet>

Adam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/coff/attachments/20200920/d36507d6/attachment.htm>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [COFF] A little networking tool to reduce having to run emulators with privilege
  2020-09-20 22:28 [COFF] A little networking tool to reduce having to run emulators with privilege athornton
@ 2020-09-21 21:38 ` steffen
  2020-09-22  0:19   ` gtaylor
  0 siblings, 1 reply; 9+ messages in thread
From: steffen @ 2020-09-21 21:38 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 5344 bytes --]

Adam Thornton wrote in
 <23BB3E13-7306-4BB6-9566-DF4C61DE9799 at gmail.com>:
 |I finally got around to tidying up a little shell tool I wrote that \
 |turns a network interface you specify into a bridge, and then creates \
 |some tap devices with owning user and group you specify and attaches \
 |them to that bridge.
 |
 |This gets around having to run emulated older systems under sudo if \
 |you want networking to work.
 |
 |It’s mostly intended for the PiDP-11/simh, but it also works fine with \
 |klh10 and TOPS-20.
 |
 |Maybe it will be useful to someone else.
 |
 |https://github.com/athornton/brnet <https://github.com/athornton/brnet>

Bridges usually do not work with wireless interfaces, it need some
v?eth.  And those br* tools are not everywhere, too (grr).
Have you ever considered network namespaces?

After over a year of using proxy_arp based pseudo bridging (cool!)
i finally wrapped my head around veth, and with it and Linux
network namespaces i loose 40 percent ping response speed, but
have a drastically reduced need for configuration.

What i have is this, maybe you find it useful.  It does not need
any firewall rules.  (Except allowing 10.0.0.0/8.)

In my net-qos.sh (which is my shared-everywhere firewall and tc
script)

  vm_ns_start() {
        #net.ipv4.conf.all.arp_ignore=0
     sysctl -w \
        net.ipv4.ip_forward=1

     ${ip} link add v_n type veth peer name v_i
     ${ip} netns add v_ns
     ${ip} link set v_i netns v_ns

     ${ip} a add 10.0.0.1/8 dev v_n
     ${ip} link set v_n up
     ${ip} route add 10.0.0.1 dev v_n

     ${ip} netns exec v_ns ${ip} link set lo up
     #if [ -z "$BR" ]; then
     #   ${ip} netns exec v_ns ip addr add 10.1.0.1/8 dev v_i broadcast +
     #   ${ip} netns exec v_ns ip link set v_i up
     #   ${ip} netns exec v_ns ip route add default via 10.0.0.1
     #else
        ${ip} netns exec v_ns ${ip} link set v_i up
        ${ip} netns exec v_ns ${ip} link add v_br type bridge
        ${ip} netns exec v_ns ${ip} addr add 10.1.0.1/8 dev v_br broadcast +
        ${ip} netns exec v_ns ${ip} link set v_br up
        ${ip} netns exec v_ns ${ip} link set v_i master v_br
        ${ip} netns exec v_ns ${ip} route add default via 10.0.0.1
     #fi
  }

  vm_ns_stop() {
     ${ip} netns del v_ns

^ That easy it is!

        #net.ipv4.conf.all.arp_ignore=1
     sysctl -w \
        net.ipv4.ip_forward=0
  }

And then, in my /x/vm directory the qemu .ifup.sh script

  #!/bin/sh -

  if [ "$VMNETMODE" = bridge ]; then
     ip link set dev $1 master v_br
     ip link set $1 up
  elif [ "$VMNETMODE" = proxy_arp ]; then
     echo 1 > /proc/sys/net/ipv4/conf/$1/proxy_arp
     ip link set $1 up
     ip route add $VMADDR dev $1
  else
     echo >&2 Unknown VMNETMODE=$VMNETMODE
  fi

Of course qemu creates the actual device for me here.
The .ifdown.sh script i omit, it is not used in this "vbridge"
mode.  It would do nothing really, and it cannot be called because
i now can chroot into /x/vm (needs dev/u?random due to libcrypt
needing it though it would not need them, but i cannot help it).

This then gets driven by a .run.sh script (which is called by the
real per-VM scripts, like

  #!/bin/sh -
  # root.alp-2020, steffen: Sway

  debug=
  vmsys=x86_64
  vmname=alp-2020
  vmimg=.alp-2020-amd64.vmdk
  vmpower=half
  vmmac=52:54:45:01:00:12
  vmcustom= #'-boot menu=on -cdrom /x/iso/alpine-virt-3.12.0-x86_64.iso'

  . /x/vm/.run.sh
  # s-sh-mode

so, and finally invokes qemu like so

  echo 'Monitor at '$0' monitor'
  eval exec $sudo /bin/ip netns exec v_ns /usr/bin/qemu-system-$vmsys \
     -name $VMNAME $runas $chroot \
     $host $accel $vmdisp $net $usb $vmrng $vmcustom \
     -monitor telnet:127.0.0.1:$monport,server,nowait \
     -drive file=$vmimg,index=0,if=ide$drivecache \
     $redir

Users in the vm group may use that sudo, qemu is executed in the
v_ns network namespace under runas='-runas vm' and jailed via
chroot='-chroot .'.  It surely could be more sophisticated, more
cgroups, whatever.  Good enough for me.
That .run.sh does enter

   if [ "$1" = monitor ]; then
      echo 'Entering monitor of '$VMNAME' ('$VMADDR') at '$monport
      eval exec $sudo /bin/ip netns exec v_ns telnet localhost $monport
      exit 5

and enters via ssh

   elif [ "$1" = ssh ]; then
      echo 'SSH into '$VMNAME' ('$VMADDR')'
      doex=exec
      if command -v tmux >/dev/null 2>&1 && [ -n "$TMUX_PANE" ]; then
         tmux set window-active-style bg=colour231,fg=colour0
         doex=
      fi
      ( eval $doex ssh $VMADDR )
      exec tmux set window-active-style bg=default,fg=default
      exit 5

for me.  (I use VMs in Donald Knuth emacs colour scheme it seems,
at least more or less.  VMs here, VM there.  Hm.)

Overall this network namespace thing is pretty cool.  Especially
since, compared to FreeBSD jails, for example, you simply can run
a single command.  Unfair comparison though.  WHat i'd really
wish would be a system which is totally embedded in that
namespace/jail idea.  I.e., _one_ /, and then only moving targets
mounted via overlayfs into "per-jail" directories.  Never found
time nor motivation to truly try this out.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [COFF] A little networking tool to reduce having to run emulators with privilege
  2020-09-21 21:38 ` steffen
@ 2020-09-22  0:19   ` gtaylor
  2020-09-22 14:54     ` steffen
  0 siblings, 1 reply; 9+ messages in thread
From: gtaylor @ 2020-09-22  0:19 UTC (permalink / raw)

On 9/21/20 3:38 PM, Steffen Nurpmeso wrote:
> Bridges usually do not work with wireless interfaces, it need some 
> v?eth.

Is it the bridge that's the problem or is it the wireless interface 
that's the problem?

My understanding is that some (many?) wireless interfaces are funky 
regarding multiple MAC addresses.  Mostly in that they don't work with 
extra MAC addresses.

What would you do with veth interfaces in this context?

> And those br* tools are not everywhere, too (grr).

I've been able to use ip to do much of what I used to do with brctl.

    ip link add bri0 type bridge
    ip link set eth1 master bri0

(from memory)

> Have you ever considered network namespaces?

What would network namespaces provide in this context?

Would you run the $EMULATOR in the network namespace (~container)?

What does that get you that running the $EMULATOR in the main / root / 
unnamed network namespace does not get you?  --  If anything, I'd think 
that running the $EMULATOR in the network namespace would add additional 
networking complexity.

Don't get me wrong, I'm all for network namespaces and doing fun(ky) 
things with them.  I've emulated entire corporate networks with network 
namespaces.  I've currently got nine of them on the system I'm replying 
from, with dynamic routing.

> After over a year of using proxy_arp based pseudo bridging (cool!)

I'll argue that Proxy ARP /is/ a form of routing.  ;-)  Your system 
replies to ARP requests for the IP(s) behind it and the packets are sent 
to it as if it's a router.  }:-)

> i finally wrapped my head around veth,

I find veth to be quite helpful.  {MAC,IP}{VLAN,VTAP} are also fun.

Aside:  {MAC,IP}{VLAN,VTAP} is slightly more difficult to get working. 
In my experience, {MAC,IP}{VLAN,VTAP} don't support talking to the host 
directly, instead you need to create an additional {MAC,IP}{VLAN,VTAP} 
and have the hose use it as if it is it's own guest.

> with it and Linux network namespaces i loose 40 percent ping response 
> speed, but have a drastically reduced need for configuration.

I've never noticed any sort of increased latency worth mentioning.

> What i have is this, maybe you find it useful.  It does not need
> any firewall rules.  (Except allowing 10.0.0.0/8.)
> 
> In my net-qos.sh (which is my shared-everywhere firewall and tc
> script)
> 
>    vm_ns_start() {
>          #net.ipv4.conf.all.arp_ignore=0
>       sysctl -w \
>          net.ipv4.ip_forward=1
> 
>       ${ip} link add v_n type veth peer name v_i
>       ${ip} netns add v_ns
>       ${ip} link set v_i netns v_ns

If you create the netns first, then you can have the veth interface 
created and moved into the network namespace in one command.

    ${ip} link add v_n type veth peer name v_i netns v_ns

Note:  This does create the v_i veth interface in the network namespace 
that you're running the command in and then automatically move it for you.

>       ${ip} a add 10.0.0.1/8 dev v_n
>       ${ip} link set v_n up
>       ${ip} route add 10.0.0.1 dev v_n

Why are you adding a (host) route to 10.0.0.1 when it's part of 
10.0.0.0/8 which is going out the same interface?

>       ${ip} netns exec v_ns ${ip} link set lo up
>       #if [ -z "$BR" ]; then
>       #   ${ip} netns exec v_ns ip addr add 10.1.0.1/8 dev v_i broadcast +
>       #   ${ip} netns exec v_ns ip link set v_i up
>       #   ${ip} netns exec v_ns ip route add default via 10.0.0.1
>       #else
>          ${ip} netns exec v_ns ${ip} link set v_i up
>          ${ip} netns exec v_ns ${ip} link add v_br type bridge

Why are you adding a bridge inside the v_ns network namespace?

>          ${ip} netns exec v_ns ${ip} addr add 10.1.0.1/8 dev v_br broadcast +
>          ${ip} netns exec v_ns ${ip} link set v_br up
>          ${ip} netns exec v_ns ${ip} link set v_i master v_br

Why are you adding the v_i interface to the bridge /inside/ the network 
namespace?

>          ${ip} netns exec v_ns ${ip} route add default via 10.0.0.1
>       #fi
>    }

What does creating a bridge with a single interface /inside/ of the 
network namespace get you?

I would have assumed that you were creating the bridge outside the 
network namespace and adding the network namespace's outside veth to 
said bridge.

>    vm_ns_stop() {
>       ${ip} netns del v_ns
> 
> ^ That easy it is!

Yep.  I've done a LOT of things like that.  Though I have the bridge 
outside.

>          #net.ipv4.conf.all.arp_ignore=1

What was (historically, since it's commented out) the purpose for 
setting arp_ignore to 1.

>       sysctl -w \
>          net.ipv4.ip_forward=0
>    }
> 
> And then, in my /x/vm directory the qemu .ifup.sh script
> 
>    #!/bin/sh -
> 
>    if [ "$VMNETMODE" = bridge ]; then
>       ip link set dev $1 master v_br

This is more what I would expect.

>       ip link set $1 up
>    elif [ "$VMNETMODE" = proxy_arp ]; then
>       echo 1 > /proc/sys/net/ipv4/conf/$1/proxy_arp
>       ip link set $1 up
>       ip route add $VMADDR dev $1

I guess the route is because you're using Proxy ARP.

That makes me ask, is the 10.0.0.0/8 network also used on the outside 
home LAN?

>    else
>       echo >&2 Unknown VMNETMODE=$VMNETMODE
>    fi

;-)

> Of course qemu creates the actual device for me here.
> The .ifdown.sh script i omit, it is not used in this "vbridge"
> mode.  It would do nothing really, and it cannot be called because
> i now can chroot into /x/vm (needs dev/u?random due to libcrypt
> needing it though it would not need them, but i cannot help it).

You can bind mount /dev into the chroot.  That way you could chroot in. 
Much like Gentoo does during installation.

> This then gets driven by a .run.sh script (which is called by the
> real per-VM scripts, like
> 
>    #!/bin/sh -
>    # root.alp-2020, steffen: Sway
> 
>    debug=
>    vmsys=x86_64
>    vmname=alp-2020
>    vmimg=.alp-2020-amd64.vmdk
>    vmpower=half
>    vmmac=52:54:45:01:00:12
>    vmcustom= #'-boot menu=on -cdrom /x/iso/alpine-virt-3.12.0-x86_64.iso'
> 
>    . /x/vm/.run.sh
>    # s-sh-mode
> 
> so, and finally invokes qemu like so
> 
>    echo 'Monitor at '$0' monitor'
>    eval exec $sudo /bin/ip netns exec v_ns /usr/bin/qemu-system-$vmsys \
>       -name $VMNAME $runas $chroot \
>       $host $accel $vmdisp $net $usb $vmrng $vmcustom \
>       -monitor telnet:127.0.0.1:$monport,server,nowait \
>       -drive file=$vmimg,index=0,if=ide$drivecache \
>       $redir
> 
> Users in the vm group may use that sudo, qemu is executed in the
> v_ns network namespace under runas='-runas vm' and jailed via
> chroot='-chroot .'.  It surely could be more sophisticated, more
> cgroups, whatever.  Good enough for me.

:-)

Have you spent any time looking at unshare and / or nsenter?

    # Spawn the lab# NetNSs and set it's hostname.
    unshare --mount=/run/mountns/${1} --net=/run/netns/${1} 
--uts=/run/utsns/${1} /bin/hostname ${1}
    # Bring up the loopback interface.
    nsenter --mount=/run/mountns/${1} --net=/run/netns/${1} 
--uts=/run/utsns/${1} /bin/ip link set dev lo up

I use the mount, net, and uts namespaces.  RTFM for more details on 
different combinations of namespaces.

> That .run.sh does enter
> 
>     if [ "$1" = monitor ]; then
>        echo 'Entering monitor of '$VMNAME' ('$VMADDR') at '$monport
>        eval exec $sudo /bin/ip netns exec v_ns telnet localhost $monport
>        exit 5
> 
> and enters via ssh
> 
>     elif [ "$1" = ssh ]; then
>        echo 'SSH into '$VMNAME' ('$VMADDR')'
>        doex=exec
>        if command -v tmux >/dev/null 2>&1 && [ -n "$TMUX_PANE" ]; then
>           tmux set window-active-style bg=colour231,fg=colour0
>           doex=
>        fi
>        ( eval $doex ssh $VMADDR )
>        exec tmux set window-active-style bg=default,fg=default
>        exit 5
> 
> for me.  (I use VMs in Donald Knuth emacs colour scheme it seems,
> at least more or less.  VMs here, VM there.  Hm.)

... VM everywhere.

But ... are they /really/ VMs?

You're running an /emulator/* in a (home grown) /container/.

}:-)

*Okay.  QEMU can be more of a VM than an emulator, depending on command 
line options.

> Overall this network namespace thing is pretty cool.  Especially
> since, compared to FreeBSD jails, for example, you simply can run
> a single command.  Unfair comparison though.  WHat i'd really
> wish would be a system which is totally embedded in that
> namespace/jail idea.  I.e., _one_ /, and then only moving targets
> mounted via overlayfs into "per-jail" directories.  Never found
> time nor motivation to truly try this out.

I'm not that familiar with jails.  But I'm convinced that there are some 
possibilities with namespaces (containers) that may come close.

Network namespaces (ip netns ...) don't alter the mount namespace. 
That's one of the advantages of unshare / nsenter.  You can create new 
namespace (container) specific mount configurations.

-- 
Grant. . . .
unix || die

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4013 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://minnie.tuhs.org/pipermail/coff/attachments/20200921/4833daf7/attachment.bin>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [COFF] A little networking tool to reduce having to run emulators with privilege
  2020-09-22  0:19   ` gtaylor
@ 2020-09-22 14:54     ` steffen
  2020-09-22 18:15       ` gtaylor
  0 siblings, 1 reply; 9+ messages in thread
From: steffen @ 2020-09-22 14:54 UTC (permalink / raw)


Grant Taylor wrote in
 <1dbc110c-8844-040d-a08d-07914094b47f at spamtrap.tnetconsulting.net>:
 |On 9/21/20 3:38 PM, Steffen Nurpmeso wrote:
 |> Bridges usually do not work with wireless interfaces, it need some 
 |> v?eth.
 |
 |Is it the bridge that's the problem or is it the wireless interface 
 |that's the problem?
 |
 |My understanding is that some (many?) wireless interfaces are funky 
 |regarding multiple MAC addresses.  Mostly in that they don't work with 
 |extra MAC addresses.

My understanding also is about MAC changing, but it seems there
are drivers which can do something about it.  Not my area, sorry.

 |What would you do with veth interfaces in this context?

This is exactly the nice thing about the approach i use, the one
side of the veth pair that is not in the namespace simply plugs
into whatever network environment there is on the host, as long as
the host as a route to it.

The other end, jailed in the namespace, can regulary be used by
a bridge device.  And is constant and self-sufficient.  I use
fixed addresses, for example.  (Parsed from a hosts.txt that is
also read in by dnsmasq(8) so that the host and the VMs find each
other.  All i have to adjust is the hosts.txt.  Having said that,
i can improve this by deriving the MAC address from that file,
too.)

 |> And those br* tools are not everywhere, too (grr).
 |
 |I've been able to use ip to do much of what I used to do with brctl.
 |
 |    ip link add bri0 type bridge
 |    ip link set eth1 master bri0
 |
 |(from memory)

Yes.  It is just that when you search the internet for Linux and
bridges you will find mostly brctl or systemd things.  (Generally
spoken the amount of let me say whisked crap is near hundred
percent in fact.)

 |> Have you ever considered network namespaces?
 |
 |What would network namespaces provide in this context?
 |
 |Would you run the $EMULATOR in the network namespace (~container)?

Yes.

 |What does that get you that running the $EMULATOR in the main / root / 
 |unnamed network namespace does not get you?  --  If anything, I'd think 
 |that running the $EMULATOR in the network namespace would add additional 
 |networking complexity.

You have seen all the configuration there is.
It is isolated, it is not affected by the firewall rules of the
host, the firewall rules of the host do not take care of this
thing at all, attack surface is thus only kernel bugs, i thing,
and anything on the inside can be hardwired.  No worries.

 |Don't get me wrong, I'm all for network namespaces and doing fun(ky) 
 |things with them.  I've emulated entire corporate networks with network 
 |namespaces.  I've currently got nine of them on the system I'm replying 
 |from, with dynamic routing.

No worries.

 |> After over a year of using proxy_arp based pseudo bridging (cool!)
 |
 |I'll argue that Proxy ARP /is/ a form of routing.  ;-)  Your system 
 |replies to ARP requests for the IP(s) behind it and the packets are sent 
 |to it as if it's a router.  }:-)

Yes it is cool.  The "Linux Advanced Routing & Traffic Control
HOWTO" (twenty years ago such great things were written and said
"Welcome, gentle reader", unfortunately that all stopped when the
billions came over Linux i think, but, of course, today the manual
pages are great) says

 /proc/sys/net/ipv4/conf/DEV/proxy_arp
        If you set this to 1, this interface will respond to ARP
        requests for addresses the kernel has routes to. Can be
        very useful when building 'ip pseudo bridges'. Do take
        care that your netmasks are very correct before enabling
        this! Also be aware that the rp_filter, mentioned
        elsewhere, also operates on ARP queries.

 |> i finally wrapped my head around veth,
 |
 |I find veth to be quite helpful.  {MAC,IP}{VLAN,VTAP} are also fun.
 |
 |Aside:  {MAC,IP}{VLAN,VTAP} is slightly more difficult to get working. 
 |In my experience, {MAC,IP}{VLAN,VTAP} don't support talking to the host 
 |directly, instead you need to create an additional {MAC,IP}{VLAN,VTAP} 
 |and have the hose use it as if it is it's own guest.
 |
 |> with it and Linux network namespaces i loose 40 percent ping response 
 |> speed, but have a drastically reduced need for configuration.
 |
 |I've never noticed any sort of increased latency worth mentioning.

Compared to proxy_arp it was 40 percent here.
However, this was with kernel 4.19 and the network driver
(R8822BE) was in staging, now with 5.8 it is not (RTW88) but
terribly broken or buggy or whatever.
It must be said this driver seems to be very complicated, the
R8822BE had a megabyte of code infrastructure iirc.
But it is terrible, and now i know that audio via bluetooth and
wlan throughput are highly dependent.  (Or can.)

 |> What i have is this, maybe you find it useful.  It does not need
 |> any firewall rules.  (Except allowing 10.0.0.0/8.)
 |> 
 |> In my net-qos.sh (which is my shared-everywhere firewall and tc
 |> script)
 |> 
 |>    vm_ns_start() {
 |>          #net.ipv4.conf.all.arp_ignore=0
 |>       sysctl -w \
 |>          net.ipv4.ip_forward=1
 |> 
 |>       ${ip} link add v_n type veth peer name v_i
 |>       ${ip} netns add v_ns
 |>       ${ip} link set v_i netns v_ns
 |
 |If you create the netns first, then you can have the veth interface 
 |created and moved into the network namespace in one command.
 |
 |    ${ip} link add v_n type veth peer name v_i netns v_ns
 |
 |Note:  This does create the v_i veth interface in the network namespace 
 |that you're running the command in and then automatically move it for you.

I did not know that, will try it out.

 |>       ${ip} a add 10.0.0.1/8 dev v_n
 |>       ${ip} link set v_n up
 |>       ${ip} route add 10.0.0.1 dev v_n
 |
 |Why are you adding a (host) route to 10.0.0.1 when it's part of 
 |10.0.0.0/8 which is going out the same interface?

I assign an address to the interface, and make that interface
routable from the host.

 |>       ${ip} netns exec v_ns ${ip} link set lo up
 |>       #if [ -z "$BR" ]; then
 |>       #   ${ip} netns exec v_ns ip addr add 10.1.0.1/8 dev v_i broadcast \
 |>       +
 |>       #   ${ip} netns exec v_ns ip link set v_i up
 |>       #   ${ip} netns exec v_ns ip route add default via 10.0.0.1
 |>       #else
 |>          ${ip} netns exec v_ns ${ip} link set v_i up
 |>          ${ip} netns exec v_ns ${ip} link add v_br type bridge
 |
 |Why are you adding a bridge inside the v_ns network namespace?

This is where all the VMs plug into.

 |>          ${ip} netns exec v_ns ${ip} addr add 10.1.0.1/8 dev v_br \
 |>          broadcast +
 |>          ${ip} netns exec v_ns ${ip} link set v_br up
 |>          ${ip} netns exec v_ns ${ip} link set v_i master v_br
 |
 |Why are you adding the v_i interface to the bridge /inside/ the network 
 |namespace?

That is the other side of the veth interface pair of course.

 |>          ${ip} netns exec v_ns ${ip} route add default via 10.0.0.1
 |>       #fi
 |>}
 |
 |What does creating a bridge with a single interface /inside/ of the 
 |network namespace get you?
 |
 |I would have assumed that you were creating the bridge outside the 
 |network namespace and adding the network namespace's outside veth to 
 |said bridge.

No.  The purpose is to be able to create a network of any number
of VMs somewhere, so those go via the bridge, no?
This network is self-sufficient

  ANY HOST HOWEVER ONLINE  <---> VETH -|- VETH IN NAMESPACE
                                                 ^
                                                 |
                    ANY NUMBER OF VMS <-> BRIDGE <

 |>    vm_ns_stop() {
 |>       ${ip} netns del v_ns
 |> 
 |> ^ That easy it is!
 |
 |Yep.  I've done a LOT of things like that.  Though I have the bridge 
 |outside.

Why would you want to do this?  I do not understand.

 |>          #net.ipv4.conf.all.arp_ignore=1
 |
 |What was (historically, since it's commented out) the purpose for 
 |setting arp_ignore to 1.

Yeah, this is a leftover from the proxy_arp based pseudo-bridge.
Not to forget it, maybe.  I should have removed it before posting.
I am not a network expert ok, especially not so Linux-specific.

 |>       sysctl -w \
 |>          net.ipv4.ip_forward=0
 |>}
 |> 
 |> And then, in my /x/vm directory the qemu .ifup.sh script
 |> 
 |>    #!/bin/sh -
 |> 
 |>    if [ "$VMNETMODE" = bridge ]; then
 |>       ip link set dev $1 master v_br
 |
 |This is more what I would expect.

This is just the startup of a VM, it registers at the bridge.

 |>       ip link set $1 up
 |>    elif [ "$VMNETMODE" = proxy_arp ]; then
 |>       echo 1 > /proc/sys/net/ipv4/conf/$1/proxy_arp
 |>       ip link set $1 up
 |>       ip route add $VMADDR dev $1
 |
 |I guess the route is because you're using Proxy ARP.

I am _not_ using proxy_arp no more.  This is a leftover, at the
beginning i made it configurable and could switch in between the
different approaches via a setting in /x/vm/.run.sh.  I should
have removed it before posting.

 |That makes me ask, is the 10.0.0.0/8 network also used on the outside 
 |home LAN?

That one is isolated, i can reach it from the host, and they can
talk to the host, each other and the internet.  I never placed
servers addressable from the outside in such a thing, this is only
a laptop without fixed address.  I guess in order to allow
a public accessible server to live in the namespace i would maybe
need an ipfilter rule on the host.

 |>    else
 |>       echo >&2 Unknown VMNETMODE=$VMNETMODE
 |>    fi
 |
 |;-)
 |
 |> Of course qemu creates the actual device for me here.
 |> The .ifdown.sh script i omit, it is not used in this "vbridge"
 |> mode.  It would do nothing really, and it cannot be called because
 |> i now can chroot into /x/vm (needs dev/u?random due to libcrypt
 |> needing it though it would not need them, but i cannot help it).
 |
 |You can bind mount /dev into the chroot.  That way you could chroot in. 
 |Much like Gentoo does during installation.

Yes --bind mounting is cool also.  But i definetely do not want to
give the VM an entire /dev, it only needs u?random and that _only_
because libcrypt (linked into qemu) needs it, even though it is
not actually used (the Linux getrandom system call is used
instead).

 |> This then gets driven by a .run.sh script (which is called by the
 |> real per-VM scripts, like
 |> 
 |>    #!/bin/sh -
 |>    # root.alp-2020, steffen: Sway
 |> 
 |>    debug=
 |>    vmsys=x86_64
 |>    vmname=alp-2020
 |>    vmimg=.alp-2020-amd64.vmdk
 |>    vmpower=half
 |>    vmmac=52:54:45:01:00:12
 |>    vmcustom= #'-boot menu=on -cdrom /x/iso/alpine-virt-3.12.0-x86_64.iso'
 |> 
 |>    . /x/vm/.run.sh
 |>    # s-sh-mode
 |> 
 |> so, and finally invokes qemu like so
 |> 
 |>    echo 'Monitor at '$0' monitor'
 |>    eval exec $sudo /bin/ip netns exec v_ns /usr/bin/qemu-system-$vmsys \
 |>       -name $VMNAME $runas $chroot \
 |>       $host $accel $vmdisp $net $usb $vmrng $vmcustom \
 |>       -monitor telnet:127.0.0.1:$monport,server,nowait \
 |>       -drive file=$vmimg,index=0,if=ide$drivecache \
 |>       $redir
 |> 
 |> Users in the vm group may use that sudo, qemu is executed in the
 |> v_ns network namespace under runas='-runas vm' and jailed via
 |> chroot='-chroot .'.  It surely could be more sophisticated, more
 |> cgroups, whatever.  Good enough for me.
 |
 |:-)
 |
 |Have you spent any time looking at unshare and / or nsenter?
 |
 |    # Spawn the lab# NetNSs and set it's hostname.
 |    unshare --mount=/run/mountns/${1} --net=/run/netns/${1} 
 |--uts=/run/utsns/${1} /bin/hostname ${1}
 |    # Bring up the loopback interface.
 |    nsenter --mount=/run/mountns/${1} --net=/run/netns/${1} 
 |--uts=/run/utsns/${1} /bin/ip link set dev lo up
 |
 |I use the mount, net, and uts namespaces.  RTFM for more details on 
 |different combinations of namespaces.

Yes.  But no, i do not really need it, i use it only for qemu
instances.  Interesting would be imposing hard CPU and memory
restrictions, especially if i would use this approach for creating
servers, too.  This is a future task, however.

 |> That .run.sh does enter
 |> 
 |>     if [ "$1" = monitor ]; then
 |>        echo 'Entering monitor of '$VMNAME' ('$VMADDR') at '$monport
 |>        eval exec $sudo /bin/ip netns exec v_ns telnet localhost $monport
 |>        exit 5
 |> 
 |> and enters via ssh
 |> 
 |>     elif [ "$1" = ssh ]; then
 |>        echo 'SSH into '$VMNAME' ('$VMADDR')'
 |>        doex=exec
 |>        if command -v tmux >/dev/null 2>&1 && [ -n "$TMUX_PANE" ]; then
 |>           tmux set window-active-style bg=colour231,fg=colour0
 |>           doex=
 |>        fi
 |>        ( eval $doex ssh $VMADDR )
 |>        exec tmux set window-active-style bg=default,fg=default
 |>        exit 5
 |> 
 |> for me.  (I use VMs in Donald Knuth emacs colour scheme it seems,
 |> at least more or less.  VMs here, VM there.  Hm.)
 |
 |... VM everywhere.
 |
 |But ... are they /really/ VMs?
 |
 |You're running an /emulator/* in a (home grown) /container/.
 |
 |}:-)
 |
 |*Okay.  QEMU can be more of a VM than an emulator, depending on command 
 |line options.
 |
 |> Overall this network namespace thing is pretty cool.  Especially
 |> since, compared to FreeBSD jails, for example, you simply can run
 |> a single command.  Unfair comparison though.  WHat i'd really
 |> wish would be a system which is totally embedded in that
 |> namespace/jail idea.  I.e., _one_ /, and then only moving targets
 |> mounted via overlayfs into "per-jail" directories.  Never found
 |> time nor motivation to truly try this out.
 |
 |I'm not that familiar with jails.  But I'm convinced that there are some 
 |possibilities with namespaces (containers) that may come close.
 |
 |Network namespaces (ip netns ...) don't alter the mount namespace. 
 |That's one of the advantages of unshare / nsenter.  You can create new 
 |namespace (container) specific mount configurations.

Yeah like i said, i could impose more restrictions on programs
running inside that namespace, qemu also shows some security flaws
at times, but this is really just for testing purposes etc.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [COFF] A little networking tool to reduce having to run emulators with privilege
  2020-09-22 14:54     ` steffen
@ 2020-09-22 18:15       ` gtaylor
  2020-09-22 21:53         ` steffen
  0 siblings, 1 reply; 9+ messages in thread
From: gtaylor @ 2020-09-22 18:15 UTC (permalink / raw)

On 9/22/20 8:54 AM, Steffen Nurpmeso wrote:
> My understanding also is about MAC changing, but it seems there are 
> drivers which can do something about it.  Not my area, sorry.

That matches my understanding.

> This is exactly the nice thing about the approach i use, the one 
> side of the veth pair that is not in the namespace simply plugs into 
> whatever network environment there is on the host, as long as the 
> host as a route to it.

Sorry, I was asking for more clarification on what you do with the host 
end of the veth to connect it to the rest of your environment.

> The other end, jailed in the namespace, can regulary be used by a 
> bridge device.

Yes.

I'm wondering why you are attaching the NetNS end of the veth to a 
bridge instead of just using the veth directly.

> And is constant and self-sufficient.  I use fixed addresses, for 
> example.  (Parsed from a hosts.txt that is also read in by dnsmasq(8) 
> so that the host and the VMs find each other.  All i have to adjust 
> is the hosts.txt.  Having said that, i can improve this by deriving 
> the MAC address from that file, too.)

Sure.

I'm about 98% certain that all of that applies equally as well to the 
veth interface inside of the network namespace as it does to the bridge 
inside of the network namespace.

So ... why use a bridge inside of the network namespace?

I completely get why you use a bridge on the host and the veth interface 
outside of the network namespace.  I just don't understand why you are 
using a bridge /inside/ the network namespace.

> Yes.  It is just that when you search the internet for Linux and 
> bridges you will find mostly brctl or systemd things.  (Generally 
> spoken the amount of let me say whisked crap is near hundred percent 
> in fact.)

The other thing that I find is how to configure bridging in distro init 
scripts.

> Yes.
> 
> You have seen all the configuration there is.  It is isolated, it is 
> not affected by the firewall rules of the host, the firewall rules of 
> the host do not take care of this thing at all, attack surface is thus 
> only kernel bugs, i thing, and anything on the inside can be hardwired. 
> No worries.

Depending on how you are connecting the host side veth to the network, 
there is a very real chance that the host firewall will influence what 
goes into / comes out of the emulator in the network namespace 
(~container).  Particularly if you are routing.  Less so if you are 
bridging.  But bridging can still be effected by the firewall.

> Yes it is cool.  The "Linux Advanced Routing & Traffic Control HOWTO" 
> (twenty years ago such great things were written and said "Welcome, 
> gentle reader", unfortunately that all stopped when the billions 
> came over Linux i think, but, of course, today the manual pages are 
> great) says

Ya.  The Linux Documentation Project and their How-To's were (arguably 
still is) great.  It's not /quite/ timeless.  But much of the stuff 
there is still viable.  Some of it is woefully out of date though.

> /proc/sys/net/ipv4/conf/DEV/proxy_arp If you set this to 1, this 
> interface will respond to ARP requests for addresses the kernel has 
> routes to. Can be very useful when building 'ip pseudo bridges'. Do 
> take care that your netmasks are very correct before enabling 
> this! Also be aware that the rp_filter, mentioned elsewhere, also 
> operates on ARP queries.

*nod*

> Compared to proxy_arp it was 40 percent here.  However, this was 
> with kernel 4.19 and the network driver (R8822BE) was in staging, now 
> with 5.8 it is not (RTW88) but terribly broken or buggy or whatever. 
> It must be said this driver seems to be very complicated, the R8822BE 
> had a megabyte of code infrastructure iirc.  But it is terrible, 
> and now i know that audio via bluetooth and wlan throughput are 
> highly dependent.  (Or can.)

Sounds like you've got some issues that I typically don't run into with 
more traditional RTL 8129 / 8139 / 8169 drivers.

> I did not know that, will try it out.

I figured that you would appreciate it.

> I assign an address to the interface, and make that interface routable 
> from the host.

The fact that the prefix is on a directly attached network should be 
sufficient to make it routable to the host.

Unless you are also using the same 10.0.0.0/8 on the other 
{wired,wireless} network that your system is connected to.

> This is where all the VMs plug into.

I disagree.

The VMs plug into the host bridge.

I'm asking about why you have a bridge /inside/ of each of the VMs.

> That is the other side of the veth interface pair of course.

Yes.

But you can use the v_i interface /directly/.  I'm not seeing any /need/ 
for the bridge /inside/ the network namespace.

> No.  The purpose is to be able to create a network of any number 
> of VMs somewhere, so those go via the bridge, no?  This network is 
> self-sufficient
> 
>    ANY HOST HOWEVER ONLINE  <---> VETH -|- VETH IN NAMESPACE
>                                                   ^
>                      ANY NUMBER OF VMS <-> BRIDGE <
> 
> 
> Why would you want to do this?  I do not understand.

         +---------------------------------------+
         |host   +------+           +-----------+|
         |       |      +---v_ns1---+v_i   v_ns1||
         |       |      |           +-----------+|
         |       |      |           +-----------+|
(LAN)---+eth0---+ bri0 +---v_ns2---+v_i   v_ns2||
   |     |       |      |           +-----------+|
   |     |       |      |           +-----------+|
   |     |       |      +---v_ns3---+v_i   v_ns3||
   |     |       +------+           +-----------+|
   |     +---------------------------------------+
   |
   |     +---------------+
(LAN)---+eth0   notebook|
         +---------------+

Each network namespace (v_ns#) has it's own vEth pair.  The host side of 
each vEth pair is connected to the bridge on the host.  The bridge on 
the host is connected to the host's eth0 interface.  Thus, each of the 
network namespaces have a layer 2 network connection to the LAN. 
Meaning that each of the network namespaces are proper members of the 
LAN.  No routing is needed.  No proxy ARP is needed.  Notebook, host, 
v_ns1, v_ns2, v_ns3 can all be on the same subnet without doing anything 
fancy.

> Yeah, this is a leftover from the proxy_arp based pseudo-bridge. 
> Not to forget it, maybe.  I should have removed it before posting. 
> I am not a network expert ok, especially not so Linux-specific.

*nod*

I was just trying to confirm that's historic.  Seeing as how it's 
commented out.

Trying to deduce what, and more so why, can be non-trivial at times.

> This is just the startup of a VM, it registers at the bridge.

Yep.  Adding the host end of the vEth pair to the host bridge.

> I am _not_ using proxy_arp no more.  This is a leftover, at the 
> beginning i made it configurable and could switch in between the 
> different approaches via a setting in /x/vm/.run.sh.  I should have 
> removed it before posting.

It's cool.  Methods, scripts there of, evolve.

> That one is isolated, i can reach it from the host, and they can talk 
> to the host, each other and the internet.  I never placed servers 
> addressable from the outside in such a thing, this is only a laptop 
> without fixed address.  I guess in order to allow a public accessible 
> server to live in the namespace i would maybe need an ipfilter rule 
> on the host.

Or bridging, as depicted above.  ;-)

> Yes --bind mounting is cool also.  But i definetely do not want to 
> give the VM an entire /dev, it only needs u?random and that _only_ 
> because libcrypt (linked into qemu) needs it, even though it is not 
> actually used (the Linux getrandom system call is used instead).

You can copy the /dev/urandom device, or make a new one, or bind mount 
the device inside the network namespace.  ;-)

Even if it's not used for anything other than to make the kernel happy, 
you are going to need it.

> Yes.  But no, i do not really need it, i use it only for qemu 
> instances.  Interesting would be imposing hard CPU and memory 
> restrictions, especially if i would use this approach for creating 
> servers, too.  This is a future task, however.

Now you're hedging on cgroups.

> Yeah like i said, i could impose more restrictions on programs running 
> inside that namespace, qemu also shows some security flaws at times, 
> but this is really just for testing purposes etc.

IMHO /everything/ has security flaws at one point or another.  It's just 
a matter of when.

-- 
Grant. . . .
unix || die

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4013 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://minnie.tuhs.org/pipermail/coff/attachments/20200922/8437a342/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [COFF] A little networking tool to reduce having to run emulators with privilege
  2020-09-22 18:15       ` gtaylor
@ 2020-09-22 21:53         ` steffen
  2020-09-23  1:54           ` gtaylor
  0 siblings, 1 reply; 9+ messages in thread
From: steffen @ 2020-09-22 21:53 UTC (permalink / raw)

Hello.

Grant Taylor wrote in
 <0f0083d2-1fc6-87dd-3001-6fa7e3752c79 at tnetconsulting.net>:
 |On 9/22/20 8:54 AM, Steffen Nurpmeso wrote:
 |> My understanding also is about MAC changing, but it seems there are 
 |> drivers which can do something about it.  Not my area, sorry.
 |
 |That matches my understanding.

And that is fuzzy on my side, the best i have heard was on some
FreeBSD not too long ago, where the same problem exists.
..Yes.  This is exactly the message i remembered:

  https://marc.info/?l=freebsd-current&m=155924713003353&w=2

(The relevant part is

  >>> I think there's a (unknown?) problem that makes lagg(4) incompatible with 
  >>> bridge(4). I've never been unable to make a lagg interface work as a member
  >>  of 
  >>> a bridge. Lacking the time to pursue it, I've resorted to NATing instead.
  >>>
  >>> Also, wlan interfaces tend to break if you change their MAC address.  So in
  >>  a 
  >>> lagg consisting of a wlan interface and a ethernet interface (without a 
  >>> bridge), I always set the MAC of the ethernet to match the native MAC of th
  >> e 
  >>> wlan, and not vice versa.
)

Not much.

 |> This is exactly the nice thing about the approach i use, the one 
 |> side of the veth pair that is not in the namespace simply plugs into 
 |> whatever network environment there is on the host, as long as the 
 |> host as a route to it.
 |
 |Sorry, I was asking for more clarification on what you do with the host 
 |end of the veth to connect it to the rest of your environment.

Nothing.  I only set the route.

 |> The other end, jailed in the namespace, can regulary be used by a 
 |> bridge device.
 |
 |Yes.
 |
 |I'm wondering why you are attaching the NetNS end of the veth to a 
 |bridge instead of just using the veth directly.

Ha.  That could very well be because i was desperately trying to
create a bridge all the time, but it just was not working at all.
So i turned to proxy_arp "pseudo-bridging", with just having
routes and the TAP devices for the VMs, nothing more.  A wild
chaos universe thus.

You know, to me this is just a programmatic problem, i just do not
understand.  Why does it matter whether you have eth0 or wlp1s0?
I can go into the internet with both, why can i create a bridge on
one but not the other?  That does not make sense.  Just do it?  It
works when i inject a VETH pair, so may it be like this.
Having the bridge now makes things easy, i just make it the master
of anything which is plugged into it.
Yes, so easy are things if you never programmed a network hardware
driver!

 |> And is constant and self-sufficient.  I use fixed addresses, for 
 |> example.  (Parsed from a hosts.txt that is also read in by dnsmasq(8) 
 |> so that the host and the VMs find each other.  All i have to adjust 
 |> is the hosts.txt.  Having said that, i can improve this by deriving 
 |> the MAC address from that file, too.)
 |
 |Sure.
 |
 |I'm about 98% certain that all of that applies equally as well to the 
 |veth interface inside of the network namespace as it does to the bridge 
 |inside of the network namespace.
 |
 |So ... why use a bridge inside of the network namespace?
 |
 |I completely get why you use a bridge on the host and the veth interface 

I do not use a bridge on the host.  This i cannot do.

 |outside of the network namespace.  I just don't understand why you are 
 |using a bridge /inside/ the network namespace.

But it makes things so easy now.  You have seen the scripts, and
on the host i see

  #?0|kent:tmp$ ip rou
  default via 192.168.0.1 dev wlp1s0 proto dhcp src 192.168.0.153 metric 306
  10.0.0.0/8 dev v_n proto kernel scope link src 10.0.0.1
  10.0.0.1 dev v_n scope link
  192.168.0.0/24 dev wlp1s0 proto dhcp scope link src 192.168.0.153 metric 306
  #?0|kent:tmp$ ip a
  ...
  6: wlp1s0: ...
      inet 192.168.0.153/24 brd 192.168.0.255 scope global dynamic noprefixroute wlp1s0
         valid_lft 69766sec preferred_lft 58966sec
  8: v_n at if7: ...
      inet 10.0.0.1/8 scope global v_n
         valid_lft forever preferred_lft forever

and in the namespace

  #?0|kent:~# ip netns exec v_ns ip rou
  default via 10.0.0.1 dev v_br
  10.0.0.0/8 dev v_br proto kernel scope link src 10.1.0.1
  #?0|kent:~# ip netns exec v_ns ip a
  ...
  6: v_br: ..
      inet 10.1.0.1/8 brd 10.255.255.255 scope global v_br
         valid_lft forever preferred_lft forever
      inet6 fe80::dc81:96ff:fe0b:a229/64 scope link
         valid_lft forever preferred_lft forever
  7: v_i at if8: ..
      link/ether fe:7f:36:b0:04:97 brd ff:ff:ff:ff:ff:ff link-netnsid 0
      inet6 fe80::fc7f:36ff:feb0:497/64 scope link
         valid_lft forever preferred_lft forever

And i started a VM for you

  8: vm_ulinux-010204: ..
      link/ether ce:4a:41:5c:d6:61 brd ff:ff:ff:ff:ff:ff
      inet6 fe80::cc4a:41ff:fe5c:d661/64 scope link
       valid_lft forever preferred_lft forever

No address no route setup, and in the machine

  $ cat default/net
  TYPE=static
  DEV=eth0
  ADDR=10.0.1.15
  MASK=8
  GW=10.0.0.1

Except for using a DHCP server in the namespace this is as short
as it can get.  Of course if i would use that it would be even
less work to do when setting up a VM.  And even more flexible and
automatic.  But, you know, this is so overkill given that i use
this only for testing, and dhcpcd is now privilege-separated:

  #?0|kent:src$ pla|grep dhcpc
  dhcpcd     509     1 S     0.0  1748 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
  root       510   509 S     0.0  2396 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
  dhcpcd     511   509 S     0.0   268 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
  dhcpcd     512   509 S     0.0   268 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
  dhcpcd    7619   510 S     0.0   376 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0

And this twice all week long for in practice nothing?  Ah, no.
And what if i move along, i have to have dhcpcd, or configure
dnsmasq to serve it for the namespace.  And so i can use a simple
hosts.txt and have dnsmasq integrate it in its normal DNS service,
this would not be that easy if it would be dynamic, i had to look.

Anyhow, no network setup at all on in the namespace, i have the
hosts.txt that i need anyway, and i configure the machine once
i install it.  Done.

 |> Yes.  It is just that when you search the internet for Linux and 
 |> bridges you will find mostly brctl or systemd things.  (Generally 
 |> spoken the amount of let me say whisked crap is near hundred percent 
 |> in fact.)
 |
 |The other thing that I find is how to configure bridging in distro init 
 |scripts.

It is terrible.  Nothing HOWTO like, not "help the people to help
themselves", but everybody who understood a topic by himself is
quick in fooling others.  This makes the FreeBSD handbook and the
BSDs and their manual portfolio in general outstanding.

 |> Yes.
 |> 
 |> You have seen all the configuration there is.  It is isolated, it is 
 |> not affected by the firewall rules of the host, the firewall rules of 
 |> the host do not take care of this thing at all, attack surface is thus 
 |> only kernel bugs, i thing, and anything on the inside can be hardwired. 
 |> No worries.
 |
 |Depending on how you are connecting the host side veth to the network, 
 |there is a very real chance that the host firewall will influence what 
 |goes into / comes out of the emulator in the network namespace 
 |(~container).  Particularly if you are routing.  Less so if you are 
 |bridging.  But bridging can still be effected by the firewall.

Don't confuse me please, i am .. not a network expert.  Surely you
could do use firewall stuff, but i do not.  At least nothing
special, very restrictive indeed, but 10.0.0.0 is set free early,
and other...  Ah yes, i have forgotten this line:

  if [ -n "$VM_NS" ]; then
     ${iptables} -t nat -A POSTROUTING -o ${what} -j MASQUERADE
     #echo 1 > /proc/sys/net/ipv4/conf/${what}/proxy_arp
  fi

Here $what is the device, wlp1s0 for example.  True.
Of course.  The comment is a leftover.

 |> Yes it is cool.  The "Linux Advanced Routing & Traffic Control HOWTO" 
 |> (twenty years ago such great things were written and said "Welcome, 
 |> gentle reader", unfortunately that all stopped when the billions 
 |> came over Linux i think, but, of course, today the manual pages are 
 |> great) says
 |
 |Ya.  The Linux Documentation Project and their How-To's were (arguably 
 |still is) great.  It's not /quite/ timeless.  But much of the stuff 
 |there is still viable.  Some of it is woefully out of date though.

Unfortunately yes.

  ...
 |> Compared to proxy_arp it was 40 percent here.  However, this was 
 |> with kernel 4.19 and the network driver (R8822BE) was in staging, now 
 |> with 5.8 it is not (RTW88) but terribly broken or buggy or whatever. 
 |> It must be said this driver seems to be very complicated, the R8822BE 
 |> had a megabyte of code infrastructure iirc.  But it is terrible, 
 |> and now i know that audio via bluetooth and wlan throughput are 
 |> highly dependent.  (Or can.)
 |
 |Sounds like you've got some issues that I typically don't run into with 
 |more traditional RTL 8129 / 8139 / 8169 drivers.

Actually yes.  In fact i have booted into 4.19 again today because
RTW88 is totally unusable here, at least after the first
suspend/resume and even though they added the
rtw88_pci.disable_aspm=1 kernel command line switch to work around
power management problems.

In fact the driver messed the hardware so much that Linux was no
longer capable to access it, even booting 4.19 and using R8822BE
thus did not do it.  By sheer luck the friendly salesman gave me
a 512 GB NVME SSD for the price of a 256 GB last year, so i kept
the maximally minimized 30 GB Windows partition just for win
(imagine that: 30 GB of space .. wasted!), and booted into it,
because of despair!  Of course i had forgotten my password (i
just wanted to log into Windows to see how it looks once i bought
the laptop, my last Windows login before that was Windows 95 B),
but on the welcome screen you could select the network, and once
that dialog wanted the password of the network (!) i rebooted into
Linux 4.19 .. and was alive again.  Sheer luck, dammit, otherwise
i would have been just dead.  Terrible!

This at least seems to be avoidable by using the above command
line switch.  Today it could be accessed after reboot.

 |> I did not know that, will try it out.
 |
 |I figured that you would appreciate it.
 |
 |> I assign an address to the interface, and make that interface routable 
 |> from the host.
 |
 |The fact that the prefix is on a directly attached network should be 
 |sufficient to make it routable to the host.
 |
 |Unless you are also using the same 10.0.0.0/8 on the other 
 |{wired,wireless} network that your system is connected to.

No.

 |> This is where all the VMs plug into.
 |
 |I disagree.
 |
 |The VMs plug into the host bridge.
 |
 |I'm asking about why you have a bridge /inside/ of each of the VMs.

But i do not?

 |> That is the other side of the veth interface pair of course.
 |
 |Yes.
 |
 |But you can use the v_i interface /directly/.  I'm not seeing any /need/ 
 |for the bridge /inside/ the network namespace.

Yes i understood that.  I would need to assign addresses to the
VMs once the VM starts, whereas now i only do that inside the VM.
Hm?.

 |> No.  The purpose is to be able to create a network of any number 
 |> of VMs somewhere, so those go via the bridge, no?  This network is 
 |> self-sufficient
 |> 
 |>    ANY HOST HOWEVER ONLINE  <---> VETH -|- VETH IN NAMESPACE
 |>                                         ^
 |>                      ANY NUMBER OF VMS <-> BRIDGE <
 |> 
 |> Why would you want to do this?  I do not understand.
 |
 |         +---------------------------------------+
 ||host   +------+           +-----------+|
 |||      +---v_ns1---+v_i   v_ns1||
 ||||           +-----------+|
 ||||           +-----------+|
 |(LAN)---+eth0---+ bri0 +---v_ns2---+v_i   v_ns2||
 |||||           +-----------+|
 |||||           +-----------+|
 ||||      +---v_ns3---+v_i   v_ns3||
 |||       +------+           +-----------+|
 ||     +---------------------------------------+
 ||
 ||     +---------------+
 |(LAN)---+eth0   notebook|
 |         +---------------+
 |
 |Each network namespace (v_ns#) has it's own vEth pair.  The host side of 
 |each vEth pair is connected to the bridge on the host.  The bridge on 
 |the host is connected to the host's eth0 interface.  Thus, each of the 
 |network namespaces have a layer 2 network connection to the LAN. 
 |Meaning that each of the network namespaces are proper members of the 
 |LAN.  No routing is needed.  No proxy ARP is needed.  Notebook, host, 
 |v_ns1, v_ns2, v_ns3 can all be on the same subnet without doing anything 
 |fancy.

There is only one network namespace here.  One for all VMs.
You cannot create bridge devices on wireless interfaces, unless
you have a driver which does support that, or, i guess, you create
your own host access point, i dimly recall this could be
a solution too.

 |> Yeah, this is a leftover from the proxy_arp based pseudo-bridge. 
 |> Not to forget it, maybe.  I should have removed it before posting. 
 |> I am not a network expert ok, especially not so Linux-specific.
 |
 |*nod*
 |
 |I was just trying to confirm that's historic.  Seeing as how it's 
 |commented out.
 |
 |Trying to deduce what, and more so why, can be non-trivial at times.
 |
 |> This is just the startup of a VM, it registers at the bridge.
 |
 |Yep.  Adding the host end of the vEth pair to the host bridge.
 |
 |> I am _not_ using proxy_arp no more.  This is a leftover, at the 
 |> beginning i made it configurable and could switch in between the 
 |> different approaches via a setting in /x/vm/.run.sh.  I should have 
 |> removed it before posting.
 |
 |It's cool.  Methods, scripts there of, evolve.
 |
 |> That one is isolated, i can reach it from the host, and they can talk 
 |> to the host, each other and the internet.  I never placed servers 
 |> addressable from the outside in such a thing, this is only a laptop 
 |> without fixed address.  I guess in order to allow a public accessible 
 |> server to live in the namespace i would maybe need an ipfilter rule 
 |> on the host.
 |
 |Or bridging, as depicted above.  ;-)
 |
 |> Yes --bind mounting is cool also.  But i definetely do not want to 
 |> give the VM an entire /dev, it only needs u?random and that _only_ 
 |> because libcrypt (linked into qemu) needs it, even though it is not 
 |> actually used (the Linux getrandom system call is used instead).
 |
 |You can copy the /dev/urandom device, or make a new one, or bind mount 
 |the device inside the network namespace.  ;-)
 |
 |Even if it's not used for anything other than to make the kernel happy, 
 |you are going to need it.

No no, only libgcrypt...

 |> Yes.  But no, i do not really need it, i use it only for qemu 
 |> instances.  Interesting would be imposing hard CPU and memory 
 |> restrictions, especially if i would use this approach for creating 
 |> servers, too.  This is a future task, however.
 |
 |Now you're hedging on cgroups.

Yes.  That i have to do some time.

 |> Yeah like i said, i could impose more restrictions on programs running 
 |> inside that namespace, qemu also shows some security flaws at times, 
 |> but this is really just for testing purposes etc.
 |
 |IMHO /everything/ has security flaws at one point or another.  It's just 
 |a matter of when.

Unfortunately true.  The complexity of cgroups and the Linux
kernel as such is however very, very much intensed compared to a
FreeBSD jail.  At least once jails appeared it often was nothing
more than a "if(process->jailed)" at the beginning of some kernel
functions.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [COFF] A little networking tool to reduce having to run emulators with privilege
  2020-09-22 21:53         ` steffen
@ 2020-09-23  1:54           ` gtaylor
  2020-09-23 23:50             ` steffen
  0 siblings, 1 reply; 9+ messages in thread
From: gtaylor @ 2020-09-23  1:54 UTC (permalink / raw)

On 9/22/20 3:53 PM, Steffen Nurpmeso wrote:
> Hello.

Hi,

> And that is fuzzy on my side, the best i have heard was on some 
> FreeBSD not too long ago, where the same problem exists.  ..Yes. 
> This is exactly the message i remembered:

Hum.

I've added an LACP Link Aggregation Group as a member to a bridge 
multiple times.

I would NEVER create an LACP Ling Aggregation Group between a wired and 
a wireless connection.  I would expect that to NOT work.

> Nothing.  I only set the route.

Hum....

> Ha.  That could very well be because i was desperately trying to 
> create a bridge all the time, but it just was not working at all. 
> So i turned to proxy_arp "pseudo-bridging", with just having routes and 
> the TAP devices for the VMs, nothing more.  A wild chaos universe thus.

Were you trying to bridge things to a wireless NIC?

> You know, to me this is just a programmatic problem, i just do 
> not understand.  Why does it matter whether you have eth0 or wlp1s0?

IMHO the name of the device doesn't matter.

But the name does imply what type of device it is.  eth0 is almost 
always wired (but there is no guarantee).  After looking it up, wlp1s0 
seems to imply Wi-Fi.

Wi-Fi tends to imply other problems specific to Wi-Fi.

> I can go into the internet with both,

Yes.

> why can i create a bridge on one but not the other?

Because some Wi-Fi cards have problems related to multiple MAC 
addresses.  Problems like they simply refuse to allow them.

So the problem that it sounds like you're running into is that the Wi-Fi 
is refusing to do anything useful with the Ethernet frames that are 
being bridged to it.

Aside:  EBTables may be able to help resolve this problem.  But that's 
another kettle of fish.

> That does not make sense.

Once you understand some of Wi-Fi's inherent limitations, it should also 
make sense to you.

> Just do it?

It (Wi-Fi) probably can't or won't do anything useful with an Ethernet 
frame that has a different source MAC address.

Ergo, bridging with Wi-Fi typically is problematic or simply doesn't 
work.  It's a limitation of Wi-Fi, not bridging technology.

> It works when i inject a VETH pair, so may it be like this.

/How/ are you injecting with a vEth pair?

You say you aren't bridging.  You indicate that you have used Proxy ARP 
in the past and that you aren't doing so now.

So I'm not entirely sure what you're doing currently.  I'd have to 
re-read and scrutinize emails in this thread again.

> Having the bridge now makes things easy, i just make it the master 
> of anything which is plugged into it.  Yes, so easy are things if 
> you never programmed a network hardware driver!

What all are you plugging into it?

Are you referring to plugging the single vEth in the network namespace?

> I do not use a bridge on the host.  This i cannot do.

I trust that you believe what you are saying.  I question the deep 
technical merits of it.  Including using things like EBTables to NAT* 
the source MAC address to that of thee Wi-Fi card.

*NAT typically applies to layer 3 IP addresses.  But the same concept is 
being done to layer 3 Ethernet addresses.

> But it makes things so easy now.  You have seen the scripts, and on 
> the host i see
> 
>    #?0|kent:tmp$ ip rou
>    default via 192.168.0.1 dev wlp1s0 proto dhcp src 192.168.0.153 metric 306
>    10.0.0.0/8 dev v_n proto kernel scope link src 10.0.0.1
>    10.0.0.1 dev v_n scope link
>    192.168.0.0/24 dev wlp1s0 proto dhcp scope link src 192.168.0.153 metric 306
>    #?0|kent:tmp$ ip a
>    ...
>    6: wlp1s0: ...
>        inet 192.168.0.153/24 brd 192.168.0.255 scope global dynamic noprefixroute wlp1s0
>           valid_lft 69766sec preferred_lft 58966sec
>    8: v_n at if7: ...
>        inet 10.0.0.1/8 scope global v_n
>           valid_lft forever preferred_lft forever

That's the first clear picture that I've had of the host side.  (Perhaps 
I mis-read something before.)

Do other things on your network have any access to what's running in the 
network namespace?  Or is it only accessible by your host?

> and in the namespace
> 
>    #?0|kent:~# ip netns exec v_ns ip rou
>    default via 10.0.0.1 dev v_br
>    10.0.0.0/8 dev v_br proto kernel scope link src 10.1.0.1
>    #?0|kent:~# ip netns exec v_ns ip a
>    ...
>    6: v_br: ..
>        inet 10.1.0.1/8 brd 10.255.255.255 scope global v_br
>           valid_lft forever preferred_lft forever
>        inet6 fe80::dc81:96ff:fe0b:a229/64 scope link
>           valid_lft forever preferred_lft forever
>    7: v_i at if8: ..
>        link/ether fe:7f:36:b0:04:97 brd ff:ff:ff:ff:ff:ff link-netnsid 0
>        inet6 fe80::fc7f:36ff:feb0:497/64 scope link
>           valid_lft forever preferred_lft forever
> 
> And i started a VM for you
> 
>    8: vm_ulinux-010204: ..
>        link/ether ce:4a:41:5c:d6:61 brd ff:ff:ff:ff:ff:ff
>        inet6 fe80::cc4a:41ff:fe5c:d661/64 scope link
>         valid_lft forever preferred_lft forever
> 
> No address no route setup,

Why not add an IP address to the vm_ulinux-010204 interface?  Much like 
you've added to the v_br interface above.

> and in the machine
> 
>    $ cat default/net
>    TYPE=static
>    DEV=eth0
>    ADDR=10.0.1.15
>    MASK=8
>    GW=10.0.0.1
> 
> Except for using a DHCP server in the namespace this is as short as it 
> can get.  Of course if i would use that it would be even less work to 
> do when setting up a VM.  And even more flexible and automatic.  But, 
> you know, this is so overkill given that i use this only for testing, 
> and dhcpcd is now privilege-separated:

I get the lack of motivation for DHCP inside of network namespaces.

Especially when the IP address(es) used in the network namespace can be 
derived from the name of the network namespace.

>    #?0|kent:src$ pla|grep dhcpc
>    dhcpcd     509     1 S     0.0  1748 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
>    root       510   509 S     0.0  2396 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
>    dhcpcd     511   509 S     0.0   268 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
>    dhcpcd     512   509 S     0.0   268 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0
>    dhcpcd    7619   510 S     0.0   376 dhcpcd /sbin/dhcpcd -h kent -z wlp1s0

Why do you have as many things running dhcpcd against the same interface?

> And this twice all week long for in practice nothing?  Ah, no. 
> And what if i move along, i have to have dhcpcd, or configure dnsmasq 
> to serve it for the namespace.  And so i can use a simple hosts.txt 
> and have dnsmasq integrate it in its normal DNS service, this would 
> not be that easy if it would be dynamic, i had to look.

Hence why I prefer to not use DHCP inside of network namespaces.  ;-)

I think we're in agreement about how addresses are assigned.  Or at 
least that DHCP is an unnecessary complication.

> Anyhow, no network setup at all on in the namespace, i have the 
> hosts.txt that i need anyway, and i configure the machine once i 
> install it.  Done.
> 
> It is terrible.  Nothing HOWTO like, not "help the people to help 
> themselves", but everybody who understood a topic by himself is quick 
> in fooling others.  This makes the FreeBSD handbook and the BSDs and 
> their manual portfolio in general outstanding.

Ya.  I think that a lot of documentation for things that are post TLDP's 
heyday are lacking considerably.

I've had to dig through a lot of texts, scripts, watch a lot of videos, 
read man pages, and do lots of experimentation with network namespaces 
to get to where I am now.  I'm always happy to share what I know.

> Don't confuse me please, i am .. not a network expert.  Surely you 
> could do use firewall stuff, but i do not.  At least nothing special, 
> very restrictive indeed, but 10.0.0.0 is set free early, and other...

I'm not trying to confuse you.  I'm just pointing out that the host 
firewall /can/ mess with things in some situations.  So, I advise you to 
not assume that it can't.

> Ah yes, i have forgotten this line:
> 
>    if [ -n "$VM_NS" ]; then
>       ${iptables} -t nat -A POSTROUTING -o ${what} -j MASQUERADE
>       #echo 1 > /proc/sys/net/ipv4/conf/${what}/proxy_arp
>    fi
> 
> Here $what is the device, wlp1s0 for example.  True.

Okay.

That gives your network namespace access to the rest of the network (and 
probably the Internet).

> Of course.  The comment is a leftover.

;-)

> Actually yes.  In fact i have booted into 4.19 again today because 
> RTW88 is totally unusable here, at least after the first suspend/resume 
> and even though they added the rtw88_pci.disable_aspm=1 kernel command 
> line switch to work around power management problems.

Power management can be a nightmare for networking (and other things).

I now see that RTW88 is a Realtek /Wireless/ NIC.  That makes a lot of 
(if not all of) the problems that you're describing make a lot more sense.

> In fact the driver messed the hardware so much that Linux was no 
> longer capable to access it, even booting 4.19 and using R8822BE thus 
> did not do it.  By sheer luck the friendly salesman gave me a 512 GB 
> NVME SSD for the price of a 256 GB last year, so i kept the maximally 
> minimized 30 GB Windows partition just for win (imagine that: 30 GB of 
> space .. wasted!), and booted into it, because of despair!  Of course 
> i had forgotten my password (i just wanted to log into Windows to see 
> how it looks once i bought the laptop, my last Windows login before 
> that was Windows 95 B), but on the welcome screen you could select 
> the network, and once that dialog wanted the password of the network 
> (!) i rebooted into Linux 4.19 .. and was alive again.  Sheer luck, 
> dammit, otherwise i would have been just dead.  Terrible!

It sounds like you partially wedged the wireless chip set.  That's not 
unheard of.  Sometimes a reboot will unwedge it.  Sometimes it requires 
a full power off and back on.  Sometimes you need to remove the battery. 
  A few times you need another utility that will try to initialize it in 
a different way (Windows qualifies here).

> This at least seems to be avoidable by using the above command 
> line switch.  Today it could be accessed after reboot.

Good.

> No.

I see that now.

> But i do not?

But you do.

   #?0|kent:~# ip netns exec v_ns ip a
   ...
   6: v_br: ..
       inet 10.1.0.1/8 brd 10.255.255.255 scope global v_br
          valid_lft forever preferred_lft forever
       inet6 fe80::dc81:96ff:fe0b:a229/64 scope link
          valid_lft forever preferred_lft forever
   7: v_i at if8: ..
       link/ether fe:7f:36:b0:04:97 brd ff:ff:ff:ff:ff:ff link-netnsid 0
       inet6 fe80::fc7f:36ff:feb0:497/64 scope link
          valid_lft forever preferred_lft forever

"ip netns exec v_ns ip a" runs "ip address" inside of the "v_ns" network 
namespace.  "v_br" looks like a bridge.  Your IP is bound to the v_br 
(bridge) interface.  The "v_i" interface, which your previous email 
indicated was the vEth that is inside of the network namespace.

So it /really/ looks to me like you /do/ have a bridge /inside/ of the 
network namespace and that you /are/ using it as part of your 
communications path.

Please do an "ip link show" inside the network namespace.

    ip netns exec v_ns ip l

I half way expect that the v_i interface will have a master of v_br.

For giggles, why don't you add a "-d" between ip and link.

    ip netns exec v_ns ip -d l

> Yes i understood that.  I would need to assign addresses to the VMs 
> once the VM starts, whereas now i only do that inside the VM.  Hm?.

That overall concept would not change.  But you would assign it to the 
v_i interface instead of the v_br interface.

> There is only one network namespace here.  One for all VMs.

Ah.  I had thought there were multiple network namespaces.  One per VM / 
emulator.

> You cannot create bridge devices on wireless interfaces, unless you 
> have a driver which does support that, or, i guess, you create your 
> own host access point, i dimly recall this could be a solution too.

I don't know if the driver that balks at multiple MACs will support 
being an access point.

Though I do wonder if it would be possible to leverage EBTables to play 
with MAC addresses to sooth the Wi-Fi NIC's heartburn at multiple MACs. 
}:-)

> No no, only libgcrypt...

That's all that /you/ personally are using it for.

But I thought you indicated that something else was unhappy if there 
wasn't a (u)random device.

> Yes.  That i have to do some time.

I'll look into cgroups some day.  I've not had a need to do so yet.

> Unfortunately true.  The complexity of cgroups and the Linux kernel 
> as such is however very, very much intensed compared to a FreeBSD 
> jail.  At least once jails appeared it often was nothing more than a 
> "if(process->jailed)" at the beginning of some kernel functions.

*nod*

I think that Linux still has some things to learn from BSD jails and 
Solaris zones.

-- 
Grant. . . .
unix || die

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4013 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://minnie.tuhs.org/pipermail/coff/attachments/20200922/ec5d1c97/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [COFF] A little networking tool to reduce having to run emulators with privilege
  2020-09-23  1:54           ` gtaylor
@ 2020-09-23 23:50             ` steffen
  2020-09-24  2:58               ` gtaylor
  0 siblings, 1 reply; 9+ messages in thread
From: steffen @ 2020-09-23 23:50 UTC (permalink / raw)


Hello.

Sorry for the late reply, very last summer day with cold and rainy
days to come, and then i really turned on the TV on a non-Sunday
for the first time for long, to watch the German film
"Gundermann", which i had on my list ever since i have heard from
it.

This thread also gives me the feeling of being twisted and
wrapped, .. but you know, feelings, pah.

Grant Taylor wrote in
 <b94c3a89-b6ca-b7d4-dd24-ddf7c00283c9 at spamtrap.tnetconsulting.net>:
 |On 9/22/20 3:53 PM, Steffen Nurpmeso wrote:
 ...
 |> Ha.  That could very well be because i was desperately trying to 
 |> create a bridge all the time, but it just was not working at all. 
 ...
 |Were you trying to bridge things to a wireless NIC?
 |
 |> You know, to me this is just a programmatic problem, i just do 
 |> not understand.  Why does it matter whether you have eth0 or wlp1s0?
 |
 |IMHO the name of the device doesn't matter.
 |
 |But the name does imply what type of device it is.  eth0 is almost 
 |always wired (but there is no guarantee).  After looking it up, wlp1s0 
 |seems to imply Wi-Fi.
 |
 |Wi-Fi tends to imply other problems specific to Wi-Fi.
 ...
 |> why can i create a bridge on one but not the other?
 |
 |Because some Wi-Fi cards have problems related to multiple MAC 
 |addresses.  Problems like they simply refuse to allow them.
 |
 |So the problem that it sounds like you're running into is that the Wi-Fi 
 |is refusing to do anything useful with the Ethernet frames that are 
 |being bridged to it.
 |
 |Aside:  EBTables may be able to help resolve this problem.  But that's 
 |another kettle of fish.

This i have never used, i do not have it installed.
But you mention it -- what i did not understand is why it is not
"simply" made a policy that when you want to create a software
bridge device, you can create it.  In the end i expect the problem
to manifest in only a few bytes of data?

 |> That does not make sense.
 |
 |Once you understand some of Wi-Fi's inherent limitations, it should also 
 |make sense to you.

Well i never had doubts there are some technical reasons, i just
did not know them, and there were no nice error messages nor
documentation of the problem.  But given that it _does_ work for
some drivers i was also sure that the actual problem is more of
a notational sort, so to say, and then i did not understand why
the creation of a software bridge does not take appropriate steps
to make this happen.

 |> Just do it?
 |
 |It (Wi-Fi) probably can't or won't do anything useful with an Ethernet 
 |frame that has a different source MAC address.
 |
 |Ergo, bridging with Wi-Fi typically is problematic or simply doesn't 
 |work.  It's a limitation of Wi-Fi, not bridging technology.

I could say "heck, but if ..

 |> It works when i inject a VETH pair[.]

then it can be done".

 |> It works when i inject a VETH pair, so may it be like this.

But i do not.

  ..
 |> I do not use a bridge on the host.  This i cannot do.
 |
 |I trust that you believe what you are saying.  I question the deep 
 |technical merits of it.  Including using things like EBTables to NAT* 
 |the source MAC address to that of thee Wi-Fi card.
 |
 |*NAT typically applies to layer 3 IP addresses.  But the same concept is 
 |being done to layer 3 Ethernet addresses.

Well if i had the time and motivation i surely could dive into
kernel sources and look around, and maybe even get it (done).
But i have so much to do with what other people would call
livestock but i call friends, for example, so it seems i am too
old and hackneyed to get that job done.

  ...
 |I get the lack of motivation for DHCP inside of network namespaces.

yeah.

 |Especially when the IP address(es) used in the network namespace can be 
 |derived from the name of the network namespace.

Yes, this is all ssh/scp communication, and that the script
handles SSH came with that veth/bridge rewrite, before everything
simply had to be typed.  And so i need a name or a fixed IP in
.ssh/known_hosts.  Here i have both now, without much straining.

  ...
 |>    #?0|kent:src$ pla|grep dhcpc
 |>    dhcpcd     509     1 S     0.0  1748 dhcpcd /sbin/dhcpcd -h kent \

[5x]..

 |Why do you have as many things running dhcpcd against the same interface?

It has to be said that Linux should really offer some kind of
setproctitle(2), more and more BSD programs use it to give sense
to ps(1) output of privilege-separated programs, and the
compatibility code using PR_SET_MM_MAP of prctl(2) after reading
and parsing /proc/self/stat is, well, immense effort.

  ...
 |> It is terrible.  Nothing HOWTO like, not "help the people to help 
 |> themselves", but everybody who understood a topic by himself is quick 
 |> in fooling others.  This makes the FreeBSD handbook and the BSDs and 
 |> their manual portfolio in general outstanding.
 |
 |Ya.  I think that a lot of documentation for things that are post TLDP's 
 |heyday are lacking considerably.
 |
 |I've had to dig through a lot of texts, scripts, watch a lot of videos, 
 |read man pages, and do lots of experimentation with network namespaces 
 |to get to where I am now.  I'm always happy to share what I know.

 ...
 |So it /really/ looks to me like you /do/ have a bridge /inside/ of the 
 |network namespace and that you /are/ using it as part of your 
 |communications path.

  ...
 |> There is only one network namespace here.  One for all VMs.
 |
 |Ah.  I had thought there were multiple network namespaces.  One per VM / 
 |emulator.

No.  No, that is really overkill, i am not a student with bad body
hygiene having fun with software or something.  It is annoying
enough that you ever and always again settle on a "that is a good
status quo" just to find out the next day that doing it all anew
would possibly improve the situation.  No.

 |> You cannot create bridge devices on wireless interfaces, unless you 
 |> have a driver which does support that, or, i guess, you create your 
 |> own host access point, i dimly recall this could be a solution too.
 |
 |I don't know if the driver that balks at multiple MACs will support 
 |being an access point.

That HostAP software or how it is named, doesn't it deal with this
the right way?  I seem to have read this is possible.
Never tried this.

 |Though I do wonder if it would be possible to leverage EBTables to play 
 |with MAC addresses to sooth the Wi-Fi NIC's heartburn at multiple MACs. 
 |}:-)

Imho "bridge" should do this by itself automatically.

  ...
 |> Yes.  That [cgroups] i have to do some time.
 |
 |I'll look into cgroups some day.  I've not had a need to do so yet.

Well you could look into software like "containers", you could
read

  If cgroup support, the memory controller and the pids controller
  are compiled into the kernel, a mounted cgroup2 filesystem can
  be used to apply memory and process-count limits to a container
  as it is started. For example, the shell script

    #!/bin/sh -e
    echo +memory +pids >/sys/fs/cgroup/cgroup.subtree_control
    mkdir /sys/fs/cgroup/mycontainer
    echo $$ >/sys/fs/cgroup/mycontainer/tasks
    echo 2G >/sys/fs/cgroup/mycontainer/memory.high
    echo 3G >/sys/fs/cgroup/mycontainer/memory.max
    echo 2G >/sys/fs/cgroup/mycontainer/memory.swap.max
    echo 256 >sys/fs/cgroup/mycontainer/pids.max
    exec contain [...]

  ...
  See linux/kernel/Documentation/cgroup-v2.txt for detailed info
  on the available controllers and configuration parameters.

But most of it can be done with unshare and nsenter.
For example the super minimal ulinux project (a bit stale) has
a box script which does, among other things

  # shellcheck disable=SC2086
  unshare \
    --ipc \
    --uts \
    --pid \
    --user \
    --fork \
    --mount \
    --mount-proc \
    --map-root-user \
    /usr/sbin/chroot "$tmpc/root" \
      /usr/bin/env -i $BOX_ENV /bin/sh -c "source /init; $*"

It could do nice things like

  setup_tmpc() {
    mkdir -p "$tmpc/root" "$tmpc/storage" "$tmpc/work"
    mount -t overlay \
      -o upperdir="$tmpc/storage,lowerdir=/,workdir=$tmpc/work" \
      overlayfs "$tmpc/root"

I am really interested, but am too lazy to convert the scripts so
that this "distribution" (almost kernel-only) can be build without
docker etc.

 |> Unfortunately true.  The complexity of cgroups and the Linux kernel 
 |> as such is however very, very much intensed compared to a FreeBSD 
 |> jail.  At least once jails appeared it often was nothing more than a 
 |> "if(process->jailed)" at the beginning of some kernel functions.
 |
 |*nod*
 |
 |I think that Linux still has some things to learn from BSD jails and 
 |Solaris zones.

I personally am always astonished when i have contact with Plan9.
I cannot really use it, i am too used to BSD/Linux, and some
things drive me insane (network configuration etc. is so
spreaded).  But i am subscribed to the MLs ever since i have been
pointed to Plan9 and always wonder when problem solutions happen
to happen, how it is done.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [COFF] A little networking tool to reduce having to run emulators with privilege
  2020-09-23 23:50             ` steffen
@ 2020-09-24  2:58               ` gtaylor
  0 siblings, 0 replies; 9+ messages in thread
From: gtaylor @ 2020-09-24  2:58 UTC (permalink / raw)

On 9/23/20 5:50 PM, Steffen Nurpmeso wrote:
> Hello.

Hi,

> Sorry for the late reply, very last summer day with cold and rainy 
> days to come, and then i really turned on the TV on a non-Sunday 
> for the first time for long, to watch the German film "Gundermann", 
> which i had on my list ever since i have heard from it.

No apology necessary.

That's one of the wonderful things about email.  It's their waiting for 
it to be convenient for you.  ;-)

> This thread also gives me the feeling of being twisted and wrapped, 
> .. but you know, feelings, pah.
~chuckle~

> This i have never used, i do not have it installed.  But you mention 
> it -- what i did not understand is why it is not "simply" made a 
> policy that when you want to create a software bridge device, you 
> can create it.

Either I'm having trouble unpacking that or otherwise not understanding it.

You can create the bridge any time that you want to.  (Presuming that 
the kernel dependencies have been met.)

Or are you making reference to you shouldn't be able to create the 
bridge if it won't work?

> In the end i expect the problem to manifest in only a few bytes 
> of data?

How many byte of data are you able to send from the system behind the 
bridge behind the WiFi NIC that doesn't want to play nicely?

> Well i never had doubts there are some technical reasons, i just did 
> not know them, and there were no nice error messages nor documentation 
> of the problem.  But given that it _does_ work for some drivers i was 
> also sure that the actual problem is more of a notational sort, so 
> to say, and then i did not understand why the creation of a software 
> bridge does not take appropriate steps to make this happen.

Because the behavior that I'm talking about is atypical, almost never 
needed, and requires additional kernel capabilities.

There is also no good way for a system to know if the hack that I'm 
talking about needs to be done or not.

Even more importantly, if this hack was automatically done, it would 
mess up the most common use case for bridges or at least cause 
unexpected behavior.

> I could say "heck, but if ..
> 
> then it can be done".

Bridging does work with wireless in so far as the bridged frame is 
handed to the wireless NIC.

It's the wireless NIC that fails to do things properly.

Is it the light switch or the light bulb's fault that the light does not 
come on when there is no power to the building?

> But i do not.

But in a way you did.  ;-)

> Well if i had the time and motivation i surely could dive into kernel 
> sources and look around, and maybe even get it (done).  But i have 
> so much to do with what other people would call livestock but i call 
> friends, for example, so it seems i am too old and hackneyed to get 
> that job done.

Fair enough.  To each their own.

Chances are that your friends are there for you when the power is out.

> yeah.
> 
> Yes, this is all ssh/scp communication, and that the script handles 
> SSH came with that veth/bridge rewrite, before everything simply had 
> to be typed.  And so i need a name or a fixed IP in .ssh/known_hosts. 
> Here i have both now, without much straining.
> 
> It has to be said that Linux should really offer some kind of 
> setproctitle(2), more and more BSD programs use it to give sense to 
> ps(1) output of privilege-separated programs, and the compatibility 
> code using PR_SET_MM_MAP of prctl(2) after reading and parsing 
> /proc/self/stat is, well, immense effort.

Okay.

I'm guessing we're running / ran different versions of dhcpcd.  I'm used 
to one process per instance.  Hence why I thought you were running 
multiple instances of it.  (Along with the thought that you were running 
multiple network namespaces.)

> No.  No, that is really overkill, i am not a student with bad body 
> hygiene having fun with software or something.  It is annoying enough 
> that you ever and always again settle on a "that is a good status quo" 
> just to find out the next day that doing it all anew would possibly 
> improve the situation.  No.

Indeed.  If it does what you want and you're satisfied with it, the more 
power to you.

> That HostAP software or how it is named, doesn't it deal with this the 
> right way?  I seem to have read this is possible.  Never tried this.

I have run HostAP.  It's fairly nice and feature complete.  However, 
HostAP has requirements of wireless cards that play nice with things. 
Even the nicest software can't realistically overcome hardware limitations.

> Imho "bridge" should do this by itself automatically.

See above.  What I'm suggesting with EBTables is way beyond what bridges 
should do.  In fact, any bridge that does it is not behaving like a 
bridge.  It's behaving like something completely different.

Bridges learn who's on what side and doesn't pass something across if it 
knows the destination is on the side the message is from.  There's no 
point in sending it across.

What I'm talking about doing is effectively the bridge lying and 
claiming that everybody on the other side is the bridge.

So, no, bridges should not do what I'm pontificating automatically.

> Well you could look into software like "containers", you could read
> 
>    If cgroup support, the memory controller and the pids controller
>    are compiled into the kernel, a mounted cgroup2 filesystem can
>    be used to apply memory and process-count limits to a container
>    as it is started. For example, the shell script
> 
>      #!/bin/sh -e
>      echo +memory +pids >/sys/fs/cgroup/cgroup.subtree_control
>      mkdir /sys/fs/cgroup/mycontainer
>      echo $$ >/sys/fs/cgroup/mycontainer/tasks

Does this move the current running shell's PID ($$) into the mycontainer 
cgroup?

>      echo 2G >/sys/fs/cgroup/mycontainer/memory.high
>      echo 3G >/sys/fs/cgroup/mycontainer/memory.max
>      echo 2G >/sys/fs/cgroup/mycontainer/memory.swap.max
>      echo 256 >sys/fs/cgroup/mycontainer/pids.max

Setting parameters on the mycontainer cgroup.

Seems simple enough.

>      exec contain [...]

exec to replace the running shell with a new process while keeping the 
same PID.

>    See linux/kernel/Documentation/cgroup-v2.txt for detailed info
>    on the available controllers and configuration parameters.

I'll have to check that out.

That all seems simple enough.  It's definitely in line with the way that 
I do namespaces.

> But most of it can be done with unshare and nsenter.  For example the 
> super minimal ulinux project (a bit stale) has a box script which does, 
> among other things
> 
>    # shellcheck disable=SC2086
>    unshare \
>      --ipc \
>      --uts \
>      --pid \
>      --user \
>      --fork \
>      --mount \
>      --mount-proc \
>      --map-root-user \
>      /usr/sbin/chroot "$tmpc/root" \
>        /usr/bin/env -i $BOX_ENV /bin/sh -c "source /init; $*"

That all looks like typical namespace manipulation.  I don't see the 
relation to cgroups.

> It could do nice things like
> 
>    setup_tmpc() {
>      mkdir -p "$tmpc/root" "$tmpc/storage" "$tmpc/work"
>      mount -t overlay \
>        -o upperdir="$tmpc/storage,lowerdir=/,workdir=$tmpc/work" \
>        overlayfs "$tmpc/root"
> 
> I am really interested, but am too lazy to convert the scripts so 
> that this "distribution" (almost kernel-only) can be build without 
> docker etc.

Fair.

Though, I don't see any reason why you can't have a minimal version of 
the distribution that you're running things on now.  Thus there wouldn't 
be any porting.

> I personally am always astonished when i have contact with Plan9.

I've not done anything with Plan9.  Though I have been impressed by what 
I've heard others say and read what they write.

> I cannot really use it, i am too used to BSD/Linux, and some things 
> drive me insane (network configuration etc. is so spreaded).  But i 
> am subscribed to the MLs ever since i have been pointed to Plan9 and 
> always wonder when problem solutions happen to happen, how it is done.

I'm not sure what "MLs" are (is?) in this context.

-- 
Grant. . . .
unix || die

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4013 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://minnie.tuhs.org/pipermail/coff/attachments/20200923/5f3991d3/attachment.bin>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-09-24  2:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-20 22:28 [COFF] A little networking tool to reduce having to run emulators with privilege athornton
2020-09-21 21:38 ` steffen
2020-09-22  0:19   ` gtaylor
2020-09-22 14:54     ` steffen
2020-09-22 18:15       ` gtaylor
2020-09-22 21:53         ` steffen
2020-09-23  1:54           ` gtaylor
2020-09-23 23:50             ` steffen
2020-09-24  2:58               ` gtaylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).