9front - general discussion about 9front
 help / color / mirror / Atom feed
* [9front] Speeding up snoopy(8)
@ 2022-10-18  4:57 droyo
  2022-10-20 15:55 ` David Arroyo
  0 siblings, 1 reply; 10+ messages in thread
From: droyo @ 2022-10-18  4:57 UTC (permalink / raw)
  To: 9front

I am a plan 9 novice, especially when it comes to performance, so
thank you in advance for your patience.

I have been trying to diagnose some interesting behavior in 9front's
TCP where it appears to be sending a handful of TCP segments
out-of-order every 1 second or so at gigabit rates, and this is a
of a detour from that.

I setup a 9front VM on qemu with 2 cores and 2GB of memory.  I have
not tried to tune the VM parameters at all, and it's very possible
that I'd get different results on real hardware.

I did a trivial tcp benchmark from the 9front VM:

	aux/listen1 tcp!*!9999 cat /dev/zero

And connected to the VM from its host.  This easily achieves more than
gigabit speeds.  The client and server are using sub-interfaces on the
same physical NIC, so latency is very low.

However, when I attempted to capture traffic on the 9front side with

	ramfs; snoopy -D -f 'tcp(sd=9999)' /net/ether0 >/tmp/pcap

I found that about 50% of the packets were not recorded (I knew this
because I compared it with a capture I took at the client).  With help
from IRC, I could see soft overflows incrementing in the
/net/ether0/$index/stats file.  I patched the stats command so I could
watch it during my benchmarks.

While cinap offered some alternative filtering options like ipmux, I
would like to try to improve snoopy's performance.  I tried profiling
snoopy with

	pid=`{ psu | awk '/snoopy/ { print $2 }' }
	echo profile > /proc/$pid/ctl
	while() { tprof $pid ; sleep 5 }

Here is a sample during the benchmark, with snoopy's output redirected
to /dev/null to rule out disk or ramfs bottlenecks:

	total: 1150
	TEXT 00200000
	    ms      %   sym
	   360	 31.3	pread
	   230	 20.0	pwrite
	   120	 10.4	nsec
	   110	  9.5	_filterpkt
	    70	  6.0	tracepkt
	    60	  5.2	p_filter
	    60	  5.2	defaultframer
	    50	  4.3	p_filter
	    40	  3.4	be2vlong
	    20	  1.7	read
	    10	  0.8	write
	    10	  0.8	newfilter
	    10	  0.8	main

You can see a full 50% of time is spent in pread/pwrite.  I patched
snoopy to dial /net/ether0!-2 instead of /net/ether0!-1, which
truncates the frames to the first 64 bytes, and the overflows went
away.

The first 64 bytes are generally the most useful bytes, so an easy
win would be to use the "-2" dial option when snoopy's "-M" flag is
<=64.

More generally, though, I think snoopy could be much faster if it
read multiple frames per pread() and wrote multiple records per
pwrite(). I am a bit surprised to see pwrite() so high even when
I am redirecting to /dev/null.

It's trivial enough to buffer snoopy's output with bio(2). But I
don't know of an existing interface that would let me read more than
one frame at a time. Is there one? The only other option I can think
of to reduce the effect of pread() would be to move the pread to
a separate thread. But I'm curious to hear your thoughts.

Thanks!
David


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-21 19:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18  4:57 [9front] Speeding up snoopy(8) droyo
2022-10-20 15:55 ` David Arroyo
2022-10-20 16:03   ` Sigrid Solveig Haflínudóttir
2022-10-21 15:38     ` David Arroyo
2022-10-21 16:04       ` ori
2022-10-21 16:54         ` David Arroyo
2022-10-21 17:10           ` ori
2022-10-21 17:40       ` Lyndon Nerenberg (VE7TFX/VE6BBM)
2022-10-21 18:52         ` ori
2022-10-21 19:23           ` ori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).