On multiple XenServer 5.x setups I’ve been experiencing dropped packets evident as light to medium packet loss on a busy DomU. It only affects heavy network loads on DomU’s and not Dom0. It also doesn’t seem to care about what OS is running; I’m seeing it in Debian, CentOS, and Windows. Since I manage streaming services, I have some heavy network loads and the packet loss is causing issues for some clients. I also notice the following on the affected VIF as seen on Dom0 (this is just an example):
vif406.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF
UP BROADCAST RUNNING NOARP MTU:1500 Metric:1
RX packets:3793176632 errors:0 dropped:0 overruns:0 frame:0
TX packets:3083746066 errors:0 dropped:280 overruns:0 carrier:0
collisions:0 txqueuelen:32
RX bytes:1680119904 (1.5 GiB) TX bytes:2415406042 (2.2 GiB)
Notice the dropped TX packets. This is only seen on Dom0 on the VIF. No packets lost on the Dom0 PIF or the DomU itself as shown by ifconfig.
I tried disabling checksum offloading (as suggested for Windows 2003 issues) on both Dom0 and DomU but it had no affect at all. I was almost ready to give up until I started wondering about the really small txqueuelen. 32 is really small — much smaller than the Linux default. I come from a Cisco background and we could never run ports with buffers that small.
On a hunch I tried increasing it on Dom0 to a reasonable value for a busy network:
ifconfig vif406.0 txqueuelen 1500
To my surprise, it completely fixed the packet loss. Single thread speeds went from bouncing all over the place to a steady 30+MB/sec. It was really that simple, and I can’t believe more people haven’t been hit by this. Especially for network based storage-backed DomU.
So I wrote a script and put it into Cron so all VIF will be set at 1500 on a regular basis:
ifconfig | grep -P '^vif\d+\.\d+' | awk '{system("ifconfig "$1" txqueuelen 1500")}'
If anybody knows how to set the txqueuelen permanently through XE or XenStore I want to hear it. But for now I’ve found nothing in the manual or on the net to suggest how to do this.
I checked some older XenServer 4.X boxes and they don’t have this problem even though the txqueuelen is still only 32. Only 5.X exhibits problems from what I can see. All machines use standard Intel gigabit interfaces (82574L), nothing out of the ordinary.
Hopefully this helps someone else!