Archive for March, 2008

OpenVPN, Slow Download, VMware ESX

Tuesday, March 4th, 2008

I just finally solved a horrible OpenVPN problem that I’ve been having for a while. The OpenVPN server is a VM running Fedora 8 under ESX Server 3.5.

When I would go to download a file over the VPN, it was very, very slow - about 128kbit/s instead of the 16mbit/s it should be. Interestingly enough, if I downloaded a file directly from the OpenVPN server, and not from a server on the LAN that OpenVPN serves, it would run at full speed, suggesting that the problem was somewhere in the kernel’s NAT system. Further weirding things out, uploads worked at full speed.

I tried every OpenVPN config option I could think of. I tried changing MTUs on every interface. I looked at the iptables logs, tcpdump, even strace. Despite all this, iperf showed perfect numbers, suggesting an ip fragmentation problem. Finally, I tried one more thing, which should not have made a difference, but it did.

I changed the virtual hardware, for the ethernet adapters in the VM, from “enhanced vmxnet” to “flexible vmxnet”.

This should not have made a difference for a variety of reasons, but the most notable reason is the fact that the VMware tools were running on the VM. If you’ve got those tools running, it uses the virtual ethernet adapter in ‘enhanced vmxnet’ mode automatically. If you’re not running the tools, then the virtual ethernet adapter appears as some regular card that every OS has drivers for, in the event that you can’t run the tools. So in effect, the virtual hardware is exactly the same. The original option forced the card to run in ‘enhanced vmxnet’ mode only, this new option gives it a choice, but the tools pick ‘enhanced vmxnet’ anyway.

VMware’s networking support can be very strange at times. There is a huge bug in ESX 3.5 right now with regards to a certain ethernet chipset (”Intel Gigabit VT Quad Port Server Adapter”). You can’t run VLANs over an 802.3ad aggregated link (that’s when you use more than one ethernet port out of your switch in a sort of tunnel, adding redundancy and speed). If you do, traffic is randomly dropped. I get about 85% packet loss when I do that. So, my install currently does not use 802.3ad. VMware told me that it’s a known bug in their driver for this card, and that a fix will be included in the next patch. I reported this bug back in December and they still have not released a patch. And just a few weeks ago, they posted the bug to their knowledge base.

It seems like VMware’s support, as far as patches and updates go, has gradually become slower and slower since EMC bought them. If you want to make your fast, resilient company become super slow, sell to a gigantic faceless corporation. It’ll work every time.