Docker, VXLANs, and MTU
2026-03-14
If you're running a Nomad cluster across hosts with mixed outbound networking you may run into a fun one. We'll ignore that you shouldn't be running different networking setups within a cluster!
The issue was some services that required making external requests were failing, however some requests from the services did work. Other requests would just timeout without any warnings or errors. Running the same network request with curl on the worker itself worked without issue, so we assumed there was no network issue. Moving the service to a different host also sometimes allowed all requests while others were sporadic.
After some more poking around it turned out some workers had public IP's for outbound traffic only, while others were going out the correct way via a bastion host (with no public IP). Initially this didn't feel like an issue as curl requests worked on both hosts...alas that was the issue.
For the SNAT to work we have to a VXLAN overlay which for the internal NIC automatically lowers the MTU down to 1450. This explains why the manual requests worked on both machines. Docker by default uses the standard 1500 MTU, which on some hosts in fine which don't use VXLAN. However, for the hosts using the proper network setup they'd have constant packet drops due to the mismatch.
Thankfully the fix is easy, just tell Docker to use a different MTU on it's virtual interfaces. Update (or create) your /etc/docker/daemon.json configuration file.
{
"mtu": 1450
}
After a restart of the service, any new containers will get made with the new correct MTU, existing containers will need to be recreated sadly.
Now you know, sometimes MTU should be different than the default!