In certain configurations, packets being sent on overlay networks can be silently dropped, in particular when vmw_conn_notifyd
is being used. There is an open issue with VMware discussing the behavior which is worth following, and is worth reading for potential workarounds until this is patched.
opened 12:32PM - 09 Dec 20 UTC
We have an issue with semi-large packets being silently dropped on hosts running… on VmWare NSX, where the hosts are running `vmw_conn_notifyd`. If the size of network packets rise above a certain threshold (which seems to be somewhere below 900 bytes, far lower than the MTU of any interface involved), packets are simply lost. We have narrowed this down to `vmw_conn_notify`, if this is running, the failures below occur. If you cannot reproduce the error with packet sizes of 859, try to increase this number while keeping it below the MTU setting of the interface.
`host 1` is the swarm manager.
**host 1:**
`docker network create --attachable --driver overlay --scope swarm foo_net`
`docker run -it --name test_server --hostname test_server --rm --network foo_net ubuntu:latest bash`
**host 2:**
`docker run -it --name test_client --hostname test_client --rm --network foo_net ubuntu:latest bash`
Then, inside container:
```
apt update && apt install -y iputils-ping
ping -c 1 -s 858 -M do test_server # this works
ping -c 1 -s 859 -M do test_server # this doesn't
```
**On both hosts (as root, outside containers):**
`/etc/rc.d/init.d/vmw_conn_notifyd stop`
**host 2:**
`ping -c 1 -s 859 -M do test_server # now works`
**Versions:**
AppDefense is v2.3.2.0
ESXi is v6.7.0, build 16773714 (patch release ESXi670-202010001 (14 October)
NSX is 6.4.6.14819921
```
# docker --version
Docker version 19.03.13, build 4484c46d9d
# uname -srvmpio
Linux 3.10.0-1160.2.2.el7.x86_64 #1 SMP Tue Oct 20 16:53:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
# vmw_conn_notify -v
vmw_conn_notify version : 1.1.0.0
```