A single instance of our edge proxies was seeing intermittent spikes and exhaustion of internal shared memory. This was not an exhaustion of available memory, simply of that it was configured to use. When this happened connections would fail and in most cases retry again. This also affected websockets (and thus Communicator) where calls were balanced to the affected Kamailio whilst it was having issues.
Following yesterday’s loss of the Leeds site, DNS was repointed to London to cater for client devices which do not support SRV properly, before being restored. We believe a large number of those devices also cached stale DNS causing an imbalance of connection attempts and the spikes in memory usage.
With this limit increased, whilst we still see an imbalance of client requests, they are more than adequately handled.
We apologise to those intermittently affected by this elusive issue.
Oct 15, 15:15 BST
Service has now been restored to a stable level. We will continue to monitor closely. Thank you for your feedback and input with this incident and apologies for the variable service and time to restore.
Oct 15, 14:46 BST
Call completion appears stable - please continue to report if you find otherwise. There is a secondary problem with Communicator working intermittently which continues to be investigated
Oct 15, 12:28 BST
We have received some reports of call failure - affecting Nimvelo and Simwood Partner accounts. Call traffic and volumes otherwise appear normal. These reports are being investigated
Oct 15, 11:52 BST