Tuning TCP ports for your Elixir app

By Jake Morrison in DevOps on Fri 24 May 2019

Elixir is great at handling lots of concurrent connections. When you actually try to do this, however, you will bump up against the default OS configuration which limits the number of open filehandles/sockets. You may also run out of TCP ephemeral ports.

The result is poor application performance, e.g. timeouts. If you are running behind Nginx, you may see it as 503 errors, with your application taking five seconds to respond. When you look at the logs, however, the application response time is fine.

What is happening is that the client talks to Nginx, then Nginx talks to your app, but there are not enough filehandles available, so Nginx queues the request. You may start with 1024 by default, which is pitifully small. You will need to raise that at each step in the config, e.g. systemd unit file, Nginx, and Erlang VM.

First, make sure that open file limits are increased at each step in the chain.

OS account limits

The OS account running the app has default limits:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 3841
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 3841
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Here we can see the open file limit of 1024 by default.

You can see the same for a running process by looking up limits for the process id, cat /proc/<pid>/limits:

# cat /proc/800/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             3841                 3841                 processes
Max open files            65535                65535                files
Max locked memory         16777216             16777216             bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       3841                 3841                 signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Create a /etc/security/limits.d/foo-limits file for the account running the app:

foo soft    nofile          1000000
foo hard    nofile          1000000

Systemd

Systemd doesn't use the limits file, though, so you need to set the limit with variables like LimitNOFILE in the unit file for your app:

LimitNOFILE=65536

Erlang VM options

In rel/vm.args for your release, increase the number of ports on the command line. In newer Erlang versions, the setting is +Q 65536. In older ones, use -env ERL_MAX_PORTS 65536.

Nginx open files

If you are running Nginx in front of your app, make sure it has enough open ports. See "Serving your Phoenix app with Nginx".

Ephemeral TCP ports

After that, you may run into lack of ephemeral TCP ports. This hits you very hard running behind Nginx as a proxy, but can also hit you on the outbound side when you are talking to a small number of back end servers.

In TCP/IP, a connection is defined by the combination of source IP + source port + destination IP + destination port. In this situation, all but the source port is fixed: 127.0.0.1 + random + 127.0.0.1 + 4000. There are only 64K ports. The TCP/IP stack won't reuse a port for 2 x maximum segment lifetime, which by default is 2 minutes.

Doing the math:

1024 ports / 120 sec = 8.53 requests per second with default file handle limit
60,000 / 120 = 500 requests per sec

If you are getting limited talking to back end servers, then it's useful to give your server multiple IP addresses. Tell your HTTP client library to use an IP from a pool as its source when making the request. Then the equation turns into "source IP from pool" + random port + target IP + 80.

You may be able to reuse outbound connections, with HTTP pipelining, if the back ends support it. At a certain point, the back end servers may be the limit. They may benefit from having more IPs as well.

DNS

DNS lookups can become an issue on outbound connections. We have had hosting providers block us because they thought we were doing a DOS attack on their DNS. Google DNS rate limits to 100 requests per second by default. Run a local caching DNS for your app.

Kernel tuning

You can tune the OS kernel settings to reduce the maximum segment lifetime.

In /etc/sysctl.conf (or a file in /etc/sysctl.d/):

# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 15
# Recycle and Reuse TIME_WAIT sockets faster
net.ipv4.tcp_tw_reuse = 1

Load the new settings:

sudo sysctl -p

There are other kernel TCP settings you can tune as well, e.g.

sysctl -w fs.file-max=12000500
sysctl -w fs.nr_open=20000500
ulimit -n 20000000
sysctl -w net.ipv4.tcp_mem='10000000 10000000 10000000'
sysctl -w net.ipv4.tcp_rmem='1024 4096 16384'
sysctl -w net.ipv4.tcp_wmem='1024 4096 16384'
sysctl -w net.core.rmem_max=16384
sysctl -w net.core.wmem_max=16384

See https://phoenixframework.org/blog/the-road-to-2-million-websocket-connections and https://www.rabbitmq.com/networking.html#dealing-with-high-connection-churn

See this presentation on tuning Elixir performance.