Best practices for deploying Elixir apps

Figuring out how to deploy your Elixir app can be confusing, as it's a bit different from other languages. This post describes how we deploy apps with the reasons behind our decisions.

We have created a working example template which puts all the pieces together to get you started quickly. This post gives more background and links to advanced topics.

Summary

Big picture, we deploy Erlang "releases" using systemd for process supervision. We run in cloud or dedicated server instances running CentOS or Ubuntu Linux. We build on a continuous integration server and deploy using Ansible or AWS CodeDeploy.

The recommendations here come from the kinds of apps we build. We make health care and financial apps, so we are paranoid about security. We run apps that get large amounts of traffic, so we are careful about performance. And we deploy to the cloud, so the apps need to be stateless, dynamically scaled under the control of a system like AWS CodeDeploy.

Locking dependency versions

The process starts in your dev environment. When you run mix deps.get, mix fetches the dependencies listed in the mix.exs, but they are normally only loosely specified, e.g. {:cowboy, "~> 1.0"} will actually install version 1.1.2.

Mix records the specific versions that it fetched in the mix.lock file. Later, on the build machine, mix uses the specific package version or git reference in the lock file to build the release.

This makes a release completely predictable and reproducible. It does not depend on the version of libraries installed on the server, and one app doesn't affect another. It's like Ruby's Gemspec or Node's lock files, but built-in and more reliable.

This locking happens automatically as part of the standard mix process, but make sure you check the mix.lock file into source control.

Managing build tool versions

When you are building a release, Elixir is just another library dependency. You can also package the Erlang virtual machine with your release. That lets you upgrade production systems with no drama. We have apps which have been running continuously for years on clusters of servers, upgrading through multiple Elixir and Erlang versions with no downtime.

In order to reliably upgrade and roll back systems, we need to be able to precisely specify Erlang and Elixir versions. We can't just use the version that comes from the OS. Erlang and Elixir also release frequently, so the stable OS packages are generally out of date. We also need to be able to run multiple versions at once, moving a version through dev, test, and production.

We use ASDF to manage the versions of Erlang, Elixir and Node.js. It is a language-independent equivalent to tools like Ruby's rbenv.

ASDF looks at the .tool-versions file and automatically sets the path to point to the specified version used by each project.

Create the .tool-versions file and set the version:

erlang 20.3
elixir 1.6.5
nodejs 8.2.1

Install ASDF as described here: https://github.com/cogini/elixir-deploy-template#set-up-asdf

Once it's set up, go to your project directory and install the Erlang, Elixir and Node.js versions matching the .tool-versions file there:

asdf install

Building and testing

We normally develop on macOS and deploy to Linux. Erlang does not rely much on the operating system, and mix manages library dependencies tightly, so we don't find it necessary to use Vagrant or Docker to isolate projects.

It is necessary, however, to build your release with an Erlang binary that matches your target system. We can't build the release on macOS and deploy to Linux.

We handle this by using the continuous integration server to build releases after it runs the tests. For simple projects, you can build on the same server that runs the app. Check out the code from git, build a release, then run it locally.

Like your dev machine, the build server runs ASDF. When you make a build, it uses the versions of Erlang and Elixir specified in the .tool-versions file.

Erlang releases

The most important part of the deployment process is using Erlang "releases". A release combines the Erlang VM, your application, and the libraries it depends on into a tarball, which you deploy as a unit.

The release has a script to start the app, launched and supervised by the OS init system (e.g. systemd). If it dies, the supervisor restarts it.

You may be tempted to use a process where you check out from git and somehow run mix phx.server. That's reinventing the wheel, badly. Just use releases, they handle a lot of the details you need to run things reliably in production.

You can get a remote console on your production app just like it was running under iex. Log into the host server using ssh and run, e.g.:

MIX_ENV=prod /opt/myorg/foo/current/bin/foo remote_console

Releases also support one of Erlang's cool features, hot code updates. If you are not keeping state in GenServers, then they are easy. If not, you need to write an upgrade function, like a database migration but for your state.

Distillery

The Distillery library makes it easy to build releases.

Add the library to your deps:

defp deps do
  [{:distillery, "~> 1.5", runtime: false}]
end

Initialize distillery:

mix release.init

Create a rel/config.exs file for your project.

use Mix.Releases.Config,
  default_release: :foo,
  default_environment: Mix.env

environment :dev do
  set dev_mode: true
  set include_erts: false
  set cookie: :dev
end

environment :prod do
  set include_erts: true
  set include_src: false
  set cookie: File.read!("config/cookie.txt") |> String.to_atom
  set vm_args: "rel/vm.args.eex"
end

release :foo do
  set version: current_version(:foo)
  set applications: [
    :runtime_tools
  ]
  plugin Conform.ReleasePlugin
end

The "cookie" is a password that allows Erlang nodes to talk to each other using the Erlang distribution protocol. In a straightforward web app, it's not needed, but should still be strong.

You can generate it with a command like:

iex> :crypto.strong_rand_bytes(32) |> Base.encode16

The config above puts the cookie in a file on the build server, config/cookie.txt. There are other ways to set the cookie, e.g. we generally write it to $HOME/.erlang.cookie on the prod servers as part of our deploy process. It depends whether you trust your build server with the cookie, e.g. if you are using a hosted CI system. If your machines don't talk to each other using the Erlang distribution protocol, then firewall off the ports and it's not a security question.

See this article for more tips.

The rel/vm.args.eex file sets the Erlang VM startup arguments. It's an EEx template so we can insert variables from the release config:

## Name of the node
-name <%= release_name %>@127.0.0.1

## Cookie for distributed erlang
-setcookie <%= release.profile.cookie %>

## Enable kernel poll and a few async threads
+K true
+A 128

## Increase number of concurrent ports/sockets
-env ERL_MAX_PORTS 65536

## Tweak GC to run more often
##-env ERL_FULLSWEEP_AFTER 10

# Enable SMP automatically based on availability
-smp auto

You can tune VM settings as discussed in this Elixir performance tuning presentation. Most important is making sure you have enough ports to handle your traffic, as described below.

After creating these files, build the release:

MIX_ENV=prod mix release

This creates a tarball with everything you need to deploy in e.g.

_build/prod/rel/foo/releases/0.1.0/foo.tar.gz

Supervising your app

In the Erlang OTP framework, we use supervisors to start and stop processes, restarting them in case of problems. It's turtles all the way down: you need a supervisor to make sure your Erlang VM is running, restarting it if there is a problem.

Ignore the haters, systemd is the best supervisor we have right now, and all the Linux distros are standardizing on it. On CentOS 6 or older Ubuntu releases, you can use upstart, but it's not as nice.

Systemd handles all the things that "well behaved" daemons need to do. Instead of scripts, it has declarative config that handles standard situations. It sets up the environment, handles logging and controls permissions.

Here is an example systemd unit file:

[Unit]
Description=Foo server
After=local-fs.target network.target

[Service]
Type=simple
User=foo
Group=foo
WorkingDirectory=/opt/myorg/foo/current/current
ExecStart=/opt/myorg/foo/current/bin/foo foreground
ExecStop=/opt/myorg/foo/current/bin/foo stop
Environment=LANG=en_US.UTF-8
Environment=MIX_ENV=prod
Environment=RELEASE_MUTABLE_DIR=/var/tmp/foo
Environment=PORT=4001
Environment=CONFORM_CONF_PATH=/etc/foo/foo.conf
LimitNOFILE=65536
UMask=0027
SyslogIdentifier=foo
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

The Environment commands set OS environment variables such as MIX_ENV and the locale. You can also put them in an environment file:

EnvironmentFile=/etc/sysconfig/foo

For security, following the principle of least privilege, we limit the app to only what it really needs to do its job. That way if the app is compromised, the attacker is limited in what they can do.

Following that philosophy, we use a separate OS user (e.g. deploy) to upload the release files from the user that the app runs under (e.g. foo).

The script that starts the system needs to write some files, e.g. to generate the running config using Conform and write to the startup log.

The CONFORM_CONF_PATH environment var specifies the location of the Conform config file:

Environment=CONFORM_CONF_PATH=/etc/foo/foo.conf

This config file is the "persistent" config outside of your release, and overrides the defaults. It is a good place to put secrets like database passwords and API keys.

The RELEASE_MUTABLE_DIR environment var specifies a temp directory writable by the app user:

Environment=RELEASE_MUTABLE_DIR=/var/tmp/foo

If you are running systemd, then it will capture early startup error messages. With an older supervisor, e.g. Upstart, the initial startup is here: $RELEASE_MUTABLE_DIR/log. You can move it by setting the RUNNER_LOG_DIR environment variable, e.g.:

Environment=RUNNER_LOG_DIR=/var/log/foo

Systemd captures any messages sent to the console and puts them in the system journal. journald handles log rotation, and the app doesn't need permissions to write log files, which is a security win.

If the app generates a lot of logs, then you can configure a file backend for Logger.

Elixir can scale to handle lots of traffic, but it needs to be given enough filehandles. Use LimitNOFILE=65536 (or higher).

The proper location for your systemd unit file is under /lib/systemd/system. This allows admins to override settings on a specific machine by placing files under /etc/systemd.

See the systemd docs and the Distillery docs for more options.

Deploying the code

We use a structure like Capistrano to manage the release files. We first create a base directory named for the organization and app, e.g. /opt/myorg/foo. Under that we create a releases directory to hold the release files.

The actual deployment process works like this:

  1. Create a new directory for the release with a timestamp

    /opt/myorg/foo/releases/20171114T072116

  2. Upload the new release tarball to the server and unpack it

  3. Make a symlink from /opt/myorg/foo/current to the new release dir
  4. Restart the app using the systemd unit

Deploying locally

For simple applications, we build on the same server.

These mix tasks help you copy the release files to a standard location on the local machine.

MIX_ENV=prod mix deploy.local

They handle the process of creating the timestamped directory, finding and extracting the release tarball, extracting it to the target dir, and making the symlink.

Rolling back

If there is a problem with the release, then it's easy to roll back. Just update the current symlink to point to the previous working release, then restart.

This task automates the process of rolling back to last release:

MIX_ENV=prod mix deploy.local.rollback

Deploying with Ansible

We use Ansible to set up the system and to deploy the code. It's a lightweight general-purpose tool which is easy for both devs and ops to understand.

We split the deployment into two phases, setup and deploy. In the setup phase, we do the tasks that require elevated permissions, e.g. creating user accounts, creating app dirs, installing OS packages, and setting up the firewall.

In the deploy phase, we push the latest code to the server and restart it. The deploy doesn't require admin permissions, so it can run from a regular user, e.g. the build server.

Connecting to the outside world

At this point, we have a working release, but we still need to be able to talk to it. We have two options for how to receive traffic, via a proxy or direct.

Listening directly on port 80

If you are only running a single app on your box, you can listen directly to HTTP traffic. That will end up giving you lower latency and overall lower complexity.

Erlang can handle lots of load with no problems, so don't worry. For example, Heroku's routing layer is based on Erlang. We handle billions of requests a day, including DDOS attacks. You can handle 3000 requests per second on a $5/month Digital Ocean droplet.

You may need to set some HTTP options that Nginx was dealing with, e.g.:

config :foo, FooWeb.Endpoint,
  http: [
    compress: true,
    protocol_options: [max_keepalive: 5_000_000]
  ],

Normally, in order to listen on a port less than 1024, an app needs to be running as root or have elevated capabilities. That's a security problem waiting to happen, though, so we run the app on a normal port, e.g. 4000, and redirect traffic from port 80 to 4000 in the firewall using an iptables rule.

Additional topics