chr4

Devops. I've never asked for this.

Use and automate letsencrypt certificates (ACME) in an high availability environment

Mozilla launched a “free, automated and open” certificate authority called Let’s encrypt. As the name suggests, it provides free certificates trusted by all (major) browsers and operating systems. I’m using it heavily (on this blog, for example).

This blog post shows how Syncthing can be used to deploy letsencrypt certificates in an environment with multiple servers (e.g. in a round-robin scenario) without adding a single-point-of-failure.

ACME

Let’s encrypt automated the process of requesting and authenticating a certificate using a protocol called ACME. The client requesting a new certificate uses a .well-known path on its webserver where it places a challenge, and Let’s encrypt retrieves this challenge for authentification.

The actual process is a little more complicated, though. If you want to know how it works in detail, I recommend Let’s encrypt’s excellent ACME documentation.

The problem in high availability setups

When using multiple servers for SSL termination (e.g. in the load-balancing scenario described in the picture below, where SSL termination is handled by the nginx instances) each one requires a certificate for the domain(s) they are serving.

ACME high availablilty setup with HAproxy and nginx

In a setup that e.g. uses a round-robin, we can’t guarantee that the incoming request for the ACME challenge ends up on the server actually requesting the certificate. Furthermore, each server needs to request (and renew) its own certificates.

The cleanest solution I found for this problem is to share the .well-known challenge directory (and maybe even the certificate) between multiple servers.

Syncthing to the rescue!

The tool I found best to syncronize those directories was Syncthing. It is one of the most exiting tools for file-sharing, as it is completely decentralized and works without any central server (but can be configured to use one, if required), is fully peer-to-peer, open-soure, written in Go and cross-platform.

Syncthing fulfills all items on my wishlist:

  • Traffic between the instances is encrypted
  • The setup is automatically deployable
  • Instances can be easily added or removed
  • No single-point-of-failure (all nodes connect to each other, syncronizing the same directory between all machines)
  • No additional services required

I chose it to syncronize the /etc/nginx/certs directory. It shares the dhparams, SSL certificates and the ACME challenges between all nginx instances. Here’s what the shared directory looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ tree -a
.
├── .stfolder
├── acme
│   └── .well-known
│       └── acme-challenge
│           ├── 8xdoeH5OLPUij4xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
│           ├── cWaLNpzt_8v--xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
│           └── _wsvWIOyvP-45vt-xxxxxxxxxxxxxxxxxxxxxxxxxxx
├── dhparam.pem
├── www.example.com.crt
└── www.example.com.key

3 directories, 7 files

Implementation

We’re using Chef to automate our infrastructure at flinc, but the process should be easily adaptable to a automation tool of your choice.

Syncthing is easily deployed, as there’s an official repository available:

Install Syncthing

1
2
3
4
5
6
7
8
9
# Use the official Syncthing apt repositories
apt_repository 'syncthing-release' do
  uri 'http://apt.syncthing.net'
  distribution 'syncthing'
  components %w(release)
  key 'https://syncthing.net/release-key.txt'
end

package 'syncthing'

Set up systemd service

A systemd.service is quickly crafted from the provided example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[Unit]
Description=Syncthing - Open Source Continuous File Synchronization for %I
Documentation=man:syncthing(1)
After=network.target

[Service]
User=%i
ExecStart=/usr/bin/syncthing -no-browser -no-restart -logflags=0
Restart=on-failure
SuccessExitStatus=3 4
RestartForceExitStatus=3 4

[Install]
WantedBy=multi-user.target

Generate Syncthing configuration and keys

We want to centrally manage our instances, so Syncthing certificates are stored centrally in Chef’s encrypted data bags, alongside their device IDs and API keys. Here’s how to generate and extract everything that’s required:

First, generate a new key-pair and save the device ID and API key for each node:

1
2
3
4
NODE=1.nginx.example.com
syncthing --generate=$NODE |grep ID |awk '{ print $5 }' > $NODE/device_id
grep apikey $NODE/config.xml |cut -d\> -f2 |cut -d\< -f1 > $NODE/apikey
rm $NODE/config.xml

The resulting key.pem and cert.pem will then be deployed into the .config/syncthing directory on the target machine.

After using Syncthing’s web-interface to configure the share, the resulting config.xml was then used to craft the following ERB template:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
<configuration version="16">
    <!-- This is our shared folder. Scan it every 5s, so updates are syncronized quickly -->
    <folder id="vm623-hxlsp" label="letsencrypt" path="/etc/nginx/certs/" type="readwrite" rescanIntervalS="5" ignorePerms="false" autoNormalize="true">

        <!-- Share the folder between all nodes -->
        <% @nodes.each do |name, config| %>
          <device id="<%= config['id'] %>"></device>
        <% end %>

        <!-- Share settings. Default settings, with simple versioning -->
        <minDiskFreePct>1</minDiskFreePct>
        <versioning type="simple">
            <param key="keep" val="5"></param>
        </versioning>
        <copiers>0</copiers>
        <pullers>0</pullers>
        <hashers>0</hashers>
        <order>random</order>
        <ignoreDelete>false</ignoreDelete>
        <scanProgressIntervalS>0</scanProgressIntervalS>
        <pullerSleepS>0</pullerSleepS>
        <pullerPauseS>0</pullerPauseS>
        <maxConflicts>10</maxConflicts>
        <disableSparseFiles>false</disableSparseFiles>
        <disableTempIndexes>false</disableTempIndexes>
    </folder>

    <!-- Make sure all nodes are connected to one another -->
    <% @nodes.each do |name, config| %>
      <device id="<%= config['id'] %>" name="<%= name %>" compression="metadata" introducer="false">
          <address><%= config['address'] %></address>
      </device>
    <% end %>

    <gui enabled="true" tls="false" debugging="false">
        <address>127.0.0.1:8384</address>
        <apikey><%= @apikey %></apikey>
        <theme>default</theme>
    </gui>
    <options>
        <listenAddress>default</listenAddress>

        <!-- Disable announcement, as we're automatically adding all servers above -->
        <globalAnnounceServer>default</globalAnnounceServer>
        <globalAnnounceEnabled>false</globalAnnounceEnabled>
        <localAnnounceEnabled>false</localAnnounceEnabled>
        <localAnnouncePort>21027</localAnnouncePort>
        <localAnnounceMCAddr>[ff12::8384]:21027</localAnnounceMCAddr>

        <maxSendKbps>0</maxSendKbps>
        <maxRecvKbps>0</maxRecvKbps>
        <reconnectionIntervalS>60</reconnectionIntervalS>
        <relaysEnabled>false</relaysEnabled>
        <relayReconnectIntervalM>10</relayReconnectIntervalM>
        <startBrowser>false</startBrowser>
        <natEnabled>false</natEnabled>
        <natLeaseMinutes>60</natLeaseMinutes>
        <natRenewalMinutes>30</natRenewalMinutes>
        <natTimeoutSeconds>10</natTimeoutSeconds>
        <urAccepted>1</urAccepted>
        <urUniqueID></urUniqueID>
        <urURL>https://data.syncthing.net/newdata</urURL>
        <urPostInsecurely>false</urPostInsecurely>
        <urInitialDelayS>1800</urInitialDelayS>
        <restartOnWakeup>true</restartOnWakeup>
        <autoUpgradeIntervalH>12</autoUpgradeIntervalH>
        <keepTemporariesH>24</keepTemporariesH>
        <cacheIgnoredFiles>false</cacheIgnoredFiles>
        <progressUpdateIntervalS>5</progressUpdateIntervalS>
        <symlinksEnabled>true</symlinksEnabled>
        <limitBandwidthInLan>false</limitBandwidthInLan>
        <minHomeDiskFreePct>1</minHomeDiskFreePct>
        <releasesURL>https://upgrades.syncthing.net/meta.json</releasesURL>
        <overwriteRemoteDeviceNamesOnConnect>false</overwriteRemoteDeviceNamesOnConnect>
        <tempIndexMinBlocks>10</tempIndexMinBlocks>
    </options>
</configuration>

Deploy Syncthing configuration

Here’s how we deploy Syncthing keys and configuration from encrypted data bags to the nginx nodes (Note: It probably makes sense to use run Syncthing as the same user as nginx, as Syncthing needs to deploy a key that should only be readable by nginx and noone else):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Set this to the home directory of your user (probably the same user running nginx)
user = 'nginx'

# Populate node information from data bag
node_config = {}
node_list.each do |node_name|
  config = Chef::EncryptedDataBagItem.load('syncthing', node_name, data_bag_secret)
  node_config[node_name] = {}
  node_config[node_name]['id'] = config['device_id']

  # Set address to "dynamic" if it's ourselves
  node_config[node_name]['address'] = if node.name == node_name
    'dynamic'
  else
    "tcp://#{node_name}.#{node['domain']}:22000"
  end
end

# Deploy Syncthing certificate (from data bag)
local_config = Chef::EncryptedDataBagItem.load('syncthing', node.name, data_bag_secret)
%w(key cert).each do |k|
  # Show an error message if key couldn't be retrieved
  Chef.fatal("#{k}.pem is empty!") unless local_config[k]

  file "/home/#{user}/.config/syncthing/#{k}.pem" do
    mode 0o600
    owner  user
    group  user
    content local_config[k]
  end
end

# Deploy Syncthing configuration
template "/home/#{user}/data/.config/syncthing/config.xml" do
  mode   0o600
  owner  user
  group  user
  source 'syncthing.config.xml.erb'
  variables nodes: node_config, apikey: local_config['apikey']
end

# Restart Syncthing upon configuration/ key changes
service "syncthing@#{user}" do
  subscribes :restart, "template[/home/#{user}/.config/syncthing/config.xml]"
  subscribes :restart, "template[/home/#{user}/.config/syncthing/key.pem]"
  subscribes :restart, "template[/home/#{user}/.config/syncthing/cert.pem]"
  action [:enable, :start]
end

Restrict Syncthing to private backnet

We have a dedicated backnet for all environments. Syncthing should only be allowed on this specific backnet (in our case eth1). I’m using the iptables-ng cookbook to manage iptables.

1
2
3
4
5
# Allow Syncthing in backnet only
iptables_ng_rule '50-syncthing' do
  rule ['-i eth1 --protocol tcp --dport 22000 --match state --state NEW --jump ACCEPT',
        '-i eth1 --protocol udp --dport 21025 --match state --state NEW --jump ACCEPT']
end

Get the certificates and automate renewal

To actually request the certificate, the acme cookbook got you covered, which uses the ruby ACME library acme-client under the hood.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Get some bonus points for generating your own Diffie-Hellmann parameters:
execute 'openssl dhparam -out /etc/nginx/certs/dhparam.pem 2048' do
  creates '/etc/nginx/certs/dhparam.pem'
  notifies :restart, 'service[nginx]'
end

# Make sure acme-client gem is installed
include_recipe 'letsencrypt::default'

# Create a webroot for acme challenges
directory '/etc/nginx/certs/acme' do
  owner user
  group user
end

# Deploy nginx site to answer ACME challenges
template '/etc/nginx/conf.d/letsencrypt.example.com.conf' do
  mode      0o644
  source   'letsencrypt.nginx.erb'
  notifies :restart, 'service[nginx]', :immediately
  not_if   'test -f /etc/nginx/certs/www.example.com.crt'
end

letsencrypt_certificate 'www.example.com' do
  alt_names %w(example.com)
  owner     user
  group     user
  fullchain '/etc/nginx/certs/www.example.com.crt'
  key       '/etc/nginx/certs/www.example.com.key'
  method    'http'
  wwwroot   '/etc/nginx/certs/acme'
  notifies  :restart, 'service[nginx]'
end

# Remove temporary letsencrypt site
file '/etc/nginx/conf.d/letsencrypt.example.com.conf' do
  notifies :restart, 'service[nginx]', :immediately
  action :delete
end

The temporary letsencrypt.nginx.erb

1
2
3
4
5
6
7
8
9
10
server {
    # This is for HAproxy with proxy_protocol, adapt if necessary
    listen [::]:80 ipv6only=off proxy_protocol;

    # Serve well-known path for letsencrypt
    location /.well-known/acme-challenge {
        root /etc/nginx/certs/acme;
        default_type text/plain;
    }
}

Also make sure to include something like this to your actual nginx site configuration, so challenges of automatic renewals can be answered:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
server {
    # This is for HAproxy with proxy_protocol, adapt if necessary
    listen [::]:80 ipv6only=off proxy_protocol;

    # Use remote_addr from proxy_protocol
    real_ip_header proxy_protocol;
    set_real_ip_from 10.13.37.0/24;

    # Serve well-known path for letsencrypt
    location /.well-known/acme-challenge {
        root /etc/nginx/certs/acme;
        default_type text/plain;
    }

    location / {
        return 301 https://<%= @domain %>$request_uri;
    }
}

server {
    # This is for HAproxy with proxy_protocol, adapt if necessary
    listen [::]:443 ssl http2 ipv6only=off proxy_protocol;

    [...]
}

Wrap up

That’s it! We can now automatically request and renew free Let’s encrypt SSL certificates in our high availability setup! Syncthing will happily keep the certificates and challenges in sync, even if some nodes go down. More nodes can be added by simply adding the credentials to the syncthing data bag, and the configuration of all nodes will adapt automatically.

If you have some feedback, feel free to contact me. I’m also available for hire as a freelancer.