chr4

Devops. I've never asked for this.

Make keepalived play nicely with netplan/ systemd-network

TL;DR: Use use_vmac directive in keepalived when using multicast and your carrier allows multiple MAC addresses per interface. If that’s not an option, migrate away from netplan/systemd-networkd, e.g. to ifupdown.

Saturday night, 06:00:
I was torn from my sleep by the lovely voice of my colleague, uttering the words dreaded among the ops community: “We’re down, Chris.”

After opening the lid of my notebook, I started digging into the issue.

We’re running several high availability services with the help of keepalived. When a service goes down, keepalived switches over services to a standby instance. This is usually done by assigning a dedicated virtual IP address (VIP) to a service which can then be migrated on the fly between instances. This is achieved by keepalived using the Virtual Router Redundance Protocol (VRRP) as described in RFC5789, and has proved itself to be very reliable in the last years.

Not so today. The primary server somehow lost its VIP, without any notice in the log files. Just like that. I reassigned the VIP manually, verified that we’re back up and went back to sleep.

We then did a post-mortem on Monday. Obviously, we wanted to know what the heck had happened, and how we could prevent it from happening again in the future.

DHCP

Our OpenStack hoster relies on DHCP to configure the attached network interfaces. I usually prefer static configured interfaces on servers, and eyed the configuration with some suspicion for a while. We’ve had some minor issues with static routes and DHCP in the past, but the environment was pretty stable, so we went with the hoster’s original base images and their configuration.

Ubuntu upgrade

Keeping this in mind, we’ve been implementing one bigger change in our infrastructure: We’ve upgraded some instances (including the database node that had the outage) from Ubuntu 16.04 LTS (xenial) to Ubuntu 18.04 LTS (bionic). One of the major changes between the two releases is the change of the default networking stack from ifupdown to netplan. Could this have caused the issue?

netplan

After digging around a bit more, I found out that calling netplan apply (to apply a network configuration) resulted in dropped VIPs.

Apparently, keepalived doesn’t monitor the VIP assigned to it, and neither initiates a failover, nor tries to reinstate the dropped VIP after it was removed from the interface. This is a known issue, and was fixed by this commit which was released with keepalived-2.0.0.

Unfortunately, Ubuntu ships with keepalived-1.3.9, and the keepalived developers do not provide an official repository for more recent versions. After considering providing packages myself, I came to the conclusion that this actually wouldn’t fit the actual problem, as keepalived would just note the removed VIP and failover to another machine. But I wanted to fix the underlying problem itself, instead of just coping with the symptoms.

systemd-networkd

The default renderer for netplan on Ubuntu is systemd-networkd, a systemd component. On every netplan apply, the systemd-networkd is restarted, applying the new configuration. This apparently also happens when a DHCP lease is renewed and results in the VIPs being removed, because systemd-networkd is unaware of them, as they are assigned by keepalived and are not configured in /etc/netplan.

This seems to be a feature, and I kind of agree why: If you have a messy interface configuration and you connect to a network, it makes sense to make sure the interface is in a defined state.

But having a downtime every time a DHCP lease is renewed obviously also is not an option.

So, how do I fix the problem permanently in a clean way? I was thinking about the following approaches:

1. Migrating to a static IP configuration

As I was critical of using DHCP anyway, I considered migrating to static IPs. While this would fix the issues with DHCP lease renewals, we’d still have the same problem on other occasions of systemctl restart systemd-networkd (e.g. automatic security updates). I was reluctant to change that much of the networking stack, as I believe this should be done by the underlying hypervisor.

2. Dummy network interface (plz don’t!)

NOTE: An earlier version of the blog post suggested to use a dummy network interface. The Linux networking stack defaults to “Weak ES (End System) Model” as specified in RFC1122), enabling it to handle incoming packages for IP addresses configured on another interface than the physical interface the package came in. Unfortunately, the gratuitous ARP packages (ipv4) and unsolicited neighbour adverts (ipv6) can’t be sent out properly with the dummy network interface, resulting in shaky failovers. Furthermore, this solution didn’t work at all using ipv6.

3. MACVLAN

After a very helpful discussion with a keepalived maintainer, the recommended way of solving this is to use MACVLANs. This is a built-in feature of keepalived and can be simply enabled with the use_vmac directive in the configuration file. Recent versions of keepalived take care of all required sysctl settings for this, so it should just work.

The MACVLAN interface (e.g. vrrp.51@eth0) will be created by keepalived, and it will also take care of the VIP on that interface. Then, systemd-networkd can configure the original interface at will, without hurting the dedicated interface for the virtual IP when it’s reloaded (or a new DHCP lease comes in).

Bonus: The interface (incl. the master VIP) are also now displayed in Ubuntu’s default message-of-the-day (motd) upon login.

Here’s the relevant section from the keepalived documentation:

1
2
3
4
5
6
7
8
9
# Use VRRP Virtual MAC.
# NOTE: If sysctl net.ipv4.conf.all.rp_filter is set,
# and this vrrp_instance is an IPv4 instance, using
# this option will cause the individual interfaces to be
# updated to the greater of their current setting, and
# all.rp_filter, as will default.rp_filter, and all.rp_filter
# will be set to 0.
# The original settings are restored on termination.
use_vmac [<VMAC_INTERFACE>]

This is the recommended way of implementing VIPs with systemd, and should work with ipv4 and ipv6 setups!

Unfortunately, using Virtual MAC is only possible when keepalived is running in multicast mode, as pointed out by the maintainer.

While I was able to get multicast running on OpenStack (a security group allowing protocol 112 is necessary), most providers only allow a single MAC address per interface, which leads to unstable failovers. Apparently, MACVLAN also is not an option on AWS, as it doesn’t allow multicast at all.

When multicast is not allowed, or multiple MAC addresses are no option, there’s another method available: IPVLAN

4. IPVLAN

An IPVLAN interface is similar to MACVLAN, but receives packets from the underlying interface based on IP address filtering instead of MAC address filtering.

You can create it using the following command (NOTE: This won’t persist on reboots)

1
2
3
4
5
ip link add link eth0 keepalived0 type ipvlan mode l2
ip link set keepalived0 up

# If you intent to use the interface with IPv4, a dummy IP address might be required
ip address add 1.2.3.4/32 keepalived0

Modify the following lines in your keepalived.conf to use the interface:

1
2
3
4
5
6
7
# Set the base interface here
interface eth0

# Attach the VIP to the IPVLAN device
virtual_ipaddress {
    10.1.0.1/32 dev keepalived0
}

This should work and also send unsolicited neighbour adverts and gratuitous ARP packets correctly.

Disclaimer: While IPVLAN worked fine with IPv6, I wasn’t able to get this to work for IPv4 in an OpenStack environment. Incoming IPv4 ICMP/ TCP packets are received by the parent interface, but never replied to. When attaching the interface to a namespace and use keepalived’s net_namespace pinging works, but I cannot access the high availability service which resides in the default namespace. Also, both nodes go into MASTER mode and apparently VRRP doesn’t work, as the interface needs to be set to the IPVLAN interface inside the namespace. The respective kernel documentation is pretty scarce and only provides examples including namespaces.

Unfortunately, I found no way so far to persist the IPVLAN interface with systemd-networkd. I’ve tried creating the respective systemd .netdev and .network files, with no luck (the interface is just not created). Also, netplan doesn’t support MACVLAN/ IPVLAN, but at least there a ticket with the feature request. I assume though, that creating the interface with netplan/ systemd would also mean, that the interface then will be purged again. So this also is not a solution.

There’s a commit that adds an use_ipvlan option to keepalived to solve this issue and to make the use of IPVLAN as seamless as MACVLAN in keepalived.

Until this is ready, there are two options:

  1. Use networkd-dispatcher to create the IPVLAN device after systemd-networkd brings the interface up by placing the IPVLAN creation from the above snipped in etc/networkd-dispatcher/routable.d/50-keepalived. See netplan FAQ for more information. Because seemingly the hook scripts for networkd-dispatcher are global, and I cannot restrict them to a single interface, I personally went for the second approach for now:

  2. Migrate back to ifupdown for all instances using keepalived. I was reluctant to so, as I didn’t want to exchange the network stack on every new instance, but the migration went smoother than expected and I was able to automate it quite well using Saltstack. While ifupdown doesn’t support IPVLAN natively, it’s easy to implement it using the up directive and just add the necessary commands to /etc/network/interfaces.d/<yourinterface>.cfg:

1
2
3
4
5
6
7
iface eth0 [...]
  address [...]
  gateway [...]

  # Create keepalived IPVLAN interface
  up ip link add link eth0 keepalived0 type ipvlan mode l2
  up ip link set keepalived0 up

While this is not the perfect solution, it makes me feel comfortable enough to run this in production.

I hope this ensures I can sleep in on Saturdays.

A new hope

There’s hope. Besides the mentioned keepalived commit that adds an use_ipvlan option, I’ve filed an systemd issue explaining the problem to the systemd maintainers. Even though they currently have more than 1.000 open issues there’s already a pull request adding an option to systemd to prevent automatic purging of addresses.