I’ve undertaken a project where I am helping an educational entity build and deploy a high availability router/firewall with redundant ISPs. Because I’ve had to rely on scattered documentation, I’ve decided to describe everything here in a GitHub page so that others can do the same kind of thing.

This page will be updated as the project progresses. The page was first published November 6, 2023.

Update: February Update on High-Availability Project

First, I’ll describe the scenario we are working with. This is a low budget (almost “no budget”) project due to extremely limited resources of the organization I’m working with. The organization currently has two ISP connections available. One is funded by federal money, and the other is a “free” community connection courtesy of Google Fiber. Only the paid connection is in active use. The organization is relying on a single Debian-based firewall that is several years old. The organization’s LAN spans several buildings, and it has been divided into VLANs for various applications (security cameras, HVAC, WiFi, etc.). Two new Dell servers have already been purchased for this project along with Intel 350 NICs (which have 4x 1Gbe copper ports). The LAN currently rides over somewhat dated Juniper EX switches, but has a 10Gb fiber backbone between buildings and all edge ports are 1Gbe copper.

The goal is to build a system that meets the following criteria:

  • There must be a paid support option available for the deployed solution with no less than 5 years of support from the date of installation (though the paid support does not have to be active at the time of deployment).
  • The solution must have a fully documented implementation process to make rebuilding the solution from scratch as seamless as possible.
  • There must be redundancy in the firewall/router setup. The preferred setup is to have automatic failover, but load-balancing is not necessary.
  • There must be redundancy in the upstream ISP connectivity. Connections are allowed to drop when an ISP fails, but users need to be able to establish new connections within 60 seconds of failure.
  • The solution must support IPv4 and IPv6.
  • The solution must be able to accommodate a small number of essential services that depend on local servers, including VOIP/SIP.
  • The solution must support network-enforced content filtering to ensure compliance with local legal obligations and policies.
  • The solution must be reasonably secure. possible.
  • It is preferred if the solution inherently supports VPN access.

The “ideal” solution would involve a few different things that we won’t have easy access to get during this project. Most notably, the ideal solution would involve two BGP-capable ISP connections, along with ASN, IPv4, and IPv6 assignments directly to the organization. IPv4 addresses are in extremely limited supply these days, and ISPs don’t sell BGP-capable connections for cheap. It would simplify a lot of things if we had these, but we don’t. Instead, we have one ISP able to provide a very large pool of IPv4 addresses without charging extra, and another that will provide only 5 addresses. Because of the failover requirements, the solution has to be scaled to the lowest common denominator, and in this instance, that means we have 5 usable public IPv4 addresses for each ISP. Fortunately, both ISPs provide sizable IPv6 delegated prefixes (/48 and /56), so we will have plenty of IPv6 space to work with.

To meet these requirements, I have advised the organization to deploy the following to satisfy each criteria:

  • Ubuntu Server 22.04 LTS (the latest LTS as of this writing). Ubuntu has paid support options that can be activated out of the box or added later. Ubuntu Server is one of the most commonly deployed virtual machines on the market today, and is heavily used in cloud computing. In addition to support, training and certifications are available for reasonable costs.
  • The implementation will be carried out by cloud-init with the NoCloud datastore (seems to be a self-contradiction, but it’s actually fairly common). Example files will be added to this page as the project progresses. Testing will first take place on VirtualBox VMs for testing (because this is a no-cost way of vetting configs), and will then be capable of deployment to bare metal hardware in a fully automated fashion. The three files involves (user-data, meta-data, and a custom install script) will be heavily commented for documentation, and all software used will have developer-provided documentation (man pages, wikis, etc.).
  • We have two connections already at our disposal. We just have to use them. As mentioned above, we will be using ISP-provided IP space, so connections will drop if the primary ISP fails. IT IS POSSIBLE to avoid this (via tunnels to cloud providers), but the ongoing performance impact during routine use far outweighs the impact of dropping a few connections in the rare event of an ISP failure. Preliminary testing has shown LAN-side failover to be less than 1 second and WAN side failover to be less than 5 seconds, without being super aggressive in the configs.
  • Both ISPs support IPv4 and IPv6, and Ubuntu has no trouble handling both as well. It is important to note that the ISPs must support delegation of at least a small prefix for this to work in IPv6. Most providers do offer this, and in our case we should have a /48 and a /56. For the purposes of the network we’re dealing with here, even a much smaller /60 would be sufficient.
  • Public services will be exposed through NAT for IPv4, and are relatively easy to make work via IPv6 (where no NAT will be used).
  • Network-enforced content filtering will be achieved via DNS filtering. For the purposes of this page, we will assume OpenDNS will be used, though other options like Akamai and GoGaurdian are being considered. We will use some NAT trickery to ensure that no one can circumvent this without going through a VPN tunnel or by avoiding DNS altogether (e.g., typing IP addresses into browsers). The goal here is to take reasonable efforts to enforce content filtering and prevent accidents (it’s no accident if you fire up a VPN). In this modern age, it is completely impossible to have a network that appears to be fully functional for user-provided devices and also strictly enforce content filtering. If we aren’t going to heavily restrict outgoing ports and force everyone through an HTTP/HTTPS proxy (both of which break things like cell phones and the constantly-evolving landscape of computer applications), then there are about a million ways for an advanced user to circumvent content filters. And to be clear, 5th graders can easily be “advanced users.”
  • Ubuntu Server is relatively secure, and defaults are fairly sane for a secure deployment. Automating the install will help ensure that no shortcuts are taken that would compromise security. Security updates will be set to automatically install when they become available. The system will be built to require SSH keys for remote login, root login will be disabled, per-user passwords will be required for privilege escallation, and there will be no “shared accounts” as part of the initial deployment. The firewall/router will only provide SSH access to the outside world (this will later be locked down to VPN-only for SSH). Other services (NTP, DNS, etc.) will be available inside the network, and this page will describe how to deploy them on the firewall/router itself for simplicity. However, in an ideal environment, the most secure design (without completely destroying administeratability) would offload all services to another machine or virtual instance, and SSH would only be reachable through a VPN (itself hosted on another machine or virtual instance). Each application/port that is exposed to other devices provides an attack surface. For this deployment, we won’t have a VM stack ready, so we have to make due with deploying a complete solution to the firewall/router itself. The organization will hopefully offload these services to separate instances on a VMWare stack inside the network at a later date. In addition, the system will be configured to log and report events to a separate server, with alerting capabilities, further improving the status of security for the deployment.
  • Ubuntu Server 22.04 LTS includes the OpenVPN Access Server for VPN access. Two concurrent connections can be used for free. This is probably sufficient for the networking staff of this organization, but licenses can be purchased for more users if desired. This could run directly on the firewall/router. However, this presents some problems. First, VPNs have to inherently mess with routing and firewall rules to work properly, and we don’t want that to run the risk of interfering with our main firewall/router for an entire organization. Second, we will probably want people other than network administrators to be able to use the VPN. That means setting up authentication mechanisms that support those users and incorporates them into the server somehow (there are several ways to do this). Due to some recent large-scale hacks that have occurred in this region due to VPN authentication services being misconfigured (or simply not maintained over time), it doesn’t seem wise to tie the server with the highest security responsibility to a system that will have hundreds (maybe thousands) of users and numerous non-network-administrators able to adjust account settings. Errors will be made. Old accounts will be left active. Convenience will win out over secure policies. It’s just a bad recipe. So, the firewalls in this deployment will remain autonomous for authentication purposes. And that means the VPN needs to live on another machine. But, it will be running the same server platform and will be a part of this solution design.

I mentioned above that testing will take place in VirtualBox VMs. This is because I’m building most of these configs and testing them at home, and I don’t have spare servers or a VMWare stack laying around. I’ll be doing all of the work on an Asus ZenBook running an Intel Core i7-1165G7 CPU and 16GB of RAM with a 1TB SSD. Not a wimpy computer, but certainly not a powerhouse either. Because of my resource constraints on this laptop, I’ll limit myself to 4 concurrently active VMs, and may have to adjust the roles of different VMs for testing different aspects of the system. I only have one ISP at home, at that is AT&T Gigabit Fiber (which, despite a prior dislike for AT&T due to past issues, I’ve rather liked the quality of this service). I’ll be using copious amounts of NAT for IPv4. For IPv6, I’m having AT&T turn on prefix delegation (which is included with the service, but not active by default). AT&T delegates up-to a /60, but only half of that is usable (some people say you really get a /61, but if that were true then you’d have one fewer subnets than you really get).

I won’t be attempting to use the USB drives for the final bare metal servers while I’m experimenting in VMs. There are some Windows security issues involved with that (like having to always run VirtualBox with elevated privileges to do it). Instead, I’ll mimic the USB drives with VDI images in VirtualBox. The VDI file will be fairly small, and the VDI can be made into an image that can be written verbatim to a USB drive when the testing phase ends. Rather than providing the actual VDI or disk image here (which would be much larger than actually necessary), I’ll provide step by step instructions for creating them.

ISOs and USB disk images will be written with balenaEtcher.

I edit my files in either Notepad++ or vim (depending on the circumstances). But, it is VERY IMPORTANT to note that the files we are working with in this project are VERY SENSITIVE to having *nix end-of-line characters. Notepad++ can be configured to use the right character codes, and can correct your files. The vim editor will use *nix EOL characters by default, and it can correct bad codes with some advanced features, but the sed command is the usual fix if you end up needing to fix a file from a shell prompt. If you are unfamiliar with whitespace characters, you need to be aware that you will NOT be able to see the difference between the files when viewing them in any normal editor. Even Notepad++ doesn’t show them by default. Neither does vim. If you follow this guide and get weird errors, especially those with “^M” in the error, this is probably what is causing your problems.

More to come as the project progresses.