Now Reading
JIT WireGuard · The Fly Weblog

JIT WireGuard · The Fly Weblog

2024-03-13 01:13:09

A cartoon of devices floating around and networked together with lines and nodes
Picture by


Annie Ruygt

We’re Fly.io and we transmute containers into VMs, working them on our {hardware} all over the world with the ability of Firecracker alchemy. We do lots of stuff with WireGuard, which has turn into part of our buyer API. It is a fast story about some methods we performed to make WireGuard sooner and extra scalable for the a whole lot of hundreds of people that now use it right here.

Considered one of many odd choices we’ve made at Fly.io is how we use WireGuard. It’s not simply that we use it in lots of locations the place different retailers would use HTTPS and REST APIs. We’ve gone a step past that: each time you run flyctl, our lovable, sprawling CLI, it conjures a TCP/IP stack out of skinny air, with its personal IPv6 deal with, and speaks on to Fly Machines working on our networks.

There are plusses and minuses to this method, which we talked about in a blog post a couple years back. Some issues, like remote-operated Docker builders, get simpler to precise (a Fly Machine, so far as flyctl is anxious, may as effectively be on the identical LAN). However every thing typically will get trickier to maintain working reliably.

It was a call. We personal it.

Anyhow, we’ve made some enhancements just lately, and I’d like to speak about them.

Where we left off

Until a few weeks ago, our gateways ran on a pretty simple system.

  1. We operate dozens of “gateway” servers around the world, whose sole purpose is to accept incoming WireGuard connections and connect them to the appropriate private networks.
  2. Any time you run flyctl and it needs to talk to a Fly Machine (to build a container, pop an SSH console, copy files, or proxy to a service you’re running), it spawns or connects to a background agent process.
  3. The first time it runs, the agent generates a new WireGuard peer configuration from our GraphQL API. WireGuard peer configurations are very simple: just a public key and an address to connect to.
  4. Our API in turn takes that peer configuration and sends it to the appropriate gateway (say, ord, if you’re near Chicago) via an RPC we send over the NATS messaging system.
  5. On the gateway, a service called wggwd accepts that configuration, saves it to a SQLite database, and adds it to the kernel using WireGuard’s Golang libraries. wggwd acknowledges the installation of the peer to the API.
  6. The API replies to your GraphQL request, with the configuration.
  7. Your flyctl connects to the WireGuard peer, which works, because you receiving the configuration means it’s installed on the gateway.

I copy-pasted those last two bullet points from that two-year-old post, as a result of when it really works, it does simply work fairly effectively. (We finally did find yourself defaulting everyone to WireGuard-over-WebSockets, although.)

But when it all the time labored, we wouldn’t be right here, would we?

We bumped into two annoying issues:

One: NATS is quick, however doesn’t assure supply. Again in 2022, Fly.io was fairly massive on NATS internally. We’ve moved away from it. For example, our internal flyd API was once pushed by NATS; right now, it’s HTTP. Our NATS cluster was shedding too many messages to host a dependable API on it. Scaling again our use of NATS made WireGuard gateways higher, however nonetheless not nice.

Two: When flyctl exits, the WireGuard peer it created sticks round on the gateway. Nothing cleans up previous friends. In any case, you’re doubtless going to return again tomorrow and deploy a brand new model of your app, or fly ssh console into it to debug one thing. Why take away a peer simply to re-add it the following day?

Sadly, the overwhelming majority of friends are created by flyctl in CI jobs, which don’t have persistent storage and might’t reconnect to the identical peer the following run; they generate new friends each time, it doesn’t matter what.

So, we ended up with a not-reliable-enough provisioning system, and gateways with a whole lot of hundreds of friends that may by no means be used once more. The excessive stale peer rely made kernel WireGuard operations very sluggish – particularly loading all of the friends again into the kernel after a gateway server reboot – in addition to some kernel panics.

There needed to be

A better way.

Storing bajillions of WireGuard peers is no big challenge for any serious n-tier RDBMS. This isn’t “big data”. The problem we have at Fly.io is that our gateways don’t have serious n-tier RDBMSs. They’re small. Scrappy. They live off the land.

Seriously, though: you could store every WireGuard peer everybody has ever used at Fly.io in a single SQLite database, easily. What you can’t do is store them all in the Linux kernel.

So, at some point, as you push more and more peer configurations to a gateway, you have to start making decisions about which peers you’ll enable in the kernel, and which you won’t.

Wouldn’t it be nice if we just didn’t have this problem? What if, instead of pushing configs to gateways, we had the gateways pull them from our API on demand?

If you did that, peers would only have to be added to the kernel when the client wanted to connect. You could yeet them out of the kernel any time you wanted; the next time the client connected, they’d just get pulled again, and everything would work fine.

The problem you quickly run into to build this design is that Linux kernel WireGuard doesn’t have a feature for installing peers on demand. However:

It is possible to JIT WireGuard peers

The Linux kernel’s interface for configuring WireGuard is Netlink (which is mainly a method to create a userland socket to speak to a kernel service). Right here’s a summary of it as a C API. Word that there’s no API name to subscribe for “incoming connection try” occasions.

That’s OK! We will simply make our personal occasions. WireGuard connection requests are packets, they usually’re simply identifiable, so we will effectively snatch them with a BPF filter and a packet socket.

More often than not, it’s even simpler for us to get the uncooked WireGuard packets, as a result of our customers now default to WebSockets WireGuard (which is simply an unauthenticated WebSockets join that shuttles framed UDP packets to and from an interface on the gateway), in order that individuals who have hassle speaking end-to-end in UDP can carry connections up.

We personal the daemon code for that, and might simply hook the packet obtain operate to snarf WireGuard packets.

It’s not apparent, however WireGuard doesn’t have notions of “consumer” or “server”. It’s a pure point-to-point protocol; friends join to one another after they have visitors to ship. The primary peer to attach is named the initiator, and the peer it connects to is the responder.

For Fly.io, flyctl is often our initiator, sending a single UDP packet to the gateway, which is the responder. In accordance to the WireGuard paper, this primary packet is a handshake initiation. It will get higher: the packet kind is recorded in a single plaintext byte. So this easy BPF filter catches all of the incoming connections: udp and dst port 51820 and udp[8] = 1.

In most different protocols, we’d be completed at this level; we’d simply scrape the username or whatnot out of the packet, go fetch the matching configuration, and set up it within the kernel. With WireGuard, not so quick. WireGuard is predicated on Trevor Perrin’s Noise Protocol Framework, and Noise goes manner out of its method to hide identities throughout handshakes. To establish incoming requests, we’ll have to run sufficient Noise cryptography to decrypt the identification.

The code to do that is fussy, however it’s comparatively brief (about 200 traces). Helpfully, the kernel Netlink interface will give a privileged course of the non-public key for an interface, so the secrets and techniques we have to unwrap WireGuard are simple to get. Then it’s only a matter of working the primary little bit of the Noise handshake. When you’re that type of nerdy, here’s the code.

At this level, we have now the occasion feed we wished: the general public keys of each consumer making an attempt to make a WireGuard connection to our gateways. We maintain a rate-limited cache in SQLite, and once we see new friends, we’ll make an inside HTTP API request to fetch the matching peer data and set up it. This matches properly into the little daemon that already runs on our gateways to handle WireGuard, and permits us to ruthlessly and recklessly take away stale friends with a cron job.

See Also

However wait! There’s extra! We bounced this plan off Jason Donenfeld, and he tipped us off on a sneaky characteristic of the Linux WireGuard Netlink interface.

Jason is the toughest working individual in present enterprise.

Our API fetch for brand spanking new friends is usually not going to be quick sufficient to reply to the primary handshake initiation message a brand new consumer sends us. That’s OK; WireGuard is fairly quick about retrying. However we will do higher.

Once we get an incoming initiation message, we have now the 4-tuple deal with of the specified connection, together with the ephemeral supply port flyctl is utilizing. We will set up the peer as if we’re the initiator, and flyctl is the responder. The Linux kernel will provoke a WireGuard connection again to flyctl. This works; the protocol doesn’t care an entire lot who’s the server and who’s the consumer. We get new connections established about as quick as they will presumably be put in.

Speedrun an app onto Fly.io and get your individual JIT WireGuard peer&nbsp✨


Speedrun  

Look at this graph

We’ve been running this in production for a few weeks and we’re feeling pretty happy about it. We went from thousands, or hundreds of thousands, of stale WireGuard peers on a gateway to what rounds to none. Gateways now hold a lot less state, are faster at setting up peers, and can be rebooted without having to wait for many unused peers to be loaded back into the kernel.

I’ll leave you with this happy Grafana chart from the day of the switchover.

a Grafana chart of 'kernel_stale_wg_peer_count' vs. time. For the first few hours, all traces are flat. Most are at values between 0 and 50,000 and the top-most is just under 550,000. Towards the end of the graph, each line in turn jumps sharply down to the bottom, and at the end of the chart all datapoints are indistinguishable from 0.

Editor’s note: Despite our tearful protests, Lillian has decided to move on from Fly.io to explore new pursuits. We wish her much success and happiness! ✨



Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top