Automated Certificate Management is a great idea.#
One of my early decisions was to Build ACME Support into IO to automatically generate SSL certificates for HTTPS hosts. This supercharged IO as the gateway to my homelab and to droplets that host multiple HTTPS services.
ACME is revolutionary but by now, many people have forgotten why. Before ACME, anyone running an HTTPS server had to pay a certificate authority for their certificates, and could easily find themselves spending hundreds of dollars a year for just a few sites. Adding injury to insult, providers had their own specific processes, making certificate generation largely manual and tedious. ACME makes that all unnecessary.
Initially I used a third-party Go library that provided ACME support. The library was a large project with nearly 10k GitHub stars and over 1000 forks. But from my perspective, it did too much, making it difficult to see the low-level details of the ACME protocol and to have the confidence to add features like automatic certificate renewal. Also, IO’s architecture preferred running the ACME challenge server on a Linux abstract socket, which required forking at least a few files in the project.
Bringing it internal.#
In keeping wih Minimize Dependencies, I recently rewrote IO’s ACME support to eliminate this dependency.
Rebuilding certificate generation directly from RFC8555 took between three and four working days, and generally followed a process that I’ve used many times:
- first build test commands that exercise each step in the process,
- then factor helpers out of those test commands,
- and then organize calls to these functions into an automated sequence.
Intermediate state is stored in IO’s database, and the result is both observable and easily modifiable. One expected future addition will be to use DNS-01 challenges to get wildcard certificates.
The rewrite eliminated a transitive dependency on go-jose, which helped maintain my decision to use one JWT library, the excellent github.com/lestrrat-go/jwx/v3.
RFC 8555 seemed a bit weird to me.#
I spent an evening ranting about my experience to an IETF veteran, and while I’m not sure enough to say that any of my impressions are correct, here are some things about RFC 8555 that I found weird.
Request authorization seems unconventional.#
Requests are authorized with a JWS in the POST body. As a result, you don’t use GET HTTP verbs, instead using POST-as-GET which constantly had me wondering “why not GET-as-POST?” Stepping back from that, why not put authorization tokens in headers? ACME was first drafted in 2015, and API practice has grown a lot since then.
I wish they hadn’t used HATEOAS.#
ACME uses HATEOAS. The first request any client makes to an ACME server is a “directory” request that returns a JSON structure containing URLs for the methods of the API.
This might seem like a good idea early on because it underspecifies an API, leaving API providers free to make changes.
But underspecifying an API makes it harder to observe and manage the API. For example, it makes it a hassle to introduce a calling proxy (you would have to rewrite URLs to replace the host with your proxy). I also found myself chasing keywords around the RFC where a list of API methods might have been more helpful for understanding everything.
My client library did too much.#
ACME requires HTTPS, which is great, but the client library that I originally used enforced this on the client side. That’s unnecessary and brittle, which I discovered when I naively tried pointing it to IO instead so that I could observe ACME traffic (before I found that ACME used HATEOAS).
LetsEncrypt doesn’t conform to RFC8555.#
I was particularly tripped up by its lack of support for the “orders” field. Apparently the only way to see the orders associated with an account is to keep track of them on the client side. Unfortunately, LetsEncrypt doesn’t think fixing this is a priority.
http-01 challenge validation seems weak.#
The http-01 challenge uses a random token RFC 8555, Section 11.3 so that it’s “more difficult for ACME clients to implement a “naive” validation server that automatically replies to challenges without being configured per challenge”. But challenges are handled by simply replying to HTTP requests that have the token in their path with the token plus a thumbprint of the client’s public key, and as the RFC notes,
because the token appears both in the request sent by the ACME server and in the key authorization in the response, it is possible to build clients that copy the token from request to response.
In other words, I can just write a request handler that matches the challenge requests with /.well-known/acme-challenge/{TOKEN} and responds with $TOKEN.$THUMBPRINT. My handler doesn’t need even know what the challenge tokens are, they can just return the ones they are given. That’s exactly what Section 11.3 seemed to want to prevent.
Hindsight is 20/20 and the Second System Effect is real.#
At this point, I’m just making observations. My new ACME support seems to be working but I’m still a novice with it. Maybe I’ll show up later with more concrete suggestions. Until then, I won’t break something that’s working well enough.
Do you have experience with ACME and opinions about it? I’d value your thoughts.
