Down the Kamal rabbit holes

Multi-arch builds, GitHub Container Registry, custom TLS certificate, conflicts with foreman, ...

Sep 19, 2024

This is a post from my Tech Journal section, in which I explore technical topics and focus on how I overcome specific challenges.
If you're not into the tech stuff, you can unsubscribe from Tech Journal updates from your account.

OK, the title might be a bit “click-baity” 😅

The truth is, Kamal is super cool to setup!
I have never deployed an app from scratch so quickly.

I did end up spending time into 4 rabbit holes though, so I wanted to keep notes about what happened and how to overcome the issues I faced.

You know, just in case.

Multi-arch build issues

The first issue I encountered was somewhat of a tricky one, although the fix was simple in the end.

It's always like this, isn't it? 😅

After I setup Kamal for my project, I ran kamal server bootstrap followed by kamal deploy, as instructed.

Colorful logs started flowing on my screen and I sat back, amazed by the simplicity of it all.

The Docker build process started. Along came the bundling step, and gem names came trickling down the terminal.

And then, it hung. For a few minutes.
I thought it must be doing something big, and it was late, so I went to bed.

The next morning, it was still stuck on the same thing:

 DEBUG [0729c69a] 	#29 668.9 Bundled gems are installed into `/usr/local/bundle`
 DEBUG [0729c69a] 	#29 668.9 Post-install message from rubyzip:
 DEBUG [0729c69a] 	#29 668.9 RubyZip 3.0 is coming!
 DEBUG [0729c69a] 	#29 668.9 **********************
 DEBUG [0729c69a] 	#29 668.9 
 DEBUG [0729c69a] 	#29 668.9 The public API of some Rubyzip classes has been modernized to use named
 DEBUG [0729c69a] 	#29 668.9 parameters for optional arguments. Please check your usage of the
 DEBUG [0729c69a] 	#29 668.9 following classes:
 DEBUG [0729c69a] 	#29 668.9   * `Zip::File`
 DEBUG [0729c69a] 	#29 668.9   * `Zip::Entry`
 DEBUG [0729c69a] 	#29 668.9   * `Zip::InputStream`
 DEBUG [0729c69a] 	#29 668.9   * `Zip::OutputStream`
 DEBUG [0729c69a] 	#29 668.9 
 DEBUG [0729c69a] 	#29 668.9 Please ensure that your Gemfiles and .gemspecs are suitably restrictive
 DEBUG [0729c69a] 	#29 668.9 to avoid an unexpected breakage when 3.0 is released (e.g. ~> 2.3.0).
 DEBUG [0729c69a] 	#29 668.9 See https://github.com/rubyzip/rubyzip for details. The Changelog also
 DEBUG [0729c69a] 	#29 668.9 lists other enhancements and bugfixes that have been implemented since
 DEBUG [0729c69a] 	#29 668.9 version 2.3.0.

A second run led to the same issue.

So I started searching, and found a few threads about the problem. The one that led me to a solution is Discussion #747 on Kamal’s repository: Deploy stucks on message about RubyZip 3.0 coming.
Someone in there hints at the issue coming from Kamal building a cross-platform Docker image by default, targeted for both AMD64 (Intel/x86-64 compatible processors) and ARM64 (ARM architectures such as Ampere processors).
Apparently building for ARM64 from an Intel CPU tends to fail and hang, at least on a Mac.

Since I am building on an Intel Mac but also for an x86-64 server, I don't need to build for ARM. So I disabled the ARM build, and everything went smoothly after that!

Everything?
Well, not exactly.

Using GHCR as Docker registry

As hinted above, Kamal uses Docker to build an app “artifact” and deploy it onto your servers. In the process, it needs to push the locally built image to a container registry, from which it can pull it again for deployment from your servers.

Since I'm using GitHub to host my code, I wanted to keep things simple and use their Container Registry feature as well, instead of Docker Hub.

It's pretty straightforward to setup Kamal for this:

The KAMAL_REGISTRY_PASSWORD comes from your .env file at the root of your project.
For things to work with GitHub Container Registry, you have to use a new Classic Personal Access Token (PAT) as your password.

I did this, but unfortunately it did not work immediately: as soon as Kamal tried to push the built image, it failed with an error:

Error: unexpected status from POST request to https://ghcr.io/v2/techtrails/real-emails/blobs/uploads/: 403 Forbidden

Kamal does manage to log into the registry as its first step, so the error was a bit unsettling: why could it log in, but not push an image to my repository registry?

Since I'm not pushing to my user GitHub (https://github.com/olance) but to my Techtrails organization (https://github.com/techtrails-io), I thought it might be an issue with my PAT and the rights it had to push to my org.

I spent a lot of time trying different tokens and tweaking settings on GitHub.
To no avail.

The 403 error really misled me into looking for token issues, but it was actually much simpler: as you might have noticed, I was pushing to techtrails/real-emails, whereas my GitHub org name is techtrails-io.

Yeah, techtrails was already taken :(

Kamal generates the push URL from the image setting in the deploy.yml file:

So I changed techtrails into techtrails-io, and it worked!

TLS and CloudFlare

At this point, I had my app successfully deployed to real-emails.com.
It's there, you can check!

There were a few things, SSL/TLS-wise, that were not 100% satisfying.

TLS termination point & redirection loop

I'm using CloudFlare as my DNS server, taking advantage of their Proxy feature that serves my site through their own infrastructure, allowing them to cache content at the edge, detect DDoS attacks, etc.

They also provide automated TLS certificates out-of-the-box, which is great.
To make this simple for everyone, the way it works is that their servers act as a TLS termination point:

As illustrated above, a request coming from a client will be encrypted only between the browser and CloudFlare’s servers. CloudFlare relays the request to my servers in plain HTTP, without TLS encryption.

With a Rails application, what this causes when you load your website is an redirection loop, because Rails is configured to force SSL connections by default:

So when CloudFlare relays the browser’s request to Rails as an HTTP request, Rails responds with an HTTP Redirect from http://real-emails.com to https://real-emails.com.
This triggers a reloading of the HTTPS URL by the browser, which again is responded to with a redirect… and thus the loop begins.

My first fix to this issue was to turn config.force_ssl to false. Easy!

This, however, feels hacky and suboptimal. Plus it caused another issue that I forgot to note down.
And, I do want end-to-end encryption between the browser and my server.

So what I really want is …

CloudFlare “Full (strict)” encryption mode

In this mode, you get end-to-end encryption and a certificate provided by CloudFlare that allows them to certify they're talking to your origin server.

I think I'll write a specific tutorial to configure Kamal for this setup, but here’s the gist of it:

Store CloudFlare certificate on your server
Declare a Docker volume for the Traefik service to get access to this certificate and am accompanying TLS config file
Configure a TLS entryPoint on Traefik and route it to your app’s service

Traefik is the reverse proxy Kamal is built upon to route requests to your application. In CloudFlare’s Full (strict or not) encryption mode, it's Traefik that becomes the TLS termination point, right before Rails serves your app.

So what you have to do, as outlined above, is just to provide it with the right TLS certificate and be done.
It’s even easier if you don't aim for strict mode (but I thought, let's go all the way!) since Traefik can provision certificates for you with Let's Encrypt.

Anyway. The simple yet terrible mistake I made here is that “Traefik” is simply the phonetics transcription of “traffic”, and some muscle memory at some point made me write “traefic” in the Docker volume name configuration 🤦‍♂️

Only after I left the broken site for a night and came back to it with fresh eyes and a thorough review of Traefik logs (which only indicated it hadn't found any certificate) did I understand my mistake.

Bonus: conflict with foreman

As an added bonus, I finally faced a seemingly unrelated issue when I got back to developing locally.

Since I'm using TailwindCSS, I have the cssbundling gem installed to rebuild my styles as I add new Tailwind classes to my templates.
This is handled by a separate process, so I'm using the provided bin/dev command instead of running rails server.

bin/dev is provided by the gem, and simply uses foreman to run both rails server and rails tailwindcss:watch in parallel.

What happened after I had completed my Kamal setup, is that:

running rails server would work just fine
running bin/dev would see the rails server process fail with an error: ActiveSupport::MessageEncryptor::InvalidMessage

This error generally occurs when Rails, upon booting, is not able to decrypt the credentials.yml.enc file that contains the secret_base token and other credentials you might have stored there.

It made no sense though, that it would work when run alone and fail when run through foreman.

To figure out what was going on, I edited Rails core files to place a debugger instruction right before the code that would throw the ActiveSupport::MessageEncryptor::InvalidMessage exception.

This allowed me to run rdbg (a command-line Ruby debugger) and attach to the process, which in turn let me inspect live variables content and see what the process was dealing with when crashing.

This is the code I dove into:

From decrypt_and_verify, in which the first catch_and_raise method was the one ultimately causing the crash, I could verify that the message argument contained the encrypted content of my development credentials.yml.enc file.
As I would expect in this context.

Later on, the read_message call leads to the decrypt method.
In there, I inspected the @secret instance variable, and found that it contained my production encryption key.

No surprise it couldn't correctly decrypt the file!

Now the question was: how did Rails, started in development mode, was picking up the production key instead of the development one?

After some thinking and digging, I inspected the content of ENV["RAILS_MASTER_KEY“], which is one way for Rails to get the decryption key.
It was indeed set to the production key.

That's when I realized what was going on:

One one hand:
When initializing Kamal, it creates a .env file into which you can provide environment variables for your deployment.
If you recall from the beginning of this article, this is where the KAMAL_REGISTRY_PASSWORD value comes from.

The other value that you need to put there, is the RAILS_MASTER_KEY encryption key, so that Kamal can set it on the deployed container for Rails to boot correctly.
On the other hand:
Foreman, used in the bin/dev script, will by default load environment variables for the spawned processes from any .env file it can find in its working directory.
So it picks up the one from Kamal, and “corrupts” your local process with environment variables meant for production.

It's hard to tell whether Kamal or Foreman is at fault here, but once I understood the issue I quickly found a PR on GitHub that addressed the problem: Disable loading .env file when running foreman for development.

Simply tell foreman to load environment variables from /dev/null (or another arbitrary file if needed) in the bin/dev script:

The end.

Thanks for reading it all! If you liked this post, please share it.

Techtrails