Why you need a Raspberry Pi 5 in your homelab if you're building ARM software

January 21, 2026 · 5 min read

I've been working on ARM images lately. They are not application containers, but a distro-style image built from ubuntu:22.04 that installs large package sets, runs post-install scripts, enables services, and pulls NVIDIA repositories. It's much closer to OS assembly than a typical app container.

I wasn't surprised that building these images natively on ARM would be faster than doing it from an x86 machine using emulation. That part was expected.

What surprised me was how much faster it was.

Seeing a Raspberry Pi 5 consistently outperform a 3-year-old high-end Intel laptop by almost 2x forced me to re-evaluate some assumptions I had about build performance, hardware specs, and what "powerful enough" actually means when you're building ARM software.

The setup

I ran the same Docker-based build targeting linux/arm64 on two machines.

Spec	Raspberry Pi 5 (native ARM)	Intel laptop (ARM via emulation)
Architecture	aarch64	x86_64
CPU	Cortex-A76 (4 cores @ 2.4 GHz)	Intel i7-1280P (14 cores / 20 threads, up to 4.8 GHz)
RAM	~8 GB (no swap)	62 GB (+ swap)
Disk	ext4 on `/dev/sda2` (USB SSD, not microSD)	NVMe, btrfs
Kernel	6.17.x (Ubuntu raspi kernel)	6.17.x
Docker Engine	28.x (linux/arm64)	building `linux/arm64` via QEMU (`binfmt_misc`)

On paper, the Intel laptop should dominate: it has about 3.5x more CPU cores (14 vs 4) with much higher boost clocks, roughly 8x the RAM (62 GB vs ~8 GB), and an NVMe drive that is typically much faster than a USB SSD in both throughput and latency.

On the Intel system, ARM execution is handled through Docker BuildKit with binfmt_misc enabled using tonistiigi/binfmt, which registers qemu-aarch64 at the kernel level. No static QEMU binary is copied into the image; emulation happens transparently during the build.

The Dockerfile

The Dockerfile is not compiling application code. It performs a lot of OS-level work:

apt-get update
installing many packages
running post-install scripts
enabling system services
generating initramfs/dracut bits
pulling NVIDIA repositories

In other words: this is much closer to assembling a small Linux distribution than building a typical container image.

If you want to see the exact build steps, the Dockerfile is here.

Results

Using the exact same Dockerfile and target architecture:

Raspberry Pi 5: real 7m35s
Intel laptop: real 14m31s

Despite the Intel machine having dramatically more CPU power, memory, and faster storage, it still took almost twice as long. For transparency, I ran the build once on the Raspberry Pi 5 and a couple of times on the Intel machine; the Intel timings were consistent enough that cold vs warm effects did not change the overall outcome.

Why this happens

The decisive difference isn't raw performance, it's architecture alignment.

On the Raspberry Pi, everything runs natively. ARM binaries execute directly on an ARM CPU.

On the Intel machine, all ARM binaries run through QEMU user-mode emulation. That means every instruction has to be translated before it can execute.

This build workload is particularly unfriendly to emulation:

heavy apt / dpkg usage
many short-lived processes
lots of filesystem operations
syscall-heavy post-install scripts
very little meaningful parallelism

This isn't a compute-bound workload where faster clocks and more cores help. It's dominated by overhead, and emulation multiplies that cost.

As a result, even a much more powerful x86 system can lose badly to a modest ARM system when the workload is OS-heavy.

The part that really matters: cost vs time

A Raspberry Pi 5 costs roughly EUR 80-100, depending on the RAM configuration and availability.

That's not "cheap" in an absolute sense, but in practice it's a bargain.

If that machine saves me even a few minutes per build -- multiplied across many iterations while working on images -- it pays for itself very quickly. Not in hardware terms, but in focus, iteration speed, and reduced friction.

Instead of waiting 15 minutes for a build to finish on my laptop under emulation, I can let a small ARM box do the work in half the time, quietly, in the background.

For the type of work I'm doing, that trade-off is an easy decision.

This applies even more to CI pipelines

The same logic applies to pipelines, arguably even more so.

For Kairos, we run our ARM builds on native ARM runners, which GitHub provides for free. Moving away from emulated ARM builds has significantly reduced how long our pipelines run.

The impact is very noticeable:

faster feedback loops
less wasted CI time
fewer flaky or timing-sensitive failures
lower overall pipeline cost

When your builds are dominated by package installation, system initialization, and OS-level steps, native execution isn't an optimization, it's the correct architectural choice.

Side note: about Docker "cache" confusion

While running these experiments, I repeatedly saw output like:

CACHED FROM ubuntu:22.04@sha256:...

even when using --no-cache and --pull.

What's happening here is subtle:

--no-cache disables Docker build step caching
it does not clear BuildKit's internal content store
base image layers and remote ADD blobs can still be reused by digest

If you really want to start from a clean slate, you need to clear BuildKit's cache explicitly:

docker builder prune -a --force

This removes cached content stored by BuildKit itself, not just images visible via docker images.

It's a blunt tool, but useful when you're trying to reason about cold-build performance.

What I took away from this

I didn't learn that native ARM is faster -- I already knew that.

What I learned is that architecture alignment matters far more than raw hardware specs for certain workloads.

When your build process is dominated by package managers, system initialization, and distribution-level tooling, native execution can easily outperform much stronger hardware running under emulation.

The Raspberry Pi 5 didn't win because it's fast.

It won because it speaks the right language.

The setup​

The Dockerfile​

Results​

Why this happens​

The part that really matters: cost vs time​

This applies even more to CI pipelines​

Side note: about Docker "cache" confusion​

What I took away from this​