Fixing slow Docker builds on ephemeral CI runners

TLDR: --mount=type=cache makes RUN layers non-deterministic. On ephemeral runners the mount is always empty, so BuildKit can’t match layers from registry cache. Removing cache mounts dropped builds from ~27 min to ~2 min.

I’d been ignoring slow Docker builds on a project for a while — around 27 minutes per build on ephemeral GCP runners, most of that spent in uv sync downloading Python dependencies from scratch. Every single build. Even though BuildKit caching was configured.

The runners are ephemeral VMs — created for each job, then destroyed. No persistent BuildKit daemon between builds. The Dockerfiles used cache mounts for package managers:

RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen

And the workflow used GitHub Actions cache as the BuildKit backend:

cache-from: type=gha,scope=buildkit-${{ github.ref_name }}
cache-to: type=gha,scope=buildkit-${{ github.ref_name }},mode=max

This is the pattern you’ll find in most “optimise your Docker builds” guides. On ephemeral CI runners, it’s worse than useless.

Why cache mounts break registry caching Link to heading

--mount=type=cache attaches a persistent cache directory to a RUN step so package managers can reuse downloaded files. But the layer’s cache key includes the state of the mount, making it non-deterministic — the layer identity differs between machines.

On ephemeral runners the mount is always empty, so it provides zero benefit. The real damage is on the cache-matching side: when BuildKit pulls cache from a registry, it matches layers by content hash. A layer with --mount=type=cache can’t be matched this way because the mount state is part of its identity. So even with a warm registry cache, every RUN --mount=type=cache layer misses and runs from scratch.

The lockfile hadn’t changed, the dependencies hadn’t changed, but BuildKit couldn’t match the layer because the cache mount made it non-deterministic.

The fix: remove cache mounts Link to heading

# Before — non-deterministic
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen

# After — deterministic, cacheable by content hash
RUN uv sync --frozen

Without the cache mount, the layer is fully determined by its inputs. If uv.lock hasn’t changed, BuildKit matches from registry cache and skips it entirely. Same applies to npm, pnpm, etc.

I kept apt cache mounts though — system packages change rarely and those layers typically get cached at a higher level anyway.

Switch to registry cache Link to heading

I also switched from GHA cache to a container registry on the same cloud (in my case Google Artifact Registry):

Network locality. Pulling cache from the same cloud is significantly faster than GHA cache, which stores data in Azure blob storage — every read/write crosses cloud boundaries.
Size limits. GHA cache is limited to 10 GB per repo. With multiple branches and images, older entries get evicted and you’re back to cold builds.

cache-from: |
  type=registry,ref=europe-docker.pkg.dev/my-project/apps/myapp:buildcache-app-${{ steps.cache.outputs.branch }}
  type=registry,ref=europe-docker.pkg.dev/my-project/apps/myapp:buildcache-app-staging
cache-to: ${{ steps.cache.outputs.app }}

The fallback to the staging cache means new feature branches get a warm start rather than building cold. If your runners are on AWS, same idea with ECR.

Gate cache exports to deploy branches Link to heading

Writing cache from every feature branch pollutes the registry. I’d recommend only exporting on deploy branches:

- name: Resolve cache config
  id: cache
  run: |
    echo "branch=${GITHUB_REF_NAME//\//-}" >> $GITHUB_OUTPUT
    if [[ " master staging release " == *" $GITHUB_REF_NAME "* ]]; then
      echo "app=type=registry,ref=europe-docker.pkg.dev/my-project/apps/myapp:buildcache-app-${{ steps.cache.outputs.branch }},mode=max" >> $GITHUB_OUTPUT
    fi

When steps.cache.outputs.app is empty, BuildKit skips the export. Feature branches still read from cache — they just don’t write back.

Mirror base images Link to heading

Ephemeral runners hit Docker Hub’s anonymous pull rate limit fast. Each runner is a fresh VM with no auth tokens. I was seeing 429 Too Many Requests regularly.

The fix: mirror base images to your own registry and pass them as build args.

ARG base_image='python:3.12-slim'
FROM ${base_image} AS base

build-args: |
  base_image=europe-docker.pkg.dev/my-project/public/python:3.12-slim

The ARG default means local builds still pull from Docker Hub, which is fine for development.

Results Link to heading

	Before	After
Build time	~27 min	~2 min
Rate limit failures	Frequent	None
Cache exports	Every branch	Deploy branches only