I’d been using HPAs for a while without really understanding them. They worked, so I never looked closely. Then I noticed a service was running way more replicas than the load justified — and I realised I didn’t actually know how to read what HPA was doing or why. Here’s what I learned debugging it.

What I saw Link to heading

kubectl get hpa
NAME     REFERENCE           TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
my-app   Deployment/my-app   124%/70%   1         10        6          45m

Six replicas for a service that barely had any traffic. The utilisation was showing 124%, which seemed wrong — the pods weren’t under heavy load at all.

Reading describe output Link to heading

The real debugging tool is kubectl describe hpa. Most of the useful information is in the Conditions and Events sections at the bottom:

kubectl describe hpa my-app
Metrics:
  Resource  cpu on pods  (as a percentage of request):  124% (62m) / 70%

Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully
                                            calculate a replica count from cpu
                                            resource utilization (percentage of request)
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than
                                            the maximum replica count

Everything looked healthy — ScalingActive: True, no errors. But that Metrics line told the story: 62m actual CPU, reported as 124% of request. The pods were using 62 millicores of CPU, which is nothing. So why was that 124%?

Because the CPU request was set to 50m. HPA calculates utilisation as actual usage divided by requested: 62m / 50m = 124%. The service looked overloaded on paper because the requests were too low, not because it was actually busy.

The fix Link to heading

The CPU request needed to reflect what the service actually uses at baseline:

# before - request too low, inflates utilisation
resources:
  requests:
    cpu: 50m
    memory: 256Mi
  limits:
    memory: 256Mi
# after - request matches realistic baseline usage
resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    memory: 256Mi

With a 200m request and 62m actual usage, utilisation drops to 31% — well under the 70% target. HPA scales back down to the minimum:

NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-app   Deployment/my-app   31%/70%   1         10        1          52m

The key insight: HPA doesn’t know how busy your service is. It only knows what percentage of its request is being used. If your requests are wrong, HPA’s scaling decisions will be wrong too. Getting resource requests right is a prerequisite for sensible autoscaling.

Other things that tripped me up Link to heading

These are gotchas I ran into over time as I set up HPAs on more services:

<unknown> metrics. If kubectl get hpa shows <unknown>/70% instead of a number, HPA can’t read metrics at all. The most common cause is pods with no CPU requests defined — HPA has nothing to divide by. Less commonly, metrics-server isn’t running. Check with kubectl top pods; if that fails, metrics-server is the problem.

Target not found. HPA targets a specific deployment by name. If the name in your HPA spec doesn’t match the actual deployment name (easy to get wrong with Helm template names), you’ll see ScalingActive: False with reason FailedGetScale. Double-check with kubectl get deployments and compare.

Stabilisation window confusion. HPA won’t scale down immediately after scaling up. The default scale-down stabilisation window is 5 minutes, meaning it picks the highest recommendation from the last 5 minutes before scaling down. This is intentional — it prevents flapping — but it confused me when I expected instant response. You can tune this with behavior.scaleDown.stabilizationWindowSeconds if needed.

Min equals max. If you set minReplicas: 2 and maxReplicas: 2, HPA has nowhere to go. I’ve done this accidentally when copying values between environments. It doesn’t error — it just silently does nothing.

Further reading Link to heading