Debugging Kubernetes HPA scaling

I’d been using HPAs for a while without really understanding them. They worked, so I never looked closely. Then I noticed a service was running way more replicas than the load justified — and I realised I didn’t actually know how to read what HPA was doing or why. Here’s what I learned debugging it.

What I saw Link to heading

kubectl get hpa

NAME     REFERENCE           TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
my-app   Deployment/my-app   124%/70%   1         10        6          45m

Six replicas for a service that barely had any traffic. The utilisation was showing 124%, which seemed wrong — the pods weren’t under heavy load at all.

Reading describe output Link to heading

The real debugging tool is kubectl describe hpa. Most of the useful information is in the Conditions and Events sections at the bottom:

kubectl describe hpa my-app

Metrics:
  Resource  cpu on pods  (as a percentage of request):  124% (62m) / 70%

Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully
                                            calculate a replica count from cpu
                                            resource utilization (percentage of request)
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than
                                            the maximum replica count

Everything looked healthy — ScalingActive: True, no errors. But that Metrics line told the story: 62m actual CPU, reported as 124% of request. The pods were using 62 millicores of CPU, which is nothing. So why was that 124%?

Because the CPU request was set to 50m. HPA calculates utilisation as actual usage divided by requested: 62m / 50m = 124%. The service looked overloaded on paper because the requests were too low, not because it was actually busy.

The fix Link to heading

The CPU request needed to reflect what the service actually uses at baseline:

# before - request too low, inflates utilisation
resources:
  requests:
    cpu: 50m
    memory: 256Mi
  limits:
    memory: 256Mi

# after - request matches realistic baseline usage
resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    memory: 256Mi

With a 200m request and 62m actual usage, utilisation drops to 31% — well under the 70% target. HPA scales back down to the minimum:

NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-app   Deployment/my-app   31%/70%   1         10        1          52m

The key insight: HPA doesn’t know how busy your service is. It only knows what percentage of its request is being used. If your requests are wrong, HPA’s scaling decisions will be wrong too. Getting resource requests right is a prerequisite for sensible autoscaling.

Debugging Kubernetes HPA scaling

What I saw Link to heading

Reading describe output Link to heading

The fix Link to heading

Other things that tripped me up Link to heading

Further reading Link to heading