OpenClaw: when a heartbeat session keeps your old model alive

TL;DR Link to heading

My OpenClaw gateway kept calling an LLM provider I had retired. The new primary in openclaw.json was ignored because the long-running “heartbeat” session pinned the model selection at session creation time, both in the session index and in a model_change event at the top of the session transcript. Resetting the session let the global config take effect.

Motivation Link to heading

I noticed my self-hosted OpenClaw gateway was still hitting an LLM provider I had switched away from days earlier. openclaw.json had a different primary model and no fallbacks, but the metered usage disagreed. Each “heartbeat” tick cost close to ten cents and shipped 370k cached input tokens to the wrong provider.

What I expected Link to heading

When I edit agents.defaults.model.primary in openclaw.json and reload, the next agent turn should use the new primary. Sessions created earlier should re-resolve the primary on each tick rather than hold onto whatever was current at creation time.

What was actually happening Link to heading

OpenClaw stores per-session-key state in ~/.openclaw/agents/main/sessions/sessions.json, where each entry caches the resolved modelProvider and model. The first event written to the session transcript (<id>.jsonl) is also a model_change event with the same provider and model. Both were stamped weeks ago, when the old primary was still in place.

On every heartbeat tick the runtime:

Looks up the session for agent:main:main in sessions.json.
Replays the existing transcript as context, including the leading model_change event.
Uses the cached model from that event for the LLM call.

Nothing in this path re-reads openclaw.json. Edits only apply to new sessions.

The bonus problem hiding underneath: the heartbeat session had compactionCount: 0 and a 2.27 MB transcript covering nine days of HEARTBEAT_OK ticks. Each tick replayed the whole thing on top of a 47k-character system prompt. Cache hits softened the bill, but I was paying for context I did not need on a session that should have stayed light.

The fix Link to heading

Two-step reset on the host:

# 1. Move the pinned transcript aside so the next tick has to start fresh
ts=$(date -u +%Y-%m-%dT%H-%M-%SZ)
mv ~/.openclaw/agents/main/sessions/<session-id>.jsonl{,.reset.$ts}

# 2. Drop the cached model from the session index
python3 - <<'EOF'
import json, shutil, time
p = "/home/ubuntu/.openclaw/agents/main/sessions/sessions.json"
shutil.copy(p, f"{p}.bak.{int(time.time())}")
d = json.load(open(p))
del d["agent:main:main"]
json.dump(d, open(p, "w"), indent=2)
EOF

The next heartbeat created a fresh session against the current config. Bootstrap dropped from 370k tokens per tick to about 12k. The provider switched.

How to spot this Link to heading

If you suspect a long-running session is pinned to a model you thought you removed, two signals together are diagnostic:

The session’s index entry shows a modelProvider/model that does not match your current global default.
The session transcript opens with a model_change event for the same stale provider.

To find runaway sessions, count provider mentions in each transcript:

for f in ~/.openclaw/agents/main/sessions/*.jsonl; do
  c=$(grep -c '"provider":"old-provider"' "$f")
  [ "$c" -gt 0 ] && echo "$c $f"
done | sort -rn | head

Cross-reference the session id against sessions.json to find which session key it is bound to, then reset.

The takeaway Link to heading

Editing a config file does not mean every running session honours the change. Anything keyed by session, conversation, or workspace can cache config values at creation time and outlive later edits. When I change a default in a system with long-running sessions now, I look for the existing sessions and either reset them or accept that the change is forward-only.

Postscript: the upgrade rabbit hole Link to heading

After the fix I tried to upgrade OpenClaw to the latest stable release. That cost me another forty minutes:

The upgrade triggered exhaustive plugin runtime-deps staging on the first message. Around 480 MB of cached chunks, ten-plus minutes of 100% CPU with no log output.
Telegram outbound failed because the new HTTP client bypassed /etc/hosts and tried IPv6 first on a VM with no IPv6 routing.

The IPv6 issue had a clean fix:

echo -e 'net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1' | sudo tee /etc/sysctl.d/99-disable-ipv6.conf
sudo sysctl -p /etc/sysctl.d/99-disable-ipv6.conf

The runtime-deps staging did not. OpenClaw’s GitHub had a fresh “blocker” issue covering the same regression, with several users pinning to the previous release I had skipped. I rolled back to that release and stopped trying to be the integration test for a stable tag the maintainers had not stabilised yet.

openclaw/openclaw#74284: I filed this as a follow-up covering the heartbeat path specifically.
openclaw/openclaw#51677: sessions.json caches stale model after config change. Closed as implemented in v2026.4.22, but the fix targets the reply path.
openclaw/openclaw#67078: /new initialised fresh Telegram DM session on the wrong model. Closed as implemented in v2026.4.20, fixing the /new//reset reset path.