TL;DR Link to heading
My OpenClaw gateway kept calling an LLM provider I had retired. The new primary in openclaw.json was ignored because the long-running “heartbeat” session pinned the model selection at session creation time, both in the session index and in a model_change event at the top of the session transcript. Resetting the session let the global config take effect.
Motivation Link to heading
I noticed my self-hosted OpenClaw gateway was still hitting an LLM provider I had switched away from days earlier. openclaw.json had a different primary model and no fallbacks, but the metered usage disagreed. Each “heartbeat” tick cost close to ten cents and shipped 370k cached input tokens to the wrong provider.
What I expected Link to heading
When I edit agents.defaults.model.primary in openclaw.json and reload, the next agent turn should use the new primary. Sessions created earlier should re-resolve the primary on each tick rather than hold onto whatever was current at creation time.
What was actually happening Link to heading
OpenClaw stores per-session-key state in ~/.openclaw/agents/main/sessions/sessions.json, where each entry caches the resolved modelProvider and model. The first event written to the session transcript (<id>.jsonl) is also a model_change event with the same provider and model. Both were stamped weeks ago, when the old primary was still in place.
On every heartbeat tick the runtime:
- Looks up the session for
agent:main:maininsessions.json. - Replays the existing transcript as context, including the leading
model_changeevent. - Uses the cached model from that event for the LLM call.
Nothing in this path re-reads openclaw.json. Edits only apply to new sessions.
The bonus problem hiding underneath: the heartbeat session had compactionCount: 0 and a 2.27 MB transcript covering nine days of HEARTBEAT_OK ticks. Each tick replayed the whole thing on top of a 47k-character system prompt. Cache hits softened the bill, but I was paying for context I did not need on a session that should have stayed light.
The fix Link to heading
Two-step reset on the host:
# 1. Move the pinned transcript aside so the next tick has to start fresh
ts=$(date -u +%Y-%m-%dT%H-%M-%SZ)
mv ~/.openclaw/agents/main/sessions/<session-id>.jsonl{,.reset.$ts}
# 2. Drop the cached model from the session index
python3 - <<'EOF'
import json, shutil, time
p = "/home/ubuntu/.openclaw/agents/main/sessions/sessions.json"
shutil.copy(p, f"{p}.bak.{int(time.time())}")
d = json.load(open(p))
del d["agent:main:main"]
json.dump(d, open(p, "w"), indent=2)
EOF
The next heartbeat created a fresh session against the current config. Bootstrap dropped from 370k tokens per tick to about 12k. The provider switched.
How to spot this Link to heading
If you suspect a long-running session is pinned to a model you thought you removed, two signals together are diagnostic:
- The session’s index entry shows a
modelProvider/modelthat does not match your current global default. - The session transcript opens with a
model_changeevent for the same stale provider.
To find runaway sessions, count provider mentions in each transcript:
for f in ~/.openclaw/agents/main/sessions/*.jsonl; do
c=$(grep -c '"provider":"old-provider"' "$f")
[ "$c" -gt 0 ] && echo "$c $f"
done | sort -rn | head
Cross-reference the session id against sessions.json to find which session key it is bound to, then reset.
The takeaway Link to heading
Editing a config file does not mean every running session honours the change. Anything keyed by session, conversation, or workspace can cache config values at creation time and outlive later edits. When I change a default in a system with long-running sessions now, I look for the existing sessions and either reset them or accept that the change is forward-only.
Postscript: the upgrade rabbit hole Link to heading
After the fix I tried to upgrade OpenClaw to the latest stable release. That cost me another forty minutes:
- The upgrade triggered exhaustive plugin runtime-deps staging on the first message. Around 480 MB of cached chunks, ten-plus minutes of 100% CPU with no log output.
- Telegram outbound failed because the new HTTP client bypassed
/etc/hostsand tried IPv6 first on a VM with no IPv6 routing.
The IPv6 issue had a clean fix:
echo -e 'net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1' | sudo tee /etc/sysctl.d/99-disable-ipv6.conf
sudo sysctl -p /etc/sysctl.d/99-disable-ipv6.conf
The runtime-deps staging did not. OpenClaw’s GitHub had a fresh “blocker” issue covering the same regression, with several users pinning to the previous release I had skipped. I rolled back to that release and stopped trying to be the integration test for a stable tag the maintainers had not stabilised yet.
Related issues Link to heading
- openclaw/openclaw#74284: I filed this as a follow-up covering the heartbeat path specifically.
- openclaw/openclaw#51677:
sessions.json caches stale model after config change. Closed as implemented in v2026.4.22, but the fix targets the reply path. - openclaw/openclaw#67078:
/new initialised fresh Telegram DM session on the wrong model. Closed as implemented in v2026.4.20, fixing the/new//resetreset path.
Further reading Link to heading
- Linux: how to disable IPv6. Red Hat’s note on the exact sysctl knobs.
- Node
dns.lookupvsdns.resolve. Explains why/etc/hostsonly helps the former.