OpenClaw logo
TryOpenClaw
Try for $1

Why your OpenClaw agent goes silent after a week

By Linas Valiukas · May 7, 2026

"First week with OpenClaw was honestly unreal. Now I'm about 2 weeks in and it keeps breaking on me. Agents become completely unresponsive. I'll send them tasks over Telegram and just get nothing back. Total radio silence."

That's a real Reddit post from this week. The comments are full of "same here." This is the most common shape of OpenClaw failure - silence with everything still reporting healthy. The dashboard says online. The gateway says online. Telegram says the bot is online. Nothing replies.

The pattern is real, but it's not random. There are about eight specific things that fail in week two. Each has a mechanism, a signal in the logs, and a fix. Here they are.

1. The Telegram poll loop hit a 409 and gave up

OpenClaw's @openclaw/telegram channel uses Telegram's getUpdates long-poll under grammy. The protocol is strict: only one process at a time can poll a given bot token. Two processes calling getUpdates simultaneously gets one of them a 409 Conflict: terminated by other getUpdates request.

This happens more often than you'd think. You restart the gateway and the old process doesn't release the lock cleanly. You SSH into the box and run a quick test against the same token. Your Docker compose brings up a sibling container before the previous one exited. A cron job on a second VPS that you forgot was running.

grammy's default behavior on 409 is to back off and retry, but if the conflict is persistent (the other process is alive and polling), the loop quietly stops. Your gateway stays up. Your dashboard stays green. New Telegram messages just sit there.

How to confirm:

openclaw channel status

# If telegram shows: state=stopped reason=409_conflict
openclaw channel restart telegram

If the conflict comes back the moment you restart, find the second process. Search for any other host calling Telegram's API with that token: curl https://api.telegram.org/bot<TOKEN>/getMe from each candidate machine. Whichever one returns the bot's identity is also probably the one stealing your updates.

2. The Discord gateway didn't reconnect after a network blip

Discord's gateway requires a heartbeat every 41 seconds. Miss two and you get a 4000/4006 close. The plugin auto-reconnects in most cases. The exception is the bug tracked in issue #51370: if the disconnect happens during the initial connection (network not ready when the gateway boots), the reconnect logic never engages. The plugin enters a permanent disconnected state.

This is a common failure mode on home setups behind residential connections, on a Raspberry Pi that boots faster than its WiFi associates, or on a VPS that comes up before its IPv6 route is ready. We've also seen it triggered by Cloudflare Tunnel restarts during the first 30 seconds of boot.

The 4008 family of close codes is documented in our 4008 fix guide, but the silent-Discord case looks different - no error appears in the user-facing UI. You only see it in openclaw gateway logs --tail 100:

[discord] gateway closed code=4006 reason=session_timeout
[discord] reconnect attempt 1/3 failed: ECONNREFUSED
[discord] reconnect attempt 2/3 failed: ECONNREFUSED
[discord] reconnect attempt 3/3 failed: ECONNREFUSED
[discord] giving up

Three attempts, two minutes apart, then the plugin sits there. Restart fixes it: openclaw channel restart discord. Permanent fix is to wait for #51370 to ship, or wrap the gateway in a systemd unit that retries on its own.

A separate but related Discord failure: gateway crashes from issue #37375 (unhandled rejection in GatewayPlugin.registerClient) take the whole gateway with them. If your agent went silent on every channel at once and the gateway log ends with an unhandled rejection, that's this one.

3. The 2026.4.29 upgrade pruned your channel deps

If your agent went silent within an hour of an upgrade, the cause is almost certainly the packaging regression that hit 2026.4.24 and again on 2026.4.29. The npm postinstall script ran postinstall-bundled-plugins.mjs and pruned files it had no business pruning. 4,116 JS files in the tarball, 2,499 left after install, 1,617 silently gone.

The user-facing symptom: the gateway boots normally. The dashboard shows everything green. But every channel that depends on a bundled plugin runtime fails to register. Your logs are full of:

Error: Cannot find module 'grammy'
Error: Cannot find module '@slack/web-api'
Error: Cannot find module 'discord.js'

Tracked in issue #75685. Fix: roll back to 2026.4.23 or forward to 2026.5.4 (the first stable since the regression). The full sequence of broken releases is in our May release notes writeup.

npm uninstall -g openclaw
npm cache clean --force
npm install -g [email protected]
openclaw doctor --fix
openclaw restart

4. Your reverse proxy is killing the WebSocket

OpenClaw's gateway uses long-lived WebSockets for both client devices and channel relay. Reverse proxies don't love long-lived idle connections, and most distributions ship with default timeouts that work fine for HTTP requests and break WebSockets.

Proxy Default idle timeout What you need to set
nginx 60s proxy_read_timeout 86400s
Cloudflare Tunnel 100s --proxy-keepalive-timeout 0
Traefik 90s forwardingTimeouts.idleTimeout: 0
AWS ALB 60s idle_timeout.timeout_seconds = 4000
Caddy ~5min flush_interval -1

The agent stays "up" because the gateway thinks the WebSocket is alive. The proxy already half-closed it. Messages from your phone don't make it in. The proxy doesn't return an error to OpenClaw - it just drops the bytes. You see no logs.

The simplest test: from your phone or laptop, open the OpenClaw web UI and watch the connection state for two minutes. If it flips between "connected" and "reconnecting" on a regular interval roughly matching one of the values above, that's your proxy. Fix it before you blame OpenClaw.

5. Your OAuth token expired and nothing told the user

The OpenAI Codex OAuth refresh token expires every 30 days. Anthropic's Claude OAuth window was on a similar cadence before the April 4 cutoff. When the token expires mid-session, OpenClaw retries the API call 3 times, surfaces a 401 in openclaw gateway logs, and then quietly stops calling the model. The Telegram or Discord channel gets nothing back. No error message reaches the user.

Worse: the 2026.5.5 doctor --fix repair rewrote valid openai-codex/* ChatGPT/Codex OAuth routes to openai/*, which routed users onto the OpenAI API-key path and broke OAuth-only GPT-5.5 setups. The agent silently failed every model call until 2026.5.6 reverted it.

To check:

openclaw auth status

# If you see: openai-codex expires_in=-2d
openclaw auth refresh openai-codex

# If 5.5 rewrote your route
openclaw models set openai-codex/gpt-5.5
openclaw config validate

The deeper problem is that there's no built-in alert. The troubleshooting guide covers the manual checks. A managed service watches the refresh window and re-authenticates before it expires.

6. The model provider rate-limited you and your agent gave up

Two flavors of this. First, the soft cap: you blew through your subscription quota. The provider returns a 429, OpenClaw retries with backoff, hits the cap on every retry, and silently stops the session. Your dashboard shows green. No reply on Telegram.

Second, "engine overloaded": the provider is having capacity problems. The Moonshot AI forum has a thread describing this exact failure on Kimi Claw Allegretto: the API returns "engine overloaded," OpenClaw's compaction layer fails on the next turn, and the session enters a stuck state. The agent doesn't crash. It also doesn't recover.

The May 7 api.cloud.ollama.com DNS outage made this concrete for a lot of people - any agent with Ollama cloud as primary or fallback model started timing out at 04:00 UTC. The Reddit thread has the temporary workaround: repin to openrouter/google/gemini-2.5-flash-lite as a fallback chain in openclaw.json until the upstream comes back.

Multi-provider fallback is the only real fix. One provider going down should not silently kill your agent.

7. A cron session hit the model context cap and stalled

Since 2026.2.17, cron jobs reuse existing sessions instead of starting fresh ones. Each run appends its output to the same transcript. Eventually the session blows past the model's hard context cap (200K for Claude, 128K for most others, 1M for Gemini Pro). At that point the cron just errors. The user-facing chat still works fine - because that's a different session - so it's hard to notice.

What "silent" looks like: scheduled tasks stop running, but the agent replies if you message it directly. The pattern we covered in the phantom token piece overlaps. Same root cause, different symptom: the cron worker has been quietly burning tokens for days, then hits a wall.

Check with:

openclaw cron status

# If you see: last_run=stalled context_used=199500/200000
openclaw cron reset <job-id>

Set requests_limit on every cron job and force a session reset on a fixed schedule (daily 4 AM is common). That keeps the context bounded.

8. The heartbeat process OOMed but the gateway didn't

OpenClaw's heartbeat is a separate process from the gateway. It runs the periodic health checks, the channel poll triggers, the cron scheduler, and the safeguard compaction passes. If it dies - usually OOM-killed on a 2GB VPS, sometimes wedged on a stuck file descriptor - the gateway keeps the WebSocket alive and the dashboard reports everything online. Nothing actually advances.

The signal in the logs is unsubtle once you know what to look for:

journalctl -u openclaw-heartbeat -n 50

# Last entry timestamp from 4 days ago = it's dead
systemctl restart openclaw-heartbeat

On Docker, the heartbeat runs in the same container as the gateway, so OOM takes both down. On bare-metal systemd installs, they're separate units and the gateway often outlives the heartbeat by hours. Add a memory cap on the systemd unit, point your monitoring at openclaw heartbeat status, or do what most people do and restart the whole thing daily via cron.

Quick diagnostic order

When the agent goes silent, run these in order. Most cases resolve in the first three steps.

Step Command What you're checking
1 openclaw channel status 409 conflict, dead poll loop, channel stopped
2 openclaw auth status Expired Codex/Anthropic OAuth token
3 openclaw heartbeat status Heartbeat process dead while gateway lives
4 openclaw gateway logs --tail 200 429s, 401s, "Cannot find module," reconnect attempts
5 openclaw cron status Stalled job, context cap hit, requests_limit blown
6 openclaw doctor --fix Token mismatch, plugin drift, config corruption

If the first six don't surface anything, check your reverse proxy timeouts (cause #4), your network at the time of the silence (cause #2), and whether you upgraded recently (cause #3).

The bigger pattern

Every cause on this page has the same shape. A subsystem failed quietly. A sibling subsystem kept reporting "online." Nobody told you. By the time you noticed, your agent had been silent for hours or days.

OpenClaw doesn't have a unified health model. The gateway, the heartbeat, the channel plugins, the auth layer, the cron worker, and the model router are all separate concerns with their own failure surfaces. You can keep all six green individually and have a dead agent. The dashboard wasn't built to show this.

There's also the update treadmill. OpenClaw shipped 13 releases in March, 7 more in April, 6 more in May - each one capable of introducing a new silent-failure mode. The Reddit thread for 2026.5.6 is full of "I'm waiting" precisely because people learned to stop trusting fresh releases.

The Reddit user running OpenClaw for his company said it best: "When you have dozens of these sort of automations in place, you can end up spending meaningful amount of time just debugging the issues you thought you were solving."

Or skip the diagnosis entirely

If you self-host, you'll hit each of these eight failure modes eventually. Usually in week two. The fixes are simple individually. Knowing which one to apply at 11 PM when your agent stopped replying is the hard part - and there's no way to know without being there for the silence.

TryOpenClaw.ai watches the channel poll loop, refreshes OAuth tokens before they expire, pins to known-good releases, monitors heartbeat liveness, sets fallback chains for provider outages, and alerts on rate limits. We catch silence before you do. That's what $39/month buys you. Hosting that watches itself, so you don't have to read this article at midnight when the heartbeat process dies.

LV

Linas Valiukas

Founder of TryOpenClaw.ai. Software engineer writing about OpenClaw, self-hosting trade-offs, and what non-technical users actually need from an AI assistant. About the author →

Try it right now

This is just one example - OpenClaw adapts to whatever you need. Describe any workflow in plain language and it figures out the rest. Pay $1 for a full 24-hour trial, pick your messaging app, and start chatting with your own instance in under 60 seconds. Love it? $39/mo. Not for you? Walk away - we delete everything.

Try OpenClaw for $1

24h full access. No commitment. Cancel anytime.