OpenClaw without Anthropic: free and cheap model setups that actually work
By Linas Valiukas · April 15, 2026
Since Anthropic cut Claude subscriptions off from OpenClaw on April 4, r/openclaw has had one question on repeat: what are people actually using now?
We pulled the top threads, cross-checked against OpenRouter's usage data, and tested the main setups ourselves. Here's what works in April 2026, what it costs, and where the landmines are.
The short answer
Three paths, from most free to most paid:
- Free tier: OpenRouter's free models (GPT-OSS 20B, DeepSeek R1 Distill) or Gemini 2.5 Flash direct. Good for light use, kills your agent mid-task if you hit rate limits.
- Cheap paid: MiniMax M2.5, GLM-4.7, Kimi K2.5, or DeepSeek V3.2 via OpenRouter. $3-$15/month for most personal setups.
- Local: Gemma 4 or Qwen 3.5 on Ollama. Zero API bill, but you need the hardware and your laptop has to stay on.
Most people are landing on a mix: a cheap default model plus one free fallback, routed through OpenRouter.
Truly free options
OpenRouter's free tier
OpenRouter hosts 29 free models at time of writing, with a 20 requests-per-minute cap. No credit card needed to sign up. You get a single API key that works for all of them.
The two that actually handle OpenClaw's tool calls well:
- GPT-OSS 20B - OpenAI's Apache 2.0 release. Matches o3-mini on coding. The strongest free option for OpenClaw skill invocations right now.
- DeepSeek R1 Distill - Solid reasoning, handles long plans reasonably well, gets bogged down on very deep tool-calling chains.
Things to watch: a user on the best-free-model thread was burned by NVIDIA's free Nemotron on OpenRouter hallucinating mid-sentence. Free often means "abandoned or rate-limited hard." Pick one of the two above and have a fallback ready.
Gemini 2.5 Flash direct
Google's free tier is the most generous in the business: 1,500 requests per day. That's enough for almost any personal OpenClaw setup including regular heartbeats and moderate tool use.
The catch: tool calling on Flash is good, not great. It sometimes skips a step in a multi-call chain and summarizes instead of executing. If your agent is mostly doing inbox triage, calendar nudges, or Slack replies, you won't notice. If it's orchestrating a 20-step workflow, you will.
Cheap paid options (what most people actually use)
Real cost data from pricepertoken.com's OpenClaw leaderboard and the "what are you using now" Reddit thread:
MiniMax M2.5
About $0.26 input / $1.06 output per million tokens. That's roughly 38-47% of GLM-5's price at comparable tool-calling quality. The community's current pick for best value.
One Redditor reported switching to MiniMax 2.7 after trying GPT and ending up back where they started, which is a good sign for stability under real use. Downside: uptime is spotty, and it occasionally drops Chinese characters into non-Chinese text. One user complained about seeing 中文 in their Dutch replies.
Kimi K2.5
$0.50 / $2.40 per million tokens. Currently ranked #1 on the OpenClaw leaderboard by community vote, ahead of GLM 4.7 and Claude Opus 4.6. Kimi's tool-calling chain reliably goes past 200 steps without losing the thread, which matters if you run long async tasks.
GLM-4.7
Very consistent with structured JSON output, which OpenClaw skills rely on heavily. More expensive than MiniMax but fewer "why did it skip the tool call?" moments. A common recommendation on the OpenClaw free alternatives thread.
DeepSeek V3.2
The "thinking" mode bakes reasoning directly into tool calls, so skills execute more predictably. Cheap per token. Sign-up at platform.deepseek.com needs credits up front though - there's no free tier despite what older guides still say.
Local: zero API bill, some tradeoffs
If you have a decent machine, local is the only path to a genuine $0/month OpenClaw. Popular stacks on r/openclaw right now:
- Ollama + Gemma 4 E2B - Works on a 16GB MacBook. Gemma 4 tool calling is good enough for most personal workflows.
- Ollama + Qwen 3.5 - Stronger at coding tasks. Needs 24GB+ RAM for the useful sizes.
- LM Studio - OpenClaw v2026.4.12 ships a bundled LM Studio provider with onboarding, model discovery, and embeddings for memory search. Easier setup than Ollama if you're new to local models.
- GLM 5.1:cloud via Ollama - Ollama's cloud mode gives you GLM 5.1 without downloading weights. One Redditor posted their whole setup as:
ollama launch claude --model glm-5.1:cloud
What people underestimate: your laptop has to stay on. OpenClaw is always-on by design. If you close the lid, your agent goes offline. This is why local works for coding assistants but not for business agents.
The OpenRouter config (5 minutes)
OpenClaw has built-in OpenRouter support. You don't touch models.providers. You just set the key and reference models with openrouter/<author>/<slug>. See the official integration guide for the current API.
Edit ~/.openclaw/openclaw.json:
{
"provider": "openrouter",
"apiKey": "sk-or-v1-...",
"model": "openrouter/moonshot/kimi-k2.5",
"fallbackModel": "openrouter/openai/gpt-oss-20b",
"heartbeatModel": "openrouter/google/gemini-2.5-flash",
"budget": { "monthlyLimit": 15, "alertAt": 10 }
} The fallback matters. When Kimi or MiniMax times out (and they will), OpenClaw automatically retries on GPT-OSS 20B instead of dying. Your agent stays up.
Set the monthly limit on OpenRouter's dashboard too. The difference between a $5 surprise and a $300 one is usually just a missing cap.
Multi-model routing (the real cost win)
The biggest savings don't come from finding a cheap default. They come from matching the model to the task. You covered most of this in the cost optimization post, but post-Anthropic the model mix changed.
A routing setup that's been passed around a lot on r/openclaw:
- Heartbeat - Gemini 2.5 Flash free tier. 48 calls a day is well under the 1,500 limit.
- Default / routine tasks - MiniMax M2.5 or Kimi K2.5.
- Coding / complex reasoning - DeepSeek V3.2 or GPT-5.4.
- Vision / OCR - Gemini 2.5 Flash (it's multimodal on the free tier).
For purpose-built routing, ClawRouter is an open-source agent-native router for OpenClaw with 41+ models and sub-millisecond routing. Overkill for one user. Worth a look if you're running OpenClaw for a team.
Things that will bite you
Free tier rate limits kill mid-task
OpenRouter's 20 requests-per-minute sounds fine until OpenClaw spawns a subagent that fires 15 tool calls in 4 seconds. You hit the wall, the task errors out, and OpenClaw's retry logic replays the whole plan. Always configure a paid fallback.
Tool calling quality varies wildly
A model that scores well on MMLU can still be useless inside OpenClaw if it doesn't emit clean tool-call JSON. OpenRouter's tool-calling collection filters for models that actually do this reliably. Start there, don't just pick by price.
Some providers silently drop your context
A few OpenRouter-hosted free variants trim context aggressively to stay free. OpenClaw doesn't always notice. The agent starts "forgetting" things mid-session. If that happens, swap provider, not model.
The always-on problem hasn't changed
Cheaper models don't fix the always-on requirement. Your server still has to stay up. Your gateway still needs a public address. Your WebSocket still needs to not drop every 30 minutes. If you run OpenClaw on your laptop, the cheapest API in the world won't help when you close the lid.
What actual setups look like
Pulled from the last two weeks of r/openclaw threads:
- Honest-Cheesecake275: "Claude Code $20 to do the heavy lifting and OpenAI Oauth $20 to coordinate." Total: $40/month.
- GlitteringCoconut203: "Minimax 2.7. After the Anthropic ban, I began using Minimax, then I moved to GPT, and now running it again in Minimax 2.7."
- Prestigious_Yard_320:
ollama launch claude --model glm-5.1:cloud. Zero local RAM, no OpenRouter dependency. - Rent_South: Haiku 4.5 or Flash Lite 3.1 as default, with "task specific model rosters" and fallbacks.
- Krillian58: Wrapped Claude Code in a daemon that connects to Discord like OpenClaw. Claims better token efficiency. Still buggy.
The pattern: nobody is using one model for everything anymore. The Claude-only default is dead. Multi-provider with fallbacks is the new normal.
Or skip all of this
All of the above assumes you enjoy wiring up OpenRouter accounts, tuning fallbacks, checking quota resets, and watching Reddit for the next provider that's up or down.
TryOpenClaw.ai bundles model access, gateway uptime, and updates into one flat fee. You don't pick a model. You don't set a fallback. You don't watch rate limits. If GLM goes down tonight, you won't notice.
If you're spending more than two hours a month on model juggling, the math starts to favor managed hosting quickly. Full breakdown here.
If you'd rather keep self-hosting: pick MiniMax M2.5 as your default, GPT-OSS 20B as fallback, Gemini 2.5 Flash for heartbeats, and set a monthly cap. That'll get most people through the month for under $10.
Founder of TryOpenClaw.ai. Software engineer writing about OpenClaw, self-hosting trade-offs, and what non-technical users actually need from an AI assistant. About the author →
Try it right now
This is just one example - OpenClaw adapts to whatever you need. Describe any workflow in plain language and it figures out the rest. Pay $1 for a full 24-hour trial, pick your messaging app, and start chatting with your own instance in under 60 seconds. Love it? $39/mo. Not for you? Walk away - we delete everything.
Try OpenClaw for $124h full access. No commitment. Cancel anytime.