OpenClaw can now answer your phone (Voice Call setup with Twilio, the safe way)
By Linas Valiukas · April 27, 2026
Your OpenClaw agent can answer your phone. Not "OpenClaw triggers a Twilio webhook that triggers an n8n flow that triggers a TTS." Actually answer the phone. As of v2026.4.24 there's a bundled Voice Call plugin, a Gemini Live realtime backend, and a voicecall setup command. You bind a Twilio number, dial it from your phone, and the agent picks up.
This is the kind of feature that sounds like a demo and turns out to be one config edit away from running in your kitchen. It also sounds like a feature you absolutely should not point at a public phone number on the defaults. Both are true. Here's how to set it up and how to keep it from doing something embarrassing.
What ships in 2026.4.24
The release bundles four pieces that together make voice calls work end to end:
- A Voice Call plugin. Bundled. Ships with
voicecall setupand avoicecall smokecommand that's a dry run by default - you have to pass--liveto actually place a test call. - Twilio realtime transport. The plugin treats Twilio as a first-class voice transport, alongside Chrome WebRTC for browser-side calls. Paired-node Chrome support is in too, for the people running BlackHole/SoX audio bridges.
- A Gemini Live backend voice provider. Bidirectional audio plus function-call support. It's the audio brain of the call: turning speech into intent, taking text back, speaking it. It's not your full OpenClaw agent. It's a fast, narrow voice model with a hand on a phone.
- An
openclaw_agent_consulttool. Exposed inside the live voice session. When the caller asks something the voice model can't answer on its own, it calls this tool and the full OpenClaw agent steps in - reads memory, runs skills, hits the CRM - and the voice model relays the answer.
The same plumbing powers the new Google Meet plugin. Different transport (Meet instead of phone), same realtime voice loop, same agent-consult handoff. If you set up Voice Call you've basically set up Meet too.
The setup, in order
- Get a Twilio number with voice enabled. A US local number runs $1.15/month. International is more. Buy it in the Twilio console, note the SID.
- Get a Google AI Studio API key with Gemini Live access. Free tier covers test calls. You'll need it on the production billing tier for anything past the rate limits.
- Make sure your OpenClaw gateway is reachable from Twilio. Twilio webhooks need to reach your gateway over HTTPS. If you're self-hosting at home, that means a proper reverse proxy or a Cloudflare tunnel. Localhost won't work.
- Run
voicecall setup. The command walks you through pasting in the Twilio account SID, auth token, phone number, and the Gemini Live key. It writes them into the secrets store, configures the webhook URL on the Twilio number, and registers the voice plugin with the gateway.openclaw voicecall setup - Dry-run with
voicecall smoke. This is the bit you actually want to run before you let anyone dial in.
Walks the whole stack: Twilio creds valid, webhook reachable, audio codecs negotiated, Gemini Live token mint succeeds,openclaw voicecall smokeopenclaw_agent_consultbridge resolves. Default mode reports problems and exits without dialing. Pass--liveonly when you've fixed everything the dry run flags. - Dial your number. If smoke is green, calling the Twilio number should now connect to a Gemini Live voice loop. The voice model says hello, you talk, it answers. First few exchanges will feel canned - that's the realtime model working without your full agent's context. Ask it something specific to your data and watch latency jump as
openclaw_agent_consultkicks in.
What the agent_consult handoff actually does
The thing that makes this useful instead of a toy is the bridge between the voice model and your full agent. Worth understanding because the latency and cost profile of voice calls flows from it.
Gemini Live is fast and cheap per minute, but it doesn't have your IMAP password, your CRM token, your memory files, or your skills. So it runs the conversational layer - turn-taking, intent detection, tone - and when the caller asks something specific, it calls openclaw_agent_consult like a tool. That tool call goes to your normal OpenClaw agent through the gateway, which does the actual work: search the calendar, look up the customer in HubSpot, check whether the package shipped. The result comes back as text, the voice model reads it out.
Two consequences:
- Latency varies a lot. Voice-only turns are ~300ms. Turns that bounce through agent_consult into a slow tool (CRM API, IMAP search, skill execution) can hit 4-8 seconds. The voice model handles this with filler ("Let me check that for you") but the seam is real.
- Cost scales with consult depth, not call length. A 10-minute call where the caller chats about the weather costs Gemini Live audio minutes plus nothing else. A 2-minute call that triggers seven CRM lookups costs the same minutes plus seven full agent runs.
Three settings to change before you let anyone call in
The defaults are a starting point. They are not safe defaults. Before you bind this to a public number:
Disable bootstrap context injection on the voice agent. The voice agent shouldn't see your full CLAUDE.md / MEMORY.md on every turn. 2026.4.24 added agents.defaults.contextInjection precisely for this:
[agents.voicecall]
contextInjection = "never"
allowed_tools = ["calendar.read", "crm.lookup", "openclaw_agent_consult"] Lock the tool list to read-only. The voice agent gets to read your calendar, look up customers, search docs. It does not get email.send, calendar.write, payment skills, or anything that mutates state. If a real action needs to happen, the voice agent's job is to take a message; the followup happens after the call, with you in the loop.
Gate agent_consult by caller ID until you trust it. The Twilio webhook payload includes the caller's number. The voice plugin lets you set an allowlist:
[plugins.voicecall.policy]
agent_consult.allowlist = ["+15555550123", "+15555550199"]
agent_consult.outside_allowlist = "decline" Calls from outside the allowlist still get answered. They just can't reach the full agent. Useful while you're tuning prompts and figuring out which questions you actually want the agent fielding.
The actual use case: missed-call replies
The shape of small-business adoption that's going to win this year isn't "an AI receptionist that handles everything." It's "an AI that picks up when you can't, takes a structured message, and texts you the highlights before the caller hangs up." The pattern most r/openclaw users are landing on:
- Caller dials the business number. Voicemail forwards to your Twilio number after 4 rings.
- Voice agent picks up. Greets, asks who's calling and what they need. Reads back what it heard.
- If the caller is asking something the agent can answer (hours, location, simple FAQ),
openclaw_agent_consultresolves it from the knowledge base. - If the caller needs you specifically, the agent says you'll be in touch, captures the callback number, and ends the call.
- Post-call, the OpenClaw agent runs a write-up skill: structured summary, sentiment, urgency, the caller's number, a draft text reply. Pings you on WhatsApp or Telegram.
This works because every action that touches the outside world (sending texts, scheduling things, charging cards) happens after the call, with a human approving. The voice agent's only job during the call is being a polite, competent listener. The hard work runs through the same small-business automation patterns people are already using through messaging.
What it costs
| Line item | Per minute | 30-call month (90s avg) |
|---|---|---|
| Twilio voice (US inbound) | $0.0085 | $0.38 |
| Twilio number rental | flat | $1.15 |
| Gemini Live realtime audio | ~$0.50 | $22.50 |
| Agent consults (~3 per call, Sonnet) | ~$0.015 each | $1.35 |
| Post-call write-up skill | ~$0.02 each | $0.60 |
| Total | ~$26/month |
Gemini Live is the line item that scales fastest. If you route calls to a cheaper realtime provider when one becomes available, that table changes a lot. If you're on a Pro Anthropic subscription with the CLI workaround in place, the agent_consult line item drops toward zero until you hit the session ceiling.
Things that will go wrong
- Twilio webhook timeouts. If your gateway takes more than 15 seconds to respond, Twilio drops the call. Slow agent_consults will take you over the line. Set
browser.actionTimeoutMsappropriately and pre-warm the consult model on plugin start. - Audio codec mismatches. Twilio negotiates μ-law by default; Gemini Live wants Opus. The smoke test catches this. Don't skip it.
- The voice model improvising your business. Without the contextInjection lockdown, Gemini Live will happily invent your business hours, claim you do services you don't, quote prices you didn't authorize. Lock the tool list and write a tight system prompt that explicitly says "if you don't know, say you'll get the human in touch."
- International calls. Twilio's E.164 handling needs a country prefix. The default smoke test only validates US numbers.
- Hold music silence. If agent_consult takes more than 2-3 seconds, the call goes silent. The voice plugin has a
filler_phrasessetting. Use it.
The Google Meet variant
The same plumbing answers a different transport: Google Meet. Bundled in 2026.4.24 too. googlemeet doctor --oauth walks the personal Google auth, recover_current_tab picks up an already-open Meet without opening a duplicate, and the artifact/attendance exports pull conference records, recordings, transcripts, and smart notes out as markdown.
What this means: the agent that picks up your phone calls can also drop into your Meet, take notes, and post a summary to your CRM. The same agent_consult handoff applies. The same contextInjection lockdown applies, and matters even more in a meeting where it'll happily start summarizing things you didn't say.
Or skip the Twilio account, the Gemini key, and the smoke test
On TryOpenClaw.ai, voice answering is a toggle. We provision the phone number, run the voice plugin against pooled Gemini Live capacity, ship safe defaults (read-only tools, contextInjection off, caller-ID gating on), and route post-call follow-ups through your messaging app of choice. You don't see the Twilio config, you don't see the smoke test, and you don't have to debug a 15-second webhook timeout at 9pm on a Friday.
Flat
Founder of TryOpenClaw.ai. Software engineer writing about OpenClaw, self-hosting trade-offs, and what non-technical users actually need from an AI assistant. About the author →
Try it right now
This is just one example - OpenClaw adapts to whatever you need. Describe any workflow in plain language and it figures out the rest. Pay $1 for a full 24-hour trial, pick your messaging app, and start chatting with your own instance in under 60 seconds. Love it? $39/mo. Not for you? Walk away - we delete everything.
Try OpenClaw for $124h full access. No commitment. Cancel anytime.