OpenClaw logo
TryOpenClaw
Try for $1

OpenClaw can now answer your phone (Voice Call setup with Twilio, the safe way)

By Linas Valiukas · April 27, 2026

Your OpenClaw agent can answer your phone. Not "OpenClaw triggers a Twilio webhook that triggers an n8n flow that triggers a TTS." Actually answer the phone. As of v2026.4.24 there's a bundled Voice Call plugin, a Gemini Live realtime backend, and a voicecall setup command. You bind a Twilio number, dial it from your phone, and the agent picks up.

This is the kind of feature that sounds like a demo and turns out to be one config edit away from running in your kitchen. It also sounds like a feature you absolutely should not point at a public phone number on the defaults. Both are true. Here's how to set it up and how to keep it from doing something embarrassing.

What ships in 2026.4.24

The release bundles four pieces that together make voice calls work end to end:

The same plumbing powers the new Google Meet plugin. Different transport (Meet instead of phone), same realtime voice loop, same agent-consult handoff. If you set up Voice Call you've basically set up Meet too.

The setup, in order

  1. Get a Twilio number with voice enabled. A US local number runs $1.15/month. International is more. Buy it in the Twilio console, note the SID.
  2. Get a Google AI Studio API key with Gemini Live access. Free tier covers test calls. You'll need it on the production billing tier for anything past the rate limits.
  3. Make sure your OpenClaw gateway is reachable from Twilio. Twilio webhooks need to reach your gateway over HTTPS. If you're self-hosting at home, that means a proper reverse proxy or a Cloudflare tunnel. Localhost won't work.
  4. Run voicecall setup. The command walks you through pasting in the Twilio account SID, auth token, phone number, and the Gemini Live key. It writes them into the secrets store, configures the webhook URL on the Twilio number, and registers the voice plugin with the gateway.
    openclaw voicecall setup
  5. Dry-run with voicecall smoke. This is the bit you actually want to run before you let anyone dial in.
    openclaw voicecall smoke
    Walks the whole stack: Twilio creds valid, webhook reachable, audio codecs negotiated, Gemini Live token mint succeeds, openclaw_agent_consult bridge resolves. Default mode reports problems and exits without dialing. Pass --live only when you've fixed everything the dry run flags.
  6. Dial your number. If smoke is green, calling the Twilio number should now connect to a Gemini Live voice loop. The voice model says hello, you talk, it answers. First few exchanges will feel canned - that's the realtime model working without your full agent's context. Ask it something specific to your data and watch latency jump as openclaw_agent_consult kicks in.

What the agent_consult handoff actually does

The thing that makes this useful instead of a toy is the bridge between the voice model and your full agent. Worth understanding because the latency and cost profile of voice calls flows from it.

Gemini Live is fast and cheap per minute, but it doesn't have your IMAP password, your CRM token, your memory files, or your skills. So it runs the conversational layer - turn-taking, intent detection, tone - and when the caller asks something specific, it calls openclaw_agent_consult like a tool. That tool call goes to your normal OpenClaw agent through the gateway, which does the actual work: search the calendar, look up the customer in HubSpot, check whether the package shipped. The result comes back as text, the voice model reads it out.

Two consequences:

Three settings to change before you let anyone call in

The defaults are a starting point. They are not safe defaults. Before you bind this to a public number:

Disable bootstrap context injection on the voice agent. The voice agent shouldn't see your full CLAUDE.md / MEMORY.md on every turn. 2026.4.24 added agents.defaults.contextInjection precisely for this:

[agents.voicecall]
contextInjection = "never"
allowed_tools = ["calendar.read", "crm.lookup", "openclaw_agent_consult"]

Lock the tool list to read-only. The voice agent gets to read your calendar, look up customers, search docs. It does not get email.send, calendar.write, payment skills, or anything that mutates state. If a real action needs to happen, the voice agent's job is to take a message; the followup happens after the call, with you in the loop.

Gate agent_consult by caller ID until you trust it. The Twilio webhook payload includes the caller's number. The voice plugin lets you set an allowlist:

[plugins.voicecall.policy]
agent_consult.allowlist = ["+15555550123", "+15555550199"]
agent_consult.outside_allowlist = "decline"

Calls from outside the allowlist still get answered. They just can't reach the full agent. Useful while you're tuning prompts and figuring out which questions you actually want the agent fielding.

The actual use case: missed-call replies

The shape of small-business adoption that's going to win this year isn't "an AI receptionist that handles everything." It's "an AI that picks up when you can't, takes a structured message, and texts you the highlights before the caller hangs up." The pattern most r/openclaw users are landing on:

  1. Caller dials the business number. Voicemail forwards to your Twilio number after 4 rings.
  2. Voice agent picks up. Greets, asks who's calling and what they need. Reads back what it heard.
  3. If the caller is asking something the agent can answer (hours, location, simple FAQ), openclaw_agent_consult resolves it from the knowledge base.
  4. If the caller needs you specifically, the agent says you'll be in touch, captures the callback number, and ends the call.
  5. Post-call, the OpenClaw agent runs a write-up skill: structured summary, sentiment, urgency, the caller's number, a draft text reply. Pings you on WhatsApp or Telegram.

This works because every action that touches the outside world (sending texts, scheduling things, charging cards) happens after the call, with a human approving. The voice agent's only job during the call is being a polite, competent listener. The hard work runs through the same small-business automation patterns people are already using through messaging.

What it costs

Line item Per minute 30-call month (90s avg)
Twilio voice (US inbound) $0.0085 $0.38
Twilio number rental flat $1.15
Gemini Live realtime audio ~$0.50 $22.50
Agent consults (~3 per call, Sonnet) ~$0.015 each $1.35
Post-call write-up skill ~$0.02 each $0.60
Total ~$26/month

Gemini Live is the line item that scales fastest. If you route calls to a cheaper realtime provider when one becomes available, that table changes a lot. If you're on a Pro Anthropic subscription with the CLI workaround in place, the agent_consult line item drops toward zero until you hit the session ceiling.

Things that will go wrong

The Google Meet variant

The same plumbing answers a different transport: Google Meet. Bundled in 2026.4.24 too. googlemeet doctor --oauth walks the personal Google auth, recover_current_tab picks up an already-open Meet without opening a duplicate, and the artifact/attendance exports pull conference records, recordings, transcripts, and smart notes out as markdown.

What this means: the agent that picks up your phone calls can also drop into your Meet, take notes, and post a summary to your CRM. The same agent_consult handoff applies. The same contextInjection lockdown applies, and matters even more in a meeting where it'll happily start summarizing things you didn't say.

Or skip the Twilio account, the Gemini key, and the smoke test

On TryOpenClaw.ai, voice answering is a toggle. We provision the phone number, run the voice plugin against pooled Gemini Live capacity, ship safe defaults (read-only tools, contextInjection off, caller-ID gating on), and route post-call follow-ups through your messaging app of choice. You don't see the Twilio config, you don't see the smoke test, and you don't have to debug a 15-second webhook timeout at 9pm on a Friday.

Flat $39/month. Your phone gets answered.

LV

Linas Valiukas

Founder of TryOpenClaw.ai. Software engineer writing about OpenClaw, self-hosting trade-offs, and what non-technical users actually need from an AI assistant. About the author →

Try it right now

This is just one example - OpenClaw adapts to whatever you need. Describe any workflow in plain language and it figures out the rest. Pay $1 for a full 24-hour trial, pick your messaging app, and start chatting with your own instance in under 60 seconds. Love it? $39/mo. Not for you? Walk away - we delete everything.

Try OpenClaw for $1

24h full access. No commitment. Cancel anytime.