Voice Agent
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install skilldb:georges91560~voice-agent-v1cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/skilldb%3Ageorges91560~voice-agent-v1/file -o voice-agent-v1.mdGit 仓库获取源码
git clone https://github.com/openclaw/skills/commit/88926549fd5ce76a1e826f4479f8bd823dff08e3---
name: voice-agent
description: >
Gives the agent a complete voice layer using ElevenLabs. Clones the
principal's voice, generates MP3 audio from any text (VSL, podcasts,
video scripts, nurturing sequences), and deploys a conversational AI
agent for automated inbound and outbound calls via Twilio. Self-configures
by navigating elevenlabs.io autonomously via virtual-desktop — no manual
API key setup required. Use when the agent needs to speak, call leads,
answer prospects, or produce audio content at scale.
version: 1.0.0
author: Wesley Armando (Georges Andronescu)
license: MIT
metadata:
openclaw:
emoji: "🎙️"
security_level: L2
always: false
required_paths:
read:
- /workspace/voice/config.json
- /workspace/voice/scripts/
- /workspace/voice/samples/
- /workspace/.learnings/LEARNINGS.md
write:
- /workspace/voice/config.json
- /workspace/voice/scripts/
- /workspace/voice/output/
- /workspace/voice/calls/
- /workspace/.learnings/LEARNINGS.md
- /workspace/.learnings/ERRORS.md
- /workspace/AUDIT.md
network_behavior:
makes_requests: true
request_targets:
- https://elevenlabs.io (dashboard navigation via virtual-desktop)
- https://api.elevenlabs.io (ElevenLabs REST API — requires ELEVENLABS_API_KEY)
- https://api.twilio.com (Twilio API — optional, requires TWILIO credentials)
uses_agent_telegram: true
telegram_note: >
Notifies principal when voice clone is ready, audio file generated,
or call completed. Reports call transcripts and outcomes.
always: false
requires:
skills:
- virtual-desktop
optional_skills:
- acquisition-master
- funnel-builder
bins:
- python3
- ffmpeg
---
# Voice Agent — Autonomous Voice Layer for Wesley
> "The most trusted voice in any room is the one that sounds like you."
The agent doesn't just write content. It speaks it.
This skill gives Wesley a voice — his own voice — deployed at scale.
---
## What This Skill Does
```
LAYER 1 — VOICE SETUP (self-configuring)
Navigates elevenlabs.io autonomously via virtual-desktop
Logs in via Google OAuth or email/password
Creates API key, clones voice, configures agent
Writes all credentials to .env automatically
LAYER 2 — TEXT TO SPEECH
Converts any text to MP3 using Wesley's cloned voice
VSL scripts, podcast intros, video narrations
Email audio versions, social audio clips
LAYER 3 — CONVERSATIONAL AGENT (with Twilio)
Outbound calls to leads — automated follow-up
Inbound calls — answers 24/7, qualifies, reports
Natural turn-taking, handles objections, books calls
```
---
## PHASE 1 — SELF-CONFIGURATION
The agent runs this phase automatically on first use.
It uses virtual-desktop to navigate ElevenLabs and retrieve its own credentials.
### Step 1 — Login Detection
```
The agent checks /workspace/voice/config.json for existing credentials.
IF ELEVENLABS_API_KEY already in config.json:
→ Skip to Phase 2 (already configured)
IF NOT configured:
→ Run the self-configuration sequence below
```
### Step 2 — Connect to ElevenLabs
```
OPTION A — Google OAuth (preferred — zero credentials required)
Condition: virtual-desktop has an active Google session
Process:
1. virtual-desktop opens https://elevenlabs.io/app/sign-in
2. Clicks "Continue with Google"
3. Google session is already active in the browser
4. ElevenLabs dashboard loads automatically
5. Proceed to API key creation
OPTION B — Email / Password
Condition: ELEVENLABS_EMAIL and ELEVENLABS_PASSWORD in .env
Process:
1. virtual-desktop opens https://elevenlabs.io/app/sign-in
2. Fills email field with ELEVENLABS_EMAIL
3. Fills password field with ELEVENLABS_PASSWORD
4. Clicks "Sign in"
5. ElevenLabs dashboard loads
6. Proceed to API key creation
FALLBACK — Manual
If neither option works:
Log to AUDIT.md: "ElevenLabs login failed — manual setup required"
Notify principal via Telegram with exact steps to follow
```
### Step 3 — Create API Key
```
Navigation path (2026 ElevenLabs UI):
Dashboard → bottom-left corner → "Developers"
→ Tab "API Keys"
→ Button "Create API Key"
→ Name: "wesley-agent"
→ Click "Create"
→ Copy the generated key (shown only once)
→ Write to /workspace/voice/config.json:
{ "ELEVENLABS_API_KEY": "sk_..." }
→ Also write to .env:
ELEVENLABS_API_KEY=sk_...
```
### Step 4 — Clone the Voice
```
Requires: 3 MP3 files in /workspace/voice/samples/
Minimum: 30 seconds each, clear audio, no background noise
Optimal: 3-5 minutes total, varied sentences
Navigation path:
Dashboard → "Voices" → "Add Voice"
→ "Voice Clone" → "Instant Voice Clone"
→ Upload files from /workspace/voice/samples/
→ Name: "Wesley"
→ Click "Create Voice Clone"
→ Wait for processing (usually < 30 seconds)
→ Copy the Voice ID from the voice card
→ Write to config.json: { "ELEVENLABS_VOICE_ID": "abc123..." }
IF no MP3 files in /workspace/voice/samples/:
→ Log to AUDIT.md: "Voice samples missing"
→ Notify principal via Telegram:
"To clone your voice, record 3 audio clips of 30-60 seconds each
(read any text naturally), save as MP3, and upload to
/workspace/voice/samples/
Then run voice-agent again."
→ Pause and wait for samples
```
### Step 5 — Create Conversational Agent (optional — for calls)
```
Only runs if TWILIO_ACCOUNT_SID is in .env
Navigation path:
Dashboard → "Agents" → "Create Agent"
→ Name: "Wesley Sales Agent"
→ Voice: select "Wesley" (the cloned voice)
→ System prompt: read from /workspace/voice/templates/agent_prompt.md
→ Save → Copy Agent ID
→ Write to config.json: { "ELEVENLABS_AGENT_ID": "agent_..." }
Then connect Twilio:
Dashboard → "Agents" → select "Wesley Sales Agent"
→ "Phone Numbers" tab → "Add Phone Number"
→ Enter TWILIO_ACCOUNT_SID + TWILIO_AUTH_TOKEN
→ Select TWILIO_PHONE_NUMBER
→ ElevenLabs configures Twilio automatically
→ Write to config.json: { "TWILIO_CONFIGURED": true }
```
### Configuration Complete
```
When all steps are done, config.json contains:
{
"ELEVENLABS_API_KEY": "sk_...",
"ELEVENLABS_VOICE_ID": "...",
"ELEVENLABS_AGENT_ID": "...", ← if Twilio configured
"TWILIO_CONFIGURED": true, ← if Twilio configured
"setup_date": "YYYY-MM-DD",
"voice_name": "Wesley"
}
Telegram notification:
"🎙️ Voice Agent configured and ready.
Voice: Wesley (cloned)
TTS: active
Calls: [active / not configured]"
```
---
## VOICE CLONING — Complete Reference
This section gives the agent every command and navigation step
needed to clone the principal's voice. Two paths available —
use whichever fits the context.
---
### What You Need Before Starting
```
AUDIO SAMPLES — required for voice cloning
Minimum : 1 file × 30 seconds
Recommended : 3 files × 1-2 minutes each
Optimal (Professional Clone) : 30+ minutes total
Quality requirements :
→ Clear voice, no background noise or music
→ Natural speech rhythm (not reading robotically)
→ Consistent microphone distance
→ Format : MP3, WAV, M4A, FLAC all accepted
→ No multiple speakers in the same file
Where to put them :
/workspace/voice/samples/sample_01.mp3
/workspace/voice/samples/sample_02.mp3
/workspace/voice/samples/sample_03.mp3
MINIMUM PLAN REQUIRED
Instant Voice Clone (IVC) : Starter plan ($5/month) or above
Professional Voice Clone (PVC) : Creator plan ($22/month) or above
```
---
### PATH A — Terminal / API (fastest — no browser needed)
Use this path when ELEVENLABS_API_KEY is already in config.json.
The agent calls the API directly without virtual-desktop.
#### Step 1 — Install the SDK
```bash
pip install elevenlabs --break-system-packages
pip install requests --break-system-packages
```
#### Step 2 — Verify API key works
```bash
curl -s https://api.elevenlabs.io/v1/user -H "xi-api-key: $EL