voice-agent-pro-v3

GitHub 作者 Wesley Armando (Georges Andronescu) v3.1.0

Gives any OpenClaw agent a complete voice layer via ElevenLabs. Clones the principal's voice from audio samples, converts any text to MP3 audio (VSL, podcasts, video narrations, nurturing sequences), and deploys a conversational AI agent for automated inbound and outbound calls via Twilio. Use when the agent needs to generate audio content, call leads, or answer prospects 24/7 in the principal's cloned voice. Requires ELEVENLABS_API_KEY and ELEVENLABS_VOICE_ID — see Setup section in SKILL.md.

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install github:LeoYeAI~openclaw-master-skills~voice-agent-pro-v1

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/github%3ALeoYeAI~openclaw-master-skills~voice-agent-pro-v1/file -o voice-agent-pro-v1.md

# Voice Agent Pro V3 — Autonomous Voice Layer

> "The most trusted voice in any room is the one that sounds like you."

This skill gives the principal a voice — their own voice — deployed at scale.

```
LAYER 1 — VOICE SETUP
  Clones the principal's voice from MP3 samples via ElevenLabs API
  Requires ELEVENLABS_API_KEY and ELEVENLABS_VOICE_ID in .env
  Full setup guide with all commands: references/setup_guide.md

LAYER 2 — TEXT TO SPEECH
  Converts any text to MP3 using the principal's cloned voice
  VSL scripts, podcast intros, video narrations, email audio versions

LAYER 3 — CONVERSATIONAL AGENT (with Twilio)
  Outbound calls to leads — automated follow-up
  Inbound calls — answers 24/7, qualifies, reports
```

---

## SETUP — Required Before First Use

### Step 1 — Install dependencies

The agent runs these commands **inside the OpenClaw container**.
If you prefer to run them manually from your VPS host, use:
`docker exec openclaw-yyvg-openclaw-1 pip install elevenlabs --break-system-packages`

```bash
# Inside the container (agent runs this directly)
pip install elevenlabs --break-system-packages
pip install twilio --break-system-packages
apt-get update && apt-get install -y ffmpeg

# Verify
ffmpeg -version | head -1
python3 -c "from elevenlabs.client import ElevenLabs; print('✅ SDK ready')"
```

> Note: `--break-system-packages` is required on Ubuntu 24.04 / Debian 12+
> containers. If you get an "externally-managed-environment" error, this
> flag resolves it. On older systems, plain `pip install elevenlabs` works.

### Step 2 — Access ElevenLabs

```
OPTION A — Via virtual-desktop (if installed)
  If the virtual-desktop skill is installed and a Google session
  is active in the browser, the agent can navigate elevenlabs.io
  and create the API key automatically:
  → Go to: https://elevenlabs.io/app/sign-in
  → Click "Continue with Google" (uses active session)
  → Navigate: Developers → API Keys → Create API Key
  → Copy the key and run apply-config (Step 6)

OPTION B — Manual (recommended for first setup)
  → Go to: https://elevenlabs.io/app/settings/api-keys
  → Click "Create API Key" → name it → copy it
  → Add to your agent .env file: ELEVENLABS_API_KEY=sk_...
```

```bash
# Verify it works
curl -s https://api.elevenlabs.io/v1/user \
  -H "xi-api-key: $ELEVENLABS_API_KEY" | python3 -m json.tool
# Expected: JSON with subscription info — if 401, key is wrong
```

### Step 3 — Clone your voice

Provide 3 MP3 files of your voice (30-60 seconds each, clear audio)
in `/workspace/voice/samples/` before running this step.

#### Via Python SDK

```python
from elevenlabs.client import ElevenLabs
import json, os

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
voice = client.voices.ivc.create(
    name="[AGENT_VOICE_NAME]",
    description="[PRINCIPAL_NAME] cloned voice",
    files=[
        "/workspace/voice/samples/sample_01.mp3",
        "/workspace/voice/samples/sample_02.mp3",
        "/workspace/voice/samples/sample_03.mp3",
    ],
)
print(f"Voice ID: {voice.voice_id}")

# Save to config.json
with open("/workspace/voice/config.json") as f:
    config = json.load(f)
config["ELEVENLABS_VOICE_ID"] = voice.voice_id
with open("/workspace/voice/config.json", "w") as f:
    json.dump(config, f, indent=2)
print("✅ Voice ID saved to config.json")
```

#### Via curl

```bash
VOICE_ID=$(curl -s -X POST https://api.elevenlabs.io/v1/voices/add \
  -H "xi-api-key: $ELEVENLABS_API_KEY" \
  -F "name=[AGENT_VOICE_NAME]" \
  -F "description=[PRINCIPAL_NAME] cloned voice" \
  -F "files=@/workspace/voice/samples/sample_01.mp3" \
  -F "files=@/workspace/voice/samples/sample_02.mp3" \
  -F "files=@/workspace/voice/samples/sample_03.mp3" \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['voice_id'])")

echo "Voice ID: $VOICE_ID"
# Add to your agent .env: ELEVENLABS_VOICE_ID=$VOICE_ID
```

### Step 4 — List voices (verify clone)

```bash
curl -s https://api.elevenlabs.io/v1/voices \
  -H "xi-api-key: $ELEVENLABS_API_KEY" \
  | python3 -c "
import sys, json
for v in json.load(sys.stdin)['voices']:
    print(f"{v['voice_id']} | {v['name']} | {v['category']}")
"
```

### Step 5 — Test the clone

```bash
python3 -c "
from elevenlabs.client import ElevenLabs
import os, json

with open('/workspace/voice/config.json') as f:
    cfg = json.load(f)

client = ElevenLabs(api_key=cfg['ELEVENLABS_API_KEY'])
audio = client.text_to_speech.convert(
    text='Voice clone test successful.',
    voice_id=cfg['ELEVENLABS_VOICE_ID'],
    model_id='eleven_multilingual_v2',
    output_format='mp3_44100_128',
)
import os; os.makedirs('/workspace/voice/output', exist_ok=True)
with open('/workspace/voice/output/test_clone.mp3', 'wb') as f:
    for chunk in audio:
        f.write(chunk)
print('✅ Test audio: /workspace/voice/output/test_clone.mp3')
"
```

### Step 6 — Apply config and verify

```bash
# Apply credentials to config.json (no container restart needed)
python3 /workspace/voice/scripts/voice_generator.py apply-config \
  --api-key "$ELEVENLABS_API_KEY" \
  --voice-id "$ELEVENLABS_VOICE_ID"

# Verify everything is ready
python3 /workspace/voice/scripts/voice_generator.py status
# Expected:
#   API Key:    ✅ configured
#   Voice ID:   ✅ abc123...
```

> The skill reads credentials from config.json at runtime —
> no container restart needed after updating credentials.

> For browser dashboard navigation guide (step-by-step with screenshots):
> `references/setup_guide.md`

---

## PHASE 1 — TEXT TO SPEECH

Converts any text to audio using the principal's cloned voice.

### Use Cases

```
VSL (Video Sales Letter)
  Input:  /workspace/voice/scripts/vsl_[offer].md
  Output: /workspace/voice/output/vsl_[offer].mp3
  Use:    record your VSL once — never again

PODCAST INTRO / OUTRO
  Input:  /workspace/voice/scripts/podcast_[episode].md
  Output: /workspace/voice/output/podcast_[episode].mp3

VIDEO NARRATION
  Input:  text from content-creator-pro queue
  Output: MP3 ready for CapCut / video editor

EMAIL AUDIO VERSION
  Input:  email text from acquisition-master sequences
  Output: MP3 attached or linked in email

SOCIAL AUDIO CLIPS
  Input:  hook text from content-creator-pro
  Output: 15-30 second MP3 for Instagram, Twitter Spaces
```

### TTS Models

```
eleven_flash_v2_5        → 75ms latency — use for real-time / calls
eleven_multilingual_v2   → best quality — use for VSL / podcasts
eleven_v3                → most expressive — use for storytelling content
```

### TTS Process

```
1. Read script from /workspace/voice/scripts/[name].md
2. Split into chunks of max 900 characters (sentence boundaries)
3. Call ElevenLabs TTS API for each chunk
4. Concatenate chunks with ffmpeg → single MP3
5. Save to /workspace/voice/output/[name].mp3
6. Log to AUDIT.md: "TTS generated: [name].mp3 — [duration]s"
7. Notify principal via Telegram with file path
```

### CLI Usage

```bash
# Generate from text
python3 /workspace/voice/scripts/voice_generator.py tts \
  --text "Hello, this is [PRINCIPAL_NAME]." \
  --output /workspace/voice/output/hello.mp3

# Generate from script file
python3 /workspace/voice/scripts/voice_generator.py tts \
  --script /workspace/voice/scripts/vsl_offer.md \
  --model eleven_multilingual_v2

# Check status
python3 /workspace/voice/scripts/voice_generator.py status
```

---

## PHASE 2 — CONVERSATIONAL CALLS (requires Twilio)

### Setup Twilio

```bash
# Add to your agent .env file:
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890

# Twilio account: console.twilio.com
# Phone number: ~$1/month
```

### Connect Twilio to ElevenLabs Agent

```
1. Go to: https://elevenlabs.io/app/conversational-ai
2. Click "Create Agent"
3. Name: "[PRINCIPAL_NAME] Sales Agent"
4. Voice: select "[AGENT_VOICE_NAME]"
5. Agent instructions: paste content from templates/agent_prompt.md
6. Save → copy Agent ID → save to config.json
7. Tab "Phone Numbers" → "Add Phone Number"
8. Enter TWILIO_ACCOUNT_SID +