elevenlabs-speech

TotalClaw 作者 totalclaw

使用 ElevenLabs AI 进行文本转语音和语音转文本。当用户想要将文本转换为语音、转录语音消息或使用多种语言的语音时使用。支持高品质AI语音，精准转录。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~jeffpignataro-miranda-elevenlabs-speech

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~jeffpignataro-miranda-elevenlabs-speech/file -o jeffpignataro-miranda-elevenlabs-speech.md

# ElevenLabs Speech

Complete voice solution — both TTS and STT using one API:
- **TTS**: Text-to-Speech (high-quality voices)
- **STT**: Speech-to-Text via Scribe (accurate transcription)

## Quick Start

### Environment Setup

Set your API key:
```bash
export ELEVENLABS_API_KEY="sk_..."
```

Or create `.env` file in workspace root.

### Text-to-Speech (TTS)

Convert text to natural-sounding speech:

```bash
python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3
```

With custom voice:
```bash
python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
```

### List Available Voices

```bash
python scripts/elevenlabs_speech.py voices
```

## Using in Code

```python
from scripts.elevenlabs_speech import ElevenLabsClient

client = ElevenLabsClient(api_key="sk_...")

# Basic TTS
result = client.text_to_speech(
    text="Hello from zerox",
    output_path="greeting.mp3"
)

# With custom settings
result = client.text_to_speech(
    text="Your text here",
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
    stability=0.5,
    similarity_boost=0.75,
    output_path="output.mp3"
)

# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
    print(f"{voice['name']}: {voice['voice_id']}")
```

## Popular Voices

| Voice ID | Name | Description |
|----------|------|-------------|
| `21m00Tcm4TlvDq8ikWAM` | Rachel | Natural, versatile (default) |
| `AZnzlk1XvdvUeBnXmlld` | Domi | Strong, energetic |
| `EXAVITQu4vr4xnSDxMaL` | Bella | Soft, soothing |
| `ErXwobaYiN019PkySvjV` | Antoni | Well-rounded |
| `MF3mGyEYCl7XYWbV9V6O` | Elli | Warm, friendly |
| `TxGEqnHWrfWFTfGW9XjX` | Josh | Deep, calm |
| `VR6AewLTigWG4xSOukaG` | Arnold | Authoritative |

## Voice Settings

- **stability** (0-1): Lower = more emotional, Higher = more stable
- **similarity_boost** (0-1): Higher = closer to original voice

Default: stability=0.5, similarity_boost=0.75

## Models

- `eleven_turbo_v2_5` - Fast, high quality (default)
- `eleven_multilingual_v2` - Best for non-English
- `eleven_monolingual_v1` - English only

## Integration with Telegram

When user sends text and wants voice reply:

```python
# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")

# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)
```

## Pricing

Check https://elevenlabs.io/pricing for current rates. Free tier available!

## Speech-to-Text (STT) with ElevenLabs Scribe

Transcribe voice messages using ElevenLabs Scribe:

### Transcribe Audio

```bash
python scripts/elevenlabs_scribe.py voice_message.ogg
```

With specific language:
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --language ara
```

With speaker diarization (multiple speakers):
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
```

### Using in Code

```python
from scripts.elevenlabs_scribe import ElevenLabsScribe

client = ElevenLabsScribe(api_key="sk-...")

# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])

# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")

# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)
```

### Supported Formats

- mp3, mp4, mpeg, mpga, m4a, wav, webm
- Max file size: 100 MB
- Works great with Telegram voice messages (`.ogg`)

### Language Support

Scribe supports 99 languages including:
- Arabic (`ara`)
- English (`eng`)
- Spanish (`spa`)
- French (`fra`)
- And many more...

Without language hint, it auto-detects.

## Complete Workflow Example

**User sends voice message → You reply with voice:**

```python
from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient

# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']

# 2. Process/understand the text
# ... your logic here ...

# 3. Generate response text
response_text = "Your response here"

# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")

# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)
```

## Pricing

Check https://elevenlabs.io/pricing for current rates:

**TTS (Text-to-Speech):**
- Free tier: 10,000 characters/month
- Paid plans available

**STT (Speech-to-Text) - Scribe:**
- Free tier available
- Check website for current pricing

---

## 中文说明

# ElevenLabs Speech

完整的语音解决方案 —— 使用同一套 API 同时实现 TTS 和 STT：
- **TTS**：文本转语音（高品质语音）
- **STT**：通过 Scribe 实现语音转文本（精准转录）

## 快速开始

### 环境配置

设置你的 API 密钥：
```bash
export ELEVENLABS_API_KEY="sk_..."
```

或在工作区根目录创建 `.env` 文件。

### 文本转语音 (TTS)

将文本转换为自然流畅的语音：

```bash
python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3
```

使用自定义语音：
```bash
python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
```

### 列出可用语音

```bash
python scripts/elevenlabs_speech.py voices
```

## 在代码中使用

```python
from scripts.elevenlabs_speech import ElevenLabsClient

client = ElevenLabsClient(api_key="sk_...")

# Basic TTS
result = client.text_to_speech(
    text="Hello from zerox",
    output_path="greeting.mp3"
)

# With custom settings
result = client.text_to_speech(
    text="Your text here",
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
    stability=0.5,
    similarity_boost=0.75,
    output_path="output.mp3"
)

# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
    print(f"{voice['name']}: {voice['voice_id']}")
```

## 热门语音

| Voice ID | 名称 | 描述 |
|----------|------|-------------|
| `21m00Tcm4TlvDq8ikWAM` | Rachel | 自然、通用（默认） |
| `AZnzlk1XvdvUeBnXmlld` | Domi | 强劲、富有活力 |
| `EXAVITQu4vr4xnSDxMaL` | Bella | 柔和、舒缓 |
| `ErXwobaYiN019PkySvjV` | Antoni | 均衡全面 |
| `MF3mGyEYCl7XYWbV9V6O` | Elli | 温暖、友好 |
| `TxGEqnHWrfWFTfGW9XjX` | Josh | 低沉、平静 |
| `VR6AewLTigWG4xSOukaG` | Arnold | 权威 |

## 语音设置

- **stability**（0-1）：值越低越富有情感，值越高越稳定
- **similarity_boost**（0-1）：值越高越接近原始语音

默认：stability=0.5，similarity_boost=0.75

## 模型

- `eleven_turbo_v2_5` —— 快速、高品质（默认）
- `eleven_multilingual_v2` —— 非英语场景的最佳选择
- `eleven_monolingual_v1` —— 仅限英语

## 与 Telegram 集成

当用户发送文本并希望获得语音回复时：

```python
# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")

# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)
```

## 价格

当前费率请查看 https://elevenlabs.io/pricing 。提供免费套餐！

## 使用 ElevenLabs Scribe 进行语音转文本 (STT)

使用 ElevenLabs Scribe 转录语音消息：

### 转录音频

```bash
python scripts/elevenlabs_scribe.py voice_message.ogg
```

指定语言：
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --language ara
```

启用说话人分离（多个说话人）：
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
```

### 在代码中使用

```python
from scripts.elevenlabs_scribe import ElevenLabsScribe

client = ElevenLabsScribe(api_key="sk-...")

# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])

# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")

# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)
```

### 支持的格式

- mp3、mp4、mpeg、mpga、m4a、wav、webm
- 最大文件大小：100 MB
- 与 Telegram 语音消息（`.ogg`）配合良好

### 语言支持

Scribe 支持 99 种语言，包括：
- 阿拉伯语 (`ara`)
- 英语 (`eng`)
- 西班牙语 (`spa`)
- 法语 (`fra`)
- 以及更多……

不提供语言提示时，它会自动检测。

## 完整工作流示例

**用户发送语音消息 → 你用语音回复：**

```python
from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient

# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']

# 2. Process/understand the text
# ... your logic here ...

# 3. Generate response text
response_text = "Your response here"

# 4. Conver