lh-edge-tts

TotalClaw 作者 totalclaw

使用 Python edge-tts 进行文本到语音转换，从文本生成音频。支持多种语音、语言、速度调节、音调控制、字幕生成。在以下情况下使用： (1) 用户使用“tts”触发器或关键字请求音频/语音输出。 (2) 内容需要说出来而不是阅读（多任务处理、无障碍、驾驶、烹饪）。 (3) 用户想要特定的语音、速度、音调或格式进行 TTS 输出。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~liuhedev-lh-edge-tts

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~liuhedev-lh-edge-tts/file -o liuhedev-lh-edge-tts.md

# Edge-TTS Skill

## Overview

Generate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via Python edge-tts. Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation (SRT/VTT).

## Quick Start

When you detect TTS intent from triggers or user request:

1. **Call the tts tool** (Clawdbot built-in) to convert text to speech
2. The tool returns a MEDIA: path
3. Clawdbot routes the audio to the current channel

## Trigger Detection

Recognize "tts" keyword as TTS requests. The skill automatically filters out TTS-related keywords from text before conversion.

## Advanced Customization

### Using the Python Scripts

For more control, use the bundled scripts directly:

#### TTS Converter
```bash
cd scripts
python3 tts_converter.py "Your text" --voice en-US-AriaNeural --rate +10% -o output.mp3
python3 tts_converter.py -f input.txt --voice zh-CN-XiaoxiaoNeural -o output.mp3
python3 tts_converter.py -f input.txt -v zh-CN-YunxiNeural -r "+10%" -o output.mp3 -s output.vtt
```

**Options:**
- `--voice, -v`: Voice name (default: en-US-MichelleNeural)
- `--lang, -l`: Language code (e.g., en-US, zh-CN)
- `--rate, -r`: Rate adjustment (e.g., +10%, -20%)
- `--volume`: Volume adjustment (e.g., +0%, -50%)
- `--pitch`: Pitch adjustment (e.g., +0Hz, -10Hz)
- `--output, -o`: Output file path (default: temp file)
- `--subtitles, -s`: Save subtitles to file (.vtt or .srt)
- `--file, -f`: Read text from file
- `--proxy, -p`: Proxy URL
- `--timeout`: Receive timeout in seconds (default: 60)
- `--list-voices, -L`: List available voices
- `--lang-filter`: Filter voices by language (used with --list-voices)

#### Configuration Manager
```bash
cd scripts
python3 config_manager.py --set voice zh-CN-XiaoxiaoNeural
python3 config_manager.py --set rate "+10%"
python3 config_manager.py --get
python3 config_manager.py --reset
```

### Voice Selection

Common voices (use `--list-voices` for full list):

**English:**
- `en-US-MichelleNeural` (female, natural, **default**)
- `en-US-AriaNeural` (female, natural)
- `en-US-GuyNeural` (male, natural)
- `en-GB-SoniaNeural` (female, British)
- `en-GB-RyanNeural` (male, British)

**Chinese:**
- `zh-CN-XiaoxiaoNeural` (female)
- `zh-CN-YunyangNeural` (male, news style)
- `zh-CN-YunxiNeural` (male, natural)

**Other Languages:**
- `es-ES-ElviraNeural` (Spanish)
- `fr-FR-DeniseNeural` (French)
- `de-DE-KatjaNeural` (German)
- `ja-JP-NanamiNeural` (Japanese)
- `ar-SA-ZariyahNeural` (Arabic)

### Rate Guidelines

Rate values use percentage format:
- `"+0%"`: Normal speed (default)
- `"-20%"` to `"-10%"`: Slow, clear (tutorials, stories, accessibility)
- `"+10%"` to `"+20%"`: Slightly fast (summaries)
- `"+30%"` to `"+50%"`: Fast (news, efficiency)

## Resources

### scripts/tts_converter.py
Main TTS conversion script using edge-tts. Generates audio files with customizable voice, rate, volume, pitch. Supports subtitle generation (VTT/SRT) and voice listing.

### scripts/config_manager.py
Manages persistent user preferences for TTS settings. Stores config in `~/.tts-config.json`.

### Voice Testing
Test different voices and preview audio quality at: https://tts.travisvn.com/

## Installation

```bash
pip install edge-tts
```

## Workflow

1. **Detect intent**: Check for "tts" trigger or keyword in user message
2. **Choose method**: Use built-in `tts` tool for simple requests, or `scripts/tts_converter.py` for customization
3. **Generate audio**: Convert the target text
4. **Return to user**: The tts tool returns a MEDIA: path; Clawdbot handles delivery

## Testing

### Basic Test
```bash
cd scripts
python3 tts_converter.py "Hello, this is a test." -o test-output.mp3
```

### Chinese Test
```bash
python3 tts_converter.py "这是一个测试" -v zh-CN-XiaoxiaoNeural -o test-zh.mp3
```

### List Voices
```bash
python3 tts_converter.py --list-voices --lang-filter zh
```

### Configuration Test
```bash
python3 config_manager.py --get
python3 config_manager.py --set voice en-US-GuyNeural
python3 config_manager.py --get voice
```

## Notes

- edge-tts uses Microsoft Edge's online TTS service
- No API key needed (free service)
- Output is MP3 format by default
- Requires internet connection
- Supports subtitle generation (standard VTT/SRT format)
- **Temporary File Handling**: By default, audio files are saved to the system's temporary directory with unique filenames. Specify a custom output path with `--output` for permanent storage.
- **TTS keyword filtering**: Automatically filters out TTS-related keywords from text before conversion
- Neural voices (ending in `Neural`) provide higher quality

---

## 中文说明

# Edge-TTS 技能

## 概述

通过 Python edge-tts，使用 Microsoft Edge 的神经网络 TTS 服务生成高质量的文本转语音音频。支持多种语言、多种语音、可调节的语速/音调，以及字幕生成（SRT/VTT）。

## 快速开始

当你从触发词或用户请求中检测到 TTS 意图时：

1. **调用 tts 工具**（Clawdbot 内置）将文本转换为语音
2. 该工具返回一个 MEDIA: 路径
3. Clawdbot 将音频路由到当前频道

## 触发检测

将 "tts" 关键字识别为 TTS 请求。该技能在转换前会自动从文本中过滤掉与 TTS 相关的关键字。

## 高级自定义

### 使用 Python 脚本

如需更精细的控制，可直接使用捆绑的脚本：

#### TTS 转换器
```bash
cd scripts
python3 tts_converter.py "Your text" --voice en-US-AriaNeural --rate +10% -o output.mp3
python3 tts_converter.py -f input.txt --voice zh-CN-XiaoxiaoNeural -o output.mp3
python3 tts_converter.py -f input.txt -v zh-CN-YunxiNeural -r "+10%" -o output.mp3 -s output.vtt
```

**选项：**
- `--voice, -v`：语音名称（默认：en-US-MichelleNeural）
- `--lang, -l`：语言代码（例如 en-US、zh-CN）
- `--rate, -r`：语速调节（例如 +10%、-20%）
- `--volume`：音量调节（例如 +0%、-50%）
- `--pitch`：音调调节（例如 +0Hz、-10Hz）
- `--output, -o`：输出文件路径（默认：临时文件）
- `--subtitles, -s`：将字幕保存到文件（.vtt 或 .srt）
- `--file, -f`：从文件读取文本
- `--proxy, -p`：代理 URL
- `--timeout`：接收超时秒数（默认：60）
- `--list-voices, -L`：列出可用语音
- `--lang-filter`：按语言过滤语音（与 --list-voices 搭配使用）

#### 配置管理器
```bash
cd scripts
python3 config_manager.py --set voice zh-CN-XiaoxiaoNeural
python3 config_manager.py --set rate "+10%"
python3 config_manager.py --get
python3 config_manager.py --reset
```

### 语音选择

常用语音（使用 `--list-voices` 查看完整列表）：

**英语：**
- `en-US-MichelleNeural`（女声，自然，**默认**）
- `en-US-AriaNeural`（女声，自然）
- `en-US-GuyNeural`（男声，自然）
- `en-GB-SoniaNeural`（女声，英式）
- `en-GB-RyanNeural`（男声，英式）

**中文：**
- `zh-CN-XiaoxiaoNeural`（女声）
- `zh-CN-YunyangNeural`（男声，新闻风格）
- `zh-CN-YunxiNeural`（男声，自然）

**其他语言：**
- `es-ES-ElviraNeural`（西班牙语）
- `fr-FR-DeniseNeural`（法语）
- `de-DE-KatjaNeural`（德语）
- `ja-JP-NanamiNeural`（日语）
- `ar-SA-ZariyahNeural`（阿拉伯语）

### 语速指南

语速值使用百分比格式：
- `"+0%"`：正常速度（默认）
- `"-20%"` 到 `"-10%"`：慢速、清晰（教程、故事、无障碍）
- `"+10%"` 到 `"+20%"`：稍快（摘要）
- `"+30%"` 到 `"+50%"`：快速（新闻、效率）

## 资源

### scripts/tts_converter.py
使用 edge-tts 的主 TTS 转换脚本。生成音频文件，可自定义语音、语速、音量、音调。支持字幕生成（VTT/SRT）和语音列表。

### scripts/config_manager.py
管理 TTS 设置的持久化用户偏好。将配置存储在 `~/.tts-config.json` 中。

### 语音测试
在以下网址测试不同语音并预览音频质量：https://tts.travisvn.com/

## 安装

```bash
pip install edge-tts
```

## 工作流

1. **检测意图**：检查用户消息中是否有 "tts" 触发词或关键字
2. **选择方法**：简单请求使用内置 `tts` 工具，需自定义时使用 `scripts/tts_converter.py`
3. **生成音频**：转换目标文本
4. **返回给用户**：tts 工具返回一个 MEDIA: 路径；Clawdbot 负责交付

## 测试

### 基本测试
```bash
cd scripts
python3 tts_converter.py "Hello, this is a test." -o test-output.mp3
```

### 中文测试
```bash
python3 tts_converter.py "这是一个测试" -v zh-CN-XiaoxiaoNeural -o test-zh.mp3
```

### 列出语音
```bash
python3 tts_converter.py --list-voices --lang-filter zh
```

### 配置测试
```bash
python3 config_manager.py --get
python3 config_manager.py --set voice en-US-GuyNeural
python3 config_manager.py --get voice
```

## 注意事项

- edge-tts 使用 Microsoft Edge 的在线 TTS 服务
- 无需 API 密钥（免费服务）
- 默认输出为 MP3 格式
- 需要互联网连接
- 支持字幕生成（标准 VTT/SRT 格式）
- **临时文件处理**：默认情况下，音频文件以唯一文件名保存到系统的临时目录。使用 `--output` 指定自定义输出路径以进行永久存储。
- **TTS 关键字过滤**：在转换前自动从文本中过滤掉与 TTS 相关的关键字
- 神经网络语音（以 `Neural` 结尾）提供更高的质量