U2-tts

SkillDB 作者 aaiccee v1.0.5

Text-to-speech conversion using UniSound's TTS WebSocket API for generating high-quality Chinese Mandarin audio from text. Supports multiple voices, adjustable parameters, and real-time streaming synthesis.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install skilldb:aaiccee~u2-tts
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/skilldb%3Aaaiccee~u2-tts/file -o u2-tts.md
Git 仓库获取源码
git clone https://github.com/openclaw/skills/commit/5d4dffd92ff4929601cfec07ff739e3c9a789b04
# UniSound TTS - Text-to-Speech
## 云知声语音合成

Text-to-speech conversion using UniSound's TTS WebSocket API for generating high-quality Chinese Mandarin audio from text.

使用云知声 TTS WebSocket API 进行文本转语音转换,生成高质量中文普通话音频。

## When to Use This Skill

**Use UniSound TTS for**:
- Converting Chinese text to natural-sounding speech
- Generating audio for audiobooks, podcasts, or content creation
- Creating accessibility solutions for visually impaired users
- Building voice assistants or chatbot voice responses
- Batch processing text to audio files
- Custom speech synthesis with adjustable parameters (speed, volume, pitch, brightness)

**Do NOT use for**:
- Real-time speech recognition or transcription (use ASR skills instead)
- English language synthesis (optimized for Chinese Mandarin)
- Voice cloning or custom voice model training

**Use when**: The user needs text-to-speech conversion, asks for "语音合成" (speech synthesis), or mentions UniSound/云知声 TTS.

## Installation

Install Python dependencies before using this skill. From the skill directory (`skills/tts-tools`):

```bash
pip install websocket-client
```

Requires Python 3.6+.

## How to Use This Skill

**⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔**

1. **ONLY use UniSound TTS API** - Execute the script `python scripts/tts.py`
2. **NEVER synthesize speech directly** - Do NOT attempt local TTS synthesis
3. **NEVER offer alternatives** - Do NOT suggest "I can try another method" or similar
4. **IF API fails** - Display the error message and STOP immediately
5. **NO fallback methods** - Do NOT attempt text-to-speech any other way

If the script execution fails (API not configured, network error, etc.):
- Show the error message to the user
- Do NOT offer to help using your TTS capabilities
- Do NOT ask "Would you like me to try synthesizing it?"
- Simply stop and wait for user to fix the configuration

### Basic Workflow

1. **Configure credentials** (first time only):
   ```bash
   export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
   export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'
   ```

2. **Execute text-to-speech conversion**:
   ```bash
   python scripts/tts.py --text '今天天气怎么样'
   ```

   **Command options**:
   - `--text TEXT` - Text to convert to speech (default: '今天天气怎么样?')
   - `--voice VOICE` - Voice name (default: xiaofeng-base)
   - `--format FORMAT` - Output format: mp3, wav, pcm (default: mp3)
   - `--sample RATE` - Sample rate: 8k, 16k, 24k (default: 24k)
   - `--speed SPEED` - Speech speed 0-100 (default: 50)
   - `--volume VOLUME` - Volume level 0-100 (default: 50)
   - `--pitch PITCH` - Pitch level 0-100 (default: 50)
   - `--bright BRIGHT` - Brightness/tone 0-100 (default: 50)
   - `--appkey APPKEY` - Override appkey (default: UNISOUND_APPKEY env var)
   - `--secret SECRET` - Override secret (default: UNISOUND_SECRET env var)

3. **Output**:
   - Audio files are saved to `results/` directory
   - Filename format: `<timestamp>.<format>`
   - Example: `1234567890.mp3`

### Understanding the Output

**Audio Format Options**:
- **MP3**: Compressed, smaller file size, good quality - best for web and streaming
- **WAV**: Uncompressed, excellent quality - best for production and archival
- **PCM**: Raw audio data - best for further audio processing

**Sample Rates**:
- **24k**: High quality, default - recommended for most use cases
- **16k**: Standard quality - good balance of quality and size
- **8k**: Lower quality, smaller file size - suitable for telephony

### Usage Examples

**Example 1: Quick Start with Test Credentials**
```bash
# Set test credentials
export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'

# Convert text to speech
python scripts/tts.py --text '你好世界'
```
Output: `results/1234567890.mp3`

**Example 2: Custom Voice and Format**
```bash
python scripts/tts.py --text '今天天气怎么样' --voice xiaofeng-base --format wav
```
Output: High-quality WAV file with male voice

**Example 3: Adjusted Speech Parameters**
```bash
python scripts/tts.py --text '快速朗读' --speed 70 --volume 60 --pitch 50
```
Output: Faster speech with increased volume

**Example 4: High-Quality Audio Production**
```bash
python scripts/tts.py --text '高质量音频' --format wav --sample 24k --volume 60
```
Output: Production-quality WAV file at 24kHz

**Example 5: Command-line Credential Override**
```bash
python scripts/tts.py \
  --text '测试' \
  --appkey 'ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3' \
  --secret '5c12231cd279b35873a3ccecf9439118'
```

### How It Works

The script uses the UniSound TTS WebSocket API with the following workflow:

1. **Authenticate** using SHA256 signature (appkey + timestamp + secret)
   使用 SHA256 签名进行身份验证
2. **Establish WebSocket connection** to `wss://ws-stts.hivoice.cn/v1/tts`
   建立 WebSocket 连接到云知声 TTS 服务
3. **Send TTS request** with text and voice parameters
   发送包含文本和语音参数的 TTS 请求
4. **Receive streaming audio data** in binary chunks
   以二进制块形式接收流式音频数据
5. **Save audio file** to the results directory
   将音频文件保存到结果目录

### Available Voices

| Voice | Type | Description |
|-------|------|-------------|
| xiaofeng-base | Male | Standard male voice, clear and natural |
| xiaoyan | Female | Female voice options |
| xiaomei | Female | Alternative female voice |
| Custom voices | Various | Contact UniSound for more options |

### Adjustable Parameters

| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| speed | 0-100 | 50 | Speech speed (50 = normal, higher = faster) |
| volume | 0-100 | 50 | Volume level (50 = normal, higher = louder) |
| pitch | 0-100 | 50 | Pitch level (50 = normal, higher = higher) |
| bright | 0-100 | 50 | Brightness/tone (50 = normal) |

**Recommended settings**:
- Audiobooks: speed 45, pitch 50
- News/announcements: speed 55, volume 60, bright 60
- Accessibility: speed 35-40, volume 70
- Normal conversation: speed 50, all parameters 50

## First-Time Configuration

**When credentials are not configured**:

The script will show:
```
Error: AppKey and Secret are required!
Set them via --appkey/--secret arguments or UNISOUND_APPKEY/UNISOUND_SECRET environment variables.
```

### Test Credentials

For testing and evaluation, use these credentials:

用于测试和评估,请使用以下凭据:

```bash
export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'
```

> **⚠️ Important Security Notice / 重要安全提示**
>
> - **Test credentials only** — These are for testing and evaluation purposes
>   - **仅测试凭据**——这些凭据仅供测试和评估使用
> - **No sensitive data** — Never use with production or sensitive content
>   - **勿用于敏感数据**——切勿用于生产或敏感内容
> - **Get your own credentials** — For production use, contact UniSound
>   - **获取自己的凭据**——生产环境请联系云知声
> - **Data privacy** — Text is sent to UniSound servers for processing
>   - **数据隐私**——文本将发送至云知声服务器进行处理

### Obtaining Production Credentials

For production use, obtain API credentials from UniSound (云知声):

用于生产环境时,请从云知声获取 API 凭据:

1. **Contact UniSound** to obtain your API credentials
   联系云知声获取您的 API 凭据
   Visit: https://www.unisound.com/

2. **You will receive**:
   您将收到:
   - **AppKey**: Application key / 应用密钥
   - **Secret**: Secret key for authentication / 认证密钥

### Configuration Methods

**Method 1: Environment Variables (Recommended)**

*Linux/macOS:*
```bash
export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'
python scripts/tts.py --text '你好'
```

*Windows (PowerShell):*
```powershell
$env:UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3'
$env:UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118'
python scripts/tts.py --text '你好'
```

*Windows (CMD):*
```cmd
set UNISOUND_APPKEY=ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3
set UNISOUND_SECRET=5c12231cd279b35873a3ccecf9439118
python scripts/tts.py --text '你好'
```

**Method 2: .env File (Recommended for Development)**

Create a `.env` file in the project root:
``