U2-tts
Text-to-speech conversion using UniSound's TTS WebSocket API for generating high-quality Chinese Mandarin audio from text. Supports multiple voices, adjustable parameters, and real-time streaming synthesis.
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install skilldb:aaiccee~u2-ttscURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/skilldb%3Aaaiccee~u2-tts/file -o u2-tts.mdGit 仓库获取源码
git clone https://github.com/openclaw/skills/commit/5d4dffd92ff4929601cfec07ff739e3c9a789b04# UniSound TTS - Text-to-Speech ## 云知声语音合成 Text-to-speech conversion using UniSound's TTS WebSocket API for generating high-quality Chinese Mandarin audio from text. 使用云知声 TTS WebSocket API 进行文本转语音转换,生成高质量中文普通话音频。 ## When to Use This Skill **Use UniSound TTS for**: - Converting Chinese text to natural-sounding speech - Generating audio for audiobooks, podcasts, or content creation - Creating accessibility solutions for visually impaired users - Building voice assistants or chatbot voice responses - Batch processing text to audio files - Custom speech synthesis with adjustable parameters (speed, volume, pitch, brightness) **Do NOT use for**: - Real-time speech recognition or transcription (use ASR skills instead) - English language synthesis (optimized for Chinese Mandarin) - Voice cloning or custom voice model training **Use when**: The user needs text-to-speech conversion, asks for "语音合成" (speech synthesis), or mentions UniSound/云知声 TTS. ## Installation Install Python dependencies before using this skill. From the skill directory (`skills/tts-tools`): ```bash pip install websocket-client ``` Requires Python 3.6+. ## How to Use This Skill **⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔** 1. **ONLY use UniSound TTS API** - Execute the script `python scripts/tts.py` 2. **NEVER synthesize speech directly** - Do NOT attempt local TTS synthesis 3. **NEVER offer alternatives** - Do NOT suggest "I can try another method" or similar 4. **IF API fails** - Display the error message and STOP immediately 5. **NO fallback methods** - Do NOT attempt text-to-speech any other way If the script execution fails (API not configured, network error, etc.): - Show the error message to the user - Do NOT offer to help using your TTS capabilities - Do NOT ask "Would you like me to try synthesizing it?" - Simply stop and wait for user to fix the configuration ### Basic Workflow 1. **Configure credentials** (first time only): ```bash export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3' export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118' ``` 2. **Execute text-to-speech conversion**: ```bash python scripts/tts.py --text '今天天气怎么样' ``` **Command options**: - `--text TEXT` - Text to convert to speech (default: '今天天气怎么样?') - `--voice VOICE` - Voice name (default: xiaofeng-base) - `--format FORMAT` - Output format: mp3, wav, pcm (default: mp3) - `--sample RATE` - Sample rate: 8k, 16k, 24k (default: 24k) - `--speed SPEED` - Speech speed 0-100 (default: 50) - `--volume VOLUME` - Volume level 0-100 (default: 50) - `--pitch PITCH` - Pitch level 0-100 (default: 50) - `--bright BRIGHT` - Brightness/tone 0-100 (default: 50) - `--appkey APPKEY` - Override appkey (default: UNISOUND_APPKEY env var) - `--secret SECRET` - Override secret (default: UNISOUND_SECRET env var) 3. **Output**: - Audio files are saved to `results/` directory - Filename format: `<timestamp>.<format>` - Example: `1234567890.mp3` ### Understanding the Output **Audio Format Options**: - **MP3**: Compressed, smaller file size, good quality - best for web and streaming - **WAV**: Uncompressed, excellent quality - best for production and archival - **PCM**: Raw audio data - best for further audio processing **Sample Rates**: - **24k**: High quality, default - recommended for most use cases - **16k**: Standard quality - good balance of quality and size - **8k**: Lower quality, smaller file size - suitable for telephony ### Usage Examples **Example 1: Quick Start with Test Credentials** ```bash # Set test credentials export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3' export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118' # Convert text to speech python scripts/tts.py --text '你好世界' ``` Output: `results/1234567890.mp3` **Example 2: Custom Voice and Format** ```bash python scripts/tts.py --text '今天天气怎么样' --voice xiaofeng-base --format wav ``` Output: High-quality WAV file with male voice **Example 3: Adjusted Speech Parameters** ```bash python scripts/tts.py --text '快速朗读' --speed 70 --volume 60 --pitch 50 ``` Output: Faster speech with increased volume **Example 4: High-Quality Audio Production** ```bash python scripts/tts.py --text '高质量音频' --format wav --sample 24k --volume 60 ``` Output: Production-quality WAV file at 24kHz **Example 5: Command-line Credential Override** ```bash python scripts/tts.py \ --text '测试' \ --appkey 'ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3' \ --secret '5c12231cd279b35873a3ccecf9439118' ``` ### How It Works The script uses the UniSound TTS WebSocket API with the following workflow: 1. **Authenticate** using SHA256 signature (appkey + timestamp + secret) 使用 SHA256 签名进行身份验证 2. **Establish WebSocket connection** to `wss://ws-stts.hivoice.cn/v1/tts` 建立 WebSocket 连接到云知声 TTS 服务 3. **Send TTS request** with text and voice parameters 发送包含文本和语音参数的 TTS 请求 4. **Receive streaming audio data** in binary chunks 以二进制块形式接收流式音频数据 5. **Save audio file** to the results directory 将音频文件保存到结果目录 ### Available Voices | Voice | Type | Description | |-------|------|-------------| | xiaofeng-base | Male | Standard male voice, clear and natural | | xiaoyan | Female | Female voice options | | xiaomei | Female | Alternative female voice | | Custom voices | Various | Contact UniSound for more options | ### Adjustable Parameters | Parameter | Range | Default | Description | |-----------|-------|---------|-------------| | speed | 0-100 | 50 | Speech speed (50 = normal, higher = faster) | | volume | 0-100 | 50 | Volume level (50 = normal, higher = louder) | | pitch | 0-100 | 50 | Pitch level (50 = normal, higher = higher) | | bright | 0-100 | 50 | Brightness/tone (50 = normal) | **Recommended settings**: - Audiobooks: speed 45, pitch 50 - News/announcements: speed 55, volume 60, bright 60 - Accessibility: speed 35-40, volume 70 - Normal conversation: speed 50, all parameters 50 ## First-Time Configuration **When credentials are not configured**: The script will show: ``` Error: AppKey and Secret are required! Set them via --appkey/--secret arguments or UNISOUND_APPKEY/UNISOUND_SECRET environment variables. ``` ### Test Credentials For testing and evaluation, use these credentials: 用于测试和评估,请使用以下凭据: ```bash export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3' export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118' ``` > **⚠️ Important Security Notice / 重要安全提示** > > - **Test credentials only** — These are for testing and evaluation purposes > - **仅测试凭据**——这些凭据仅供测试和评估使用 > - **No sensitive data** — Never use with production or sensitive content > - **勿用于敏感数据**——切勿用于生产或敏感内容 > - **Get your own credentials** — For production use, contact UniSound > - **获取自己的凭据**——生产环境请联系云知声 > - **Data privacy** — Text is sent to UniSound servers for processing > - **数据隐私**——文本将发送至云知声服务器进行处理 ### Obtaining Production Credentials For production use, obtain API credentials from UniSound (云知声): 用于生产环境时,请从云知声获取 API 凭据: 1. **Contact UniSound** to obtain your API credentials 联系云知声获取您的 API 凭据 Visit: https://www.unisound.com/ 2. **You will receive**: 您将收到: - **AppKey**: Application key / 应用密钥 - **Secret**: Secret key for authentication / 认证密钥 ### Configuration Methods **Method 1: Environment Variables (Recommended)** *Linux/macOS:* ```bash export UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3' export UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118' python scripts/tts.py --text '你好' ``` *Windows (PowerShell):* ```powershell $env:UNISOUND_APPKEY='ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3' $env:UNISOUND_SECRET='5c12231cd279b35873a3ccecf9439118' python scripts/tts.py --text '你好' ``` *Windows (CMD):* ```cmd set UNISOUND_APPKEY=ce44uxf7g5eag2cv33qvlp5d22qrkgcezvgfp2q3 set UNISOUND_SECRET=5c12231cd279b35873a3ccecf9439118 python scripts/tts.py --text '你好' ``` **Method 2: .env File (Recommended for Development)** Create a `.env` file in the project root: ``