vibevoice
使用 Microsoft VibeVoice 的本地西班牙语 TTS。从文本生成自然的语音音频,并针对 WhatsApp 语音消息进行了优化。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~javier887-vibevoicecURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~javier887-vibevoice/file -o javier887-vibevoice.md# VibeVoice TTS
Local text-to-speech using Microsoft's VibeVoice model. Generates natural Spanish voice audio, perfect for WhatsApp voice messages.
## Quick Start
```bash
# Basic usage
{baseDir}/scripts/vv.sh "Hola, esto es una prueba" -o /tmp/audio.ogg
# From file
{baseDir}/scripts/vv.sh -f texto.txt -o /tmp/audio.ogg
# Different voice
{baseDir}/scripts/vv.sh "Texto" -v en-Wayne -o /tmp/audio.ogg
# Adjust speed (0.5-2.0)
{baseDir}/scripts/vv.sh "Texto" -s 1.2 -o /tmp/audio.ogg
```
## Configuration
| Setting | Default | Description |
|---------|---------|-------------|
| Voice | `sp-Spk1_man` | Spanish male voice (slight Mexican accent) |
| Speed | `1.15` | 15% faster than normal |
| Format | `.ogg` | Opus codec for WhatsApp |
## Available Voices
Spanish:
- `sp-Spk1_man` - Male, slight Mexican accent (default)
English:
- `en-Wayne` - Male
- `en-Denise` - Female
- Other voices in `~/VibeVoice/demo/voices/streaming_model/`
## Output Formats
- `.ogg` - Opus codec (WhatsApp compatible, recommended)
- `.mp3` - MP3 format
- `.wav` - Uncompressed WAV
## For WhatsApp
Always use `.ogg` format with `asVoice=true` in the message tool:
```bash
# Generate
{baseDir}/scripts/vv.sh "Tu mensaje aquí" -o /tmp/mensaje.ogg
# Send via message tool
message action=send channel=whatsapp to="+34XXXXXXXXX" filePath=/tmp/mensaje.ogg asVoice=true
```
## Requirements
- **GPU**: NVIDIA with ~2GB VRAM
- **VibeVoice**: Installed at `~/VibeVoice`
- **ffmpeg**: For audio conversion
- **Python 3.10+**: With torch, torchaudio
## Performance
- RTF: ~0.24x (generates faster than realtime)
- 1 minute of audio ≈ 15 seconds to generate
## Notes
- First run loads model (~10s), subsequent runs are faster
- Audio rule: Only send voice if user requests it or speaks via audio
- Keep text under 1500 chars for best quality
---
## 中文说明
# VibeVoice TTS
使用 Microsoft 的 VibeVoice 模型实现本地文本转语音。生成自然的西班牙语语音音频,非常适合 WhatsApp 语音消息。
## 快速开始
```bash
# Basic usage
{baseDir}/scripts/vv.sh "Hola, esto es una prueba" -o /tmp/audio.ogg
# From file
{baseDir}/scripts/vv.sh -f texto.txt -o /tmp/audio.ogg
# Different voice
{baseDir}/scripts/vv.sh "Texto" -v en-Wayne -o /tmp/audio.ogg
# Adjust speed (0.5-2.0)
{baseDir}/scripts/vv.sh "Texto" -s 1.2 -o /tmp/audio.ogg
```
## 配置
| 设置 | 默认值 | 说明 |
|---------|---------|-------------|
| Voice | `sp-Spk1_man` | 西班牙语男声(略带墨西哥口音) |
| Speed | `1.15` | 比正常速度快 15% |
| Format | `.ogg` | 适用于 WhatsApp 的 Opus 编解码器 |
## 可用语音
西班牙语:
- `sp-Spk1_man` - 男声,略带墨西哥口音(默认)
英语:
- `en-Wayne` - 男声
- `en-Denise` - 女声
- 其他语音位于 `~/VibeVoice/demo/voices/streaming_model/`
## 输出格式
- `.ogg` - Opus 编解码器(兼容 WhatsApp,推荐)
- `.mp3` - MP3 格式
- `.wav` - 未压缩的 WAV
## 用于 WhatsApp
始终使用 `.ogg` 格式,并在 message 工具中设置 `asVoice=true`:
```bash
# Generate
{baseDir}/scripts/vv.sh "Tu mensaje aquí" -o /tmp/mensaje.ogg
# Send via message tool
message action=send channel=whatsapp to="+34XXXXXXXXX" filePath=/tmp/mensaje.ogg asVoice=true
```
## 要求
- **GPU**:约 2GB 显存的 NVIDIA 显卡
- **VibeVoice**:安装在 `~/VibeVoice`
- **ffmpeg**:用于音频转换
- **Python 3.10+**:包含 torch、torchaudio
## 性能
- RTF:约 0.24x(生成速度快于实时)
- 1 分钟音频 ≈ 15 秒生成
## 注意事项
- 首次运行会加载模型(约 10 秒),后续运行更快
- 音频规则:仅当用户请求或通过语音交流时才发送语音
- 文本保持在 1500 字符以内以获得最佳质量