vibevoice

TotalClaw 作者 estudiosdurero

使用 Microsoft VibeVoice 的本地西班牙语 TTS。从文本生成自然的语音音频，并针对 WhatsApp 语音消息进行了优化。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~javier887-vibevoice

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~javier887-vibevoice/file -o javier887-vibevoice.md

# VibeVoice TTS

Local text-to-speech using Microsoft's VibeVoice model. Generates natural Spanish voice audio, perfect for WhatsApp voice messages.

## Quick Start

```bash
# Basic usage
{baseDir}/scripts/vv.sh "Hola, esto es una prueba" -o /tmp/audio.ogg

# From file
{baseDir}/scripts/vv.sh -f texto.txt -o /tmp/audio.ogg

# Different voice
{baseDir}/scripts/vv.sh "Texto" -v en-Wayne -o /tmp/audio.ogg

# Adjust speed (0.5-2.0)
{baseDir}/scripts/vv.sh "Texto" -s 1.2 -o /tmp/audio.ogg
```

## Configuration

| Setting | Default | Description |
|---------|---------|-------------|
| Voice | `sp-Spk1_man` | Spanish male voice (slight Mexican accent) |
| Speed | `1.15` | 15% faster than normal |
| Format | `.ogg` | Opus codec for WhatsApp |

## Available Voices

Spanish:
- `sp-Spk1_man` - Male, slight Mexican accent (default)

English:
- `en-Wayne` - Male
- `en-Denise` - Female
- Other voices in `~/VibeVoice/demo/voices/streaming_model/`

## Output Formats

- `.ogg` - Opus codec (WhatsApp compatible, recommended)
- `.mp3` - MP3 format
- `.wav` - Uncompressed WAV

## For WhatsApp

Always use `.ogg` format with `asVoice=true` in the message tool:

```bash
# Generate
{baseDir}/scripts/vv.sh "Tu mensaje aquí" -o /tmp/mensaje.ogg

# Send via message tool
message action=send channel=whatsapp to="+34XXXXXXXXX" filePath=/tmp/mensaje.ogg asVoice=true
```

## Requirements

- **GPU**: NVIDIA with ~2GB VRAM
- **VibeVoice**: Installed at `~/VibeVoice`
- **ffmpeg**: For audio conversion
- **Python 3.10+**: With torch, torchaudio

## Performance

- RTF: ~0.24x (generates faster than realtime)
- 1 minute of audio ≈ 15 seconds to generate

## Notes

- First run loads model (~10s), subsequent runs are faster
- Audio rule: Only send voice if user requests it or speaks via audio
- Keep text under 1500 chars for best quality

---

## 中文说明

# VibeVoice TTS

使用 Microsoft 的 VibeVoice 模型实现本地文本转语音。生成自然的西班牙语语音音频，非常适合 WhatsApp 语音消息。

## 快速开始

```bash
# Basic usage
{baseDir}/scripts/vv.sh "Hola, esto es una prueba" -o /tmp/audio.ogg

# From file
{baseDir}/scripts/vv.sh -f texto.txt -o /tmp/audio.ogg

# Different voice
{baseDir}/scripts/vv.sh "Texto" -v en-Wayne -o /tmp/audio.ogg

# Adjust speed (0.5-2.0)
{baseDir}/scripts/vv.sh "Texto" -s 1.2 -o /tmp/audio.ogg
```

## 配置

| 设置 | 默认值 | 说明 |
|---------|---------|-------------|
| Voice | `sp-Spk1_man` | 西班牙语男声（略带墨西哥口音） |
| Speed | `1.15` | 比正常速度快 15% |
| Format | `.ogg` | 适用于 WhatsApp 的 Opus 编解码器 |

## 可用语音

西班牙语：
- `sp-Spk1_man` - 男声，略带墨西哥口音（默认）

英语：
- `en-Wayne` - 男声
- `en-Denise` - 女声
- 其他语音位于 `~/VibeVoice/demo/voices/streaming_model/`

## 输出格式

- `.ogg` - Opus 编解码器（兼容 WhatsApp，推荐）
- `.mp3` - MP3 格式
- `.wav` - 未压缩的 WAV

## 用于 WhatsApp

始终使用 `.ogg` 格式，并在 message 工具中设置 `asVoice=true`：

```bash
# Generate
{baseDir}/scripts/vv.sh "Tu mensaje aquí" -o /tmp/mensaje.ogg

# Send via message tool
message action=send channel=whatsapp to="+34XXXXXXXXX" filePath=/tmp/mensaje.ogg asVoice=true
```

## 要求

- **GPU**：约 2GB 显存的 NVIDIA 显卡
- **VibeVoice**：安装在 `~/VibeVoice`
- **ffmpeg**：用于音频转换
- **Python 3.10+**：包含 torch、torchaudio

## 性能

- RTF：约 0.24x（生成速度快于实时）
- 1 分钟音频 ≈ 15 秒生成

## 注意事项

- 首次运行会加载模型（约 10 秒），后续运行更快
- 音频规则：仅当用户请求或通过语音交流时才发送语音
- 文本保持在 1500 字符以内以获得最佳质量