clonev
使用 Coqui XTTS v2 克隆任何语音并生成语音。超级简单 - 提供语音样本(6-30 秒 WAV)和文本,获取克隆语音音频。支持 14 种以上语言。当用户想要 (1) 克隆自己的声音或其他人的声音,(2) 生成听起来像特定人的语音,(3) 创建个性化语音消息,(4) 多语言语音克隆(使用克隆的语音讲任何语言)时使用。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~instant-picture-clonevcURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~instant-picture-clonev/file -o instant-picture-clonev.md# CloneV Skill - Voice Cloning Made Simple
## ⚠️ CRITICAL INSTRUCTIONS FOR AI MODELS
**DO NOT** try to use Docker containers directly.
**DO NOT** try to interact with `coqui-xtts` container - it is broken and restarting.
**DO NOT** try to use APIs or servers.
**ONLY USE THE SCRIPT:** `scripts/clonev.sh`
The script handles everything automatically. Just call it with text, voice sample, and language.
---
## What This Skill Does
Clones any voice from a short audio sample and generates new speech in that voice.
**Input**:
- Text to speak
- Voice sample (WAV file, 6-30 seconds)
- Language code
**Output**: OGG voice file (cloned voice speaking the text)
**Works with**: Any voice! Yours, a celebrity, a character, etc.
---
## The ONE Command You Need
```bash
$(scripts/clonev.sh "Your text here" /path/to/voice_sample.wav language)
```
That's it! Nothing else needed.
---
## Step-by-Step Usage (FOR AI MODELS)
### Step 1: Get the required inputs
- Text to speak (from user)
- Path to voice sample WAV file (from user)
- Language code (from user or default to `en`)
### Step 2: Run the script
```bash
VOICE_FILE=$(scripts/clonev.sh "TEXT_HERE" "/path/to/sample.wav" LANGUAGE)
```
### Step 3: Use the output
The variable `$VOICE_FILE` now contains the path to the generated OGG file.
---
## Complete Working Examples
### Example 1: Clone voice and send to Telegram
```bash
# Generate cloned voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Hello, this is my cloned voice!" "/mnt/c/TEMP/Recording 25.wav" en)
# Send to Telegram (as voice message)
message action=send channel=telegram asVoice=true filePath="$VOICE"
```
### Example 2: Clone voice in Czech
```bash
# Generate Czech voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj, tohle je můj hlas" "/mnt/c/TEMP/Recording 25.wav" cs)
# Send
message action=send channel=telegram asVoice=true filePath="$VOICE"
```
### Example 3: Full workflow with check
```bash
#!/bin/bash
# Generate voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en)
# Verify file was created
if [ -f "$VOICE" ]; then
echo "Success! Voice file: $VOICE"
ls -lh "$VOICE"
else
echo "Error: Voice file not created"
fi
```
---
## Common Language Codes
| Code | Language | Example Usage |
|------|----------|---------------|
| `en` | English | `scripts/clonev.sh "Hello" sample.wav en` |
| `cs` | Czech | `scripts/clonev.sh "Ahoj" sample.wav cs` |
| `de` | German | `scripts/clonev.sh "Hallo" sample.wav de` |
| `fr` | French | `scripts/clonev.sh "Bonjour" sample.wav fr` |
| `es` | Spanish | `scripts/clonev.sh "Hola" sample.wav es` |
Full list: en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko
---
## Voice Sample Requirements
- **Format**: WAV file
- **Length**: 6-30 seconds (optimal: 10-15 seconds)
- **Quality**: Clear audio, no background noise
- **Content**: Any speech (the actual words don't matter)
**Good samples**:
- ✅ Recording of someone speaking clearly
- ✅ No music or noise in background
- ✅ Consistent volume
**Bad samples**:
- ❌ Music or songs
- ❌ Heavy background noise
- ❌ Very short (< 6 seconds)
- ❌ Very long (> 30 seconds)
---
## ⚠️ Important Notes
### Model Download
- First use downloads ~1.87GB model (one-time)
- Model is stored at: `/mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/`
- Status: ✅ Already downloaded
### Processing Time
- Takes 20-40 seconds depending on text length
- This is normal - voice cloning is computationally intensive
---
## Troubleshooting
### "Command not found"
Make sure you're in the skill directory or use full path:
```bash
/home/bernie/clawd/skills/clonev/scripts/clonev.sh "text" sample.wav en
```
### "Voice sample not found"
- Check the path to the WAV file
- Use absolute paths (starting with `/`)
- Ensure file exists: `ls -la /path/to/sample.wav`
### "Model not found"
The model should auto-download. If not:
```bash
cd /mnt/c/TEMP/Docker-containers/coqui-tts
docker run --rm --entrypoint "" \
-v $(pwd)/models-xtts:/root/.local/share/tts \
ghcr.io/coqui-ai/tts:latest \
python3 -c "from TTS.api import TTS; TTS('tts_models/multilingual/multi-dataset/xtts_v2')"
```
### Poor voice quality
- Use clearer voice sample
- Ensure no background noise
- Try different sample (some voices clone better)
---
## Quick Reference Card (FOR AI MODELS)
```
USER: "Clone my voice and say 'hello'"
→ Get: sample path, text="hello", language="en"
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "hello" "/path/to/sample.wav" en)
→ Result: $VOICE contains path to OGG file
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"
```
```
USER: "Make me speak Czech"
→ Get: sample path, text="Ahoj", language="cs"
→ Run: VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj" "/path/to/sample.wav" cs)
→ Send: message action=send channel=telegram asVoice=true filePath="$VOICE"
```
---
## Output Location
Generated files are saved to:
```
/mnt/c/TEMP/Docker-containers/coqui-tts/output/clonev_output.ogg
```
The script returns this path, so you can use it directly.
---
## Summary
1. **ONLY use the script**: `scripts/clonev.sh`
2. **NEVER** try to use Docker containers directly
3. **NEVER** try to interact with the `coqui-xtts` container
4. Script handles everything automatically
5. Returns path to OGG file ready to send
**Simple. Just use the script.**
---
*Clone any voice. Speak any language. Just use the script.*
---
## 中文说明
# CloneV 技能 - 让语音克隆变得简单
## ⚠️ 给 AI 模型的关键说明
**不要**尝试直接使用 Docker 容器。
**不要**尝试与 `coqui-xtts` 容器交互——它已损坏并正在不断重启。
**不要**尝试使用 API 或服务器。
**只使用脚本:** `scripts/clonev.sh`
该脚本会自动处理一切。只需用文本、语音样本和语言来调用它。
---
## 此技能的功能
从一段简短的音频样本中克隆任何语音,并用该语音生成新的语音。
**输入**:
- 要朗读的文本
- 语音样本(WAV 文件,6-30 秒)
- 语言代码
**输出**:OGG 语音文件(以克隆语音朗读文本)
**适用于**:任何语音!你自己的、名人的、角色的等等。
---
## 你需要的唯一命令
```bash
$(scripts/clonev.sh "Your text here" /path/to/voice_sample.wav language)
```
就是这样!不需要其他任何东西。
---
## 分步用法(给 AI 模型)
### 第 1 步:获取所需输入
- 要朗读的文本(来自用户)
- 语音样本 WAV 文件的路径(来自用户)
- 语言代码(来自用户,或默认为 `en`)
### 第 2 步:运行脚本
```bash
VOICE_FILE=$(scripts/clonev.sh "TEXT_HERE" "/path/to/sample.wav" LANGUAGE)
```
### 第 3 步:使用输出
变量 `$VOICE_FILE` 现在包含生成的 OGG 文件的路径。
---
## 完整可运行示例
### 示例 1:克隆语音并发送到 Telegram
```bash
# Generate cloned voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Hello, this is my cloned voice!" "/mnt/c/TEMP/Recording 25.wav" en)
# Send to Telegram (as voice message)
message action=send channel=telegram asVoice=true filePath="$VOICE"
```
### 示例 2:用捷克语克隆语音
```bash
# Generate Czech voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Ahoj, tohle je můj hlas" "/mnt/c/TEMP/Recording 25.wav" cs)
# Send
message action=send channel=telegram asVoice=true filePath="$VOICE"
```
### 示例 3:带检查的完整工作流
```bash
#!/bin/bash
# Generate voice
VOICE=$(/home/bernie/clawd/skills/clonev/scripts/clonev.sh "Task completed!" "/path/to/sample.wav" en)
# Verify file was created
if [ -f "$VOICE" ]; then
echo "Success! Voice file: $VOICE"
ls -lh "$VOICE"
else
echo "Error: Voice file not created"
fi
```
---
## 常用语言代码
| 代码 | 语言 | 示例用法 |
|------|----------|---------------|
| `en` | 英语 | `scripts/clonev.sh "Hello" sample.wav en` |
| `cs` | 捷克语 | `scripts/clonev.sh "Ahoj" sample.wav cs` |
| `de` | 德语 | `scripts/clonev.sh "Hallo" sample.wav de` |
| `fr` | 法语 | `scripts/clonev.sh "Bonjour" sample.wav fr` |
| `es` | 西班牙语 | `scripts/clonev.sh "Hola" sample.wav es` |
完整列表:en, cs, de, fr, es, it, pl, pt, tr, ru, nl, ar, zh, ja, hu, ko
---
## 语音样本要求
- **格式**:WAV 文件
- **时长**:6-30 秒(最佳:10-15 秒)
- **质量**:音频清晰,无背景噪声
- **内容**:任何语音(实际说的内容无所谓)
**好的样本**:
- ✅ 某人清晰说话的录音
- ✅ 背景没有音乐或噪声
- ✅ 音量一致
**差的样本**:
- ❌ 音乐或歌曲
- ❌ 强烈的背景噪声
- ❌ 太短(< 6 秒)
- ❌ 太长(> 30 秒)
---
## ⚠️ 重要注意事项
### 模型下载
- 首次使用会下载约 1.87GB 的模型(一次性)
- 模型存储在:`/mnt/c/TEMP/Docker-containers/coqui-tts/models-xtts/`
- 状态:✅ 已下载
### 处理时间
- 根据文本长度需要 20-40 秒
- 这是正常的——语音克隆是计算密集型的
---
##