comfyui-tts

TotalClaw 作者 totalclaw

使用 ComfyUI Qwen-TTS 服务生成语音音频。当用户需要通过 ComfyUI 进行文本到语音转换或生成语音时调用。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~yhsi5358-comfyui-tts

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~yhsi5358-comfyui-tts/file -o yhsi5358-comfyui-tts.md

# ComfyUI TTS Skill

Generate speech audio using ComfyUI's Qwen-TTS service. This skill allows you to convert text to speech through ComfyUI's API.

## Configuration

### Environment Variables

Set these environment variables to configure the ComfyUI connection:

```bash
export COMFYUI_HOST="localhost"      # ComfyUI server host
export COMFYUI_PORT="8188"           # ComfyUI server port
export COMFYUI_OUTPUT_DIR=""         # Optional: Custom output directory
```

## Usage

### Basic Text-to-Speech

Generate audio from text using default settings:

```bash
scripts/tts.sh "你好，世界"
```

### Advanced Options

Customize voice characteristics:

```bash
# Specify character and style
scripts/tts.sh "你好" --character "Girl" --style "Emotional"

# Change model size
scripts/tts.sh "你好" --model "3B"

# Specify output file
scripts/tts.sh "你好" --output "/path/to/output.wav"

# Combine options
scripts/tts.sh "你好，这是测试" \
  --character "Girl" \
  --style "Emotional" \
  --model "1.7B" \
  --output "~/audio/test.wav"
```

### Available Options

| Option | Description | Default |
|--------|-------------|---------|
| `--character` | Voice character (Girl/Boy/etc.) | "Girl" |
| `--style` | Speaking style (Emotional/Neutral/etc.) | "Emotional" |
| `--model` | Model size (0.5B/1.7B/3B) | "1.7B" |
| `--output` | Output file path | Auto-generated |
| `--temperature` | Generation temperature (0-1) | 0.9 |
| `--top-p` | Top-p sampling | 0.9 |
| `--top-k` | Top-k sampling | 50 |

## Workflow

The skill performs these steps:

1. **Construct Workflow**: Builds a ComfyUI workflow JSON with your text and settings
2. **Submit Job**: Sends the workflow to ComfyUI's `/prompt` endpoint
3. **Poll Status**: Monitors job completion via `/history` endpoint
4. **Retrieve Audio**: Returns the path to the generated audio file

## Troubleshooting

### Connection Refused

- Verify ComfyUI is running: `curl http://$COMFYUI_HOST:$COMFYUI_PORT/system_stats`
- Check host and port settings

### Job Timeout

- Large models (3B) take longer to generate
- Try smaller models (0.5B, 1.7B) for faster results

### Output Not Found

- Check ComfyUI's output directory configuration
- Verify file permissions

## API Reference

The skill uses ComfyUI's native API endpoints:

- `POST /prompt` - Submit workflow
- `GET /history` - Check job status
- Output files are saved to ComfyUI's configured output directory

---

## 中文说明

# ComfyUI TTS Skill

使用 ComfyUI 的 Qwen-TTS 服务生成语音音频。此技能允许你通过 ComfyUI 的 API 将文本转换为语音。

## 配置

### 环境变量

设置以下环境变量以配置 ComfyUI 连接：

```bash
export COMFYUI_HOST="localhost"      # ComfyUI server host
export COMFYUI_PORT="8188"           # ComfyUI server port
export COMFYUI_OUTPUT_DIR=""         # Optional: Custom output directory
```

## 用法

### 基础文本转语音

使用默认设置从文本生成音频：

```bash
scripts/tts.sh "你好，世界"
```

### 高级选项

自定义语音特性：

```bash
# Specify character and style
scripts/tts.sh "你好" --character "Girl" --style "Emotional"

# Change model size
scripts/tts.sh "你好" --model "3B"

# Specify output file
scripts/tts.sh "你好" --output "/path/to/output.wav"

# Combine options
scripts/tts.sh "你好，这是测试" \
  --character "Girl" \
  --style "Emotional" \
  --model "1.7B" \
  --output "~/audio/test.wav"
```

### 可用选项

| 选项 | 说明 | 默认值 |
|--------|-------------|---------|
| `--character` | 语音角色 (Girl/Boy/etc.) | "Girl" |
| `--style` | 说话风格 (Emotional/Neutral/etc.) | "Emotional" |
| `--model` | 模型大小 (0.5B/1.7B/3B) | "1.7B" |
| `--output` | 输出文件路径 | 自动生成 |
| `--temperature` | 生成温度 (0-1) | 0.9 |
| `--top-p` | Top-p 采样 | 0.9 |
| `--top-k` | Top-k 采样 | 50 |

## 工作流程

此技能执行以下步骤：

1. **构建工作流 (Construct Workflow)**：使用你的文本和设置构建 ComfyUI 工作流 JSON
2. **提交任务 (Submit Job)**：将工作流发送到 ComfyUI 的 `/prompt` 端点
3. **轮询状态 (Poll Status)**：通过 `/history` 端点监控任务完成情况
4. **获取音频 (Retrieve Audio)**：返回生成的音频文件路径

## 故障排查

### 连接被拒绝

- 确认 ComfyUI 正在运行：`curl http://$COMFYUI_HOST:$COMFYUI_PORT/system_stats`
- 检查主机和端口设置

### 任务超时

- 大型模型 (3B) 生成时间较长
- 尝试较小的模型 (0.5B、1.7B) 以获得更快的结果

### 找不到输出

- 检查 ComfyUI 的输出目录配置
- 验证文件权限

## API 参考

此技能使用 ComfyUI 的原生 API 端点：

- `POST /prompt` - 提交工作流
- `GET /history` - 检查任务状态
- 输出文件保存到 ComfyUI 配置的输出目录