IMA Studio TTS
Use when generating speech from text (text-to-speech) via IMA Open API. Use for: voice synthesis, TTS,朗读, 语音合成, 配音, 有声内容. Output: audio URL (mp3/wav). Flow: query products → create task → poll until done. Requires IMA API key. This skill targets seed-tts-2.0 only (seed-tts-1.1 is not supported). Default model is seed-tts-2.0.
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install github:LeoYeAI~openclaw-master-skills~ima-tts-aicURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/github%3ALeoYeAI~openclaw-master-skills~ima-tts-ai/file -o ima-tts-ai.md# IMA TTS (Text-to-Speech)
## Overview
Call IMA Open API to create **text-to-speech** audio. Same flow as other IMA creation skills: **query products → create task → poll until done**. Task type is `text_to_speech`. **This skill targets seed-tts-2.0 only** — seed-tts-1.1 is not supported; the script defaults to `seed-tts-2.0` when no model is specified.
## ⚙️ How This Skill Works
This skill uses a bundled Python script `scripts/ima_tts_create.py` to call the IMA Open API:
- Sends **text (prompt)** to `https://api.imastudio.com`
- Uses `--user-id` only locally for preference storage
- Returns an **audio URL** when synthesis is complete
- **Reflection mechanism**: on create failure, retries up to 3 times with parameter adjustments
**What gets sent to IMA:** prompt (text to speak), model selection, parameters (e.g. voice_id, speed). **Not sent:** API key in prompt body; user_id is local only.
### Agent Execution
Use the bundled script:
```bash
# List available TTS models (optional; default is seed-tts-2.0)
python3 {baseDir}/scripts/ima_tts_create.py --api-key $IMA_API_KEY --list-models
# Generate speech (default model: seed-tts-2.0; omit --model-id to use default)
python3 {baseDir}/scripts/ima_tts_create.py \
--api-key $IMA_API_KEY \
--model-id seed-tts-2.0 \
--prompt "Text to be spoken here." \
--user-id {user_id} \
--output-json
```
Script outputs JSON; parse it for `url` and pass to the user via the UX protocol below.
---
## Environment
Base URL: `https://api.imastudio.com`
| Header | Required | Value |
|--------|----------|-------|
| `Authorization` | ✅ | `Bearer ima_your_api_key_here` |
| `x-app-source` | ✅ | `ima_skills` |
| `x_app_language` | recommended | `en` / `zh` |
---
## ⚠️ MANDATORY: Always Query Product List First
You **MUST** call `/open/v1/product/list` with `category=text_to_speech` before creating any task. `attribute_id` is required; if 0 or missing → `"Invalid product attribute"` and task fails.
```python
GET /open/v1/product/list?app=ima&platform=web&category=text_to_speech
```
Then traverse the V2 tree: `type=2` = model groups, `type=3` = versions (leaves). Only `type=3` nodes have `credit_rules` and `form_config`. Use a leaf’s `model_id`, `id` (= model_version), and `credit_rules[0].attribute_id` / `points` for create.
---
## Core Flow
```
1. GET /open/v1/product/list?app=ima&platform=web&category=text_to_speech
→ Get attribute_id, credit, model_version, form_config
2. POST /open/v1/tasks/create
→ task_type: "text_to_speech", parameters[].parameters.prompt = text to speak
3. POST /open/v1/tasks/detail { "task_id": "..." }
→ Poll every 2–5s until medias[].resource_status == 1 and status != "failed"
→ Read medias[].url (and optional duration_str, format)
```
---
## Task Detail API — Actual Response Shape
Poll `POST /open/v1/tasks/detail` until completion. Response uses the same structure as other IMA audio tasks:
| Field | Type | Meaning |
|-------|------|--------|
| `resource_status` | int or null | 0=处理中, 1=可用, 2=失败, 3=已删除;null 视为 0 |
| `status` | string | "pending" / "processing" / "success" / "failed" |
| `url` | string | Audio URL when resource_status=1 (mp3/wav) |
| `duration_str` | string | Optional, e.g. "30s" |
| `format` | string | Optional, e.g. "mp3", "wav" |
**Completed success example:**
```json
{
"id": "task_xxx",
"medias": [{
"resource_status": 1,
"status": "success",
"url": "https://cdn.../output.mp3",
"duration_str": "12s",
"format": "mp3"
}]
}
```
**Rules:**
- Treat `resource_status: null` as 0 (processing).
- Success only when **all** medias have `resource_status == 1` and `status != "failed"`.
- On `resource_status == 2` or `status == "failed"`, stop and handle error (e.g. use `error_msg` / `remark`).
---
## API 2: Create Task
```
POST /open/v1/tasks/create
```
**text_to_speech** — no image input. `src_img_url: []`, `input_images: []`.
```json
{
"task_type": "text_to_speech",
"enable_multi_model": false,
"src_img_url": [],
"parameters": [{
"attribute_id": "<from credit_rules>",
"model_id": "<model_id>",
"model_name": "<model_name>",
"model_version": "<version_id>",
"app": "ima",
"platform": "web",
"category": "text_to_speech",
"credit": "<points>",
"parameters": {
"prompt": "Text to be spoken.",
"n": 1,
"input_images": [],
"cast": {"points": "<points>", "attribute_id": "<attribute_id>"}
}
}]
}
```
`prompt` must be inside `parameters[].parameters`, not at top level. Extra fields (e.g. voice_id, speed) come from product `form_config`; include only those present in the product’s credit_rules/form_config.
Response: `data.id` = task_id for polling.
---
## Supported Task Type & Models
| category | Capability | Input |
|----------|------------|-------|
| `text_to_speech` | Text → Speech | prompt (text to speak) |
**Models:** This skill supports **seed-tts-2.0** only (seed-tts-1.1 is not supported). The script defaults to `--model-id seed-tts-2.0` when none is provided. For current `attribute_id` and `credit`, the script reads from the product list at runtime.
### seed-tts-2.0 — Verified request parameters
The following `parameters[].parameters` shape has been verified to work for **seed-tts-2.0** (attribute_id/credit come from product list and may differ by app/platform):
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `prompt` | string | ✅ | Text to speak (合成文本). |
| `n` | int | ✅ | Usually 1. |
| `model` | string | ✅ | Sub-model: `seed-tts-2.0-expressive` (default) or `seed-tts-2.0-standard`. |
| `speaker` | string | optional | Speaker ID / 发音人,e.g. `zh_male_sophie_uranus_bigtts`([音色列表 1257544](https://www.volcengine.com/docs/6561/1257544) 中原生 voice_type). **注意:** 使用原生格式(如 `zh_male_*_uranus_bigtts`),不支持 `BV*_streaming` 格式。 |
| `audio_params` | object | optional | `emotion`(情感)、`speech_rate`(语速 [-50,100])、`loudness_rate`(音量 [-50,100])等,见 [1598757 请求 Body](https://www.volcengine.com/docs/6561/1598757?lang=zh). |
| `additions` | object | optional | e.g. `{"explicit_language": "crosslingual", "context_texts": []}`. |
| `cast` | object | ✅ | `{"points": <credit>, "attribute_id": <attribute_id>}` from product list. |
**Script example with extra params:**
```bash
python3 ima_tts_create.py --api-key $IMA_API_KEY --model-id seed-tts-2.0 \
--prompt "阳光青年音色测试,你好世界。" \
--extra-params '{"model":"seed-tts-2.0-expressive","speaker":"zh_male_sophie_uranus_bigtts","audio_params":{"emotion":"neutral"},"additions":{"explicit_language":"crosslingual","context_texts":[]}}' \
--output-json
```
**Note:** The script gets `attribute_id` and `credit` from the product list (e.g. `app=ima&platform=web` → often 2 pts / attribute_id 4419 for seed-tts-2.0). If you have a different app/platform (e.g. webAgent), the product list may return different credit_rules (e.g. 5 pts / attribute_id 8987); the script uses whatever the product list returns for the chosen model.
**Speaker / 音色列表(seed-tts-2.0 兼容火山引擎音色):** 完整音色 ID 与场景分类见项目内 `volcengine_tts_timbre_list.json`。该文件来自 [火山引擎豆包语音合成音色列表](https://www.volcengine.com/docs/6561/1257544),使用原生 `voice_type` 格式(如 `zh_male_sophie_uranus_bigtts` 魅力苏菲、`zh_female_vv_uranus_bigtts` Vivi)。**⚠️ 注意:** IMA API 只支持原生格式(`*_uranus_bigtts` 系列),不支持 `BV*_streaming` 豆包音色 ID。
**与火山引擎 2.0 文档对照:** 上述参数与 [HTTP Chunked/SSE 单向流式 V3 请求 Body](https://www.volcengine.com/docs/6561/1598757?lang=zh) 一致:`req_params.text` → prompt,`req_params.speaker` → speaker(必填项),`req_params.model` → model(expressive/standard),`req_params.audio_params`(emotion、speech_rate、loudness_rate 等),`req_params.additions`(如 explicit_language)。2.0 能力说明见 [豆包语音合成2.0能力介绍](https://www.volcengine.com/docs/6561/1871062?lang=zh)(语音指令、引用上文、语音标签等)。
---
## 🎤 当用户说「帮我制作旁白/配音」时如何询问
当用户表达「帮我制作旁白」「做一段配音」「把这段文字读出来」等意图时,**必须先收集关键信息再调用脚本**,避免缺参或盲目默认。
### 必问
| 询问项 | 对应参数 | 说明 |
|--------|-------