vidu-skills

GitHub 作者 LeoYeAI/openclaw-master-skills v1.0.4

Generate video and images by calling the official Vidu API with curl. Use when the user wants text-to-image (文生图), text-to-video (文生视频), image-to-video (图生视频), head-tail-image-to-video (首尾帧生视频), reference-to-image (参考生图), reference-to-video (参考生视频), Create References (创建参考资料), or to submit or check Vidu tasks. Requires VIDU_TOKEN and optional VIDU_BASE_URL.

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install github:LeoYeAI~openclaw-master-skills~vidu-skill
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/github%3ALeoYeAI~openclaw-master-skills~vidu-skill/file -o vidu-skill.md
# Vidu Video and Image Generation Skill (Vidu 音视频/图像生成技能)

Generate AI videos and images with Vidu (生数) via direct API calls — text-to-image, text-to-video, image-to-video, start-end frame, reference-based generation, and material elements, up to 1080p/2K/4K. Use curl with VIDU_TOKEN.

## Execution model: use curl (direct API)

**All execution is done by calling the official Vidu API** with curl (or any HTTP client). Base URL: **$VIDU_BASE_URL** (default `https://service.vidu.cn` for mainland China; `https://service.vidu.com` for overseas).

**Required headers for all requests:**

| Header        | Value                               |
| ------------- | ----------------------------------- |
| Authorization | `Token $VIDU_TOKEN`                 |
| Content-Type  | `application/json`                  |
| User-Agent    | `viduclawbot/1.0 (+$VIDU_BASE_URL)` |

**Main endpoints:**

- **Create upload**: POST `$VIDU_BASE_URL/tools/v1/files/uploads` → get `put_url`, `id`
- **PUT image**: PUT raw image bytes to `put_url` → get ETag
- **Finish upload**: PUT `$VIDU_BASE_URL/tools/v1/files/uploads/{id}/finish` → get `ssupload:?id={id}`
- **Submit task**: POST `$VIDU_BASE_URL/vidu/v1/tasks` → get `task_id` (response `id`)
- **Get task result**: GET `$VIDU_BASE_URL/vidu/v1/tasks/{task_id}` → get `state`, `creations[].nomark_uri`
- **Task state (SSE)**: GET `$VIDU_BASE_URL/vidu/v1/tasks/state?id={task_id}` with `Accept: text/event-stream` — return SSE stream to the user; do not wait for terminal state. Events include `state`, `estimated_time_left`, `err_code`, **queue_wait_time** (排队预测时间, unit: minutes).
- **Pre-process reference**: POST `$VIDU_BASE_URL/vidu/v1/material/elements/pre-process`
- **Create reference**: POST `$VIDU_BASE_URL/vidu/v1/material/elements`
- **List elements**: GET `$VIDU_BASE_URL/vidu/v1/material/elements/personal`

---

## Key Capabilities

- **text-to-image (文生图)** — POST `/vidu/v1/tasks` with `type: "text2image"`, `input.prompts` (text only), `settings`. Resolution 1080p, 2K, 4K.
- **text-to-video (文生视频)** — POST `/vidu/v1/tasks` with `type: "text2video"`, `input.prompts` (text only), `settings`.
- **image-to-video (图生视频)** — Upload one image (Create upload → PUT → Finish) to get `ssupload:?id=...`; then POST `/vidu/v1/tasks` with `type: "img2video"`, prompts (text + image).
- **head-tail-image-to-video (首尾帧生视频)** — Upload two images; POST `/vidu/v1/tasks` with `type: "headtailimg2video"`, prompts (text + image1 + image2).
- **reference-to-image (参考生图)** — Image(s) + reference(s) + text (text required; image + reference combined at most 7). POST `/vidu/v1/tasks` with `type: "reference2image"`; Q2 only, do not send `transition`, `duration` is 0.
- **reference-to-video (参考生视频)** — Image(s) + reference(s) + text (text required; image + reference combined at most 7). POST `/vidu/v1/tasks` with `type: "character2video"`; Q3 or Q2, do not send `transition`.
- **Create References (创建主体)** — POST pre-process → POST material/elements (images must be uploaded first). Query list: GET `/vidu/v1/material/elements/personal`.
- **Query task (查询任务)** — GET `/vidu/v1/tasks/{task_id}` for result; or GET `/vidu/v1/tasks/state?id={task_id}` for SSE stream.

---

## Setup

1. Obtain a VIDU token (e.g. from the official Vidu console).
2. Set environment variables:
   - `export VIDU_TOKEN="your-token"` (required / 必填)
   - `export VIDU_BASE_URL=https://service.vidu.cn` (mainland China / 国内, default / 默认) or `https://service.vidu.com` (overseas / 海外)
3. **Dependency**: curl or any HTTP client that can send JSON and binary PUT. No Python or scripts required for execution.

---

## Data usage and privacy note

**IMPORTANT**: This skill sends user-provided data to Vidu’s servers:

- Text prompts → Vidu API
- Image bytes (uploaded files) → Vidu API servers (service.vidu.cn or service.vidu.com)
- Task parameters (settings, model version, etc.)

Before using this skill, confirm that sending your content to Vidu is acceptable for your privacy and intellectual property requirements. Data handling follows Vidu’s official policy.

**Security recommendations**:

- Create a token with limited scope if possible
- Avoid using production/privileged tokens for initial testing
- Review Vidu’s terms of service and privacy policy

**Vidu Terms & Privacy**:

- Overseas: https://www.vidu.com/terms
- Mainland China: https://www.vidu.cn/terms

---

## Overview

Vidu media generation is **asynchronous**: submit a task → get **task_id** → use task_id to **query** status/result (GET `/vidu/v1/tasks/{task_id}` or SSE `/vidu/v1/tasks/state?id=`) when needed.

- **text-to-image (文生图)**: Text only. Duration 0, model_version 3.1. aspect_ratio optional. resolution 1080p, 2K, 4K (default 2K).
- **text-to-video (文生视频)**: Text only. Q3 duration 1–16, aspect ratios 16:9/9:16/1:1/4:3/3:4, transition pro/speed; Q2 duration 2–8, do not send transition.
- **image-to-video (图生视频)**: **One image + one text**. Aspect ratio from input image (do not send aspect_ratio). Q3 duration 1–16, Q2 duration 2–8, transition pro/speed.
- **head-tail-image-to-video (首尾帧生视频)**: **Two images (start frame, end frame) + one text**. Q3 1–16s, Q2 2–8s, transition pro/speed.
- **reference-to-image (参考生图)**: **Image + reference + text** (combinations); **text required**. **Image + reference at most 7**, at least one. Q2 only, duration 0, reference via `type: "material"`.
- **reference-to-video (参考生视频)**: **Image + reference + text** (combinations); **text required**. **Image + reference at most 7**, at least one. Q3 duration 1–16, Q2 duration 2–8. Do **not** send transition. References in prompts via `type: "material"`, `material.id`, `material.version`.
- **Create References (创建主体)**: Upload 1–3 images, name and optional description; **must** call POST `/vidu/v1/material/elements/pre-process` first, then POST `/vidu/v1/material/elements`. Use pre-process `recaption` when description is omitted. Response includes element `id` and `version` for reference-to-video.
- **Search References (查询主体)**: GET `/vidu/v1/material/elements/personal` with `pager.page`, `pager.pagesz`, `keyword`, `modalities`; returns `elements[].id`, `version`.

See **Supported task list** below and **references/parameters.md**.

---

## Supported Task List

When building the POST `/vidu/v1/tasks` body, ensure the user’s request matches one of the supported task types and constraints below. All parameters are passed in the request **body** (type, input.prompts, settings). **references/parameters.md** has the same list for quick lookup.

**Model version**

- **Q3** → `model_version: "3.2"`
- **Q2** → `model_version: "3.1"`

| Task Type                               | type (API)        | Input                               | Model | Duration | Aspect Ratio              | Transition | Resolution  |
| --------------------------------------- | ----------------- | ----------------------------------- | ----- | -------- | ------------------------- | ---------- | ----------- |
| text-to-image (文生图)                  | text2image        | text only                           | Q2    | 0        | 4:3, 3:4, 1:1, 9:16, 16:9 | —          | 1080p/2K/4K |
| text-to-video (文生视频)                | text2video        | text only                           | Q3    | 1–16s    | 16:9, 9:16, 1:1, 4:3, 3:4 | pro, speed | 1080p       |
| text-to-video (文生视频)                | text2video        | text only                           | Q2    | 2–8s     | 16:9, 9:16, 1:1, 4:3, 3:4 | —          | 1080p       |
| image-to-video (图生视频)               | img2video         | 1 image + text                      | Q3    | 1–16s    | from image                | pro, speed | 1080p       |
| image-to-video (图生视频)               | img2video         | 1 image + text                      | Q2    | 2–8s     | from image                | pro, speed | 1080p       |
| head-tail-image-to-video (首尾帧生视频) | headtailimg2video | 2 images + text                     | Q3    | 1–16s    | —                         | pro