image-gen

ClawSkills 作者 clawskills

Generate images using multiple AI models — Midjourney (via Legnext.ai), Flux, Nano Banana Pro (Gemini), Ideogram, Recraft, and more via fal.ai. Intelligently routes to the best model based on use case.

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install clawskills:clawskills~wells1137-image-gen

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aclawskills~wells1137-image-gen/file -o wells1137-image-gen.md

# Image Generation Skill

This skill generates images using the best AI model for each use case. **Model selection is the most important decision** — read the dispatch logic carefully before generating.

---

## 🧠 Intelligent Dispatch Logic

**Always select the model based on the user's actual need, not just the request surface.**

### Decision Tree

```
Does the request involve MULTIPLE images that share characters, scenes, or story continuity?
  ├─ YES → Use NANO BANANA (Gemini)
  │         Reason: Gemini understands context holistically; supports reference_images
  │         for character/scene consistency across a series (storyboard, comic, sequence)
  │
  └─ NO → Is it a SINGLE standalone image?
            ├─ Artistic / cinematic / painterly / highly detailed?
            │   → Use MIDJOURNEY
            │
            ├─ Photorealistic / portrait / product photo?
            │   → Use FLUX PRO
            │
            ├─ Contains TEXT (logo, poster, sign, infographic)?
            │   → Use IDEOGRAM
            │
            ├─ Vector / icon / flat design / brand asset?
            │   → Use RECRAFT
            │
            ├─ Quick draft / fast iteration (speed priority)?
            │   → Use FLUX SCHNELL (<2s)
            │
            └─ General purpose / balanced?
                → Use FLUX DEV
```

### Model Capability Matrix

| Model | ID | Artistic | Photorealism | Text | Context Continuity | Speed | Cost |
|---|---|---|---|---|---|---|---|
| **Midjourney** | `midjourney` | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | ❌ (no context) | ~30s | ~$0.05 |
| **Nano Banana Pro** | `nano-banana` | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ~20s | $0.15 |
| **Flux Pro** | `flux-pro` | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ❌ | ~5s | ~$0.05 |
| **Flux Dev** | `flux-dev` | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | ❌ | ~8s | ~$0.03 |
| **Flux Schnell** | `flux-schnell` | ⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ❌ | <2s | ~$0.003 |
| **Ideogram v3** | `ideogram` | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ❌ | ~10s | ~$0.08 |
| **Recraft v3** | `recraft` | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | ❌ | ~8s | ~$0.04 |
| **SDXL Lightning** | `sdxl` | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ❌ | ~3s | ~$0.01 |

### When to Use Nano Banana (Critical)

Use **Nano Banana** whenever the user's request involves:
- **Storyboard / 分镜图**: Multiple frames that tell a story with the same characters
- **Comic strip / 漫画**: Sequential panels with consistent characters
- **Character series**: Multiple images of the same person/character in different poses or scenes
- **Scene continuation**: "Now show the same girl in the forest" (referencing a previous image)
- **Style consistency**: A set of images that must share the same visual style/world

Nano Banana uses Google's Gemini 3 Pro multimodal architecture, which understands context holistically rather than keyword-matching. It supports up to 14 reference images for maintaining character and scene consistency.

---

## How to Use This Skill

1. **Analyze the request**: Is it a single image or a series? Does it need context continuity?
2. **Select model**: Use the decision tree above.
3. **Enhance the prompt**: Add style, lighting, and quality descriptors appropriate for the model.
4. **Inform the user**: Tell them which model you're using and why, and that generation has started.
5. **Run the script**: Use `exec` tool with sufficient timeout.
6. **Deliver the result**: Send image URL(s) to the user.

---

## Calling the Generation Script

```bash
node {baseDir}/generate.js \
  --model <model_id> \
  --prompt "<enhanced prompt>" \
  [--aspect-ratio <ratio>] \
  [--num-images <1-4>] \
  [--negative-prompt "<negative prompt>"] \
  [--reference-images "<url1,url2,...>"]
```

**Parameters:**
- `--model`: One of `midjourney`, `flux-pro`, `flux-dev`, `flux-schnell`, `sdxl`, `nano-banana`, `ideogram`, `recraft`
- `--prompt`: The image generation prompt (required)
- `--aspect-ratio`: e.g. `16:9`, `1:1`, `9:16`, `4:3`, `3:4` (default: `1:1`)
- `--num-images`: 1-4 (default: `1`; Midjourney always returns 4 regardless)
- `--negative-prompt`: Things to avoid (not supported by Midjourney)
- `--reference-images`: Comma-separated image URLs for context/character consistency (**Nano Banana only**)
- `--mode`: Midjourney speed: `turbo` (default, ~20-40s), `fast` (~30-60s), `relax` (free but slow)

**exec timeout**: Set at least **120 seconds** for Midjourney and Nano Banana; 30 seconds is sufficient for Flux Schnell.

---

## ⚡ Midjourney Workflow (Sync Mode — No --async)

Always use sync mode (no `--async`). The script waits internally until complete.

```bash
node {baseDir}/generate.js \
  --model midjourney \
  --prompt "<enhanced prompt>" \
  --aspect-ratio 16:9
```

### Understanding Midjourney Output

```json
{
  "success": true,
  "model": "midjourney",
  "jobId": "xxxxxxxx-...",
  "imageUrl": "https://cdn.legnext.ai/temp/....png",
  "imageUrls": [
    "https://cdn.legnext.ai/mj/xxxx_0.png",
    "https://cdn.legnext.ai/mj/xxxx_1.png",
    "https://cdn.legnext.ai/mj/xxxx_2.png",
    "https://cdn.legnext.ai/mj/xxxx_3.png"
  ]
}
```

**CRITICAL — image field meanings:**

| Field | What it is | When to use |
|---|---|---|
| `imageUrl` | A **2×2 grid composite** of all 4 images | Send as **preview** so user can see all options |
| `imageUrls[0]` | Image 1 (top-left) | Send when user wants image 1 |
| `imageUrls[1]` | Image 2 (top-right) | Send when user wants image 2 |
| `imageUrls[2]` | Image 3 (bottom-left) | Send when user wants image 3 |
| `imageUrls[3]` | Image 4 (bottom-right) | Send when user wants image 4 |

**"放大第N张" / "要第N张" / "give me image N" = send `imageUrls[N-1]` directly. Do NOT call generate.js again.**

### Midjourney Interaction Flow

**After generation:**
> 🎨 生成完成！这是 4 张图的预览：
> [预览图](imageUrl)
> 你喜欢哪一张？回复 1、2、3 或 4，我直接发给你高清单图。

**When user picks image N:**
> 这是第 N 张的单独高清图：
> [图片 N](imageUrls[N-1])

---

## 🤖 Nano Banana (Gemini) Workflow

Use for storyboards, character series, and any context-dependent multi-image generation.

### Single image (no reference)
```bash
node {baseDir}/generate.js \
  --model nano-banana \
  --prompt "<detailed scene description>" \
  --aspect-ratio 16:9
```

### With reference images (character/scene consistency)
```bash
node {baseDir}/generate.js \
  --model nano-banana \
  --prompt "<scene description, referencing the character/style from the reference images>" \
  --aspect-ratio 16:9 \
  --reference-images "https://url-of-previous-image-1.png,https://url-of-previous-image-2.png"
```

**How to build a storyboard series:**

1. Generate the **first frame** without reference images (establishes the character/scene)
2. Use the first frame's URL as `--reference-images` for the **second frame**
3. For subsequent frames, use the most recent 1-3 images as references to maintain consistency
4. Keep the character description consistent across all prompts

**Example storyboard workflow:**
```
Frame 1: node generate.js --model nano-banana --prompt "A young girl with red hair, wearing a blue dress, sitting under a magical treehouse in an enchanted forest, warm golden light, storybook illustration style" --aspect-ratio 16:9

Frame 2: node generate.js --model nano-banana --prompt "The same red-haired girl in blue dress climbing the rope ladder up to the treehouse, excited expression, enchanted forest background, same storybook illustration style" --aspect-ratio 16:9 --reference-images "<frame1_url>"

Frame 3: node generate.js --model nano-banana --prompt "Inside the magical treehouse, the red-haired girl discovers a glowing book on a wooden shelf, wonder on her face, warm candlelight, same storybook illustration style" --aspect-ratio 16:9 --reference-images "<frame1_url>,<frame2_url>"
```

### Nano Banana Output
```json
{
  "success": true,
  "model": "nano-banana",
  "images": ["https://v3b.fal.media/files/...png"],
  "imageUrl": "https://v3b.fal.media/files/...png"
}
```
Send `imageUrl` directly to the user (no grid, single image).

---

## Other Models

### Flux Pro / Dev / Schnell
Best for photorealistic standalone images. Output format same as Nano Banana (s