ai-podcast-pipeline

ClawSkills 作者 clawskills v0.1.5

Create Korean AI podcast packages from QuickView trend notes. Use for dual-host script writing (Callie × Nick), Gemini multi-speaker TTS audio generation, subtitle timing/render fixes, thumbnail+MP4 packaging, and YouTube title/description output. Supports both full (15~20 min) and compressed (5~7 min) editions.

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install clawskills:clawskills~jeong-wooseok-ai-podcast-pipeline
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aclawskills~jeong-wooseok-ai-podcast-pipeline/file -o jeong-wooseok-ai-podcast-pipeline.md
# AI Podcast Pipeline

## ⚠️ Security Notice

This skill may trigger antivirus false positives due to legitimate use of:
- **base64 decoding**: Used ONLY to decode audio data from Gemini TTS API responses (standard practice for binary data in JSON)
- **subprocess calls**: Used ONLY to invoke ffmpeg for audio/video processing
- **Environment variables**: Reads API keys from user-configured environment (`GEMINI_API_KEY`)
- **Network requests**: Calls Google Gemini API for text-to-speech generation

All code is open source and auditable in this repository. No malicious behavior.

Build end-to-end podcast assets from `Trend/QuickView-*` content.

## Core Workflow

1. Select source QuickView file.
2. Generate script (full or compressed mode).
3. Build dual-voice MP3 (Gemini multi-speaker, chunked for reliability).
4. Generate full-text Korean subtitles (no ellipsis truncation).
5. Render subtitle MP4 with tuned font/size/timing shift.
6. Build thumbnail + YouTube metadata.
7. Deliver final package.

## Step 1) Select Source

Prefer weekly QuickView file from your configured Quartz root.

If user gives `wk.aiee.app` URL, map to local Quartz markdown first.

## Step 2) Generate Script

Read and apply:
- `references/podcast_prompt_template_ko.md`

Modes:
- **Full mode**: 15~20 minutes
- **Compressed mode**: 5~7 minutes (core tips only)

Rules:
- no system/meta text in spoken lines
- host intro once at opening only
- conversational Korean, short sentences, actionable
- save script in `archive/`

## Step 3) Build Audio (Gemini Multi-Speaker, Reliable)

### Preferred: chunked builder (timeout-safe)
```bash
# Set API key via environment (required)
export GEMINI_API_KEY="<YOUR_KEY>"

# Run from skills/ai-podcast-pipeline/
python3 scripts/build_dualvoice_audio.py \
  --input <script.txt> \
  --outdir <outdir> \
  --basename podcast_full_dualvoice \
  --chunk-lines 6
```

### Single-pass (short scripts)
```bash
python3 scripts/gemini_multispeaker_tts.py \
  --input-file <dialogue.txt> \
  --outdir <outdir> \
  --basename podcast_dualvoice \
  --retries 3 \
  --timeout-seconds 120
```

Default voice mapping (2026-02-10 fixed):
- Callie (female) → `Kore`
- Nick (male) → `Puck`

Output: MP3 (default delivery format)

## Step 4) Build Korean Subtitles (Full Text)

Use full-text subtitle builder (no `...` truncation):
```bash
python3 scripts/build_korean_srt.py \
  --script <script.txt> \
  --audio <final.mp3> \
  --output <outdir>/podcast.srt \
  --max-chars 22
```

## Step 5) Render Subtitled MP4 (Font + Timing)

Use renderer with adjustable font and timing shift:
```bash
python3 scripts/render_subtitled_video.py \
  --image <thumbnail.png> \
  --audio <final.mp3> \
  --srt <podcast.srt> \
  --output <outdir>/final.mp4 \
  --font-name "Do Hyeon" \
  --font-size 27 \
  --shift-ms -250
```

Notes:
- `shift-ms` negative = subtitle earlier (for lag fixes)
- If text clipping occurs, lower `font-size` (e.g., 25~27)
- keep text inside safe area; avoid overlap with character/object

## Step 6) Build Thumbnail + YouTube Metadata

```bash
# Set API key via environment (required)
export GEMINI_API_KEY="<YOUR_KEY>"

python3 scripts/build_podcast_assets.py \
  --source "<QuickView path or URL>"
```

Reference (layout/copy guardrails):
- `references/thumbnail_guidelines_ko.md`

## Step 7) Final Delivery Checklist

Always include:
1. source used
2. final MP3 path
3. subtitle MP4 path + size
4. thumbnail path
5. YouTube title options (3)
6. YouTube description

## Reliability Rules

- Gemini timeout on long input: use chunked builder (`build_dualvoice_audio.py`)
- Subtitle clipping: reduce font size and increase bottom margin
- Subtitle lag: adjust `--shift-ms` (usually `-150` to `-300`)
- Keep generated assets under Telegram practical limits

## Security Notes

- API keys must be passed via environment variables (`GEMINI_API_KEY`), not hardcoded.
- Never paste raw keys into prompts, logs, screenshots, or public posts.
- Recent hardening: thumbnail generation now passes keys via env (not CLI args).

## References

- `references/podcast_prompt_template_ko.md`
- `references/workflow_runbook.md`
- `references/thumbnail_guidelines_ko.md`