Listenhub

ClawSkills 作者 0xfango v0.1.1

Explain anything — turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video", "read this aloud", "generate an image", or share knowledge in audio/visual form. Supports: topic descriptions, YouTube links, article URLs, plain text, and image prompts.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install clawskills:0xfango~listenhub-official-skills

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/clawskills%3A0xfango~listenhub-official-skills/file -o listenhub-official-skills.md

Git 仓库获取源码

git clone https://github.com/openclaw/skills/commit/c4bb1652796392bda3f1b523c0c8a86591ae03fa

<purpose>
**The Hook**: Paste content, get audio/video/image. That simple.

Four modes, one entry point:
- **Podcast** — Two-person dialogue, ideal for deep discussions
- **Explain** — Single narrator + AI visuals, ideal for product intros
- **TTS/Flow Speech** — Pure voice reading, ideal for articles
- **Image Generation** — AI image creation, ideal for creative visualization

Users don't need to remember APIs, modes, or parameters. Just say what you want.
</purpose>

<instructions>

## ⛔ Hard Constraints (Inviolable)

**The scripts are the ONLY interface. Period.**

```
┌─────────────────────────────────────────────────────────┐
│  AI Agent  ──▶  ./scripts/*.sh  ──▶  ListenHub API     │
│                      ▲                                  │
│                      │                                  │
│            This is the ONLY path.                       │
│            Direct API calls are FORBIDDEN.              │
└─────────────────────────────────────────────────────────┘
```

**MUST**:
- Execute functionality ONLY through provided scripts in `**/skills/listenhub/scripts/`
- Pass user intent as script arguments exactly as documented
- Trust script outputs; do not second-guess internal logic

**MUST NOT**:
- Write curl commands to ListenHub/Marswave API directly
- Construct JSON bodies for API calls manually
- Guess or fabricate speakerIds, endpoints, or API parameters
- Assume API structure based on patterns or web searches
- Hallucinate features not exposed by existing scripts

**Why**: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code.

## Script Location

Scripts are located at `**/skills/listenhub/scripts/` relative to your working context.

Different AI clients use different dot-directories:
- Claude Code: `.claude/skills/listenhub/scripts/`
- Other clients: may vary (`.cursor/`, `.windsurf/`, etc.)

**Resolution**: Use glob pattern `**/skills/listenhub/scripts/*.sh` to locate scripts reliably, or resolve from the SKILL.md file's own path.

## Private Data (Cannot Be Searched)

The following are **internal implementation details** that AI cannot reliably know:

| Category | Examples | How to Obtain |
|----------|----------|---------------|
| API Base URL | `api.marswave.ai/...` | ✗ Cannot — internal to scripts |
| Endpoints | `podcast/episodes`, etc. | ✗ Cannot — internal to scripts |
| Speaker IDs | `cozy-man-english`, etc. | ✓ Call `get-speakers.sh` |
| Request schemas | JSON body structure | ✗ Cannot — internal to scripts |
| Response formats | Episode ID, status codes | ✓ Documented per script |

**Rule**: If information is not in this SKILL.md or retrievable via a script (like `get-speakers.sh`), assume you don't know it.

## Design Philosophy

**Hide complexity, reveal magic.**

Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences.
Users only need: Say idea → wait a moment → get the link.

## Security

- User-provided content (text, URLs) is transmitted to the ListenHub API (`api.marswave.ai`) for processing. Do not pass sensitive or confidential information as input.
- The `--source-url` parameter accepts external URLs whose content is fetched and processed by the backend. Only use trusted URLs.
- API keys are stored locally in environment variables and transmitted via HTTPS. Never log or display full API keys.
- Version checks connect to `raw.githubusercontent.com` (read-only, no code execution). Set `LISTENHUB_SKIP_VERSION_CHECK=1` to disable.

## Environment

### ListenHub API Key

API key stored in `$LISTENHUB_API_KEY`. Check on first use:

```bash
source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"
```

If setup needed, guide user:
1. Visit https://listenhub.ai/settings/api-keys
2. Paste key (only the `lh_sk_...` part)
3. Auto-save to ~/.zshrc

### Image Generation API Key

Image generation uses the same ListenHub API key stored in `$LISTENHUB_API_KEY`.
Image generation output path defaults to the user downloads directory, stored in `$LISTENHUB_OUTPUT_DIR`.

On first image generation, the script auto-guides configuration:
1. Visit https://listenhub.ai/settings/api-keys (requires subscription)
2. Paste API key
3. Configure output path (default: ~/Downloads)
4. Auto-save to shell rc file

**Security**: Never expose full API keys in output.

## Mode Detection

Auto-detect mode from user input:

**→ Podcast (1-2 speakers)**
Supports single-speaker or dual-speaker podcasts. Debate mode requires 2 speakers.
Default mode: `quick` unless explicitly requested.
If speakers are not specified, call `get-speakers.sh` and select the first `speakerId`
matching the chosen `language`.
If reference materials are provided, pass them as `--source-url` or `--source-text`.
When the user only provides a topic (e.g., "I want a podcast about X"), proceed with:
1) detect `language` from user input,
2) set `mode=quick`,
3) choose one speaker via `get-speakers.sh` matching the language,
4) create a single-speaker podcast without further clarification.
1. Keywords: "podcast", "chat about", "discuss", "debate", "dialogue"
2. Use case: Topic exploration, opinion exchange, deep analysis
- Feature: Two voices, interactive feel

**→ Explain (Explainer video)**
- Keywords: "explain", "introduce", "video", "explainer", "tutorial"
- Use case: Product intro, concept explanation, tutorials
- Feature: Single narrator + AI-generated visuals, can export video

**→ TTS (Text-to-speech)**
TTS defaults to FlowSpeech `direct` for single-pass text or URL narration.
Script arrays and multi-speaker dialogue belong to Speech as an advanced path, not the default TTS entry.
Text-to-speech input is limited to 10,000 characters; split or use a URL when longer.
1. Keywords: "read aloud", "convert to speech", "tts", "voice"
2. Use case: Article to audio, note review, document narration
3. Feature: Fastest (1-2 min), pure audio

### Ambiguous "Convert to speech" Guidance

When the request is ambiguous (e.g., "convert to speech", "read aloud"), apply:

1. Default to FlowSpeech and prioritize `direct` to avoid altering content.
2. Input type: URL uses `type=url`, plain text uses `type=text`.
3. Speaker: if not specified, call `get-speakers` and pick the first `speakerId` matching `language`.
4. Switch to Speech only when multi-line scripts or multi-speaker dialogue is explicitly requested, and require `scripts`.

Example guidance:

“This request can use FlowSpeech with the default direct mode; switch to smart for grammar and punctuation fixes. For per-line speaker assignment, provide scripts and switch to Speech.”

**→ Image Generation**
- Keywords: "generate image", "draw", "create picture", "visualize"
- Use case: Creative visualization, concept art, illustrations
- Feature: AI image generation via Labnana API, multiple resolutions and aspect ratios

**Reference Images via Image Hosts**
When reference images are local files, upload to a known image host and use the direct image URL in `--reference-images`.
Recommended hosts: `imgbb.com`, `sm.ms`, `postimages.org`, `imgur.com`.
Direct image URLs should end with `.jpg`, `.png`, `.webp`, or `.gif`.

**Default**: If unclear, ask user which format they prefer.

**Explicit override**: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection.

## Interaction Flow

### Step 1: Receive input + detect mode

```
→ Got it! Preparing...
  Mode: Two-person podcast
  Topic: Latest developments in Manus AI
```

For URLs, identify type:
- `youtu.be/XXX` → convert to `https://www.youtube.com/watch?v=XXX`
- Other URLs → use directly

### Step 2: Submit generation

```
→ Generation submitted

  Estimated time:
  • Podcast: 2-3 minutes
  • Explain: 3-5 minutes
  • TTS: 1-2 minutes

  You can:
  • Wait and ask "done yet?"
  • Use check-s