Listenhub
Explain anything — turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video", "read this aloud", "generate an image", or share knowledge in audio/visual form. Supports: topic descriptions, YouTube links, article URLs, plain text, and image prompts.
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install clawskills:boxingyi~listenhubcURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aboxingyi~listenhub/file -o listenhub.mdGit 仓库获取源码
git clone https://github.com/openclaw/skills/commit/1ffaa0a28d1635e8816abb517544aa1fda43ecc3<purpose>
**The Hook**: Paste content, get audio/video/image. That simple.
Four modes, one entry point:
- **Podcast** — Two-person dialogue, ideal for deep discussions
- **Explain** — Single narrator + AI visuals, ideal for product intros
- **TTS/Flow Speech** — Pure voice reading, ideal for articles
- **Image Generation** — AI image creation, ideal for creative visualization
Users don't need to remember APIs, modes, or parameters. Just say what you want.
</purpose>
<instructions>
## ⛔ Hard Constraints (Inviolable)
**The scripts are the ONLY interface. Period.**
```
┌─────────────────────────────────────────────────────────┐
│ AI Agent ──▶ ./scripts/*.sh ──▶ ListenHub API │
│ ▲ │
│ │ │
│ This is the ONLY path. │
│ Direct API calls are FORBIDDEN. │
└─────────────────────────────────────────────────────────┘
```
**MUST**:
- Execute functionality ONLY through provided scripts in `**/skills/listenhub/scripts/`
- Pass user intent as script arguments exactly as documented
- Trust script outputs; do not second-guess internal logic
**MUST NOT**:
- Write curl commands to ListenHub/Marswave API directly
- Construct JSON bodies for API calls manually
- Guess or fabricate speakerIds, endpoints, or API parameters
- Assume API structure based on patterns or web searches
- Hallucinate features not exposed by existing scripts
**Why**: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code.
## Script Location
Scripts are located at `**/skills/listenhub/scripts/` relative to your working context.
Different AI clients use different dot-directories:
- Claude Code: `.claude/skills/listenhub/scripts/`
- Other clients: may vary (`.cursor/`, `.windsurf/`, etc.)
**Resolution**: Use glob pattern `**/skills/listenhub/scripts/*.sh` to locate scripts reliably, or resolve from the SKILL.md file's own path.
## Private Data (Cannot Be Searched)
The following are **internal implementation details** that AI cannot reliably know:
| Category | Examples | How to Obtain |
|----------|----------|---------------|
| API Base URL | `api.marswave.ai/...` | ✗ Cannot — internal to scripts |
| Endpoints | `podcast/episodes`, etc. | ✗ Cannot — internal to scripts |
| Speaker IDs | `cozy-man-english`, etc. | ✓ Call `get-speakers.sh` |
| Request schemas | JSON body structure | ✗ Cannot — internal to scripts |
| Response formats | Episode ID, status codes | ✓ Documented per script |
**Rule**: If information is not in this SKILL.md or retrievable via a script (like `get-speakers.sh`), assume you don't know it.
## Design Philosophy
**Hide complexity, reveal magic.**
Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences.
Users only need: Say idea → wait a moment → get the link.
## Environment
### ListenHub API Key
API key stored in `$LISTENHUB_API_KEY`. Check on first use:
```bash
source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup"
```
If setup needed, guide user:
1. Visit https://listenhub.ai/settings/api-keys
2. Paste key (only the `lh_sk_...` part)
3. Auto-save to ~/.zshrc
### Image Generation API Key
Image generation uses the same ListenHub API key stored in `$LISTENHUB_API_KEY`.
Image generation output path defaults to the user downloads directory, stored in `$LISTENHUB_OUTPUT_DIR`.
On first image generation, the script auto-guides configuration:
1. Visit https://listenhub.ai/settings/api-keys (requires subscription)
2. Paste API key
3. Configure output path (default: ~/Downloads)
4. Auto-save to shell rc file
**Security**: Never expose full API keys in output.
## Mode Detection
Auto-detect mode from user input:
**→ Podcast (1-2 speakers)**
Supports single-speaker or dual-speaker podcasts. Debate mode requires 2 speakers.
Default mode: `quick` unless explicitly requested.
If speakers are not specified, call `get-speakers.sh` and select the first `speakerId`
matching the chosen `language`.
If reference materials are provided, pass them as `--source-url` or `--source-text`.
When the user only provides a topic (e.g., "I want a podcast about X"), proceed with:
1) detect `language` from user input,
2) set `mode=quick`,
3) choose one speaker via `get-speakers.sh` matching the language,
4) create a single-speaker podcast without further clarification.
1. Keywords: "podcast", "chat about", "discuss", "debate", "dialogue"
2. Use case: Topic exploration, opinion exchange, deep analysis
- Feature: Two voices, interactive feel
**→ Explain (Explainer video)**
- Keywords: "explain", "introduce", "video", "explainer", "tutorial"
- Use case: Product intro, concept explanation, tutorials
- Feature: Single narrator + AI-generated visuals, can export video
**→ TTS (Text-to-speech)**
TTS defaults to FlowSpeech `direct` for single-pass text or URL narration.
Script arrays and multi-speaker dialogue belong to Speech as an advanced path, not the default TTS entry.
Text-to-speech input is limited to 10,000 characters; split or use a URL when longer.
1. Keywords: "read aloud", "convert to speech", "tts", "voice"
2. Use case: Article to audio, note review, document narration
3. Feature: Fastest (1-2 min), pure audio
### Ambiguous "Convert to speech" Guidance
When the request is ambiguous (e.g., "convert to speech", "read aloud"), apply:
1. Default to FlowSpeech and prioritize `direct` to avoid altering content.
2. Input type: URL uses `type=url`, plain text uses `type=text`.
3. Speaker: if not specified, call `get-speakers` and pick the first `speakerId` matching `language`.
4. Switch to Speech only when multi-line scripts or multi-speaker dialogue is explicitly requested, and require `scripts`.
Example guidance:
“This request can use FlowSpeech with the default direct mode; switch to smart for grammar and punctuation fixes. For per-line speaker assignment, provide scripts and switch to Speech.”
**→ Image Generation**
- Keywords: "generate image", "draw", "create picture", "visualize"
- Use case: Creative visualization, concept art, illustrations
- Feature: AI image generation via Labnana API, multiple resolutions and aspect ratios
**Reference Images via Image Hosts**
When reference images are local files, upload to a known image host and use the direct image URL in `--reference-images`.
Recommended hosts: `imgbb.com`, `sm.ms`, `postimages.org`, `imgur.com`.
Direct image URLs should end with `.jpg`, `.png`, `.webp`, or `.gif`.
**Default**: If unclear, ask user which format they prefer.
**Explicit override**: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection.
## Interaction Flow
### Step 1: Receive input + detect mode
```
→ Got it! Preparing...
Mode: Two-person podcast
Topic: Latest developments in Manus AI
```
For URLs, identify type:
- `youtu.be/XXX` → convert to `https://www.youtube.com/watch?v=XXX`
- Other URLs → use directly
### Step 2: Submit generation
```
→ Generation submitted
Estimated time:
• Podcast: 2-3 minutes
• Explain: 3-5 minutes
• TTS: 1-2 minutes
You can:
• Wait and ask "done yet?"
• Use check-status via scripts
• View outputs in product pages:
- Podcast: https://listenhub.ai/app/podcast
- Explain: https://listenhub.ai/app/explainer
- Text-to-Speech: https://listenhub.ai/app/text-to-speech
• Do other things, ask later
```
Internally remember Episode ID for status queries.
### Step 3: Query status
When user says "done yet?" / "ready?" / "check status":
- **Success**: Show result + next options
- **Processing**: "Still generating, wait another minute?"
- **Failed**: "Generation failed, content might be unparseable. Try another?"
### S