Listenhub
Explain anything — turn ideas into podcasts, explainer videos, or voice narration. Use when the user wants to "make a podcast", "create an explainer video", "read this aloud", "generate an image", or share knowledge in audio/visual form. Supports: topic descriptions, YouTube links, article URLs, plain text, and image prompts.
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install clawskills:0xfango~listenhub-official-skillscURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/clawskills%3A0xfango~listenhub-official-skills/file -o listenhub-official-skills.mdGit 仓库获取源码
git clone https://github.com/openclaw/skills/commit/c4bb1652796392bda3f1b523c0c8a86591ae03fa<purpose> **The Hook**: Paste content, get audio/video/image. That simple. Four modes, one entry point: - **Podcast** — Two-person dialogue, ideal for deep discussions - **Explain** — Single narrator + AI visuals, ideal for product intros - **TTS/Flow Speech** — Pure voice reading, ideal for articles - **Image Generation** — AI image creation, ideal for creative visualization Users don't need to remember APIs, modes, or parameters. Just say what you want. </purpose> <instructions> ## ⛔ Hard Constraints (Inviolable) **The scripts are the ONLY interface. Period.** ``` ┌─────────────────────────────────────────────────────────┐ │ AI Agent ──▶ ./scripts/*.sh ──▶ ListenHub API │ │ ▲ │ │ │ │ │ This is the ONLY path. │ │ Direct API calls are FORBIDDEN. │ └─────────────────────────────────────────────────────────┘ ``` **MUST**: - Execute functionality ONLY through provided scripts in `**/skills/listenhub/scripts/` - Pass user intent as script arguments exactly as documented - Trust script outputs; do not second-guess internal logic **MUST NOT**: - Write curl commands to ListenHub/Marswave API directly - Construct JSON bodies for API calls manually - Guess or fabricate speakerIds, endpoints, or API parameters - Assume API structure based on patterns or web searches - Hallucinate features not exposed by existing scripts **Why**: The API is proprietary. Endpoints, parameters, and speakerIds are NOT publicly documented. Web searches will NOT find this information. Any attempt to bypass scripts will produce incorrect, non-functional code. ## Script Location Scripts are located at `**/skills/listenhub/scripts/` relative to your working context. Different AI clients use different dot-directories: - Claude Code: `.claude/skills/listenhub/scripts/` - Other clients: may vary (`.cursor/`, `.windsurf/`, etc.) **Resolution**: Use glob pattern `**/skills/listenhub/scripts/*.sh` to locate scripts reliably, or resolve from the SKILL.md file's own path. ## Private Data (Cannot Be Searched) The following are **internal implementation details** that AI cannot reliably know: | Category | Examples | How to Obtain | |----------|----------|---------------| | API Base URL | `api.marswave.ai/...` | ✗ Cannot — internal to scripts | | Endpoints | `podcast/episodes`, etc. | ✗ Cannot — internal to scripts | | Speaker IDs | `cozy-man-english`, etc. | ✓ Call `get-speakers.sh` | | Request schemas | JSON body structure | ✗ Cannot — internal to scripts | | Response formats | Episode ID, status codes | ✓ Documented per script | **Rule**: If information is not in this SKILL.md or retrievable via a script (like `get-speakers.sh`), assume you don't know it. ## Design Philosophy **Hide complexity, reveal magic.** Users don't need to know: Episode IDs, API structure, polling mechanisms, credits, endpoint differences. Users only need: Say idea → wait a moment → get the link. ## Security - User-provided content (text, URLs) is transmitted to the ListenHub API (`api.marswave.ai`) for processing. Do not pass sensitive or confidential information as input. - The `--source-url` parameter accepts external URLs whose content is fetched and processed by the backend. Only use trusted URLs. - API keys are stored locally in environment variables and transmitted via HTTPS. Never log or display full API keys. - Version checks connect to `raw.githubusercontent.com` (read-only, no code execution). Set `LISTENHUB_SKIP_VERSION_CHECK=1` to disable. ## Environment ### ListenHub API Key API key stored in `$LISTENHUB_API_KEY`. Check on first use: ```bash source ~/.zshrc 2>/dev/null; [ -n "$LISTENHUB_API_KEY" ] && echo "ready" || echo "need_setup" ``` If setup needed, guide user: 1. Visit https://listenhub.ai/settings/api-keys 2. Paste key (only the `lh_sk_...` part) 3. Auto-save to ~/.zshrc ### Image Generation API Key Image generation uses the same ListenHub API key stored in `$LISTENHUB_API_KEY`. Image generation output path defaults to the user downloads directory, stored in `$LISTENHUB_OUTPUT_DIR`. On first image generation, the script auto-guides configuration: 1. Visit https://listenhub.ai/settings/api-keys (requires subscription) 2. Paste API key 3. Configure output path (default: ~/Downloads) 4. Auto-save to shell rc file **Security**: Never expose full API keys in output. ## Mode Detection Auto-detect mode from user input: **→ Podcast (1-2 speakers)** Supports single-speaker or dual-speaker podcasts. Debate mode requires 2 speakers. Default mode: `quick` unless explicitly requested. If speakers are not specified, call `get-speakers.sh` and select the first `speakerId` matching the chosen `language`. If reference materials are provided, pass them as `--source-url` or `--source-text`. When the user only provides a topic (e.g., "I want a podcast about X"), proceed with: 1) detect `language` from user input, 2) set `mode=quick`, 3) choose one speaker via `get-speakers.sh` matching the language, 4) create a single-speaker podcast without further clarification. 1. Keywords: "podcast", "chat about", "discuss", "debate", "dialogue" 2. Use case: Topic exploration, opinion exchange, deep analysis - Feature: Two voices, interactive feel **→ Explain (Explainer video)** - Keywords: "explain", "introduce", "video", "explainer", "tutorial" - Use case: Product intro, concept explanation, tutorials - Feature: Single narrator + AI-generated visuals, can export video **→ TTS (Text-to-speech)** TTS defaults to FlowSpeech `direct` for single-pass text or URL narration. Script arrays and multi-speaker dialogue belong to Speech as an advanced path, not the default TTS entry. Text-to-speech input is limited to 10,000 characters; split or use a URL when longer. 1. Keywords: "read aloud", "convert to speech", "tts", "voice" 2. Use case: Article to audio, note review, document narration 3. Feature: Fastest (1-2 min), pure audio ### Ambiguous "Convert to speech" Guidance When the request is ambiguous (e.g., "convert to speech", "read aloud"), apply: 1. Default to FlowSpeech and prioritize `direct` to avoid altering content. 2. Input type: URL uses `type=url`, plain text uses `type=text`. 3. Speaker: if not specified, call `get-speakers` and pick the first `speakerId` matching `language`. 4. Switch to Speech only when multi-line scripts or multi-speaker dialogue is explicitly requested, and require `scripts`. Example guidance: “This request can use FlowSpeech with the default direct mode; switch to smart for grammar and punctuation fixes. For per-line speaker assignment, provide scripts and switch to Speech.” **→ Image Generation** - Keywords: "generate image", "draw", "create picture", "visualize" - Use case: Creative visualization, concept art, illustrations - Feature: AI image generation via Labnana API, multiple resolutions and aspect ratios **Reference Images via Image Hosts** When reference images are local files, upload to a known image host and use the direct image URL in `--reference-images`. Recommended hosts: `imgbb.com`, `sm.ms`, `postimages.org`, `imgur.com`. Direct image URLs should end with `.jpg`, `.png`, `.webp`, or `.gif`. **Default**: If unclear, ask user which format they prefer. **Explicit override**: User can say "make it a podcast" / "I want explainer video" / "just voice" / "generate image" to override auto-detection. ## Interaction Flow ### Step 1: Receive input + detect mode ``` → Got it! Preparing... Mode: Two-person podcast Topic: Latest developments in Manus AI ``` For URLs, identify type: - `youtu.be/XXX` → convert to `https://www.youtube.com/watch?v=XXX` - Other URLs → use directly ### Step 2: Submit generation ``` → Generation submitted Estimated time: • Podcast: 2-3 minutes • Explain: 3-5 minutes • TTS: 1-2 minutes You can: • Wait and ask "done yet?" • Use check-s