audio-gen
Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and ElevenLabs converts it to high-quality audio. Supports multiple formats (audiobook, podcast, educational), custom lengths, and voice effects. Use when asked to create audio content, make a podcast, generate an audiobook, or produce educational audio. Returns MP3 audio file via MEDIA token.
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install clawskills:clawskills~udiedrichsen-audio-gencURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aclawskills~udiedrichsen-audio-gen/file -o udiedrichsen-audio-gen.md# 🎙️ Audio Content Generator Generate high-quality audiobooks, podcasts, or educational audio content on demand using AI-written scripts and ElevenLabs text-to-speech. ## Quick Start **Create an audiobook chapter:** ``` User: "Create a 5-minute audiobook chapter about a dragon discovering friendship" ``` **Generate a podcast:** ``` User: "Make a 10-minute podcast about the history of coffee" ``` **Produce educational content:** ``` User: "Generate a 15-minute educational audio explaining how neural networks work" ``` ## Content Formats ### Audiobook **Style:** Narrative storytelling with emotional depth - Clear beginning, middle, and end - Descriptive language and vivid imagery - Dramatic pacing with thoughtful pauses - Emotional tone that matches the story - Use voice effects like `[whispers]`, `[excited]`, `[serious]` for impact **Example Structure:** ``` [Opening hook - set the scene] [long pause] [Story development with character emotions] [short pause] between sentences [long pause] between paragraphs [Climax with dramatic tension] [long pause] [Resolution and emotional closure] ``` ### Podcast **Style:** Conversational and engaging - Warm, welcoming intro (15-30 seconds) - Main content with natural flow - Transitions between topics - Memorable outro with key takeaways - Conversational tone throughout **Example Structure:** ``` **Intro:** "Welcome to [topic]. I'm excited to share..." [short pause] **Main Content:** "Let's start with... [topic 1]" [long pause] between segments **Outro:** "Thanks for listening! Remember..." ``` ### Educational Content **Style:** Clear explanations for learning - Simple introductions to complex topics - Step-by-step breakdowns - Real-world examples and analogies - Recap of key concepts at the end - Enthusiastic delivery with `[excited]` for important points **Example Structure:** ``` **Introduction:** What is [topic] and why it matters? **Main Content:** - Concept 1: Explanation + Example - Concept 2: Explanation + Example - Concept 3: Explanation + Example **Summary:** Key takeaways and next steps ``` ## Length Guidelines **Word Count to Duration Conversion:** - 5 minutes = ~375 words - 10 minutes = ~750 words - 15 minutes = ~1,125 words - 20 minutes = ~1,500 words - 30 minutes = ~2,250 words **Pacing:** Average conversational speed is ~75 words per minute **Practical Limits:** - Minimum: 2 minutes (~150 words) - Maximum: 30 minutes (~2,250 words) - Sweet spot: 5-15 minutes for best engagement ## Workflow Instructions ### Step 1: Understand the Request Parse the user's request for: 1. **Content type** (audiobook, podcast, educational, or inferred from topic) 2. **Topic/theme** (what should the content be about) 3. **Target length** (how many minutes) 4. **Tone/style** (dramatic, casual, educational, etc.) 5. **Special requests** (specific voice, emphasis on certain points) ### Step 2: Calculate Word Count ``` target_words = target_minutes × 75 ``` Example: 10 minutes = 10 × 75 = 750 words ### Step 3: Generate the Script Write the complete script following these rules: **Content Guidelines:** - Start strong with an engaging hook - Maintain natural, conversational flow - Use active voice and simple sentence structure - Include relevant examples and stories - End with a satisfying conclusion **Formatting Rules:** - Add `[short pause]` after sentences (use sparingly, not every sentence) - Add `[long pause]` between paragraphs or major sections - Use voice effects strategically: `[whispers]`, `[shouts]`, `[excited]`, `[serious]`, `[sarcastic]`, `[sings]`, `[laughs]` - Write numbers as words: "twenty-three" not "23" - Spell out acronyms first time: "AI, or artificial intelligence" - Avoid complex punctuation (em-dashes work, but semicolons don't read well) - Remove markdown formatting before TTS conversion ### Step 4: Present the Script Show the script to the user and ask: ``` Here's the [format] script I've created (approximately [length] minutes): [Display the script] Would you like me to: 1. Generate the audio now 2. Make changes to the script 3. Adjust the length or tone ``` ### Step 5: Handle User Feedback If user requests changes: - Regenerate the script with adjustments - Maintain the target word count - Present the revised version If user approves: - Proceed to audio generation ### Step 6: Generate Audio **Format the script for TTS:** 1. Remove any remaining markdown (headers, bold, italics) 2. Ensure voice effects are in proper `[effect]` format 3. Check that pauses are appropriately placed 4. Verify numbers and acronyms are spelled out **Invoke the TTS script:** **IMPORTANT:** The `ELEVENLABS_API_KEY` environment variable is already configured in the system. Simply invoke the TTS script directly. ```bash uv run /home/clawdbot/clawdbot/skills/sag/scripts/tts.py \ -o /tmp/audio-gen-[timestamp]-[topic-slug].mp3 \ -m eleven_multilingual_v2 \ "[formatted_script]" ``` **For long scripts, use heredoc:** ```bash uv run /home/clawdbot/clawdbot/skills/sag/scripts/tts.py \ -o /tmp/audio-gen-[timestamp]-[topic-slug].mp3 \ -m eleven_multilingual_v2 \ "$(cat <<'EOF' [formatted_script] EOF )" ``` **Return the result:** ``` MEDIA:/tmp/audio-gen-[timestamp]-[topic-slug].mp3 Your [format] is ready! [Brief description of content]. Duration: approximately [X] minutes. ``` ## Voice Effects (SSML Tags) Available voice modulation effects (use sparingly for impact): - `[whispers]` - Soft, intimate delivery - `[shouts]` - Loud, emphatic delivery - `[excited]` - Enthusiastic, energetic tone - `[serious]` - Grave, solemn tone - `[sarcastic]` - Ironic, mocking tone - `[sings]` - Musical, melodic delivery - `[laughs]` - Amused, jovial tone - `[short pause]` - Brief silence (~0.5s) - `[long pause]` - Extended silence (~1-2s) **Best Practices:** - Use effects for emotional moments, not every sentence - Pauses are your most powerful tool for pacing - Voice effects work best in audiobooks and dramatic content - Keep podcasts and educational content mostly natural ## Error Handling ### Script Too Long If the generated script exceeds target by >20%: ``` The script I generated is [X] words ([Y] minutes), which is longer than your target of [Z] minutes. Would you like me to: 1. Condense it to fit the target length 2. Split it into multiple parts 3. Keep it as is ``` ### Script Too Short If the generated script is under target by >20%: ``` The script is [X] words ([Y] minutes), shorter than your target. Would you like me to: 1. Expand it with more detail 2. Add additional examples or stories 3. Generate as is ``` ### TTS Generation Fails If the TTS script fails: ``` I've created the script, but I'm unable to generate the audio right now. Here's your script: [Display script] Error: [specific error message] You can: 1. Check that ELEVENLABS_API_KEY is configured 2. Use the script with your own text-to-speech tool 3. Try again in a moment 4. Ask me to troubleshoot the audio generation ``` **Common TTS Issues:** - API key not set: Verify ELEVENLABS_API_KEY in config - Rate limit: Wait a moment and try again - Text too long: Break into smaller chunks (max ~5000 characters) ### Invalid Request For unrealistic requests (e.g., "100-hour audiobook"): ``` That length would require [X] words and take significant time to generate. I recommend: - Breaking it into multiple episodes/chapters - Targeting 5-30 minutes per audio file - Creating a series instead of one long file ``` ## Tips for Best Results ### For Engaging Audiobooks - Focus on character emotions and sensory details - Use pauses to build dramatic tension - Vary sentence length for rhythm - Include internal monologue and reflection ### For Compelling Podcasts - Start with a question or surprising fact - Use conversational phrases: "You know what's interesting..." - Include relatable examples from everyday life - End with actionable takeaways ### For Effective Educational Content - Use the "explain like I'm five" approach