audio-cog
由 CellCog 提供支持的 AI 音频生成和文本转语音。三个语音提供程序(OpenAI、ElevenLabs、MiniMax),语音克隆、头像语音、音效生成、音乐创作长达 10 分钟。使用 AI 进行专业配音、旁白和音频制作。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~nitishgargiitd-audio-cogcURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~nitishgargiitd-audio-cog/file -o nitishgargiitd-audio-cog.md## 概述(中文) 由 CellCog 提供支持的 AI 音频生成和文本转语音。三个语音提供程序(OpenAI、ElevenLabs、MiniMax),语音克隆、头像语音、音效生成、音乐创作长达 10 分钟。使用 AI 进行专业配音、旁白和音频制作。 ## 原文 # Audio Cog - AI Audio Generation Powered by CellCog Create professional audio with AI — voiceovers, music, sound effects, and personalized avatar voices. CellCog provides **three voice providers**, each with different strengths. Choose based on your needs: | Scenario | Provider | Why | |----------|----------|-----| | Standard narration/voiceover | OpenAI | Best voice style control, consistent quality | | Emotional/dramatic delivery | ElevenLabs | Richest emotional range, supports emotion tags | | Cloned voice (avatar) | MiniMax | Only provider with voice cloning support | | Character voice with specific accent | ElevenLabs | 100+ diverse pre-made voices | | Fine pitch/speed/volume control | MiniMax | Granular voice settings | --- ## Prerequisites This skill requires the `cellcog` skill for SDK setup and API calls. ```bash clawhub install cellcog ``` **Read the cellcog skill first** for SDK setup. This skill shows you what's possible. --- ## Voice Providers ### OpenAI (Default) Best for standard narration, voiceovers, and single-speaker content with precise delivery control. **Key strength**: Natural-language style instructions — describe the accent, tone, pacing, and emotion you want. **8 built-in voices:** | Voice | Gender | Characteristics | |-------|--------|----------------| | **cedar** | Male | Warm, resonant, authoritative, trustworthy | | **marin** | Female | Bright, articulate, emotionally agile, professional | | **ballad** | Male | Smooth, melodic, musical quality | | **coral** | Female | Vibrant, lively, dynamic, spirited | | **echo** | Male | Calm, measured, thoughtful, deliberate | | **sage** | Female | Wise, contemplative, reflective | | **shimmer** | Female | Soft, gentle, soothing, approachable | | **verse** | Male | Poetic, rhythmic, artistic, expressive | Best quality: **cedar** (male), **marin** (female). **Style customization examples:** - "Warm conversational tone, medium pace, slight enthusiasm when mentioning features. American accent." - "Deep, hushed, enigmatic, with a slow deliberate cadence — true crime narrator style." - "Heavy French accent, sophisticated yet friendly, moderate pacing with deliberate pauses." --- ### ElevenLabs Best for emotional delivery, dramatic content, character voices, and audiobook narration. **Key strength**: Emotion tags embedded directly in text — `[laughs]`, `[sighs]`, `[whispers]`, `[excited]`, `[sarcastic]`. Plus 100+ diverse pre-made voices. **Emotion tags (use sparingly — 1-2 per paragraph):** | Tag | Effect | |-----|--------| | `[laughs]` | Natural laughter | | `[chuckles]` | Soft/brief laughter | | `[sighs]` | Sighing | | `[gasps]` | Surprise/shock | | `[whispers]` | Whispering delivery | | `[pause]` | Natural pause/beat | | `[sad]`, `[happy]`, `[excited]`, `[angry]`, `[sarcastic]` | Emotional delivery | **Example prompt:** > "Generate speech using ElevenLabs with a warm British male voice: > 'And then, just when everyone thought it was over... [pause] [whispers] it wasn't.'" --- ### MiniMax Best for cloned voices (avatars) and fine-grained voice control. **Key strength**: MiniMax Speech 2.8 HD — studio-grade audio quality. Supports avatar cloned voice IDs for personalized content, plus 17+ standard pre-made voices with granular speed, pitch, and volume control. **Standard voices include:** `Deep_Voice_Man`, `Calm_Woman`, `Casual_Guy`, `Lively_Girl`, `Wise_Woman`, `Friendly_Person`, `Young_Knight`, `Elegant_Man`, and more. **Voice settings:** emotion (happy/sad/angry/neutral/etc.), speed (0.5–2.0), volume (0–10), pitch (-12 to 12). --- ## Avatar / Cloned Voice Users can create avatars on CellCog with their own cloned voice. When an avatar has a cloned voice, CellCog uses the MiniMax provider to generate speech that sounds like that person. **How it works:** - The user creates an avatar on cellcog.ai and uploads voice samples - CellCog clones their voice using MiniMax Speech 2.8 HD - Any audio request referencing that avatar uses their cloned voice **Example prompt:** > "Generate a voiceover using my avatar Luna's voice: 'Welcome to our quarterly update. I'm excited to share some incredible results with you today.'" This is powerful for creating consistent, personalized content — marketing videos, podcast intros, course narration — all in the user's own voice. --- ## Sound Effects (SFX) CellCog generates standalone sound effects from text descriptions. Royalty-free, 0.1 to 30 seconds. **Example prompts:** - "Generate a sound effect of heavy rain hitting a metal roof with occasional thunder, 10 seconds" - "Create a crispy footsteps-on-fresh-snow sound effect, 5 seconds" - "Generate an echoing door slam in a large empty warehouse" **Tips for better SFX:** - Be specific about textures and environment - Specify duration when exact length matters - For ambient audio longer than 30 seconds, generate a short loopable segment and extend with ffmpeg --- ## Music Generation Create original music from text descriptions. 3 seconds to 10 minutes. Royalty-free. **Capabilities:** - Any genre or genre fusion - Instrumental and vocal tracks (specify if you want vocals) - Complex arrangements, mood transitions, and energy dynamics - Describe what you want — the model handles music theory **Example prompts:** - "Create 2 minutes of calm lo-fi hip-hop background music with soft piano and mellow beats, 75 BPM" - "Generate a 15-second upbeat tech podcast intro jingle" - "Create 90 seconds of cinematic orchestral music — start soft and inspiring, build to a confident crescendo" - "Generate a 3-minute pop song about summer adventures with female vocals" For precise section-by-section control (exact timing per section), describe your composition plan in detail — CellCog handles the structure. **All generated music is royalty-free** — use commercially without attribution or licensing fees. --- ## Multi-Language Support All three voice providers support 40+ languages. Provide speech text in the target language: English, Spanish, French, German, Italian, Portuguese, Chinese (Mandarin/Cantonese), Japanese, Korean, Hindi, Arabic, Russian, Polish, Dutch, Turkish, and many more. --- ## Chat Mode **Use `chat_mode="agent"`** for all audio tasks. Audio generation executes efficiently in agent mode — no need for agent team. --- ## Tips for Better Audio 1. **Choose the right provider**: OpenAI for standard narration, ElevenLabs for emotional/dramatic, MiniMax for cloned voices 2. **Provide the complete script**: Write out exactly what should be spoken — don't say "something about our product" 3. **Include style instructions**: "Confident but warm", "slow and deliberate", "with slight excitement" 4. **For music**: Specify duration, mood, genre, and tempo (BPM if you know it) 5. **Pronunciation guidance**: For names or technical terms, add hints: "CellCog (pronounced SELL-kog)" 6. **For ElevenLabs emotion tags**: Use sparingly — 1-2 per paragraph. Tags affect all subsequent text until a new tag. --- ## 中文说明 # Audio Cog - 由 CellCog 提供支持的 AI 音频生成 使用 AI 创建专业音频——配音、音乐、音效以及个性化的头像语音。 CellCog 提供**三个语音提供程序**,各有不同的优势。根据你的需求进行选择: | 场景 | 提供程序 | 原因 | |----------|----------|-----| | 标准旁白/配音 | OpenAI | 最佳的语音风格控制,质量一致 | | 情感化/戏剧化演绎 | ElevenLabs | 最丰富的情感范围,支持情感标签 | | 克隆语音(头像) | MiniMax | 唯一支持语音克隆的提供程序 | | 带特定口音的角色语音 | ElevenLabs | 100+ 多样化的预制语音 | | 精细的音调/语速/音量控制 | MiniMax | 细粒度的语音设置 | --- ## 前置条件 本技能需要 `cellcog` 技能来完成 SDK 设置和 API 调用。 ```bash clawhub install cellcog ``` **请先阅读 cellcog 技能**以了解 SDK 设置。本技能向你展示有哪些可能性。 --- ## 语音提供程序 ### OpenAI(默认) 最适合标准旁白、配音以及需要精确演绎控制的单人内容。 **核心优势**:自然语言风格指令——描述你想要的口音、语气、节奏和情感。 **8 个内置语音:** | 语音 | 性别 | 特征 | |-------|--------|----------------| | **cedar** | 男 | 温暖、浑厚、权威、值得信赖 | | **marin** | 女 | 明亮、吐字清晰、情感灵活、专业 | | **ballad** | 男 | 流畅、悦耳、富有音乐感 | | **coral** | 女 | 充满活力、生动、动感、热情 | | **echo** | 男 | 平静、沉稳、深思、从容 |