voice-to-text
使用 Vosk 离线语音识别将语音消息和音频文件转换为文本。当用户发送语音消息、音频文件或要求将语音转录为文本时使用。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~vae999-voice-to-textcURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~vae999-voice-to-text/file -o vae999-voice-to-text.md# Voice to Text Convert voice messages and audio files to text using Vosk, an offline speech recognition toolkit. ## Setup 1. Install dependencies: ```bash # macOS brew install ffmpeg pip install vosk # Linux apt-get install ffmpeg pip install vosk ``` 2. Download a Vosk model: ```bash mkdir -p ~/.vosk/models && cd ~/.vosk/models # Chinese (small, fast) curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip unzip vosk-model-small-cn-0.22.zip # English (small) curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip unzip vosk-model-small-en-us-0.15.zip ``` ## Usage When the user provides a voice message or audio file path, run the transcription: ```bash python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>" ``` For specific model selection, set the environment variable: ```bash VOSK_MODEL_PATH=~/.vosk/models/vosk-model-cn-0.22 python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>" ``` ## Supported Audio Formats - MP3, WAV, M4A, OGG, FLAC, AAC, WEBM - Voice messages from WeChat, Telegram, WhatsApp, etc. ## Available Models | Model | Language | Size | Notes | |-------|----------|------|-------| | vosk-model-small-cn-0.22 | Chinese | 42M | Fast, good accuracy | | vosk-model-cn-0.22 | Chinese | 1.3G | High accuracy | | vosk-model-small-en-us-0.15 | English | 40M | Fast, good accuracy | | vosk-model-en-us-0.22 | English | 1.8G | High accuracy | Download models from: https://alphacephei.com/vosk/models ## Example Workflow 1. User sends a voice message via WeChat/Telegram 2. OpenClaw receives the audio file 3. Run: `python3 transcribe.py /path/to/voice.ogg` 4. Return transcribed text to user ## Troubleshooting - **No model found**: Download a model to `~/.vosk/models/` - **ffmpeg not found**: Install via `brew install ffmpeg` or `apt install ffmpeg` - **Poor accuracy**: Try a larger model for better results ## Notes - Works completely offline after model download - Supports multiple languages (download appropriate model) - Audio is converted to 16kHz mono WAV for processing --- ## 中文说明 # 语音转文本 使用离线语音识别工具包 Vosk 将语音消息和音频文件转换为文本。 ## 安装设置 1. 安装依赖: ```bash # macOS brew install ffmpeg pip install vosk # Linux apt-get install ffmpeg pip install vosk ``` 2. 下载 Vosk 模型: ```bash mkdir -p ~/.vosk/models && cd ~/.vosk/models # Chinese (small, fast) curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip unzip vosk-model-small-cn-0.22.zip # English (small) curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip unzip vosk-model-small-en-us-0.15.zip ``` ## 用法 当用户提供语音消息或音频文件路径时,运行转录: ```bash python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>" ``` 如需选择特定模型,请设置环境变量: ```bash VOSK_MODEL_PATH=~/.vosk/models/vosk-model-cn-0.22 python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>" ``` ## 支持的音频格式 - MP3、WAV、M4A、OGG、FLAC、AAC、WEBM - 来自微信、Telegram、WhatsApp 等的语音消息 ## 可用模型 | 模型 | 语言 | 大小 | 备注 | |-------|----------|------|-------| | vosk-model-small-cn-0.22 | 中文 | 42M | 快速,准确度良好 | | vosk-model-cn-0.22 | 中文 | 1.3G | 高准确度 | | vosk-model-small-en-us-0.15 | 英文 | 40M | 快速,准确度良好 | | vosk-model-en-us-0.22 | 英文 | 1.8G | 高准确度 | 从以下地址下载模型:https://alphacephei.com/vosk/models ## 示例工作流 1. 用户通过微信/Telegram 发送语音消息 2. OpenClaw 接收音频文件 3. 运行:`python3 transcribe.py /path/to/voice.ogg` 4. 将转录后的文本返回给用户 ## 故障排查 - **未找到模型**:下载模型到 `~/.vosk/models/` - **未找到 ffmpeg**:通过 `brew install ffmpeg` 或 `apt install ffmpeg` 安装 - **准确度不佳**:尝试使用更大的模型以获得更好的结果 ## 注意事项 - 下载模型后完全离线工作 - 支持多种语言(下载相应的模型) - 音频会被转换为 16kHz 单声道 WAV 进行处理