local-vosk
使用 Vosk 进行本地语音转文本。轻量、快速、完全离线。非常适合转录 Telegram 语音消息、音频文件或任何无需云 API 的语音转文本任务。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~sfkiwi-local-voskcURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~sfkiwi-local-vosk/file -o sfkiwi-local-vosk.md# Local Vosk STT Lightweight local speech-to-text using Vosk. **Fully offline** after model download. ## Use Cases - **Telegram voice messages** — transcribe .ogg voice notes automatically - **Audio files** — any format ffmpeg supports - **Offline transcription** — no API keys, no cloud, no costs ## Quick Start ```bash # Transcribe Telegram voice message ./skills/local-vosk/scripts/transcribe voice_message.ogg # Transcribe any audio ./skills/local-vosk/scripts/transcribe audio.mp3 # With language (default: en-us) ./skills/local-vosk/scripts/transcribe audio.wav --lang en-us ``` ## Supported Formats Any format ffmpeg can decode: **ogg** (Telegram), mp3, wav, m4a, webm, flac, etc. ## Models Default model: `vosk-model-small-en-us-0.15` (~40MB) Other models available at https://alphacephei.com/vosk/models ## Setup (if not installed) ```bash pip3 install vosk --user --break-system-packages # Download model mkdir -p ~/vosk-models && cd ~/vosk-models wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip unzip vosk-model-small-en-us-0.15.zip ``` ## Notes - Quality is good for conversational speech - For higher accuracy, use larger models or faster-whisper - Processes audio at ~10x realtime on typical hardware - Telegram voice messages are .ogg format — works out of the box --- ## 中文说明 # 本地 Vosk STT 使用 Vosk 的轻量级本地语音转文本。下载模型后**完全离线**。 ## 使用场景 - **Telegram 语音消息** — 自动转录 .ogg 语音备注 - **音频文件** — ffmpeg 支持的任何格式 - **离线转录** — 无需 API 密钥、无需云、无成本 ## 快速开始 ```bash # Transcribe Telegram voice message ./skills/local-vosk/scripts/transcribe voice_message.ogg # Transcribe any audio ./skills/local-vosk/scripts/transcribe audio.mp3 # With language (default: en-us) ./skills/local-vosk/scripts/transcribe audio.wav --lang en-us ``` ## 支持的格式 ffmpeg 可以解码的任何格式:**ogg**(Telegram)、mp3、wav、m4a、webm、flac 等。 ## 模型 默认模型:`vosk-model-small-en-us-0.15`(约 40MB) 更多模型可在 https://alphacephei.com/vosk/models 获取 ## 安装(若尚未安装) ```bash pip3 install vosk --user --break-system-packages # Download model mkdir -p ~/vosk-models && cd ~/vosk-models wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip unzip vosk-model-small-en-us-0.15.zip ``` ## 注意事项 - 对话式语音的识别质量良好 - 如需更高准确率,请使用更大的模型或 faster-whisper - 在典型硬件上以约 10 倍实时速度处理音频 - Telegram 语音消息为 .ogg 格式 — 开箱即用