video-analyzer

TotalClaw 作者 totalclaw

下载、转录和分析来自 YouTube、X/Twitter 和 TikTok 的视频与本地 Whisper 处理。非常适合提取 TL、DR、时间戳和可行的见解。当要求转录视频时使用，总结 YouTube 视频、从播客或谈话中提取关键点、分析某人在其中所说的内容视频，从长视频中获取时间戳，或者当用户分享任何 YouTube 时， X/Twitter 或 TikTok 视频 URL，并想知道其中的内容。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~minilozio-video-analyzer-skill

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~minilozio-video-analyzer-skill/file -o minilozio-video-analyzer-skill.md

## 概述（中文）

下载、转录和分析来自 YouTube、X/Twitter 和 TikTok 的视频
与本地 Whisper 处理。非常适合提取 TL、DR、时间戳和
可行的见解。当要求转录视频时使用，总结 YouTube
视频、从播客或谈话中提取关键点、分析某人在其中所说的内容
视频，从长视频中获取时间戳，或者当用户分享任何 YouTube 时，
X/Twitter 或 TikTok 视频 URL，并想知道其中的内容。

## 原文

# Video Analyzer 🎥

A tool to download, transcribe, and analyze videos from any platform using a smart two-tier system (yt-dlp for fast subtitles, local whisper-cpp for robust fallback).

## How to Use

When the user asks you to summarize, transcribe, or download a video/audio from a URL, use the bundled python script:

```bash
uv run {baseDir}/scripts/analyze_video.py --action <ACTION> --url "<URL>" [--quality <normal|max>] [--lang <en|it|etc>]
```

### Supported Actions:
- `transcript`: Extracts the text with timestamps. **Use this when the user asks for a summary or transcript.**
- `download-video`: Downloads the video as MP4 to the Desktop.
- `download-audio`: Downloads the audio as M4A/MP3 to the Desktop.

### Analyzing a Video (IMPORTANT)

If the user asks for a **summary, analysis, or key moments**:
1. Run the script with `--action transcript`.
2. The script will output the path to a `.txt` file containing the timestamped transcript.
3. Read that file.
4. Output your response **EXACTLY** in this Markdown format:

```markdown
## 📝 TL;DR
[A punchy 3-sentence summary of the video's core message]

## ⏱️ Key Moments
- [MM:SS] [Brief description of what is discussed]
- [MM:SS] [Brief description of what is discussed]
- [MM:SS] [Brief description of what is discussed]
*(Extract 3 to 7 key moments depending on video length)*

## 💡 Actionable Insights
1. [Practical takeaway 1]
2. [Practical takeaway 2]
3. [Practical takeaway 3]

---
```

### Local Whisper Quality
If the script needs to fall back to Whisper (e.g., for X/Twitter videos), it uses `normal` by default:
- `normal`: Fast (~1 min for 30 min video) — **Default**
- `max`: Best quality (~5 min for 30 min video) — use `--quality max` when accuracy is critical

### Multilingual Support
All Whisper models are **multilingual by default**. The skill can transcribe videos in any language (Italian, Spanish, Japanese, etc.).

**IMPORTANT:** Always respond to the user in THEIR language, not the video's language. If the user speaks Italian but sends an English video, give them the summary in Italian. 

### Finding specific moments
The transcript includes **precise timestamps** like `[05:53] text...`. If the user asks "When do they talk about X?", grep the transcript and return the exact timestamp from the file.