agent-audit
审核您的 AI 代理设置的性能、成本和投资回报率。扫描 OpenClaw 配置、cron 作业、 会话历史记录和模型使用情况以发现浪费并建议优化。 可与任何模型提供商(Anthropic、OpenAI、Google、xAI 等)合作。 在以下情况下使用:(1) 用户说“审核我的代理”、“优化我的成本”、“我在 AI 上是否超支”、 “检查我的模型使用情况”、“代理审核”、“成本优化”、(2)用户想知道哪些 cron 作业是昂贵还是便宜,(3) 用户想要模型任务匹配建议, (4) 用户想要对其代理设置进行 ROI 分析,(5) 用户说“我在哪里浪费了代币”。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~sharbelayy-agent-auditcURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~sharbelayy-agent-audit/file -o sharbelayy-agent-audit.md## 概述(中文)
审核您的 AI 代理设置的性能、成本和投资回报率。扫描 OpenClaw 配置、cron 作业、
会话历史记录和模型使用情况以发现浪费并建议优化。
可与任何模型提供商(Anthropic、OpenAI、Google、xAI 等)合作。
在以下情况下使用:(1) 用户说“审核我的代理”、“优化我的成本”、“我在 AI 上是否超支”、
“检查我的模型使用情况”、“代理审核”、“成本优化”、(2)用户想知道哪些
cron 作业是昂贵还是便宜,(3) 用户想要模型任务匹配建议,
(4) 用户想要对其代理设置进行 ROI 分析,(5) 用户说“我在哪里浪费了代币”。
## 原文
# Agent Audit
Scan your entire OpenClaw setup and get actionable cost/performance recommendations.
## What This Skill Does
1. **Scans config** — reads OpenClaw config to map models to agents/tasks
2. **Analyzes cron history** — checks every cron job's model, token usage, runtime, success rate
3. **Classifies tasks** — determines complexity level of each task
4. **Calculates costs** — per agent, per cron, per task type using provider pricing
5. **Recommends changes** — with confidence levels and risk warnings
6. **Generates report** — markdown report with specific savings estimates
## Running the Audit
```bash
python3 {baseDir}/scripts/audit.py
```
Options:
```bash
python3 {baseDir}/scripts/audit.py --format markdown # Full report (default)
python3 {baseDir}/scripts/audit.py --format summary # Quick summary only
python3 {baseDir}/scripts/audit.py --dry-run # Show what would be analyzed
python3 {baseDir}/scripts/audit.py --output /path/to/report.md # Save to file
```
## How It Works
### Phase 1: Discovery
- Read OpenClaw config (`~/.openclaw/openclaw.json` or similar)
- List all cron jobs and their configurations
- List all agents and their default models
- Detect provider (Anthropic, OpenAI, Google, xAI) from model names
### Phase 2: History Analysis
- Pull cron job run history (last 7 days by default)
- Calculate per-job: avg tokens, avg runtime, success rate, model used
- Pull session history where available
- Calculate total token spend by model tier
### Phase 3: Task Classification
Classify each task into complexity tiers:
| Tier | Examples | Recommended Models |
|------|----------|-------------------|
| **Simple** | Health checks, status reports, reminders, notifications | Cheapest tier (Haiku, GPT-4o-mini, Flash, Grok-mini) |
| **Medium** | Content drafts, research, summarization, data analysis | Mid tier (Sonnet, GPT-4o, Pro, Grok) |
| **Complex** | Coding, architecture, security review, nuanced writing | Top tier (Opus, GPT-4.5, Ultra, Grok-2) |
Classification signals:
- **Simple**: Short output (<500 tokens), low thinking requirement, repetitive pattern, status/health tasks
- **Medium**: Medium output, some reasoning needed, creative but templated, research tasks
- **Complex**: Long output, multi-step reasoning, code generation, security-critical, tasks that previously failed on weaker models
### Phase 4: Recommendations
For each task where the model tier doesn't match complexity:
```
⚠️ RECOMMENDATION: Downgrade "Knox Bot Health Check" from opus to haiku
Current: anthropic/claude-opus-4 ($15/M input, $75/M output)
Suggested: anthropic/claude-haiku ($0.25/M input, $1.25/M output)
Reason: Simple status check averaging 300 output tokens
Estimated savings: $X.XX/month
Risk: LOW — task is simple pattern matching
Confidence: HIGH
```
### Safety Rules — NEVER Recommend Downgrading:
- Coding/development tasks
- Security reviews or audits
- Tasks that have previously failed on weaker models
- Tasks where the user explicitly chose a higher model
- Complex multi-step reasoning tasks
- Anything the user flagged as critical
### Phase 5: Report Generation
Output a clean markdown report with:
1. **Overview** — total agents, crons, monthly spend estimate
2. **Per-agent breakdown** — model, usage, cost
3. **Per-cron breakdown** — model, frequency, avg tokens, cost
4. **Recommendations** — sorted by savings potential
5. **Total potential savings** — monthly estimate
6. **One-liner config changes** — exact model strings to swap
## Model Pricing Reference
See [references/model-pricing.md](references/model-pricing.md) for current pricing across all providers.
Update this file when prices change.
## Task Classification Details
See [references/task-classification.md](references/task-classification.md) for detailed heuristics
on how tasks are classified into complexity tiers.
## Important Notes
- This skill is **read-only** — it never changes your config automatically
- All recommendations include risk levels and confidence scores
- When unsure about a task's complexity, it defaults to keeping the current model
- The audit should be re-run periodically (monthly) as usage patterns change
- Token counts are estimates based on cron history — actual costs depend on your provider's billing