agent-audit

TotalClaw 作者 totalclaw

审核您的 AI 代理设置的性能、成本和投资回报率。扫描 OpenClaw 配置、cron 作业、 会话历史记录和模型使用情况以发现浪费并建议优化。 可与任何模型提供商(Anthropic、OpenAI、Google、xAI 等)合作。 在以下情况下使用:(1) 用户说“审核我的代理”、“优化我的成本”、“我在 AI 上是否超支”、 “检查我的模型使用情况”、“代理审核”、“成本优化”、(2)用户想知道哪些 cron 作业是昂贵还是便宜,(3) 用户想要模型任务匹配建议, (4) 用户想要对其代理设置进行 ROI 分析,(5) 用户说“我在哪里浪费了代币”。

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~sharbelayy-agent-audit
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~sharbelayy-agent-audit/file -o sharbelayy-agent-audit.md
## 概述(中文)

审核您的 AI 代理设置的性能、成本和投资回报率。扫描 OpenClaw 配置、cron 作业、
会话历史记录和模型使用情况以发现浪费并建议优化。
可与任何模型提供商(Anthropic、OpenAI、Google、xAI 等)合作。
在以下情况下使用:(1) 用户说“审核我的代理”、“优化我的成本”、“我在 AI 上是否超支”、
“检查我的模型使用情况”、“代理审核”、“成本优化”、(2)用户想知道哪些
cron 作业是昂贵还是便宜,(3) 用户想要模型任务匹配建议,
(4) 用户想要对其代理设置进行 ROI 分析,(5) 用户说“我在哪里浪费了代币”。

## 原文

# Agent Audit

Scan your entire OpenClaw setup and get actionable cost/performance recommendations.

## What This Skill Does

1. **Scans config** — reads OpenClaw config to map models to agents/tasks
2. **Analyzes cron history** — checks every cron job's model, token usage, runtime, success rate
3. **Classifies tasks** — determines complexity level of each task
4. **Calculates costs** — per agent, per cron, per task type using provider pricing
5. **Recommends changes** — with confidence levels and risk warnings
6. **Generates report** — markdown report with specific savings estimates

## Running the Audit

```bash
python3 {baseDir}/scripts/audit.py
```

Options:
```bash
python3 {baseDir}/scripts/audit.py --format markdown    # Full report (default)
python3 {baseDir}/scripts/audit.py --format summary     # Quick summary only
python3 {baseDir}/scripts/audit.py --dry-run             # Show what would be analyzed
python3 {baseDir}/scripts/audit.py --output /path/to/report.md  # Save to file
```

## How It Works

### Phase 1: Discovery
- Read OpenClaw config (`~/.openclaw/openclaw.json` or similar)
- List all cron jobs and their configurations
- List all agents and their default models
- Detect provider (Anthropic, OpenAI, Google, xAI) from model names

### Phase 2: History Analysis
- Pull cron job run history (last 7 days by default)
- Calculate per-job: avg tokens, avg runtime, success rate, model used
- Pull session history where available
- Calculate total token spend by model tier

### Phase 3: Task Classification
Classify each task into complexity tiers:

| Tier | Examples | Recommended Models |
|------|----------|-------------------|
| **Simple** | Health checks, status reports, reminders, notifications | Cheapest tier (Haiku, GPT-4o-mini, Flash, Grok-mini) |
| **Medium** | Content drafts, research, summarization, data analysis | Mid tier (Sonnet, GPT-4o, Pro, Grok) |
| **Complex** | Coding, architecture, security review, nuanced writing | Top tier (Opus, GPT-4.5, Ultra, Grok-2) |

Classification signals:
- **Simple**: Short output (<500 tokens), low thinking requirement, repetitive pattern, status/health tasks
- **Medium**: Medium output, some reasoning needed, creative but templated, research tasks
- **Complex**: Long output, multi-step reasoning, code generation, security-critical, tasks that previously failed on weaker models

### Phase 4: Recommendations
For each task where the model tier doesn't match complexity:

```
⚠️ RECOMMENDATION: Downgrade "Knox Bot Health Check" from opus to haiku
   Current: anthropic/claude-opus-4 ($15/M input, $75/M output)
   Suggested: anthropic/claude-haiku ($0.25/M input, $1.25/M output)
   Reason: Simple status check averaging 300 output tokens
   Estimated savings: $X.XX/month
   Risk: LOW — task is simple pattern matching
   Confidence: HIGH
```

### Safety Rules — NEVER Recommend Downgrading:
- Coding/development tasks
- Security reviews or audits
- Tasks that have previously failed on weaker models
- Tasks where the user explicitly chose a higher model
- Complex multi-step reasoning tasks
- Anything the user flagged as critical

### Phase 5: Report Generation
Output a clean markdown report with:
1. **Overview** — total agents, crons, monthly spend estimate
2. **Per-agent breakdown** — model, usage, cost
3. **Per-cron breakdown** — model, frequency, avg tokens, cost
4. **Recommendations** — sorted by savings potential
5. **Total potential savings** — monthly estimate
6. **One-liner config changes** — exact model strings to swap

## Model Pricing Reference

See [references/model-pricing.md](references/model-pricing.md) for current pricing across all providers.
Update this file when prices change.

## Task Classification Details

See [references/task-classification.md](references/task-classification.md) for detailed heuristics
on how tasks are classified into complexity tiers.

## Important Notes

- This skill is **read-only** — it never changes your config automatically
- All recommendations include risk levels and confidence scores
- When unsure about a task's complexity, it defaults to keeping the current model
- The audit should be re-run periodically (monthly) as usage patterns change
- Token counts are estimates based on cron history — actual costs depend on your provider's billing