skill-health-monitor

TotalClaw 作者 totalclaw

监控已部署技能的性能漂移、错误和意外行为变化。通过警报和趋势跟踪进行持续的部署后运行状况检查。

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~trypto1019-arc-skill-health-monitor
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~trypto1019-arc-skill-health-monitor/file -o trypto1019-arc-skill-health-monitor.md
## 概述(中文)

监控已部署技能的性能漂移、错误和意外行为变化。通过警报和趋势跟踪进行持续的部署后运行状况检查。

## 原文

# Skill Health Monitor

Catch skill degradation before it becomes a crisis. Monitors response times, error rates, output drift, and resource usage for deployed skills.

## Why This Exists

Skills work fine during testing, then silently degrade in production. Free models change behavior, APIs add latency, memory leaks accumulate. By the time you notice, your agent has been running on broken skills for hours.

## Commands

### Monitor a skill execution
```bash
python3 {baseDir}/scripts/health_monitor.py check --skill <name> --cmd "python3 path/to/script.py"
```

### View health dashboard
```bash
python3 {baseDir}/scripts/health_monitor.py dashboard
```

### Set alert thresholds
```bash
python3 {baseDir}/scripts/health_monitor.py threshold --skill <name> --max-latency 5000 --max-errors 3
```

### Export health report
```bash
python3 {baseDir}/scripts/health_monitor.py report --json
```

### View trends for a skill
```bash
python3 {baseDir}/scripts/health_monitor.py trend --skill <name> --period 24h
```

## What It Tracks

- **Latency**: Execution time per invocation, p50/p95/p99 percentiles
- **Error rate**: Failed executions, error types, frequency
- **Output drift**: Detects when output format or content changes unexpectedly
- **Resource usage**: Memory and CPU at execution time
- **Uptime**: Availability over time windows (1h, 24h, 7d)

## Alerting

- Console alerts when thresholds are exceeded
- JSON webhook support for external integrations
- Configurable per-skill thresholds

## Data Storage

Health data is stored in `~/.openclaw/health/` as JSON files. One file per skill, rotated daily.