Keldron — GPU & Hardware Monitor Agent

SkillDB 作者 horne-ra v2.0.0

GPU monitoring with risk intelligence. Local + cloud fleet monitoring, health tracking, proactive alerts, and AI-powered fleet analytics.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install skilldb:horne-ra~keldron-agent
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/skilldb%3Ahorne-ra~keldron-agent/file -o keldron-agent.md
Git 仓库获取源码
git clone https://github.com/openclaw/skills/commit/86d204a451a2eefb1e5697fff21c88ef6533f70d
# Keldron Agent — GPU Monitoring with Risk Intelligence

## 1. Overview

Keldron Agent is a vendor-neutral GPU monitoring agent that runs locally and exposes real-time telemetry and risk scores via a Prometheus endpoint. It supports Apple Silicon (M1–M5), NVIDIA consumer GPUs (RTX 3090/4090/5090), NVIDIA datacenter (H100/B200), AMD GPUs, and any Linux machine.

**Dual mode:** Use the **local agent** on `localhost:9100` for fast, real-time, single-device queries (works offline). Use **Keldron Cloud** (`https://api.keldron.ai`) for fleet overview, historical telemetry, analytics, and proactive fleet monitoring when an API key is configured.

**No sudo required on any platform** for the agent binary. On Linux, Docker may require `sudo` or membership in the `docker` group — see [Docker post-install](https://docs.docker.com/engine/install/linux-postinstall/) or rootless Docker if you hit permission errors.

Use this skill when the user wants to:

- Monitor GPU temperature, power, utilization, or memory
- Get risk assessments for their GPU
- Track power costs
- Set up alerts or watch a fleet (via cloud polling)
- Open the real dashboard (local or cloud)
- Ask fleet, history, or analytics questions

### Mode detection (run at the start of an interaction)

Field names **differ by endpoint** — see [Cloud API field names](#cloud-api-field-names-jq-reference) before writing `jq` filters.

```bash
# Check 1: Is the local agent running?
LOCAL_AGENT=$(curl -sf localhost:9100/healthz 2>/dev/null | jq -r '.status' 2>/dev/null)

# Check 2: Is cloud configured? (env → ~/.keldron/credentials from login → YAML)
CLOUD_KEY="${KELDRON_CLOUD_API_KEY:-}"
CLOUD_ENDPOINT=""
if [ -z "$CLOUD_KEY" ] && [ -f ~/.keldron/credentials ] && command -v jq &>/dev/null; then
  CLOUD_KEY=$(jq -r '.api_key // ""' ~/.keldron/credentials 2>/dev/null)
  CLOUD_ENDPOINT=$(jq -r '.endpoint // ""' ~/.keldron/credentials 2>/dev/null)
fi
if [ -z "$CLOUD_KEY" ]; then
  if command -v yq &>/dev/null; then
    CLOUD_KEY=$(yq '.cloud.api_key // ""' ~/.config/keldron/keldron-agent.yaml 2>/dev/null)
    CLOUD_ENDPOINT=$(yq '.cloud.endpoint // ""' ~/.config/keldron/keldron-agent.yaml 2>/dev/null)
  else
    CLOUD_KEY=$(grep -A3 'cloud:' ~/.config/keldron/keldron-agent.yaml 2>/dev/null \
      | grep 'api_key:' | awk '{print $2}' | tr -d "\"'" | xargs 2>/dev/null)
    CLOUD_ENDPOINT=$(grep -A3 'cloud:' ~/.config/keldron/keldron-agent.yaml 2>/dev/null \
      | grep 'endpoint:' | awk '{print $2}' | tr -d "\"'" | xargs 2>/dev/null)
  fi
fi
CLOUD_ENDPOINT="${CLOUD_ENDPOINT:-https://api.keldron.ai}"
if [ -z "$CLOUD_KEY" ]; then
  echo "No cloud API key found. Run keldron-agent login, set KELDRON_CLOUD_API_KEY (non-interactive login or streaming), or add cloud.api_key to ~/.config/keldron/keldron-agent.yaml. Sign up at https://app.keldron.ai"
fi

# Check 3: Does cloud respond? (only if we have a key)
CLOUD_OK=""
if [ -n "$CLOUD_KEY" ]; then
  CLOUD_OK=$(curl -sf "${CLOUD_ENDPOINT}/health" 2>/dev/null | jq -r '.status' 2>/dev/null)
fi
```

**Mode priority:**

- **Both** (healthy local + cloud) → Cloud for fleet, history, analytics, proactive fleet loops; local for instantaneous single-device metrics.
- **Cloud only** → Use cloud for everything that needs the API; mention local agent if they want lower-latency realtime on that machine.
- **Local only** → Use `localhost:9100` for realtime; naturally mention cloud for history, fleet, and alerts: *"That needs historical data — connect at app.keldron.ai."*
- **Neither** → Run the [Auto-setup flow](#3-auto-setup-flow).

---

## 2. Installation

Prefer a [GitHub release](https://github.com/keldron-ai/keldron-agent/releases) binary (`keldron-agent`). To build from source, clone the repo and run `make build` (requires Go and Node.js for the full dashboard).

### Mac (Apple Silicon)

```bash
curl -sfL https://github.com/keldron-ai/keldron-agent/releases/latest/download/keldron-agent-darwin-arm64 -o keldron-agent
chmod +x keldron-agent
```

### Linux (AMD64)

```bash
curl -sfL https://github.com/keldron-ai/keldron-agent/releases/latest/download/keldron-agent-linux-amd64 -o keldron-agent
chmod +x keldron-agent
```

### Linux (ARM64)

```bash
curl -sfL https://github.com/keldron-ai/keldron-agent/releases/latest/download/keldron-agent-linux-arm64 -o keldron-agent
chmod +x keldron-agent
```

### Linux (Docker)

```bash
docker rm -f keldron-agent 2>/dev/null || true
docker run -d --name keldron-agent --restart unless-stopped \
  -p 9100:9100 -p 9200:9200 -p 8081:8081 \
  -e KELDRON_OUTPUT_PROMETHEUS_HOST=0.0.0.0 \
  -e KELDRON_API_HOST=0.0.0.0 \
  -e KELDRON_HEALTH_BIND=0.0.0.0:8081 \
  ghcr.io/keldron-ai/keldron-agent:latest
```

### Verify installation

```bash
./keldron-agent --version
```

---

## 3. Auto-setup flow

**Trigger phrases:** "monitor my hardware", "set up monitoring", "install keldron", "get started", "help me set up"

### Step 1: Detect environment

```bash
OS=$(uname -s)
if [ "$OS" = "Darwin" ]; then
  CHIP=$(sysctl -n machdep.cpu.brand_string 2>/dev/null || echo "unknown")
  echo "Detected: macOS with $CHIP"
elif command -v nvidia-smi &>/dev/null; then
  GPU=$(nvidia-smi --query-gpu=name --format=csv,noheader 2>/dev/null | head -1)
  echo "Detected: Linux with NVIDIA $GPU"
else
  echo "Detected: Linux (generic thermal monitoring)"
fi
```

### Step 2: Check if agent is running

```bash
if curl -sf localhost:9100/healthz | jq -e '.status == "healthy"' &>/dev/null; then
  echo "Keldron agent is already running. Skipping to Step 4."
  exit 0
fi
```

### Step 3: Install agent

Download the release binary for the OS/arch into the current directory, then start in local mode. Example:

```bash
ARCH=$(uname -m)

if [ "$OS" = "Darwin" ]; then
  BINARY="keldron-agent-darwin-arm64"
elif [ "$ARCH" = "x86_64" ]; then
  BINARY="keldron-agent-linux-amd64"
else
  BINARY="keldron-agent-linux-arm64"
fi

if [ "$OS" = "Darwin" ]; then
  curl -sfL "https://github.com/keldron-ai/keldron-agent/releases/latest/download/${BINARY}" -o keldron-agent
  chmod +x keldron-agent
  ./keldron-agent --local &
  sleep 3
fi

if [ "$OS" = "Linux" ]; then
  if command -v docker &>/dev/null; then
    docker rm -f keldron-agent 2>/dev/null || true
    if ! docker run -d --name keldron-agent --restart unless-stopped \
      -p 9100:9100 -p 9200:9200 -p 8081:8081 \
      -e KELDRON_OUTPUT_PROMETHEUS_HOST=0.0.0.0 \
      -e KELDRON_API_HOST=0.0.0.0 \
      -e KELDRON_HEALTH_BIND=0.0.0.0:8081 \
      ghcr.io/keldron-ai/keldron-agent:latest; then
      echo "Error: Failed to start keldron-agent container. Check Docker permissions and network."
      exit 1
    fi
  else
    curl -sfL "https://github.com/keldron-ai/keldron-agent/releases/latest/download/${BINARY}" -o keldron-agent
    chmod +x keldron-agent
    ./keldron-agent --local &
  fi
  sleep 3
fi

curl -sf localhost:9100/healthz | jq -e '.status == "healthy"'
```

Report initial readings (temperature, utilization, risk score) from [Quick status queries](#8-quick-status-queries-local-real-time) once healthy.

### Step 4: Offer cloud connection

Guide the user conversationally:

1. **Account:** Ask: *Do you have a Keldron Cloud account? You can sign up free at https://app.keldron.ai*
2. **If yes:** *Run `keldron-agent login` — you can use email/password or paste your API key (option 2 in the menu).*
3. **If no:** *Sign up at https://app.keldron.ai (GitHub login available), then run `keldron-agent login`.*
4. **Verify:** *Run `keldron-agent whoami` to confirm you're connected.*
5. **Restart** the agent so it picks up credentials and begins streaming.

If they are not yet interested, summarize value:

```bash
if [ -n "$CLOUD_KEY" ]; then
  echo "Cloud already connected."
else
  echo "Want to connect to Keldron Cloud?"
  echo "  • 180-day telemetry history"
  echo "  • Fleet analytics and device comparison"
  echo "  • Device health tracking"
  echo "  • Proactive fleet alerts"
  echo "  • Dashboard at app.keldron.ai"
fi
```

### Step 5 optional: env o