log-dive

TotalClaw 作者 Anvil AI v0.1.1
跨 Loki、Elasticsearch 和 CloudWatch 的统一日志搜索。自然语言查询转换为 LogQL、ES DSL 或 CloudWatch 过滤器模式。只读。切勿修改或删除日志。
安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~tkuehnl-log-dive
cURL直接下载，无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~tkuehnl-log-dive/file -o tkuehnl-log-dive.md
## 概述（中文）

跨 Loki、Elasticsearch 和 CloudWatch 的统一日志搜索。
自然语言查询转换为 LogQL、ES DSL 或 CloudWatch 过滤器模式。
只读。切勿修改或删除日志。

## 原文

# Log Dive — Unified Log Search 🤿

Search logs across **Loki**, **Elasticsearch/OpenSearch**, and **AWS CloudWatch** from a single interface. Ask in plain English; the skill translates to the right query language.

> **⚠️ Sensitive Data Warning:** Logs frequently contain PII, secrets, tokens, passwords, and other sensitive data. Never cache, store, or repeat raw log content beyond the current conversation. Treat all log output as confidential.

## Activation

This skill activates when the user mentions:
- "search logs", "find in logs", "log search", "check the logs"
- "Loki", "LogQL", "logcli"
- "Elasticsearch logs", "Kibana", "OpenSearch"
- "CloudWatch logs", "AWS logs", "log groups"
- "error logs", "find errors", "what happened in [service]"
- "tail logs", "follow logs", "live logs"
- "log backends", "which log sources", "log indices", "log labels"
- Incident triage involving log analysis
- "log-dive" explicitly

## Permissions

```yaml
permissions:
  exec: true          # Required to run backend scripts
  read: true          # Read script files
  write: false        # Never writes files — logs may contain secrets
  network: true       # Queries remote log backends
```

## Example Prompts

1. "Find error logs from the checkout service in the last 30 minutes"
2. "Search for timeout exceptions across all services"
3. "What log backends do I have configured?"
4. "List available log indices in Elasticsearch"
5. "Show me the labels available in Loki"
6. "Tail the payment-service logs"
7. "Find all 5xx errors in CloudWatch for api-gateway"
8. "Correlate errors between user-service and payment-service"
9. "What happened in production between 2pm and 3pm today?"

## Backend Configuration

Each backend uses environment variables. Users may have one, two, or all three configured.

### Loki
| Variable | Required | Description |
|----------|----------|-------------|
| `LOKI_ADDR` | Yes | Loki server URL (e.g., `http://loki.internal:3100`) |
| `LOKI_TOKEN` | No | Bearer token for authentication |
| `LOKI_TENANT_ID` | No | Multi-tenant header (`X-Scope-OrgID`) |

### Elasticsearch / OpenSearch
| Variable | Required | Description |
|----------|----------|-------------|
| `ELASTICSEARCH_URL` | Yes | Base URL (e.g., `https://es.internal:9200`) |
| `ELASTICSEARCH_TOKEN` | No | `Basic <base64>` or `Bearer <token>` for auth |

### AWS CloudWatch Logs
| Variable | Required | Description |
|----------|----------|-------------|
| `AWS_PROFILE` or `AWS_ACCESS_KEY_ID` | Yes | Standard AWS credentials |
| `AWS_REGION` | Yes | AWS region for CloudWatch |

## Agent Workflow

Follow this sequence:

### Step 1: Check Backends

Run the backends check to see what's configured:

```bash
bash <skill_dir>/scripts/log-dive.sh backends
```

Parse the JSON output. If no backends are configured, tell the user which environment variables to set.

### Step 2: Translate the User's Query

This is the critical step. Convert the user's natural language request into the appropriate backend-specific query. Use the query language reference below.

**For ALL backends, pass the query through the dispatcher:**

```bash
# Search across all configured backends
bash <skill_dir>/scripts/log-dive.sh search --query '<QUERY>' [OPTIONS]

# Search a specific backend
bash <skill_dir>/scripts/log-dive.sh search --backend loki --query '{app="checkout"} |= "error"' --since 30m --limit 200

bash <skill_dir>/scripts/log-dive.sh search --backend elasticsearch --query '{"query":{"bool":{"must":[{"match":{"message":"error"}},{"match":{"service":"checkout"}}]}}}' --index 'app-logs-*' --since 30m --limit 200

bash <skill_dir>/scripts/log-dive.sh search --backend cloudwatch --query '"ERROR" "checkout"' --log-group '/ecs/checkout-service' --since 30m --limit 200
```

### Step 3: List Available Targets

Before searching, you may need to discover what's available:

```bash
# Loki: list labels and label values
bash <skill_dir>/scripts/log-dive.sh labels --backend loki
bash <skill_dir>/scripts/log-dive.sh labels --backend loki --label app

# Elasticsearch: list indices
bash <skill_dir>/scripts/log-dive.sh indices --backend elasticsearch

# CloudWatch: list log groups
bash <skill_dir>/scripts/log-dive.sh indices --backend cloudwatch
```

### Step 4: Tail Logs (Live Follow)

```bash
bash <skill_dir>/scripts/log-dive.sh tail --backend loki --query '{app="checkout"}'
bash <skill_dir>/scripts/log-dive.sh tail --backend cloudwatch --log-group '/ecs/checkout-service'
```

Tail runs for a limited time (default 30s) and streams results.

### Step 5: Analyze Results

After receiving log output, you MUST:

1. **Identify unique error types** — group similar errors, count occurrences
2. **Find the root cause** — look for the earliest error, trace dependency chains
3. **Correlate across services** — if errors in service A mention service B, note the dependency
4. **Build a timeline** — order events chronologically
5. **Summarize actionably** — "The checkout service started returning 500s at 14:23 because the database connection pool was exhausted (max 10 connections, 10 in use). The pool exhaustion was triggered by a slow query in the inventory service."

**NEVER dump raw log output to the user.** Always summarize, extract patterns, and present structured findings.

### Discord v2 Delivery Mode (OpenClaw v2026.2.14+)

When the conversation is happening in a Discord channel:

- Send a compact incident summary first (backend, query intent, top error types, root-cause hypothesis), then ask if the user wants full detail.
- Keep the first response under ~1200 characters and avoid dumping raw log lines in the first message.
- If Discord components are available, include quick actions:
  - `Show Error Timeline`
  - `Show Top Error Patterns`
  - `Run Related Service Query`
- If components are not available, provide the same follow-ups as a numbered list.
- Prefer short follow-up chunks (<=15 lines per message) when sharing timelines or grouped findings.

## Query Language Reference

### LogQL (Loki)

LogQL has two parts: a stream selector and a filter pipeline.

**Stream selectors:**
```
{app="myapp"}                          # exact match
{namespace="prod", app=~"api-.*"}      # regex match
{app!="debug"}                         # negative match
```

**Filter pipeline (chained after selector):**
```
{app="myapp"} |= "error"              # line contains "error"
{app="myapp"} != "healthcheck"         # line does NOT contain
{app="myapp"} |~ "error|warn"          # regex match on line
{app="myapp"} !~ "DEBUG|TRACE"         # negative regex
```

**Structured metadata (parsed logs):**
```
{app="myapp"} | json                   # parse JSON logs
{app="myapp"} | json | status >= 500   # filter by parsed field
{app="myapp"} | logfmt                 # parse logfmt
{app="myapp"} | regexp `(?P<ip>\d+\.\d+\.\d+\.\d+)` # regex extract
```

**Common patterns:**
- Errors in service: `{app="checkout"} |= "error" | json | level="error"`
- HTTP 5xx: `{app="api"} | json | status >= 500`
- Slow requests: `{app="api"} | json | duration > 5s`
- Stack traces: `{app="myapp"} |= "Exception" |= "at "`

### Elasticsearch Query DSL

**Simple match:**
```json
{"query": {"match": {"message": "error"}}}
```

**Boolean query (AND/OR):**
```json
{
  "query": {
    "bool": {
      "must": [
        {"match": {"message": "error"}},
        {"match": {"service.name": "checkout"}}
      ],
      "must_not": [
        {"match": {"message": "healthcheck"}}
      ]
    }
  },
  "sort": [{"@timestamp": "desc"}],
  "size": 200
}
```

**Time range filter:**
```json
{
  "query": {
    "bool": {
      "must": [{"match": {"message": "timeout"}}],
      "filter": [
        {"range": {"@timestamp": {"gte": "now-30m", "lte": "now"}}}
      ]
    }
  }
}
```

**Wildcard / regex:**
```json
{"query": {"regexp": {"message": "error.*timeout"}}}
```

**Common patterns:**
- Error