Phy Otel Audit

ClawSkills 作者 PHY041 v1.0.0

OpenTelemetry instrumentation coverage auditor. Scans Node.js/Python/Go/Java source code to detect missing or misconfigured OTel instrumentation — HTTP handlers without spans, database calls outside trace context, missing resource attributes, span errors not recorded, baggage not propagated, SDK not initialized before first import, sampler misconfiguration, and more. Outputs a per-file coverage score and actionable fix snippets. Zero external dependencies.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install clawskills:phy041~phy-otel-audit
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aphy041~phy-otel-audit/file -o phy-otel-audit.md
Git 仓库获取源码
git clone https://github.com/openclaw/skills/commit/7417e3588d88e0517fd72968a7cee3ecbd8739ff
# phy-otel-audit — OpenTelemetry Instrumentation Auditor

Scans your source code for **10 classes of missing or misconfigured OpenTelemetry instrumentation** that cause invisible blind spots in your traces: unspanned HTTP handlers, DB calls outside trace context, swallowed span errors, missing service.name attributes, and more.

## Quick Start

```bash
# Scan a directory
python otel_audit.py ./src

# Single file
python otel_audit.py src/handlers/users.js

# CI mode — exit 1 on HIGH findings
python otel_audit.py ./src --ci

# Verbose: show which line triggered each finding
python otel_audit.py ./src --verbose

# Only HIGH findings
python otel_audit.py ./src --only-severity HIGH
```

## The 10 Checks

| ID | Severity | Check |
|----|----------|-------|
| OT001 | HIGH | No OTel SDK imported anywhere — zero instrumentation |
| OT002 | HIGH | HTTP handler without span creation |
| OT003 | HIGH | Database/cache call outside active span |
| OT004 | HIGH | Exception caught but span.recordException() not called |
| OT005 | MEDIUM | service.name not set in Resource attributes |
| OT006 | MEDIUM | OTel SDK initialized after first import (instrumentation gap) |
| OT007 | MEDIUM | Span created but status not set on error path |
| OT008 | MEDIUM | Async context propagation missing (Promise/goroutine context not passed) |
| OT009 | LOW | Trace exporter using console/stdout in non-dev environment |
| OT010 | LOW | Manual span naming uses dynamic values (high-cardinality span names) |

### OT001 — No OTel SDK Imported
Scans all files for any OpenTelemetry import. If none found, zero instrumentation exists.

**Detected imports:**
- JS/TS: `@opentelemetry/api`, `@opentelemetry/sdk-node`, `@opentelemetry/auto-instrumentations-node`
- Python: `opentelemetry`, `opentelemetry-sdk`, `from opentelemetry`
- Go: `go.opentelemetry.io/otel`
- Java: `io.opentelemetry`, `opentelemetry-java`

### OT002 — HTTP Handler Without Span
Finds route handler definitions (Express, FastAPI, Flask, gin, Spring) without a `tracer.startSpan` or `tracer.startActiveSpan` nearby. Auto-instrumentation covers framework-level spans, but business logic within handlers needs custom spans for meaningful traces.

### OT003 — Database/Cache Call Outside Span
Detects DB/cache operations (`db.query`, `prisma.`, `mongoose.`, `cursor.execute`, `db.Execute`, `redis.get`, `cache.get`) that appear in functions where no span context is active (no `tracer.startActiveSpan`, no `ctx` parameter carrying trace context, no `with tracer.start_as_current_span`).

### OT004 — Exception Not Recorded on Span
Finds `catch` blocks or `except` clauses that handle errors but don't call `span.recordException(err)` and `span.setStatus({ code: SpanStatusCode.ERROR })`. Unrecorded exceptions make traces appear successful when they failed — the most common OTel mistake.

### OT005 — Missing service.name Resource
Scans OTel SDK initialization code for `Resource.create` or `resource:` config without `service.name`. Without `service.name`, all traces appear as `unknown_service` in backends (Jaeger/Tempo/Honeycomb) — impossible to filter.

### OT006 — SDK Initialized After First Import
In Node.js, `require('@opentelemetry/sdk-node')` must happen before any other `require` statements. If SDK init file is imported after other modules, auto-instrumentation patches miss the already-loaded modules. Detects `tracing.js` or `instrumentation.js` imported after other modules in entry files.

### OT007 — Span Status Not Set on Error Path
Finds `span.end()` calls in error branches (catch blocks, error handlers) without a preceding `span.setStatus(SpanStatusCode.ERROR)` or `span.setStatus({ code: 2 })`. Span ends without status = treated as OK by the backend.

### OT008 — Context Not Propagated Through Async
Finds `Promise.all(`, `asyncio.gather(`, or goroutine `go func()` patterns where the OTel context is not explicitly passed. In Go, `context.Context` must be threaded through goroutines manually. In Python asyncio, OpenTelemetry context is propagated automatically via contextvars — but only if tasks are created from within an active span.

### OT009 — Console/Stdout Exporter in Production
Finds `ConsoleSpanExporter`, `SimpleSpanProcessor(new ConsoleSpanExporter())`, or `ConsoleMetricExporter` outside of dev/test configuration files. Console exporters flood logs and provide no tracing backend value in production.

### OT010 — High-Cardinality Span Names
Finds `tracer.startSpan(` with dynamic values in the span name (string interpolation with variables, request paths with IDs). High-cardinality span names (`GET /users/12345`) break trace aggregation — span names should be templates (`GET /users/{id}`).

## Sample Output

```
============================================================
  OTel Instrumentation Audit — src/
  Files scanned: 52  |  Files with issues: 9
============================================================

── HIGH (3) ────────────────────────────────────────────────
🟠 OT001 [HIGH] <project>
   No OpenTelemetry SDK imported anywhere. Zero instrumentation.
   Fix: npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node
        Create instrumentation.js and require it first in your entry point.

🟠 OT004 [HIGH] src/handlers/orders.js:78
   Exception caught in catch block but span.recordException() not called.
   Fix: catch (err) { span.recordException(err); span.setStatus({ code: SpanStatusCode.ERROR }); throw err; }

🟠 OT002 [HIGH] src/routes/payments.js:12
   HTTP handler (app.post /checkout) has no custom span. Business logic is a trace black box.
   Fix: tracer.startActiveSpan('checkout.process', async (span) => { ... span.end(); })

── MEDIUM (2) ──────────────────────────────────────────────
🟡 OT005 [MEDIUM] src/tracing.js:8
   OTel Resource initialized without service.name attribute.
   Fix: Resource.create({ [SemanticResourceAttributes.SERVICE_NAME]: 'payment-service' })

🟡 OT007 [MEDIUM] src/services/inventory.js:45
   span.end() called in error branch without span.setStatus(SpanStatusCode.ERROR).
   Fix: span.setStatus({ code: SpanStatusCode.ERROR, message: err.message }); span.end();

── LOW (1) ─────────────────────────────────────────────────
🔵 OT010 [LOW] src/handlers/users.js:23
   High-cardinality span name: tracer.startSpan(`GET /users/${userId}`)
   Fix: Use template: tracer.startSpan('GET /users/{id}') and set userId as span attribute.

────────────────────────────────────────────────────────────
  Total: 6 findings
  High: 3 | Medium: 2 | Low: 1

  ❌ CI GATE FAILED — resolve HIGH findings before merging.
```

## The Script

```python
#!/usr/bin/env python3
"""
phy-otel-audit — OpenTelemetry Instrumentation Coverage Auditor
Scans Node.js/Python/Go/Java for missing or misconfigured OTel instrumentation.
Zero external dependencies.
"""

import sys
import re
from dataclasses import dataclass, field
from pathlib import Path


# ─── Data Structures ─────────────────────────────────────────────────────────

@dataclass
class Finding:
    check_id: str
    severity: str      # HIGH / MEDIUM / LOW
    location: str
    message: str
    fix: str = ""

    def __str__(self) -> str:
        icon = {"HIGH": "🟠", "MEDIUM": "🟡", "LOW": "🔵"}.get(self.severity, "⚪")
        parts = [f"{icon} {self.check_id} [{self.severity}] {self.location}"]
        parts.append(f"   {self.message}")
        if self.fix:
            parts.append(f"   Fix: {self.fix}")
        return "\n".join(parts)


@dataclass
class AuditResult:
    scan_root: str
    files_scanned: int = 0
    files_flagged: int = 0
    findings: list = field(default_factory=list)

    @property
    def high_count(self) -> int:
        return sum(1 for f in self.findings if f.severity == "HIGH")

    @property
    def medium_count(self) -> int:
        return sum(1 for f in self.findings if f.severity == "MEDIUM")

    @property
    def low_count(self) -> int:
        return sum(1 for f in self.findings if f.severity == "LOW")


# ─── Constants