Phy Deserialization Audit

SkillDB 作者 PHY041 v1.0.0

Unsafe deserialization vulnerability scanner (OWASP A08:2021). Detects Python pickle/yaml/eval, Java ObjectInputStream/XStream/XMLDecoder, PHP unserialize, Ruby Marshal.load, Node.js eval/new Function/vm, Go gob with interface{}. Traces HTTP input sources to dangerous sinks, classifies CRITICAL/HIGH/MEDIUM, outputs CWE/CVE mappings and per-language fix snippets. Zero competitors on ClawHub.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install skilldb:phy041~phy-deserialization-audit
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/skilldb%3Aphy041~phy-deserialization-audit/file -o phy-deserialization-audit.md
Git 仓库获取源码
git clone https://github.com/openclaw/skills/commit/010ef62435d955ef630ba1baffc77603445fcce2
# phy-deserialization-audit

Static scanner for **OWASP A08:2021 — Insecure Deserialization** vulnerabilities across Python, Java, PHP, Ruby, Node.js/TypeScript, and Go codebases. No API keys, no network calls, no dependencies beyond Python 3 stdlib.

## What It Detects

### Python
| Pattern | Severity | CVE/CWE |
|---------|----------|---------|
| `pickle.loads(user_data)` | CRITICAL | CWE-502 |
| `pickle.load(untrusted_file)` | CRITICAL | CWE-502 |
| `yaml.load(data)` without SafeLoader | HIGH | CVE-2017-18342 |
| `yaml.full_load()` / `yaml.unsafe_load()` | CRITICAL | CVE-2017-18342 |
| `jsonpickle.decode(input)` | CRITICAL | CWE-502 |
| `marshal.loads(data)` | HIGH | CWE-502 |
| `eval(user_input)` / `exec(user_input)` | CRITICAL | CWE-95 |
| `shelve.open(user_controlled_path)` | HIGH | CWE-502 |

### Java
| Pattern | Severity | CVE/CWE |
|---------|----------|---------|
| `new ObjectInputStream(...).readObject()` | CRITICAL | CWE-502, gadget chains |
| `XStream.fromXML(userInput)` | CRITICAL | CVE-2021-29505 |
| `new XMLDecoder(inputStream)` | CRITICAL | CWE-502 |
| `ObjectMapper.readValue(input, Object.class)` | HIGH | CVE-2017-7525 (Jackson polymorphic) |
| `Serializable` class with `readObject()` override | HIGH | CWE-502 |
| `new ObjectMapper().enableDefaultTyping()` | HIGH | CVE-2017-7525 |

### PHP
| Pattern | Severity | CVE/CWE |
|---------|----------|---------|
| `unserialize($userInput)` | CRITICAL | CWE-502, POP chains |
| `unserialize($_GET[...])` / `unserialize($_POST[...])` | CRITICAL | CWE-502 |
| `unserialize($_COOKIE[...])` | CRITICAL | CWE-502 |
| `unserialize(base64_decode(...))` | HIGH | CWE-502 |

### Ruby
| Pattern | Severity | CVE/CWE |
|---------|----------|---------|
| `Marshal.load(user_input)` | CRITICAL | CWE-502 |
| `YAML.load(user_input)` (Psych < 4.0 default) | HIGH | CVE-2013-4164 |
| `JSON.load(input)` (bypasses safe defaults) | MEDIUM | CWE-502 |

### Node.js / TypeScript
| Pattern | Severity | CVE/CWE |
|---------|----------|---------|
| `eval(req.body.*)` or `eval(req.params.*)` | CRITICAL | CWE-95 |
| `new Function(userInput)` | CRITICAL | CWE-95 |
| `vm.runInContext(userInput, ...)` | HIGH | CWE-94 |
| `vm.Script(userInput).runIn*` | HIGH | CWE-94 |
| `require(userControlledPath)` | HIGH | CWE-706 |
| `child_process.exec(unsanitizedInput)` | CRITICAL | CWE-78 (adjacent) |

### Go
| Pattern | Severity | CVE/CWE |
|---------|----------|---------|
| `gob.NewDecoder(conn).Decode(&interface{})` | HIGH | CWE-502 |
| `encoding/xml.Unmarshal` with `interface{}` target | MEDIUM | CWE-502 |
| `json.Unmarshal` into `interface{}` then unsafe cast | MEDIUM | CWE-20 |

## Taint Flow Logic

The scanner uses a two-pass approach:

**Pass 1 — Dangerous sink detection:** Find all pattern matches per file.

**Pass 2 — HTTP input proximity check:** Within the same function block (±40 lines), look for HTTP input markers:
- Python: `request.body`, `request.data`, `request.POST`, `request.GET`, `flask.request`, `request.json`
- Java: `HttpServletRequest`, `@RequestBody`, `@RequestParam`, `getParameter(`, `getInputStream(`
- PHP: `$_GET`, `$_POST`, `$_REQUEST`, `$_COOKIE`, `$_FILES`, `file_get_contents("php://input")`
- Ruby: `params[`, `request.body`, `JSON.parse(request.body)`
- Node.js: `req.body`, `req.params`, `req.query`, `req.headers`, `request.body`
- Go: `r.Body`, `r.URL.Query()`, `r.FormValue(`

If HTTP input marker found near sink → **CRITICAL** or **HIGH**
If no HTTP input marker visible → downgrade one level (informational) with note: *"Verify data source"*

**Safe patterns (excluded):**
- `yaml.safe_load(...)` — OK
- `yaml.load(data, Loader=yaml.SafeLoader)` — OK
- `pickle.loads(STATIC_BYTES)` where argument is a literal — OK
- `eval("1 + 2")` with string literal — OK

## Implementation

```python
#!/usr/bin/env python3
"""
phy-deserialization-audit — OWASP A08:2021 scanner
Usage: python3 audit_deserial.py [path] [--json] [--ci]
"""
import argparse
import json
import os
import re
import sys
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional

# ─── Severity ────────────────────────────────────────────────────────────────
CRITICAL, HIGH, MEDIUM, INFO = "CRITICAL", "HIGH", "MEDIUM", "INFO"

@dataclass
class Finding:
    file: str
    line: int
    pattern_name: str
    matched_text: str
    severity: str
    cwe: str
    cve: Optional[str]
    description: str
    fix: str
    has_http_taint: bool = False

# ─── Pattern registry ────────────────────────────────────────────────────────
# (pattern_name, regex, base_severity, cwe, cve, description, fix)
PATTERNS = {
    ".py": [
        ("PICKLE_LOADS",
         re.compile(r'\bpickle\.loads?\s*\('),
         CRITICAL, "CWE-502", None,
         "pickle.load/loads deserializes arbitrary Python objects — remote code execution if input is user-controlled.",
         "Never deserialize user input with pickle. Use json.loads() + schema validation (Pydantic/marshmallow)."),

        ("YAML_UNSAFE_LOAD",
         re.compile(r'\byaml\.(?:load|full_load|unsafe_load)\s*\((?![^)]*SafeLoader)'),
         HIGH, "CWE-502", "CVE-2017-18342",
         "yaml.load() without Loader=yaml.SafeLoader executes arbitrary Python code embedded in YAML.",
         "Replace with yaml.safe_load(data) or yaml.load(data, Loader=yaml.SafeLoader)."),

        ("JSONPICKLE_DECODE",
         re.compile(r'\bjsonpickle\.decode\s*\('),
         CRITICAL, "CWE-502", None,
         "jsonpickle.decode() restores full Python object graphs — arbitrary code execution.",
         "Do not use jsonpickle for untrusted input. Use json.loads() + strict schema."),

        ("MARSHAL_LOADS",
         re.compile(r'\bmarshal\.loads?\s*\('),
         HIGH, "CWE-502", None,
         "marshal is not intended for untrusted data and can execute code.",
         "Replace with json.loads() and validate schema."),

        ("EVAL_EXEC",
         re.compile(r'\b(?:eval|exec)\s*\('),
         CRITICAL, "CWE-95", None,
         "eval()/exec() with user-controlled input leads to arbitrary code execution.",
         "Remove eval/exec. Use ast.literal_eval() for safe literal evaluation, or a proper parser."),

        ("SHELVE_OPEN",
         re.compile(r'\bshelve\.open\s*\('),
         HIGH, "CWE-502", None,
         "shelve uses pickle internally — path traversal + deserialization risk.",
         "Ensure path is never user-controlled; prefer a proper database or JSON store."),
    ],
    ".java": [
        ("OBJECT_INPUT_STREAM",
         re.compile(r'\bnew\s+ObjectInputStream\s*\('),
         CRITICAL, "CWE-502", None,
         "Java deserialization via ObjectInputStream enables gadget-chain attacks (Apache Commons, Spring).",
         "Use a type-validating ObjectInputStream wrapper (e.g., Apache Commons ValidatingObjectInputStream) or replace with JSON/Protobuf."),

        ("XSTREAM_FROM_XML",
         re.compile(r'\.fromXML\s*\('),
         CRITICAL, "CWE-502", "CVE-2021-29505",
         "XStream.fromXML() can execute arbitrary code via crafted XML.",
         "Upgrade XStream ≥1.4.18 and enable allowlist: xstream.addPermission(NoTypePermission.NONE)."),

        ("XML_DECODER",
         re.compile(r'\bnew\s+XMLDecoder\s*\('),
         CRITICAL, "CWE-502", None,
         "XMLDecoder can instantiate arbitrary Java objects from XML — remote code execution.",
         "Never use XMLDecoder with untrusted input. Use a JSON parser or JAXB with an allowlist."),

        ("JACKSON_OBJECT_CLASS",
         re.compile(r'\.readValue\s*\([^,]+,\s*Object\.class\s*\)'),
         HIGH, "CWE-502", "CVE-2017-7525",
         "Jackson readValue to Object.class enables polymorphic deserialization attacks.",
         "Always specify a concrete class: mapper.readValue(json, MyDto.class)."),

        ("JACKSON_DEFAULT_TYPING",
         re.compile(r'\.enableDefaultTyping\s*\('),
         HIGH, "CWE-502", "CVE-2017-7525",
         "enableDefaultTyping() allows arbitrary class instantiation via @class field.",