Phy Db Index Advisor

TotalClaw 作者 PHY041 v1.0.0

数据库索引顾问,静态分析 ORM 查询模式,在丢失的索引成为生产瓶颈之前预测它们。扫描 SQLAlchemy、Django ORM、TypeORM、Prisma、GORM、ActiveRecord 和 Sequelize 代码以查找 WHERE/filter、ORDER BY 和 JOIN 条件中使用的列。交叉引用现有模型索引定义和迁移文件以抑制已索引的列。按查询频率对建议进行排名,并输出准备运行的 CREATE INDEX SQL + 每个 ORM 迁移片段。 ClawHub 上零竞争对手 — 13,700 多个文件中没有一个 db-index-advisor SKILL.md。

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:phy041~phy-db-index-advisor
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Aphy041~phy-db-index-advisor/file -o phy-db-index-advisor.md
Git 仓库获取源码
git clone https://github.com/openclaw/skills/commit/94fd7e7799362b664f285aa4c06dace426832846
## 概述(中文)

数据库索引顾问,静态分析 ORM 查询模式,在丢失的索引成为生产瓶颈之前预测它们。扫描 SQLAlchemy、Django ORM、TypeORM、Prisma、GORM、ActiveRecord 和 Sequelize 代码以查找 WHERE/filter、ORDER BY 和 JOIN 条件中使用的列。交叉引用现有模型索引定义和迁移文件以抑制已索引的列。按查询频率对建议进行排名,并输出准备运行的 CREATE INDEX SQL + 每个 ORM 迁移片段。 ClawHub 上零竞争对手 — 13,700 多个文件中没有一个 db-index-advisor SKILL.md。

## 原文

# phy-db-index-advisor

Static analysis tool that reads your **ORM query patterns** and predicts which database columns are missing indexes — before a slow query alert fires in production. Works by counting how often each column appears in `.filter()`, `.where()`, `.order_by()`, and JOIN conditions across your entire codebase, then cross-referencing model definitions to suppress columns already indexed.

## Why This Exists

- 80% of production slow queries stem from missing indexes on columns used in WHERE clauses
- `User.objects.filter(email=email)` running 1,000× per minute causes full table scans
- Existing linters don't know your query patterns; `EXPLAIN ANALYZE` only catches issues after the fact
- This skill finds them **before deployment**

## What It Detects

### Query Patterns Scanned
| Access Pattern | Why It Matters |
|---------------|----------------|
| **WHERE / filter()** | Full table scan without index — O(n) per query |
| **ORDER BY / order_by()** | Sort without index reads all rows then sorts in memory |
| **JOIN ON column** | Nested-loop join without index is O(n²) |
| **UNIQUE constraint candidates** | Columns with `unique=True` queries need unique indexes |

### Supported ORMs
| ORM | Language | Patterns Detected |
|-----|----------|-------------------|
| **Django ORM** | Python | `.filter(col=)`, `.get(col=)`, `.exclude(col=)`, `.order_by('col')`, `Meta.ordering` |
| **SQLAlchemy** | Python | `.filter(Model.col ==)`, `.filter_by(col=)`, `.order_by(col)`, `join(Model, on=)` |
| **Peewee** | Python | `.where(Model.col ==)`, `.order_by(Model.col)` |
| **TypeORM** | TypeScript | `.where("t.col = :val")`, `findBy({col:})`, `.orderBy("t.col")`, `@JoinColumn({name: 'col'})` |
| **Prisma** | TypeScript | `where: { col: }`, `orderBy: { col: }`, `include: { relation: }` |
| **Sequelize** | TypeScript/JS | `where: { col: }`, `order: [['col', 'ASC']]` |
| **GORM** | Go | `.Where("col = ?")`, `.Order("col")`, `.Joins("JOIN ... ON col")` |
| **ActiveRecord** | Ruby | `.where(col:)`, `.find_by(col:)`, `.order(:col)`, `.joins()` |

### Existing Index Detection (Suppression)
The scanner reads existing index definitions so it doesn't recommend indexes that already exist:

| ORM | Where Indexes Are Found |
|-----|------------------------|
| Django | `db_index=True` on field, `Meta.indexes`, `Meta.unique_together` |
| SQLAlchemy | `Column(index=True)`, `Column(unique=True)`, `Index(...)` objects |
| TypeORM | `@Index()` decorator, `@Column({index: true})`, `@Unique()` |
| Prisma | `@@index([col])`, `@@unique([col])`, `@unique` on field |
| GORM | `gorm:"index"`, `gorm:"uniqueIndex"` struct tags |
| ActiveRecord | `add_index` in migrations, `index: true` in column definition |
| SQL migrations | `CREATE INDEX`, `CREATE UNIQUE INDEX` statements |

## Implementation

```python
#!/usr/bin/env python3
"""
phy-db-index-advisor — ORM query pattern analyzer for missing indexes
Usage: python3 advisor.py [path] [--json] [--min-count N]
"""
import argparse
import json
import os
import re
import sys
from collections import defaultdict
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional

# ─── Data structures ─────────────────────────────────────────────────────────

@dataclass
class QueryHit:
    file: str
    line: int
    pattern: str
    orm: str
    access_type: str  # WHERE, ORDER_BY, JOIN

@dataclass
class ColumnReport:
    table_hint: str  # Guessed model/table name
    column: str
    where_count: int = 0
    order_count: int = 0
    join_count: int = 0
    files: set = field(default_factory=set)
    hits: list = field(default_factory=list)
    already_indexed: bool = False

    @property
    def total_count(self) -> int:
        return self.where_count + self.order_count + self.join_count

    @property
    def priority(self) -> str:
        if self.already_indexed:
            return "INDEXED"
        if self.where_count >= 10 or self.total_count >= 15:
            return "CRITICAL"
        if self.where_count >= 5 or self.total_count >= 8:
            return "HIGH"
        if self.total_count >= 3:
            return "MEDIUM"
        return "LOW"

# ─── Query pattern registry ───────────────────────────────────────────────────

# (orm_name, access_type, regex, model_group_idx, col_group_idx)
QUERY_PATTERNS = [
    # ── Django ORM ──
    ("Django", "WHERE",
     re.compile(r'\.(?:filter|get|exclude|count|exists)\s*\([^)]*?(\w+)__?\w*\s*='),
     None, 1),
    ("Django", "WHERE",
     re.compile(r'\.(?:filter|get|exclude)\s*\(\s*(\w+)\s*='),
     None, 1),
    ("Django", "ORDER_BY",
     re.compile(r'\.order_by\s*\(\s*[\'"-](\w+)[\'"]\s*\)'),
     None, 1),
    ("Django", "ORDER_BY",
     re.compile(r'ordering\s*=\s*\[[^\]]*?[\'"](\w+)[\'"]'),
     None, 1),

    # ── SQLAlchemy ──
    ("SQLAlchemy", "WHERE",
     re.compile(r'\.filter\s*\(\s*(\w+)\.(\w+)\s*=='),
     1, 2),
    ("SQLAlchemy", "WHERE",
     re.compile(r'\.filter_by\s*\([^)]*?(\w+)\s*='),
     None, 1),
    ("SQLAlchemy", "ORDER_BY",
     re.compile(r'\.order_by\s*\(\s*(\w+)\.(\w+)'),
     1, 2),
    ("SQLAlchemy", "ORDER_BY",
     re.compile(r'\.order_by\s*\(\s*(?:asc|desc)\s*\(\s*(\w+)\.(\w+)'),
     1, 2),

    # ── TypeORM ──
    ("TypeORM", "WHERE",
     re.compile(r'where\s*:\s*\{[^}]*?(\w+)\s*:'),
     None, 1),
    ("TypeORM", "WHERE",
     re.compile(r'\.where\s*\(\s*[\'"`](?:\w+\.)?(\w+)\s*(?:=|LIKE|IN|>|<)'),
     None, 1),
    ("TypeORM", "ORDER_BY",
     re.compile(r'\.orderBy\s*\(\s*[\'"`](?:\w+\.)?(\w+)[\'"`]'),
     None, 1),
    ("TypeORM", "ORDER_BY",
     re.compile(r'orderBy\s*:\s*\{[^}]*?(\w+)\s*:'),
     None, 1),

    # ── Prisma ──
    ("Prisma", "WHERE",
     re.compile(r'where\s*:\s*\{[^}]*?(\w+)\s*:'),
     None, 1),
    ("Prisma", "ORDER_BY",
     re.compile(r'orderBy\s*:\s*\{[^}]*?(\w+)\s*:'),
     None, 1),

    # ── Sequelize ──
    ("Sequelize", "WHERE",
     re.compile(r'where\s*:\s*\{[^}]*?(\w+)\s*:'),
     None, 1),
    ("Sequelize", "ORDER_BY",
     re.compile(r'order\s*:\s*\[\s*\[\s*[\'"`](\w+)[\'"`]'),
     None, 1),

    # ── GORM ──
    ("GORM", "WHERE",
     re.compile(r'\.(?:Where|Find|First|Last)\s*\([^,)]*?[\'"`](?:\w+\.)?(\w+)\s*(?:=|LIKE|IN|>|<|\?)'),
     None, 1),
    ("GORM", "ORDER_BY",
     re.compile(r'\.Order\s*\(\s*[\'"`](\w+)'),
     None, 1),
    ("GORM", "JOIN",
     re.compile(r'\.Joins\s*\([^)]*?ON\s+\w+\.(\w+)\s*=\s*\w+\.(\w+)'),
     None, 1),

    # ── ActiveRecord (Ruby) ──
    ("ActiveRecord", "WHERE",
     re.compile(r'\.where\s*\(\s*(\w+):\s*'),
     None, 1),
    ("ActiveRecord", "WHERE",
     re.compile(r'\.find_by\s*\(\s*(\w+):\s*'),
     None, 1),
    ("ActiveRecord", "ORDER_BY",
     re.compile(r'\.order\s*\(\s*:(\w+)\s*\)'),
     None, 1),
    ("ActiveRecord", "ORDER_BY",
     re.compile(r'\.order\s*\(\s*[\'"](\w+)'),
     None, 1),
]

# ─── Existing index detection ─────────────────────────────────────────────────

EXISTING_INDEX_PATTERNS = [
    # Django
    re.compile(r'(\w+)\s*=\s*\w+Field\s*\([^)]*\bdb_index\s*=\s*True'),
    re.compile(r'(\w+)\s*=\s*\w+Field\s*\([^)]*\bunique\s*=\s*True'),
    re.compile(r'models\.Index\s*\(\s*fields\s*=\s*\[([^\]]+)\]'),
    # SQLAlchemy
    re.compile(r'Column\s*\([^)]*\bindex\s*=\s*True[^)]*\).*?#.*?(\w+)'),
    re.compile(r'(\w+)\s*=\s*Column\s*\([^)]*\bindex\s*=\s*True'),
    re.compile(r'(\w+)\s*=\s*Column\s*\([^)]*\bunique\s*=\s*True'),
    re.compile(r'Index\s*\(\s*[\'"`]\w+[\'"`]\s*,\s*\w+\.(\w+)'),
    # TypeORM
    re.compile(r'@(?:Index|Unique|Column)\s*\([^)]*\bindex\s*:\s*true'),
    re.compile(r'@Column\s*\([^)]*\bunique\s*:\s*true[^)]*\)\s*\w+\s*:\s*\w+\s*(\w+)'),
    # Prisma
    re.compile(r'@@index\s*\(\s*\