cross-ref

SkillDB 作者 glucksberg v1.1.0

Cross-reference GitHub PRs and issues to find duplicates and missing links. Spawns parallel Sonnet subagents to semantically analyze the last N PRs and issues, finding PRs that solve the same problem (duplicates) and issues resolved by open PRs but not yet linked. Groups findings into thematic clusters, scores them by actionability, and offers rate-limited commenting or bulk actions (close, label). Use this skill when the user wants to find duplicate PRs, link issues to PRs, clean up a repo's cross-references, or audit PR/issue relationships. Also useful when the user says things like "find related PRs", "which PRs fix this issue", "are there duplicate PRs", "link issues and PRs", or "audit cross-references".

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install skilldb:glucksberg~cross-ref

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/skilldb%3Aglucksberg~cross-ref/file -o cross-ref.md

Git 仓库获取源码

git clone https://github.com/openclaw/skills/commit/1fa9826ee9160bda8d0298963d5f3563913d76b2

# Cross-Ref: PR & Issue Linker

You find hidden connections between PRs and issues that humans miss at scale.
The core loop is: **fetch → analyze in parallel → cluster → verify → report → act**.

Before doing anything, read `references/principles.md`. Those rules override
everything in this file when there's a conflict.

## Overview

Repos accumulate duplicate PRs and orphaned issue→PR links over time. Manual
cross-referencing doesn't scale past a few dozen items. This skill uses parallel
Sonnet subagents to analyze up to 1000 PRs and 1000 issues simultaneously,
finding two kinds of links:

1. **Duplicate PRs** — PRs that address the same bug or feature (even with
   different approaches or wording)
2. **Issue→PR links** — Open issues that already have a PR solving them but
   no explicit "fixes #N" reference

Results are grouped into **thematic clusters**, scored by **actionability**,
and presented with available **actions** (comment, close, label) — not just
as a flat list of pairs.

## Configuration

The user provides these at invocation time (ask if not given):

| Parameter | Default | Description |
|-----------|---------|-------------|
| `repo` | *(ask)* | GitHub `owner/repo` to analyze |
| `pr_count` | 1000 | How many recent PRs to scan |
| `issue_count` | 1000 | How many recent issues to scan |
| `pr_state` | `all` | PR state filter: `open`, `closed`, `all` |
| `issue_state` | `open` | Issue state filter: `open`, `closed`, `all` |
| `batch_size` | 50 | PRs per subagent batch |
| `confidence_threshold` | `medium` | Minimum confidence to include in report: `low`, `medium`, `high` |
| `mode` | `plan` | `plan` = report only (default, always start here). `execute` = act on findings. |

**Default mode is `plan`** (dry-run). The skill always starts by generating
the report. The user must explicitly choose to execute actions after reviewing
the findings. This matters because actions can't be undone.

## Workflow

### Phase 1: Data Collection

Fetch PR and issue metadata from the GitHub API. This phase is deterministic
and uses the shell script — no AI needed.

```bash
scripts/fetch-data.sh <owner/repo> <workspace_dir> [pr_count] [issue_count] [pr_state] [issue_state]
```

This produces:
- `workspace/prs.json` — Full PR metadata
- `workspace/issues.json` — Full issue metadata (PRs filtered out)
- `workspace/existing-refs.json` — Pre-extracted explicit cross-references
- `workspace/pr-index.txt` — Compact one-line-per-PR index
- `workspace/issue-index.txt` — Compact one-line-per-issue index

The existing references map captures what's *already* linked (via "fixes #N",
"closes #N", etc.) so subagents can focus on what's *missing*.

### Phase 2: Parallel Analysis (Sonnet Subagents)

This is where the intelligence happens. Split PRs into batches and spawn
parallel Sonnet subagents. Each subagent receives:

- Its batch of PRs (full metadata from prs.json, ~50 PRs)
- The **complete** issue index (compact, ~60KB)
- The **complete** PR index (compact, ~60KB) — for duplicate detection
- The existing references map (so it skips already-linked items)

**Spawn subagents using the Task tool:**

```
For each batch B of {batch_size} PRs:
  Task(
    subagent_type="general-purpose",
    model="sonnet",
    prompt=<see below>
  )
```

**Subagent prompt template:**

**Important**: When building each subagent prompt, paste the FULL contents of
`references/principles.md` into the "Decision Principles" section below.
Do not summarize or condense — include the complete text. This ensures
subagents always use the latest principles without drift.

```
You are a cross-reference analyst for a GitHub repository. Your job is to find
connections between PRs and issues that aren't explicitly linked yet.

## Decision Principles (these override everything else)

{paste full contents of references/principles.md here}

## Your Batch
You are analyzing PRs {start_num} through {end_num} of {total_prs}.

## PR Details (your batch)
{full PR metadata for this batch from prs.json}

## Complete Issue Index
{issue-index.txt content}

## Complete PR Index
{pr-index.txt content}

## Already Known References
{existing-refs.json content}

## Your Task

Find TWO types of connections:

### 1. Issue→PR Links
For each PR in your batch, determine if it resolves any issue in the index.
Evidence must include at least one of:
- Same error message or failure path described in both
- PR modifies the component/module that the issue describes as broken
- PR body explicitly references the problem the issue describes (even without #N)

Title similarity alone is NOT sufficient. Skip any links that already exist
in the known references.

### 2. Duplicate PRs
For each PR in your batch, check if any OTHER PR in the full PR index
addresses the same problem. Evidence must include at least one of:
- Both modify the same files for the same reason
- Both fix the same error/behavior (even with different approaches)
- One is a resubmission or continuation of the other (same branch, similar body)

Same area of code is NOT enough — the PRs must address the same specific problem.

### 3. Flagging Uncertainty

If you encounter a pair where the evidence is ambiguous — you can see a
plausible connection but can't confirm it from the available data — mark it
with `"status": "manual_review_required"` instead of guessing a confidence
level. Include what's missing (e.g., "need to see full diff to confirm
file overlap").

### Output Format
Return ONLY a JSON array. No other text.

[
  {
    "type": "issue_link",
    "pr": 5678,
    "pr_author": "@username",
    "issue": 1234,
    "confidence": "high|medium|low",
    "status": "confirmed|manual_review_required",
    "root_cause": "One sentence: what shared problem connects these",
    "evidence": "Specific: same error message, same file, same component, etc.",
    "missing_evidence": null or "What would be needed to confirm this"
  },
  {
    "type": "duplicate_pr",
    "pr_a": 5678,
    "pr_b": 5679,
    "pr_a_author": "@username_a",
    "pr_b_author": "@username_b",
    "confidence": "high|medium|low",
    "status": "confirmed|manual_review_required",
    "root_cause": "One sentence: what shared problem connects these",
    "evidence": "Specific: same files modified, same branch, resubmission, etc.",
    "missing_evidence": null or "What would be needed to confirm this"
  }
]
```

**Parallelism**: Spawn ALL batch subagents simultaneously. With batch_size=50
and 1000 PRs, that's 20 parallel subagents. This is the power of the skill —
what would take hours sequentially completes in minutes.

### Phase 3: Merge, Deduplicate & Cluster

After all subagents return:

1. **Collect** all JSON results into a single array
2. **Deduplicate** duplicate_pr entries (A→B and B→A are the same link)
3. **Merge confidence** — if two subagents found the same link, take the
   higher confidence and merge both evidence strings
4. **Filter** by `confidence_threshold`
5. **Build clusters** — group related findings into thematic clusters (see below)
6. **Score clusters** by actionability (see below)
7. **Sort** clusters by score (highest first)

Save to `workspace/results-unverified.json`.

#### Clustering Algorithm

Instead of reporting isolated pairs, group connected findings into clusters.
Two findings belong to the same cluster if they share any PR or issue number.

Example: If you find `PR#100 ↔ PR#101` (duplicate) and `PR#100 ↔ Issue#50`
(link), these form a single cluster: **"Cluster: Issue#50 + PR#100 + PR#101"**.

Cluster structure:
```json
{
  "cluster_id": 1,
  "theme": "Onboard token mismatch — OPENCLAW_GATEWAY_TOKEN ignored",
  "items": ["PR#22662", "PR#22658", "Issue#22638"],
  "findings": [ ...individual findings in this cluster... ],
  "score": 8.5,
  "cluster_status": "actionable|needs_review|manual_review_required",
  "suggested_actions": [ ...see Phase 4b... ]
}
```

The `theme` is a one-line summary that describes what this cluster is about
— the shared root cause or feature area. Gene