llm-wiki
Karpathy's LLM Wiki: build/query interlinked markdown KB.
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install hermes:hermes~llm-wikicURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/hermes%3Ahermes~llm-wiki/file -o llm-wiki.md# Karpathy's LLM Wiki
Build and maintain a persistent, compounding knowledge base as interlinked markdown files.
Based on [Andrej Karpathy's LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).
Unlike traditional RAG (which rediscovers knowledge from scratch per query), the wiki
compiles knowledge once and keeps it current. Cross-references are already there.
Contradictions have already been flagged. Synthesis reflects everything ingested.
**Division of labor:** The human curates sources and directs analysis. The agent
summarizes, cross-references, files, and maintains consistency.
## When This Skill Activates
Use this skill when the user:
- Asks to create, build, or start a wiki or knowledge base
- Asks to ingest, add, or process a source into their wiki
- Asks a question and an existing wiki is present at the configured path
- Asks to lint, audit, or health-check their wiki
- References their wiki, knowledge base, or "notes" in a research context
## Wiki Location
**Location:** Set via `WIKI_PATH` environment variable (e.g. in `~/.hermes/.env`).
If unset, defaults to `~/wiki`.
```bash
WIKI="${WIKI_PATH:-$HOME/wiki}"
```
The wiki is just a directory of markdown files — open it in Obsidian, VS Code, or
any editor. No database, no special tooling required.
## Architecture: Three Layers
```
wiki/
├── SCHEMA.md # Conventions, structure rules, domain config
├── index.md # Sectioned content catalog with one-line summaries
├── log.md # Chronological action log (append-only, rotated yearly)
├── raw/ # Layer 1: Immutable source material
│ ├── articles/ # Web articles, clippings
│ ├── papers/ # PDFs, arxiv papers
│ ├── transcripts/ # Meeting notes, interviews
│ └── assets/ # Images, diagrams referenced by sources
├── entities/ # Layer 2: Entity pages (people, orgs, products, models)
├── concepts/ # Layer 2: Concept/topic pages
├── comparisons/ # Layer 2: Side-by-side analyses
└── queries/ # Layer 2: Filed query results worth keeping
```
**Layer 1 — Raw Sources:** Immutable. The agent reads but never modifies these.
**Layer 2 — The Wiki:** Agent-owned markdown files. Created, updated, and
cross-referenced by the agent.
**Layer 3 — The Schema:** `SCHEMA.md` defines structure, conventions, and tag taxonomy.
## Resuming an Existing Wiki (CRITICAL — do this every session)
When the user has an existing wiki, **always orient yourself before doing anything**:
① **Read `SCHEMA.md`** — understand the domain, conventions, and tag taxonomy.
② **Read `index.md`** — learn what pages exist and their summaries.
③ **Scan recent `log.md`** — read the last 20-30 entries to understand recent activity.
```bash
WIKI="${WIKI_PATH:-$HOME/wiki}"
# Orientation reads at session start
read_file "$WIKI/SCHEMA.md"
read_file "$WIKI/index.md"
read_file "$WIKI/log.md" offset=<last 30 lines>
```
Only after orientation should you ingest, query, or lint. This prevents:
- Creating duplicate pages for entities that already exist
- Missing cross-references to existing content
- Contradicting the schema's conventions
- Repeating work already logged
For large wikis (100+ pages), also run a quick `search_files` for the topic
at hand before creating anything new.
## Initializing a New Wiki
When the user asks to create or start a wiki:
1. Determine the wiki path (from `$WIKI_PATH` env var, or ask the user; default `~/wiki`)
2. Create the directory structure above
3. Ask the user what domain the wiki covers — be specific
4. Write `SCHEMA.md` customized to the domain (see template below)
5. Write initial `index.md` with sectioned header
6. Write initial `log.md` with creation entry
7. Confirm the wiki is ready and suggest first sources to ingest
### SCHEMA.md Template
Adapt to the user's domain. The schema constrains agent behavior and ensures consistency:
```markdown
# Wiki Schema
## Domain
[What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"]
## Conventions
- File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`)
- Every wiki page starts with YAML frontmatter (see below)
- Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page)
- When updating a page, always bump the `updated` date
- Every new page must be added to `index.md` under the correct section
- Every action must be appended to `log.md`
- **Provenance markers:** On pages that synthesize 3+ sources, append `^[raw/articles/source-file.md]`
at the end of paragraphs whose claims come from a specific source. This lets a reader trace each
claim back without re-reading the whole raw file. Optional on single-source pages where the
`sources:` frontmatter is enough.
## Frontmatter
```yaml
---
title: Page Title
created: YYYY-MM-DD
updated: YYYY-MM-DD
type: entity | concept | comparison | query | summary
tags: [from taxonomy below]
sources: [raw/articles/source-name.md]
# Optional quality signals:
confidence: high | medium | low # how well-supported the claims are
contested: true # set when the page has unresolved contradictions
contradictions: [other-page-slug] # pages this one conflicts with
---
```
`confidence` and `contested` are optional but recommended for opinion-heavy or fast-moving
topics. Lint surfaces `contested: true` and `confidence: low` pages for review so weak claims
don't silently harden into accepted wiki fact.
### raw/ Frontmatter
Raw sources ALSO get a small frontmatter block so re-ingests can detect drift:
```yaml
---
source_url: https://example.com/article # original URL, if applicable
ingested: YYYY-MM-DD
sha256: <hex digest of the raw content below the frontmatter>
---
```
The `sha256:` lets a future re-ingest of the same URL skip processing when content is unchanged,
and flag drift when it has changed. Compute over the body only (everything after the closing
`---`), not the frontmatter itself.
## Tag Taxonomy
[Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.]
Example for AI/ML:
- Models: model, architecture, benchmark, training
- People/Orgs: person, company, lab, open-source
- Techniques: optimization, fine-tuning, inference, alignment, data
- Meta: comparison, timeline, controversy, prediction
Rule: every tag on a page must appear in this taxonomy. If a new tag is needed,
add it here first, then use it. This prevents tag sprawl.
## Page Thresholds
- **Create a page** when an entity/concept appears in 2+ sources OR is central to one source
- **Add to existing page** when a source mentions something already covered
- **DON'T create a page** for passing mentions, minor details, or things outside the domain
- **Split a page** when it exceeds ~200 lines — break into sub-topics with cross-links
- **Archive a page** when its content is fully superseded — move to `_archive/`, remove from index
## Entity Pages
One page per notable entity. Include:
- Overview / what it is
- Key facts and dates
- Relationships to other entities ([[wikilinks]])
- Source references
## Concept Pages
One page per concept or topic. Include:
- Definition / explanation
- Current state of knowledge
- Open questions or debates
- Related concepts ([[wikilinks]])
## Comparison Pages
Side-by-side analyses. Include:
- What is being compared and why
- Dimensions of comparison (table format preferred)
- Verdict or synthesis
- Sources
## Update Policy
When new information conflicts with existing content:
1. Check the dates — newer sources generally supersede older ones
2. If genuinely contradictory, note both positions with dates and sources
3. Mark the contradiction in frontmatter: `contradictions: [page-name]`
4. Flag for user review in the lint report
```
### index.md Template
The index is sectioned by type. Each entry is one line: wikilink + summary.
```markdown
# Wiki Index
> Content cata