Init Kb

ClawSkills 作者 kevjade v2.0.0

Initialize or update a knowledge base for a project, business, or client. Triggers on "init kb", "build kb", "create kb for X", "set up kb", "new kb" (init), and "update kb", "refresh kb", "re-scrape kb", "kb update" (update). Scrapes websites and social profiles via the Firecrawl API, runs deep analysis, asks targeted questions to fill gaps, and generates 9 structured KB files. Each KB loads on-demand (not at boot) to avoid context bloat. Files are designed to be comprehensive references for specialized agents.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install clawskills:kevjade~init-kb

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Akevjade~init-kb/file -o init-kb.md

Git 仓库获取源码

git clone https://github.com/openclaw/skills/commit/ded5a2e56b06d3a3b15916095181c85f2b0eab5f

# Initialize or Update Knowledge Base (OpenClaw)

You are building a structured knowledge base that gives AI agents everything they need to understand a person, their business, their voice, and their boundaries. This is the foundation for all future AI work. The better the KB, the better every output from day one.

This is the OpenClaw version of the init-kb skill. It uses the **Firecrawl REST API** (via curl) instead of the Firecrawl CLI.

## Critical Design Principle: On-Demand Loading Only

**IMPORTANT:** Knowledge bases must load ON-DEMAND, not at boot. Every agent session should not preload all KB files. This causes massive context bloat and kills productivity.

**The pattern:**
- 9 KB files are comprehensive references for **specialized work only**
- When the user asks for Operator Vault work, THEN load the relevant KB files
- For all other tasks, the KB stays unloaded
- AGENTS.md notes where the KB is, but doesn't auto-load it

This is critical for multi-project workspaces. Enforcing this throughout Phase 7 (Integration) prevents context waste.

## What You Produce

9 files in `KNOWLEDGE BASE/<project-name>/`:

| File | What It Captures |
|------|-----------------|
| PERSONA.md | Agent identity, core behavioral rules, boundaries, vibe |
| CONTEXT.md | Business context, goals, market position, competitors, non-negotiables |
| USER.md | The person/founder: background, origin story, personality, differentiators |
| VOICE.md | Writing style: tone, vocabulary, banned words/phrases, quality test |
| GUARDRAILS.md | Brand rules, things to never say, sensitive topics, approval gates |
| SITEMAP.md | Complete site structure (only if 20+ pages; otherwise folded into CONTEXT.md) |
| BUSINESS-INTEL.md | Products, pricing, business model, audience, positioning, tech stack, team |
| OPPORTUNITIES.md | Gaps, thin content, broken journeys, growth signals |
| CORRECTIONS.md | Self-improving log: every correction the user makes updates the source KB file and gets logged here |

Plus:
- A `site-content/` directory with every scraped page as its own markdown file
- An AGENTS.md boot section (auto-appended) and a CLAUDE.md snippet

## Triggers

**Init triggers:** "init kb", "build kb", "create kb for X", "set up kb", "new kb"

**Update triggers:** "update kb", "refresh kb", "re-scrape kb", "kb update"

When an update trigger fires, skip to the **Update Flow** section.

## API Key Setup

The Firecrawl API key must be available. Check for it in this order:
1. Environment variable `FIRECRAWL_API_KEY`
2. A file at `.firecrawl/api-key.txt` in the workspace root
3. Proceed through Phase 0 onboarding (see below)

If the user provides a key, save it to `.firecrawl/api-key.txt` (one line, just the key). Read from this file on future runs.

## Rules

1. **One question per message.** Never stack multiple questions. People freeze when they see three at once.
2. **Show before ask.** If Firecrawl pulled data, show what you found and ask to confirm before asking more questions.
3. **Progress tracking.** After each phase, tell the user where they are: "That's the personal stuff done. 2 of 4 sections complete. Next up: your business."
4. **Skip-friendly.** If someone says "I don't know" or "skip", mark it as `<!-- TODO: fill in later -->` in the output and move on. Never pressure.
5. **Accept file imports.** If the user drops a brand guide, doc path, or existing content, read it and extract relevant info instead of asking questions the doc already answers.
6. **No em dashes.** Never use them in generated files or responses. Use commas, periods, or restructure.
7. **Concise output.** Each file should be scannable. No novels. Strong opinions over vague guidelines. Actionable rules, not suggestions.

---

## The Flow

### Phase 0: Onboarding Wizard

When the skill triggers (init), open with this welcome message before asking anything:

```
Welcome to init-kb. I'm going to build a complete knowledge base that gives your AI agents full context on your business — who you are, what you sell, how you write, and what the rules are.

Here's what we're building:
- 9 structured files covering your business, person, voice, and boundaries
- Your full website scraped and analyzed (if you have one)
- A living KB that gets smarter over time

Estimated time: 10-20 minutes (faster if you have a website to scrape)

Do you have a Firecrawl API key? That's what I use to scrape your website and social profiles. If not, grab one free here: https://firecrawl.link/operator — come back when you have it and we'll continue.
```

If they say they have a key (or one is already saved), move to API key setup guidance below.
If they don't have one yet, wait for them to confirm before proceeding.

**API key setup — ask first:** "Are you running OpenClaw locally on your machine (Mac, PC) or on a server/VPS?"

**If local:**
```
To save your key permanently, run this in your terminal:

echo 'export FIRECRAWL_API_KEY=your-key-here' >> ~/.zshrc && source ~/.zshrc

Or if you use bash: replace .zshrc with .bashrc

Then paste your key here and I'll also save it to .firecrawl/api-key.txt as a backup.
```

**If VPS/server:**

Three ways to add it:

**Option 1 — Hostinger (GUI):**
```
Log into Hostinger, go to Catalogue, click Manage on your VPS, scroll down to Environment Variables, and add:
  Key: FIRECRAWL_API_KEY
  Value: your-key-here
```

**Option 2 — Any VPS via terminal:**
```
echo 'export FIRECRAWL_API_KEY=your-key-here' >> ~/.bashrc && source ~/.bashrc
```
(Replace .bashrc with .zshrc if you use zsh.)

**Option 3 — Just paste it here:**
Paste your key directly in this chat and I'll save it to .firecrawl/api-key.txt. Only do this if you're the only one with access to your server and Discord channel. Never paste API keys in shared or public channels.

After they paste the key, save it to `.firecrawl/api-key.txt` and confirm: "Got it. Key saved."

Then proceed:

**Question 1:** "What's the project or business name? This becomes the folder name."

**Question 2:** "Got a website URL? Any social profiles (LinkedIn, X, YouTube, Instagram)? Any other important links (docs, portfolio, press pages, Skool community)? Drop them all here. If you don't have any yet, just say 'none' and we'll skip the scraping."

**After Phase 0:**

- Check if `KNOWLEDGE BASE/<project-name>/` already exists:
  - If some files exist: "I found PERSONA.md and CONTEXT.md already. Want me to build the missing ones, or update everything?"
  - If all files exist: "This KB is already complete. Want me to review and update it, or leave it as-is?"

- **Check cache.** If `.firecrawl/<project-slug>/crawl-raw.json` exists and is less than 7 days old: "I already scraped this site on [date]. Want to use the cached data or re-scrape?" If cache is good, skip to Stage 4.

- If a website URL was provided, run the **Firecrawl scraping pipeline:**

#### Stage 1: Discover URLs (Map API)

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/map" \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "<website-url>", "limit": 500}' \
  -o .firecrawl/<project-slug>/map-result.json
```

Parse the response to get the URL list and count. Present to the user: "I found X pages on your site. Crawling all of them will use approximately X Firecrawl credits. Want me to proceed, or should I limit it?"

If the user wants to limit, ask for a number or suggest core pages only.

#### Stage 2: Crawl Full Site (Crawl API)

```bash
curl -s -X POST "https://api.firecrawl.dev/v1/crawl" \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "<website-url>", "limit": <N>, "scrapeOptions": {"formats": ["markdown"]}}' \
  -o .firecrawl/<project-slug>/crawl-job.json
```

This returns a job ID. Poll for completion:
```bash
curl -s -X GET "https://api.firecrawl.dev/v1/crawl/<job-id>" \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -o .firecrawl/<project-slug>/crawl-raw.json
```

Poll ever