gradient-knowledge-base

ClawSkills 作者 Rogue Iteration

Community skill (unofficial) for DigitalOcean Gradient Knowledge Bases. Build RAG pipelines: store documents in DO Spaces, configure data sources, manage indexing, and run semantic or hybrid search queries.

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install clawskills:clawskills~simondelorean-gradient-knowledge-base

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aclawskills~simondelorean-gradient-knowledge-base/file -o simondelorean-gradient-knowledge-base.md

# 🦞 Gradient AI — Knowledge Bases & RAG

> ⚠️ **This is an unofficial community skill**, not maintained by DigitalOcean. Use at your own risk.

> *"A lobster never forgets. Neither should your agent." — the KB lobster*

Build a [Retrieval-Augmented Generation](https://docs.digitalocean.com/products/gradient-ai-platform/details/features/#retrieval-augmented-generation-rag) pipeline using DigitalOcean's Gradient Knowledge Bases. Store your documents in DO Spaces, index them into a managed Knowledge Base (backed by OpenSearch), and query them with semantic or hybrid search.

## Architecture

```
Your Agent                   DigitalOcean
┌─────────────┐     upload    ┌──────────────┐
│  Documents  │ ──────────▶  │  DO Spaces   │
└─────────────┘              │  (S3-compat) │
                              └──────┬───────┘
                                     │ auto-index
                              ┌──────▼───────┐
                              │ Knowledge    │
                              │ Base (KBaaS) │
                              │ ┌──────────┐ │
                              │ │OpenSearch│ │
                              │ └──────────┘ │
                              └──────┬───────┘
                                     │ retrieve
┌─────────────┐     answer    ┌──────▼───────┐
│  Your Agent │ ◀──────────  │  RAG Results │
│  + LLM      │              │  + Citations │
└─────────────┘              └──────────────┘
```

📖 *[Knowledge Base docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-manage-knowledge-bases/)*

## API Endpoints

This skill connects to three official DigitalOcean service endpoints:

| Hostname | Purpose | Docs |
|----------|---------|------|
| `api.digitalocean.com` | KB management (create, list, delete, data sources) | [DO API Reference](https://docs.digitalocean.com/reference/api/) |
| `kbaas.do-ai.run` | KB retrieval — semantic/hybrid search queries | [KB Retrieval docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-manage-knowledge-bases/) |
| `inference.do-ai.run` | LLM chat completions for RAG synthesis | [Inference docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/) |
| `<region>.digitaloceanspaces.com` | S3-compatible object storage | [Spaces docs](https://docs.digitalocean.com/products/spaces/) |

All endpoints are owned and operated by DigitalOcean. The `*.do-ai.run` hostnames are the Gradient AI Platform's service domains.

## Authentication

This skill uses **two different credentials** — think of it as a two-claw approach:

| Credential | Used For | Env Var |
|------------|----------|---------|
| DO API Token | KB management, indexing, queries | `DO_API_TOKEN` |
| Gradient API Key | LLM inference for RAG synthesis | `GRADIENT_API_KEY` |
| Spaces Keys | S3-compatible uploads | `DO_SPACES_ACCESS_KEY` + `DO_SPACES_SECRET_KEY` |

> **Credential scoping:** Use minimally-scoped tokens. Create a dedicated [Model Access Key](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/manage-access-keys/) for `GRADIENT_API_KEY`. For `DO_API_TOKEN`, use a [scoped API token](https://docs.digitalocean.com/reference/api/create-personal-access-token/) with only Knowledge Base and Spaces permissions. Avoid using your account-root token.

Optional but recommended:
```bash
export GRADIENT_KB_UUID="your-kb-uuid"     # Default KB for queries
export DO_SPACES_BUCKET="your-bucket"      # Default bucket for uploads
export DO_SPACES_ENDPOINT="https://nyc3.digitaloceanspaces.com"
```

---

## Tools

### 📦 Store Documents in Spaces

Upload files to DO Spaces for Knowledge Base indexing. This is the storage layer — documents land here before being indexed.

```bash
# Upload a file
python3 gradient_spaces.py --upload /path/to/report.md --bucket my-kb-data

# Upload with a key prefix (folder structure)
python3 gradient_spaces.py --upload report.md --bucket my-kb-data --prefix "research/2026-02-15/"

# List files in a bucket
python3 gradient_spaces.py --list --bucket my-kb-data

# List files with a prefix filter
python3 gradient_spaces.py --list --bucket my-kb-data --prefix "research/"

# Delete a file
python3 gradient_spaces.py --delete "research/old_report.md" --bucket my-kb-data
```

📖 *[DO Spaces docs](https://docs.digitalocean.com/products/spaces/)*

---

### 🏗️ Create and Manage Knowledge Bases

Full CRUD for Knowledge Bases. Create them programmatically instead of clicking through the console like a land-dweller.

```bash
# List all Knowledge Bases
python3 gradient_kb_manage.py --list

# Create a new KB
python3 gradient_kb_manage.py --create --name "My Research KB" --region nyc3

# Show details for a specific KB
python3 gradient_kb_manage.py --show --kb-uuid "your-kb-uuid"

# Delete a KB (⚠️ permanent!)
python3 gradient_kb_manage.py --delete --kb-uuid "your-kb-uuid"
```

📖 *[Create KBs via API](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-manage-knowledge-bases/)*

---

### 📁 Manage Data Sources

Connect your Spaces bucket (or web URLs) to a Knowledge Base. This is what tells the KB "index these documents."

```bash
# Add a DO Spaces data source
python3 gradient_kb_manage.py --add-source \
  --kb-uuid "your-kb-uuid" \
  --bucket my-kb-data \
  --prefix "research/"

# List data sources for a KB
python3 gradient_kb_manage.py --list-sources --kb-uuid "your-kb-uuid"

# Trigger re-indexing (auto-detects the data source)
python3 gradient_kb_manage.py --reindex --kb-uuid "your-kb-uuid"

# Trigger re-indexing for a specific source
python3 gradient_kb_manage.py --reindex --kb-uuid "your-kb-uuid" --source-uuid "ds-uuid"
```

> **🦞 Pro tip: Auto-indexing.** If your KB has auto-indexing enabled, you can skip manual re-index triggers. The KB will detect changes in your Spaces bucket automatically. Configure it in the [DigitalOcean Console](https://cloud.digitalocean.com) → Knowledge Base → Settings.

---

### 🔍 Query the Knowledge Base

Search your indexed documents with semantic or hybrid queries. This is where the magic happens — your documents become answers.

```bash
# Basic query
python3 gradient_kb_query.py --query "What happened with the Q4 earnings?"

# Control number of results
python3 gradient_kb_query.py --query "Revenue trends" --num-results 20

# Tune hybrid search balance (see below)
python3 gradient_kb_query.py --query "$CAKE price movement" --alpha 0.5

# JSON output (for piping to other tools)
python3 gradient_kb_query.py --query "SEC filings summary" --json
```

**Direct API call:**
```bash
curl -s https://kbaas.do-ai.run/v1/{kb-uuid}/retrieve \
  -H "Authorization: Bearer $DO_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What happened with Q4 earnings?",
    "num_results": 10,
    "alpha": 0.5
  }'
```

📖 *[KB retrieval API](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/create-manage-knowledge-bases/#query-a-knowledge-base)*

---

### 🎛️ The `alpha` Parameter — Hybrid Search Tuning

This is the secret sauce. The `alpha` parameter controls the balance between **lexical** (keyword) and **semantic** (meaning) search:

| Alpha | Behavior | Best For |
|-------|----------|----------|
| `0.0` | Pure lexical (keyword matching) | Exact terms: ticker symbols, filing numbers, dates |
| `0.5` | Balanced hybrid | General research queries |
| `1.0` | Pure semantic (meaning-based) | Open-ended: "what happened with...", "summarize..." |

> **🦞 Rule of claw:** Start at `0.5`. Go lower when searching for specific things (`$CAKE`, `10-K`, `2026-02-15`). Go higher when exploring ideas ("What's the market sentiment?").

---

### 🧠 RAG-Enhanced Queries

The full pipeline: query the KB → build a context prompt → call an LLM to synthesize. One command, complete answers with citations.

```bash
python3 gradient_kb_query.py \
  --query "Summarize all research on $CAKE" \
  --rag \
  --model "openai-gpt-oss-120b"
```

This automatically:
1. 🔍 Queries the K