orchata-rag

GitHub 作者 Orchata AI
Knowledge management and RAG platform with tree-based document indexing. Use this skill to search, browse, and manage Orchata knowledge bases via MCP tools.
安装 / 下载方式

TotalClaw CLI推荐
totalclaw install github:LeoYeAI~openclaw-master-skills~orchata
cURL直接下载，无需登录
curl -fsSL https://skills.taituai.com/api/skills/github%3ALeoYeAI~openclaw-master-skills~orchata/file -o orchata.md
# Orchata Skills

This document describes how to effectively use Orchata, a RAG (Retrieval-Augmented Generation) platform with tree-based document indexing. Load this into your context to interact with Orchata knowledge bases.

## What is Orchata?

Orchata is a knowledge management platform that:

- **Organizes documents into Spaces** - Logical containers for related content
- **Uses tree-based indexing** - Documents are parsed into hierarchical structures with sections, summaries, and page ranges
- **Provides semantic search** - Find relevant content using natural language queries
- **Exposes MCP tools** - AI assistants can directly manage and query knowledge bases

## Core Concepts

### Spaces

A **Space** is a container for related documents. Think of it as a folder with semantic search capabilities.

- Each space has a `name`, `description`, and optional `icon`
- Descriptions are used by `smart_query` to recommend relevant spaces
- Spaces can be archived (soft-deleted)

### Documents

A **Document** is content within a space. Supported formats include:

- PDF (text-based and scanned with OCR)
- Word documents (.docx)
- Excel spreadsheets (.xlsx)
- PowerPoint presentations (.pptx)
- Markdown files (.md)
- Plain text files (.txt)
- Images (PNG, JPG, etc.)

**Document Status:**

| Status | Description |
| ------ | ----------- |
| `PENDING` | Uploaded, waiting for processing |
| `PROCESSING` | Being parsed and indexed |
| `COMPLETED` | Ready for queries |
| `FAILED` | Processing error occurred |

**Important:** Only query documents with `status: "COMPLETED"`. Other statuses won't return results.

### Document Trees

Documents are indexed into **hierarchical tree structures**:

- Each tree has nodes representing sections/chapters
- Nodes contain: `title`, `summary`, `startPage`, `endPage`, `textContent`
- Trees enable precise navigation of large documents

### Queries

Two types of queries are available:

1. **`query_spaces`** - Search document content using tree-based reasoning
2. **`smart_query`** - Discover which spaces are relevant for a query

---

## MCP Tools Reference

### Space Management

#### list_spaces

List all knowledge spaces in the organization.

```text
list_spaces
list_spaces with status="active"
list_spaces with page=1 pageSize=20
```

**Parameters:**

- `page` (number, optional): Page number (default: 1)
- `pageSize` (number, optional): Items per page (default: 10)
- `status` (string, optional): Filter by `active`, `archived`, or `all`

---

#### manage_space

Create, get, update, or delete a space.

```text
manage_space with action="create" name="Product Docs" description="Technical documentation"
manage_space with action="create" name="Legal" description="Case files" icon="briefcase"
manage_space with action="get" id="space_abc123"
manage_space with action="update" id="space_abc123" description="Updated description"
manage_space with action="delete" id="space_abc123"
```

**Parameters:**

- `action` (string, required): `create`, `get`, `update`, or `delete`
- `id` (string): Space ID (required for get/update/delete)
- `name` (string): Space name (required for create)
- `description` (string, optional): Space description
- `icon` (string, optional): Icon name. Defaults to "folder"
- `slug` (string, optional): URL-friendly identifier
- `isArchived` (boolean, optional): Archive status (for update)

**Valid Icons:**
`folder`, `book`, `file-text`, `database`, `package`, `archive`, `briefcase`, `inbox`, `layers`, `box`

If an invalid icon is provided, the tool returns an error with the list of valid options.

---

### Document Management

#### list_documents

List documents in a space.

```text
list_documents with spaceId="space_abc123"
list_documents with spaceId="space_abc123" status="completed"
list_documents with spaceId="space_abc123" status="all"
```

**Parameters:**

- `spaceId` (string, required): Space ID
- `page` (number, optional): Page number
- `pageSize` (number, optional): Items per page (max: 100)
- `status` (string, optional): Filter by status. Values: `pending`, `processing`, `completed`, `failed`, or `all`. Omitting returns all documents.

**Note:** Status values are case-insensitive (`completed` and `COMPLETED` both work).

---

#### save_document

Upload or upsert documents (single or batch).

**Single document:**

```text
save_document with spaceId="space_abc123" filename="guide.md" content="# Guide\n\nContent here..."
```

**Batch upload:**

```text
save_document with spaceId="space_abc123" documents=[{"filename": "doc1.md", "content": "..."}, {"filename": "doc2.md", "content": "..."}]
```

**Parameters:**

- `spaceId` (string, required): Space ID
- `filename` (string): Filename (required for single)
- `content` (string): Content (required for single)
- `documents` (array, optional): Array of `{filename, content, metadata}` for batch
- `metadata` (object, optional): Custom key-value pairs

---

#### get_document

Get document content by ID or filename. Returns processed markdown text.

```text
get_document with spaceId="space_abc123" id="doc_xyz789"
get_document with spaceId="space_abc123" filename="guide.md"
get_document with spaceId="*" filename="guide.md"
```

**Parameters:**

- `spaceId` (string, required): Space ID, or `*` to search all spaces (requires filename)
- `id` (string, optional): Document ID
- `filename` (string, optional): Filename

**Notes:**

- Either `id` or `filename` is required
- Use `spaceId="*"` to search all spaces when you know the filename but not the space
- For completed documents, returns the extracted markdown text (not raw PDF binary)
- When using `*`, the response includes the `spaceId` where the document was found

---

#### update_document

Update document content or metadata.

```text
update_document with spaceId="space_abc123" id="doc_xyz789" content="New content..."
update_document with spaceId="space_abc123" id="doc_xyz789" append=true content="Additional content"
```

**Parameters:**

- `spaceId` (string, required): Space ID
- `id` (string, required): Document ID
- `content` (string, optional): New content
- `metadata` (object, optional): New metadata
- `append` (boolean, optional): Append instead of replace
- `separator` (string, optional): Separator for append mode

---

#### delete_document

Permanently delete a document.

```text
delete_document with spaceId="space_abc123" id="doc_xyz789"
```

**Parameters:**

- `spaceId` (string, required): Space ID
- `id` (string, required): Document ID

---

### Query Tools

#### query_spaces

Search documents across one or more spaces using tree-based reasoning.

```text
query_spaces with query="How do I authenticate API requests?"
query_spaces with query="installation guide" spaceIds="space_abc123"
query_spaces with query="error handling" spaceIds=["space_abc", "space_def"] topK=10
```

**Parameters:**

- `query` (string, required): Natural language search query
- `spaceIds` (string or array, optional): Space ID(s) to search. Omit or use `*` for all spaces
- `topK` (number, optional): Maximum results (default: 10)
- `compact` (boolean, optional): Use compact format (default: false). See **When to Use Compact** below.

**When to Use Compact:**

| Mode | When to use | What you get |
| ---- | ----------- | ------------ |
| `compact=false` (default) | **Most queries.** Any time you need actual data, facts, numbers, dates, or details from documents. | Full results with document metadata, tree context, page ranges, and complete content. |
| `compact=true` | Broad discovery queries where you only need to know *which* documents are relevant, not their content. | Minimal results: just content snippet, source filename, and score. |

**Rule of thumb:** Default to `compact=false`. Only use `compact=true` when you're browsing/surveying and don't need the actual content yet.

**Response (compact=true format):**

```json
{
  "results": [
    {
      "content": "Relevant text content...",
      "source": "filename.pdf",
      "score": 0.95
    }