local-first-llm

TotalClaw 作者 totalclaw

在回退到云 API 之前将 LLM 请求路由到本地模型(Ollama、LM Studio、llamafile)。在持久仪表板中跟踪代币节省和成本避免。在以下情况下使用:(1) 用户要求首先使用本地模型运行任务,(2) 用户希望降低云 API 成本或将请求保持私密,(3) 用户要求查看其代币节省或 LLM 路由仪表板,(4) 应自动决定本地与云路由的任何请求。支持 Ollama、LM Studio 和 llamafile 作为本地提供程序。

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~joelnishanth-local-first-llm
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~joelnishanth-local-first-llm/file -o joelnishanth-local-first-llm.md
# Local-First LLM

Route requests to a local LLM first; fall back to cloud only when necessary. Track every decision to show real token and cost savings.

## Quick Start

### 1. Check if a local LLM is running

```bash
python3 skills/local-first-llm/scripts/check_local.py
```

Returns JSON: `{ "any_available": true, "best": { "provider": "ollama", "models": [...] } }`

### 2. Route a request

```bash
python3 skills/local-first-llm/scripts/route_request.py \
  --prompt "Summarize this meeting transcript" \
  --tokens 800 \
  --local-available \
  --local-provider ollama
```

Returns: `{ "decision": "local", "reason": "...", "complexity_score": -1 }`

### 3. Log the outcome

After executing the request, record it:

```bash
python3 skills/local-first-llm/scripts/track_savings.py log \
  --tokens 800 \
  --model gpt-4o \
  --routed-to local
```

### 4. Show the dashboard

```bash
python3 skills/local-first-llm/scripts/dashboard.py
```

---

## Full Routing Workflow

```
┌─────────────────────────────────────────────────────┐
│  1. check_local.py  →  is a local provider running? │
│                                                      │
│  2. route_request.py  →  local or cloud?             │
│     - sensitivity check  (private data → local)      │
│     - complexity score   (high score → cloud)        │
│     - availability gate  (no local → cloud)          │
│                                                      │
│  3. Execute with the chosen provider                 │
│                                                      │
│  4. track_savings.py log  →  record the outcome      │
│                                                      │
│  5. dashboard.py  →  show cumulative savings         │
└─────────────────────────────────────────────────────┘
```

---

## Routing Rules (Summary)

| Condition                                                                     | Route    |
| ----------------------------------------------------------------------------- | -------- |
| No local provider available                                                   | ☁️ Cloud |
| Prompt contains sensitive data (`password`, `secret`, `api key`, `ssn`, etc.) | 🏠 Local |
| Complexity score ≥ 3                                                          | ☁️ Cloud |
| Complexity score < 3                                                          | 🏠 Local |

For full scoring details, see [references/routing-logic.md](references/routing-logic.md).

---

## Executing with a Local Provider

Once `route_request.py` returns `"decision": "local"`, send the request:

### Ollama

```bash
curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.2", "prompt": "YOUR_PROMPT", "stream": false}'
```

### LM Studio / llamafile (OpenAI-compatible)

```bash
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "local-model", "messages": [{"role": "user", "content": "YOUR_PROMPT"}]}'
```

---

## Dashboard

The dashboard reads from `~/.openclaw/local-first-llm/savings.json` (auto-created).

```
┌─────────────────────────────────────────┐
│    🧠  Local-First LLM — Dashboard      │
├─────────────────────────────────────────┤
│  Local LLM:  ✅  ollama (llama3.2...)   │
├─────────────────────────────────────────┤
│  Total requests:         42             │
│  Routed locally:         31  (73.8%)    │
│  Routed to cloud:        11             │
├─────────────────────────────────────────┤
│  Tokens saved:       84,200             │
│  Cost saved:           $0.4210          │
└─────────────────────────────────────────┘
```

Reset savings data:

```bash
python3 skills/local-first-llm/scripts/track_savings.py reset
```

---

## Additional References

- **Routing scoring details**: [references/routing-logic.md](references/routing-logic.md)
- **Local provider setup** (Ollama, LM Studio, llamafile): [references/local-providers.md](references/local-providers.md)
- **Token estimation & cloud cost table**: [references/token-estimation.md](references/token-estimation.md)

---

## 中文说明

# Local-First LLM(本地优先 LLM)

先将请求路由到本地 LLM;仅在必要时回退到云端。跟踪每一次决策,以展示真实的 token 与成本节省。

## 快速开始

### 1. 检查是否有本地 LLM 正在运行

```bash
python3 skills/local-first-llm/scripts/check_local.py
```

返回 JSON:`{ "any_available": true, "best": { "provider": "ollama", "models": [...] } }`

### 2. 路由一个请求

```bash
python3 skills/local-first-llm/scripts/route_request.py \
  --prompt "Summarize this meeting transcript" \
  --tokens 800 \
  --local-available \
  --local-provider ollama
```

返回:`{ "decision": "local", "reason": "...", "complexity_score": -1 }`

### 3. 记录结果

执行请求后,将其记录下来:

```bash
python3 skills/local-first-llm/scripts/track_savings.py log \
  --tokens 800 \
  --model gpt-4o \
  --routed-to local
```

### 4. 显示仪表板

```bash
python3 skills/local-first-llm/scripts/dashboard.py
```

---

## 完整路由工作流

```
┌─────────────────────────────────────────────────────┐
│  1. check_local.py  →  is a local provider running? │
│                                                      │
│  2. route_request.py  →  local or cloud?             │
│     - sensitivity check  (private data → local)      │
│     - complexity score   (high score → cloud)        │
│     - availability gate  (no local → cloud)          │
│                                                      │
│  3. Execute with the chosen provider                 │
│                                                      │
│  4. track_savings.py log  →  record the outcome      │
│                                                      │
│  5. dashboard.py  →  show cumulative savings         │
└─────────────────────────────────────────────────────┘
```

---

## 路由规则(摘要)

| 条件                                                                     | 路由    |
| ----------------------------------------------------------------------------- | -------- |
| 没有可用的本地提供程序                                                   | ☁️ 云端 |
| 提示包含敏感数据(`password`、`secret`、`api key`、`ssn` 等) | 🏠 本地 |
| 复杂度得分 ≥ 3                                                          | ☁️ 云端 |
| 复杂度得分 < 3                                                          | 🏠 本地 |

完整评分细节请参阅 [references/routing-logic.md](references/routing-logic.md)。

---

## 使用本地提供程序执行

一旦 `route_request.py` 返回 `"decision": "local"`,即发送请求:

### Ollama

```bash
curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.2", "prompt": "YOUR_PROMPT", "stream": false}'
```

### LM Studio / llamafile(OpenAI 兼容)

```bash
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "local-model", "messages": [{"role": "user", "content": "YOUR_PROMPT"}]}'
```

---

## 仪表板

仪表板从 `~/.openclaw/local-first-llm/savings.json`(自动创建)读取数据。

```
┌─────────────────────────────────────────┐
│    🧠  Local-First LLM — Dashboard      │
├─────────────────────────────────────────┤
│  Local LLM:  ✅  ollama (llama3.2...)   │
├─────────────────────────────────────────┤
│  Total requests:         42             │
│  Routed locally:         31  (73.8%)    │
│  Routed to cloud:        11             │
├─────────────────────────────────────────┤
│  Tokens saved:       84,200             │
│  Cost saved:           $0.4210          │
└─────────────────────────────────────────┘
```

重置节省数据:

```bash
python3 skills/local-first-llm/scripts/track_savings.py reset
```

---

## 其他参考资料

- **路由评分细节**:[references/routing-logic.md](references/routing-logic.md)
- **本地提供程序设置**(Ollama、LM Studio、llamafile):[references/local-providers.md](references/local-providers.md)
- **Token 估算与云成本表**:[references/token-estimation.md](references/token-estimation.md)