venice-router

TotalClaw 作者 totalclaw v1.5.0

Venice.ai 的最高型号路由器——隐私第一、未经审查的人工智能平台。自动对查询复杂性进行分类并路由到最便宜的适当模型。支持网络搜索、未经审查的模式、仅限私人模式（零数据保留）、对话感知路由、成本预算、函数调用、思考/推理模式和 35+ Venice.ai 文本模型。当用户想要通过 Venice.ai 聊天、通过 Venice 发送提示或需要智能模型选择以最大限度地降低 API 成本同时保持数据对大型科技公司保密时使用。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~plusone-venice-router

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~plusone-venice-router/file -o plusone-venice-router.md

## 概述（中文）

Venice.ai 的最高型号路由器——隐私第一、未经审查的人工智能平台。自动对查询复杂性进行分类并路由到最便宜的适当模型。支持网络搜索、未经审查的模式、仅限私人模式（零数据保留）、对话感知路由、成本预算、函数调用、思考/推理模式和 35+ Venice.ai 文本模型。当用户想要通过 Venice.ai 聊天、通过 Venice 发送提示或需要智能模型选择以最大限度地降低 API 成本同时保持数据对大型科技公司保密时使用。

## 原文

# Venice.ai Supreme Router

Smart, cost-optimized model routing for [Venice.ai](https://venice.ai) — the AI platform for people who don't want Big Tech watching over their shoulder.

Unlike OpenAI, Anthropic, and Google — where every prompt is logged, analyzed, and potentially used to train future models — Venice offers **true privacy** with zero data retention on private models. Your conversations stay yours. Venice is also **uncensored**: no content filters, no refusals, no "I can't help with that."

## Setup

1. Get a Venice.ai API key from [venice.ai/settings/api](https://venice.ai/settings/api)
2. Set the environment variable:

```bash
export VENICE_API_KEY="your-key-here"
```

Or configure in `~/.openclaw/openclaw.json`:

```json
{
  "skills": {
    "entries": {
      "venice-router": {
        "enabled": true,
        "apiKey": "YOUR_VENICE_API_KEY"
      }
    }
  }
}
```

## Usage

### Route a prompt (auto-selects model)

```bash
python3 {baseDir}/scripts/venice-router.py --prompt "What is 2+2?"
```

### Force a specific tier

```bash
python3 {baseDir}/scripts/venice-router.py --tier cheap --prompt "Tell me a joke"
python3 {baseDir}/scripts/venice-router.py --tier budget-medium --prompt "Write a Python function"
python3 {baseDir}/scripts/venice-router.py --tier mid --prompt "Explain quantum computing"
python3 {baseDir}/scripts/venice-router.py --tier premium --prompt "Write a distributed systems architecture"
```

### Stream output

```bash
python3 {baseDir}/scripts/venice-router.py --stream --prompt "Write a poem about lobsters"
```

### Web search (LLM searches the web and cites sources)

```bash
python3 {baseDir}/scripts/venice-router.py --web-search --prompt "Latest news on AI regulation"
```

### Uncensored mode (prefer models with no content filters)

```bash
python3 {baseDir}/scripts/venice-router.py --uncensored --prompt "Write edgy creative fiction"
```

### Private-only mode (zero data retention, no Big Tech proxying)

```bash
python3 {baseDir}/scripts/venice-router.py --private-only --prompt "Analyze this confidential contract"
```

### Conversation-aware routing (multi-turn context)

```bash
# Save conversation history as JSON, then route follow-ups with context
python3 {baseDir}/scripts/venice-router.py --conversation history.json --prompt "Can you add tests too?"
```

The router analyzes conversation history to keep context: trivial follow-ups ("thanks") go cheap, while follow-ups in complex code discussions stay at the right tier.

### Function calling (tool use)

```bash
# Define tools in a JSON file (OpenAI tools format)
python3 {baseDir}/scripts/venice-router.py --tools tools.json --prompt "What's the weather in NYC?"
python3 {baseDir}/scripts/venice-router.py --tools tools.json --tool-choice auto --prompt "Search for latest AI news"
```

Tool definitions use the standard OpenAI format. The router auto-bumps to `mid` tier minimum for function calling since it requires capable models.

### Cost budget tracking

```bash
# Show current spending
python3 {baseDir}/scripts/venice-router.py --budget-status

# Track per-session costs
python3 {baseDir}/scripts/venice-router.py --session-id my-project --prompt "help me code"
```

Set `VENICE_DAILY_BUDGET` and/or `VENICE_SESSION_BUDGET` to enforce spending limits. The router auto-downgrades tiers as you approach budget limits.

### Classify only (no API call)

```bash
python3 {baseDir}/scripts/venice-router.py --classify "Explain the Riemann hypothesis"
```

### List available models and tiers

```bash
python3 {baseDir}/scripts/venice-router.py --list-models
```

### Override model directly

```bash
python3 {baseDir}/scripts/venice-router.py --model deepseek-v3.2 --prompt "Hello"
```

## Tiers

| Tier | Models | Cost (input/output per 1M tokens) | Best For |
|------|--------|-----------------------------------|----------|
| **cheap** | Venice Small (qwen3-4b), GLM 4.7 Flash, GPT OSS 120B, Llama 3.2 3B | $0.05–$0.15 / $0.15–$0.60 | Simple Q&A, greetings, math, lookups |
| **budget** | Qwen 3 235B, Venice Uncensored, GLM 4.7 Flash Heretic | $0.14–$0.20 / $0.75–$0.90 | Moderate questions, summaries, translations |
| **budget-medium** | Grok Code Fast, DeepSeek V3.2, MiniMax M2.1 | $0.25–$0.40 / $1.00–$1.87 | Moderate-to-complex tasks, code snippets, structured output |
| **mid** | DeepSeek V3.2, MiniMax M2.1/M2.5, Qwen3 Thinking 235B, Venice Medium, Llama 3.3 70B | $0.25–$0.70 / $1.00–$3.50 | Code generation, analysis, longer writing, reasoning |
| **high** | GLM 5, Kimi K2 Thinking, Kimi K2.5, Grok 4.1 Fast, Hermes 3 405B, Gemini 3 Flash | $0.50–$1.10 / $1.25–$3.75 | Complex reasoning, multi-step tasks, code review |
| **premium** | GPT-5.2, GPT-5.2 Codex, Gemini 3 Pro, Gemini 3.1 Pro (1M ctx), Claude Opus/Sonnet 4.5/4.6 | $2.19–$6.00 / $15.00–$30.00 | Expert-level analysis, architecture, research papers |

## Routing Strategy

The router classifies each prompt using keyword + heuristic analysis:

1. **Length** — longer prompts suggest more complex tasks
2. **Keywords** — domain-specific terms (e.g., "architecture", "optimize", "prove") signal complexity
3. **Code markers** — presence of code blocks, function names, or technical syntax
4. **Instruction depth** — multi-step instructions, comparisons, or "explain in detail" bump the tier
5. **Conversational simplicity** — greetings, yes/no, small talk stay on the cheapest tier
6. **Conversation history** — when `--conversation` is provided, analyzes full chat context: code in history boosts tier, trivial follow-ups ("thanks") downgrade, tool calls in history signal complexity
7. **Function calling** — `--tools` auto-bumps to at least `mid` tier (capable models required)
8. **Thinking/reasoning mode** — `--thinking` prefers chain-of-thought reasoning models (Qwen3 Thinking, Kimi K2) and bumps to at least `mid` tier
9. **Budget constraints** — progressive tier downgrade as spending approaches daily/session limits (95% → cheap, 80% → budget, 60% → mid, 40% → high)

The classifier errs on the side of cheaper models — it only escalates when there's strong signal for complexity.

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `VENICE_API_KEY` | Venice.ai API key (required) | — |
| `VENICE_DEFAULT_TIER` | Minimum floor tier — auto-classification never goes below this. Valid: `cheap`, `budget`, `budget-medium`, `mid`, `high`, `premium` | `budget` |
| `VENICE_MAX_TIER` | Maximum tier to ever use (cost cap) | `premium` |
| `VENICE_TEMPERATURE` | Default temperature | `0.7` |
| `VENICE_MAX_TOKENS` | Default max tokens | `4096` |
| `VENICE_STREAM` | Enable streaming by default | `false` |
| `VENICE_UNCENSORED` | Always prefer uncensored models | `false` |
| `VENICE_PRIVATE_ONLY` | Only use private models (zero data retention) | `false` |
| `VENICE_WEB_SEARCH` | Enable web search by default ($10/1K calls) | `false` |
| `VENICE_THINKING` | Always prefer thinking/reasoning models | `false` |
| `VENICE_DAILY_BUDGET` | Max daily spend in USD (0 = unlimited) | `0` |
| `VENICE_SESSION_BUDGET` | Max per-session spend in USD (0 = unlimited) | `0` |

## Why Venice.ai?

- **🔒 Private inference** — Models marked "Private" have zero data retention. Your data never trains anyone's model.
- **🔓 Uncensored** — No guardrails blocking legitimate use cases. No refusals, no filters.
- **🔌 OpenAI-compatible** — Same API format, just change the base URL. Drop-in replacement.
- **📦 30+ models** — From tiny efficient models ($0.05/M) to Claude Opus 4.6 and GPT-5.2.
- **🌐 Built-in web search** — LLMs can search the web and cite sources in a single API call.

## Tips

- Use `--classify` to preview which tier a prompt would hit before spending tokens
- Set `VENICE_MAX_TIER=mid` to cap c