production-model-router

GitHub 作者 LeoYeAI/openclaw-master-skills v2026.3.27

Route each user request to the most cost-effective model or multi-model workflow based on task type, complexity, risk, latency, budget, tool needs, and verification requirements.

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install github:LeoYeAI~openclaw-master-skills~model-routing-orchestrator

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/github%3ALeoYeAI~openclaw-master-skills~model-routing-orchestrator/file -o model-routing-orchestrator.md

# Production Model Router

## Overview
Use this skill to decide which model tier, workflow shape, and verification strategy should handle a user's request.

The goal is to maximize cost-effectiveness without sacrificing task fit, correctness, or operational reliability.

This skill does not blindly choose the strongest model. It chooses the cheapest safe path that still meets the quality bar for the task.

It may recommend:
- a single low-cost model
- a single balanced model
- a single premium model
- a tool-assisted model workflow
- a staged multi-model pipeline
- a parallel comparison workflow
- a draft-and-review workflow
- a consensus or verifier workflow

## Primary objective
For every request, choose the minimum-cost execution path that can still satisfy:
- task quality
- correctness requirements
- latency expectations
- safety or risk constraints
- output format needs
- tool and modality requirements

## When to use
Use this skill when you need to decide:
- which model should answer a given user request
- whether a cheap model is enough
- when to escalate to a stronger reasoning model
- when to use one model versus multiple models
- when to use tools instead of relying on pure model reasoning
- how to handle complex calculations, code, multimodal input, long context, or high-risk tasks
- how to balance cost, speed, and answer quality in production

## Do not use
Do not use this skill to:
- answer the original business question directly
- fabricate model capabilities without evidence from the environment or configuration
- assume the most expensive model is always the best choice
- route high-risk exact tasks to a cheap model without verification
- rely on pure language generation for exact arithmetic when tools are available

## Inputs to collect
Collect or infer the following from the request and system context:

### Request characteristics
- task type
- domain
- expected output type
- presence of images, files, tables, code, or long documents
- need for exactness versus approximate usefulness
- whether the request is open-ended or precision-critical

### Execution constraints
- budget sensitivity
- latency sensitivity
- quality expectation
- token or context size pressure
- tool availability
- need for citations or traceability
- need for reproducibility

### Risk profile
- low-risk
- medium-risk
- high-risk

### Failure tolerance
- whether a rough answer is acceptable
- whether the answer must be verified
- whether disagreement between models would be valuable

## Task taxonomy
Classify the request into one or more of these categories:

1. Simple generation
   - rewrite
   - summarization
   - formatting
   - light translation
   - basic brainstorming

2. General reasoning
   - explanation
   - comparison
   - concept mapping
   - normal business analysis

3. Deep reasoning
   - multi-step planning
   - tradeoff analysis
   - architecture design
   - ambiguous decision support
   - chain-dependent reasoning

4. Exact calculation or formal logic
   - arithmetic
   - financial calculations
   - unit conversion
   - spreadsheet-like reasoning
   - symbolic or step-sensitive math
   - combinatorics or logic puzzles where exactness matters

5. Coding and technical execution
   - code generation
   - debugging
   - refactoring
   - test generation
   - query writing
   - infrastructure or API design

6. Long-context synthesis
   - large documents
   - multiple files
   - multi-source comparison
   - transcript or contract review

7. Multi-modal tasks
   - image understanding
   - diagram interpretation
   - PDF with layout-heavy content
   - video or audio related tasks if supported

8. High-risk tasks
   - medical
   - legal
   - financial decisions
   - compliance
   - security-sensitive operations
   - anything where incorrect advice has material consequences

## Core routing principle
Always prefer the cheapest path that can safely succeed.

Apply this order of preference:
1. Cheap single-model path
2. Balanced single-model path
3. Premium single-model path
4. Tool-assisted path
5. Staged multi-model path
6. Parallel multi-model comparison
7. Premium plus verifier or consensus workflow

Do not escalate unless the task characteristics justify it.

## Model tiers
Use abstract capability tiers unless the deployment specifies exact providers.

### Economy tier
Use for:
- simple rewriting
- formatting
- low-risk classification
- short summaries
- lightweight extraction
- first-pass triage

Strengths:
- lowest cost
- fast response
- good for straightforward tasks

Weaknesses:
- weaker deep reasoning
- more brittle on ambiguity
- worse on exactness-critical tasks

### Balanced tier
Use for:
- everyday product and engineering work
- standard reasoning
- moderate code tasks
- moderate document analysis
- most business and writing tasks

Strengths:
- solid quality-cost tradeoff
- handles most normal production traffic
- reasonable speed and robustness

Weaknesses:
- may still fail on highly ambiguous or exacting tasks
- not always enough for hard reasoning or high-risk requests

### Premium tier
Use for:
- deep reasoning
- difficult code and architecture problems
- long-context synthesis with subtle dependencies
- high-value outputs
- high-risk tasks requiring stronger judgment

Strengths:
- strongest reasoning
- better ambiguity handling
- better synthesis quality

Weaknesses:
- highest cost
- often slower
- overkill for simple tasks

### Tool-assisted tier
Use when exactness matters more than fluent wording.

Use this path for:
- arithmetic
- deterministic calculations
- spreadsheet operations
- formula application
- structured data transformation
- exact code execution or testing if available
- retrieval-backed factual tasks

Rule:
When a task requires exact numeric correctness, prefer tools plus model orchestration over pure model reasoning.

## Decision dimensions
Score the request across these dimensions:

### 1. Complexity
- low
- medium
- high
- very high

### 2. Exactness requirement
- low: approximate answer is acceptable
- medium: mostly correct is acceptable
- high: exact result expected
- critical: exact result plus verification required

### 3. Risk level
- low
- medium
- high

### 4. Latency priority
- urgent
- normal
- relaxed

### 5. Budget strategy
- minimize cost
- balanced
- quality-first

### 6. Context burden
- short
- moderate
- long
- extreme

### 7. Modality burden
- text only
- image or PDF
- mixed inputs

## Hard routing rules
Apply these rules before any soft optimization.

### Exact calculation rule
If the task involves exact arithmetic, formulas, tables, accounting-like operations, unit-sensitive conversions, or step-sensitive logic:
- do not rely on a pure language-only route when tools are available
- prefer tool-assisted execution
- use a balanced or premium model only to interpret the task and explain results
- add a verification step for high-impact numeric outputs

### High-risk rule
If the task is high-risk:
- do not use economy-only routing as the final path
- require either premium single-model reasoning with grounding or a model plus verifier workflow
- add citations, checks, or a review pass when possible

### Ambiguity rule
If the task is materially ambiguous and the answer quality depends on interpretation:
- use a stronger reasoning tier or a two-stage workflow
- do not finalize on a cheap first-pass answer without clarification or review

### Long-context rule
If the input is large or multi-document:
- prefer staged processing
- use extraction or chunk summarization first
- then use a stronger model for synthesis if needed
- avoid sending everything to the strongest model by default if staged reduction is cheaper and safe

### Multimodal rule
If the task includes images, diagrams, PDFs with layout dependence, or visual interpretation:
- use a model path that actually supports the required modality
- do not route to a text-only path

### Coding rule
For code tasks:
- simple boilerplate or syntax transforms