gradient-inference

TotalClaw 作者 Rogue Iteration

DigitalOcean Gradient AI 无服务器推理的社区技能（非官方）。发现可用的模型和定价，运行聊天完成或响应 API 带有提示缓存，并生成图像。兼容 OpenAI。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~simondelorean-gradient-inference

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~simondelorean-gradient-inference/file -o simondelorean-gradient-inference.md

## 概述（中文）

DigitalOcean Gradient AI 无服务器推理的社区技能（非官方）。
发现可用的模型和定价，运行聊天完成或响应 API
带有提示缓存，并生成图像。兼容 OpenAI。

## 原文

# 🦞 Gradient AI — Serverless Inference

> ⚠️ **This is an unofficial community skill**, not maintained by DigitalOcean. Use at your own risk.

> *"Why manage GPUs when the ocean provides?" — ancient lobster proverb*

Use DigitalOcean's [Gradient Serverless Inference](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/) to call large language models without managing infrastructure. The API is **OpenAI-compatible**, so standard SDKs and patterns work — just point at `https://inference.do-ai.run/v1` and swim.

## Authentication

All requests need a **Model Access Key** in the `Authorization: Bearer` header.

```bash
export GRADIENT_API_KEY="your-model-access-key"
```

**Where to get one:** [DigitalOcean Console](https://cloud.digitalocean.com) → Gradient AI → Model Access Keys → Create Key.

📖 *[Full auth docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#create-a-model-access-key)*

---

## Tools

### 🔍 List Available Models

Window-shop for LLMs before you swipe the card.

```bash
python3 gradient_models.py                    # Pretty table
python3 gradient_models.py --json             # Machine-readable
python3 gradient_models.py --filter "llama"   # Search by name
```

Use this before hardcoding model IDs — models are added and deprecated over time.

**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/models \
  -H "Authorization: Bearer $GRADIENT_API_KEY" | python3 -m json.tool
```

📖 *[Models reference](https://docs.digitalocean.com/products/gradient-ai-platform/details/models/)*

---

### 💬 Chat Completions

The classic. Send structured messages (system/user/assistant roles), get a response. OpenAI-compatible, so you probably already know how this works.

```bash
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --system "You are a helpful assistant." \
  --prompt "Explain serverless inference in one paragraph."

# Different model
python3 gradient_chat.py \
  --model "llama3.3-70b-instruct" \
  --prompt "Write a haiku about cloud computing."
```

**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-120b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'
```

📖 *[Chat Completions docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#chat-completions)*

---

### ⚡ Responses API (Recommended)

DigitalOcean's [recommended endpoint](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#responses-api) for new integrations. Simpler request format and supports **prompt caching** — a.k.a. "stop paying twice for the same context."

```bash
# Basic usage
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --prompt "Summarize this earnings report." \
  --responses-api

# With prompt caching (saves cost on follow-up queries)
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --prompt "Now compare it to last quarter." \
  --responses-api --cache
```

**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-120b",
    "input": "Explain prompt caching.",
    "store": true
  }'
```

**When to use which:**
| | Chat Completions | Responses API |
|---|---|---|
| **Request format** | Array of messages with roles | Single `input` string |
| **Prompt caching** | ❌ | ✅ via `store: true` |
| **Multi-step tool use** | Manual | Built-in |
| **Best for** | Structured conversations | Simple queries, cost savings |

📖 *[Responses API docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#responses-api)*

---

### 🖼️ Generate Images

Turn text prompts into images. Because sometimes a chart isn't enough.

```bash
python3 gradient_image.py --prompt "A lobster trading stocks on Wall Street"
python3 gradient_image.py --prompt "Sunset over the NYSE" --output sunset.png
python3 gradient_image.py --prompt "Fintech logo" --json
```

**Direct API call:**
```bash
curl -s https://inference.do-ai.run/v1/images/generations \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A lobster analyzing candlestick charts",
    "n": 1
  }'
```

📖 *[Image generation docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#image-generation)*

---

## 🧠 Model Selection Guide

Not all models are created equal. Choose wisely, young crustacean:

| Model | Best For | Speed | Quality | Context |
|-------|----------|-------|---------|---------|
| `openai-gpt-oss-120b` | Complex reasoning, analysis, writing | Medium | ★★★★★ | 128K |
| `llama3.3-70b-instruct` | General tasks, instruction following | Fast | ★★★★ | 128K |
| `deepseek-r1-distill-llama-70b` | Math, code, step-by-step reasoning | Slow | ★★★★★ | 128K |
| `qwen3-32b` | Quick triage, short tasks | Fastest | ★★★ | 32K |

> **🦞 Pro tip: Cost-aware routing.** Use a fast model (e.g., `qwen3-32b`) to score or triage, then only escalate to a strong model (e.g., `openai-gpt-oss-120b`) when depth is needed. Enable prompt caching for repeated context.

Always run `python3 gradient_models.py` to check what's currently available — the menu changes.

📖 *[Available models](https://docs.digitalocean.com/products/gradient-ai-platform/details/models/)*

---

### 💰 Model Pricing Lookup

Check what models cost *before* you rack up a bill. Scrapes the official [DigitalOcean pricing page](https://docs.digitalocean.com/products/gradient-ai-platform/details/pricing/) — no API key needed.

```bash
python3 gradient_pricing.py                    # Pretty table
python3 gradient_pricing.py --json             # Machine-readable
python3 gradient_pricing.py --model "llama"    # Filter by model name
python3 gradient_pricing.py --no-cache         # Skip cache, fetch live
```

**How it works:**
- Fetches live pricing from DigitalOcean's docs (public page, no auth)
- Caches results for 24 hours in `/tmp/gradient_pricing_cache.json`
- Falls back to a bundled snapshot if the live fetch fails

> **🦞 Pro tip:** Run `python3 gradient_pricing.py --model "gpt-oss"` before choosing a model to see the cost difference between `gpt-oss-120b` ($0.10/$0.70) and `gpt-oss-20b` ($0.05/$0.45) per 1M tokens.

📖 *[Pricing docs](https://docs.digitalocean.com/products/gradient-ai-platform/details/pricing/)*

---

## CLI Reference

All scripts accept `--json` for machine-readable output.

```
gradient_models.py   [--json] [--filter QUERY]
gradient_chat.py     --prompt TEXT [--model ID] [--system TEXT]
                     [--responses-api] [--cache] [--temperature F]
                     [--max-tokens N] [--json]
gradient_image.py    --prompt TEXT [--model ID] [--output PATH]
                     [--size WxH] [--json]
gradient_pricing.py  [--json] [--model QUERY] [--no-cache]
```

## External Endpoints

| Endpoint | Purpose |
|----------|---------|
| `https://inference.do-ai.run/v1/models` | List available models |
| `https://inference.do-ai.run/v1/chat/completions` | Chat Completions API |
| `https://inference.do-ai.run/v1/responses` | Responses API (recommended) |
| `https://inference.do-ai.run/v1/images/generations` | Image generation |
| `https://docs.digitalocean.com/.../pricing/` | Pricing page (scraped, public) |

## Security & Privacy

- All requests go to `inference.do-ai.run` — DigitalOcean's own endpoint
- Your `GRADIENT_API_KEY` is sent