cloudflare-workers-architect

SkillDB 作者 kn7d5eszdfwftk153ymhdm4qhs83qsqy v1.0.0

Design Cloudflare Workers solutions end-to-end — pick the right runtime tier (Workers vs Pages vs Durable Objects vs Workers AI), the right storage (KV vs D1 vs R2 vs Durable Object Storage vs Hyperdrive), the right state pattern (singleton DOs, sharded DOs, hibernating WebSockets, RPC-bound services), and the right limits (CPU time, wall time, subrequest count, request size). Covers R2 multipart uploads, Queues-backed pipelines, Cron Triggers, Tail Workers, Smart Placement, Workers AI model selection, Vectorize embeddings, Hyperdrive for legacy Postgres/MySQL, and migration playbooks from Lambda@Edge, Vercel Edge, Deno Deploy, and AWS API Gateway. Triggers on "cloudflare workers", "cloudflare pages", "durable objects", "workers kv", "d1 database", "r2 storage", "cloudflare queues", "vectorize", "workers ai", "hyperdrive", "smart placement", "tail worker", "cron triggers", "rpc bindings", "wrangler", "service bindings", "edge function", "lambda@edge migration", "vercel edge migration", "deno deploy migration".

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install skilldb:kn7d5eszdfwftk153ymhdm4qhs83qsqy~cloudflare-workers-architect
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/skilldb%3Akn7d5eszdfwftk153ymhdm4qhs83qsqy~cloudflare-workers-architect/file -o cloudflare-workers-architect.md
# Cloudflare Workers Architect

Design and ship production systems on Cloudflare's developer platform. Picks the right primitive for each problem, names the limits that will bite, and emits `wrangler.toml`, bindings, deploy scripts, and a cost model. Acts as a senior platform engineer who has shipped multi-tenant SaaS, real-time collaboration, AI inference, and high-fan-out webhooks on Workers — and migrated stacks off Lambda@Edge, Vercel Edge, and Deno Deploy.

## Usage

Invoke when starting a new Workers project, deciding between primitives, sizing a real-time feature, picking storage, planning a migration, or hitting limits. Equally useful for "what should this be" architecture calls and "this Worker keeps timing out" debugging.

**Basic invocation:**
> Design a real-time collab editor on Cloudflare
> Should this be a Worker, a Pages function, or a Durable Object?
> Migrate our 80 Lambda@Edge handlers to Workers

**With context:**
> Here's the API surface — pick storage and write the wrangler.toml
> p99 hits the 30s wall-time limit; redesign with Queues
> We need 50k WebSocket connections with auth — plan the DO sharding

The agent emits a primitive choice, `wrangler.toml`, binding declarations, code skeletons, deploy commands, and a cost projection.

## Inputs Required

- **Workload shape** — HTTP API / static site / long-running stream / WebSocket / background job / scheduled / AI inference
- **State requirements** — stateless? per-user? per-room? global? eventual or strong?
- **Throughput** — req/s peak, concurrent connections, payload sizes
- **Latency target** — p50 / p95 / p99 budgets
- **Geographic distribution** — global, regional, single-country (data residency)
- **Existing constraints** — current platform if migrating, fixed external APIs, regulatory scope (GDPR, HIPAA)
- **Cost ceiling** — Workers free tier covers a lot; over $200/mo means real design choices

## Workflow

1. Classify the workload against the Decision Tree (below)
2. Pick storage from the Selection Matrix; declare bindings in `wrangler.toml`
3. Map every request path to a primitive (Worker / Pages Function / DO / Queue consumer / Cron Trigger)
4. Identify the limit that will bite first; design around it before code
5. Author `wrangler.toml` with all bindings, routes, and compatibility flags
6. Sketch the data flow: which subrequests fire, in what order, on which path
7. Wire observability: Workers Analytics Engine + Tail Worker for debug logs + Logpush to R2/external
8. Implement and test locally with `wrangler dev --remote` (real bindings)
9. Deploy via `wrangler deploy`; canary via gradual deploys
10. Document rollback (`wrangler rollback` to a known version ID)

## Decision Tree: Pages vs Workers vs Durable Objects vs Workers AI

```
START
 ├── Is the request path a static asset (HTML/JS/CSS/image)?
 │     └── YES → Pages (or Workers Sites if you need full control)
 │
 ├── Is it dynamic but stateless (lookup, transform, proxy, auth)?
 │     └── YES → Worker (HTTP fetch handler)
 │
 ├── Does it need per-entity state (per-user, per-room, per-document)
 │   that must be globally consistent and serialized?
 │     └── YES → Durable Object
 │             ├── If 1-to-1 with users → DO per user, ID = userId
 │             ├── If shared (collab doc, chat room) → DO per room
 │             └── If global counter / global queue → singleton DO
 │
 ├── Is it a long-running stream / WebSocket?
 │     └── YES → Durable Object with Hibernating WebSockets
 │             (free hibernation; pay only for actual messages)
 │
 ├── Is it AI inference (LLM, embedding, Whisper, image)?
 │     └── YES → Workers AI binding (calls into CF's inference fleet)
 │
 ├── Is it a scheduled job?
 │     └── YES → Worker with Cron Trigger
 │
 ├── Is it a queue-driven pipeline (webhooks, fan-out, retries)?
 │     └── YES → Worker producer + Queue + Worker consumer
 │
 └── Does it need to talk to a legacy Postgres/MySQL with low latency?
       └── YES → Hyperdrive binding (connection pool + region pinning)
```

**Pages vs Workers nuance:** Pages = static + opt-in `functions/`. Use Pages when the site is mostly static and you have a few API routes. Use Workers when API is the product, or you need advanced bindings (DOs, Queues, RPC).

**Pages Functions are Workers** under the hood — same runtime, same limits, fewer config knobs. Migrate Pages Functions → Worker when you need: cron triggers, queue consumers, smart placement, custom routes, or service bindings.

## Storage Selection Matrix

| Storage | Read latency | Write latency | Size cap | Consistency | Cost | When |
|---------|-------------|---------------|----------|-------------|------|------|
| **Workers KV** | <50ms (cached) | seconds (eventual) | 25 MiB/value | Eventual (60s) | $0.50/M reads, $5/M writes | Read-heavy global config, feature flags, cached HTML |
| **D1** | 5-50ms | 5-50ms | 10 GB/db | Strong within region | $0.001/1k reads, $1/1M writes | Relational app data, low-write |
| **R2** | 50-200ms | 50-500ms | 5 TiB/object | Strong (immediate) | $0.015/GB/mo, no egress | User uploads, backups, datasets |
| **Durable Object Storage** | <10ms (in-DO) | <50ms | 1 GB/DO | Strong, serialized | Bundled with DO compute | Per-entity state, real-time |
| **Durable Object SQLite** | <5ms | <20ms | 1 GB/DO | Strong, ACID | Bundled | Relational state per entity (newer alt to KV-style DO storage) |
| **Vectorize** | 10-50ms | seconds | 5M vectors/index | Eventual | $0.04/M queried | Embeddings, semantic search |
| **Hyperdrive (Postgres pool)** | 5-20ms (cached) | 10-30ms | external DB | external | $0 + your DB cost | Legacy Postgres/MySQL |
| **Cache API** | <5ms (in PoP) | <10ms | per PoP | per-PoP | free | Per-PoP HTTP response cache |

**Decision rules:**
- **Reads >> writes, global, eventual ok** → KV
- **Relational queries, joins, transactions, low-write** → D1
- **Files, blobs, datasets, images** → R2
- **Per-entity state with strong serialization** → DO Storage (use SQLite variant for relational shape)
- **Embeddings / semantic search** → Vectorize
- **Existing Postgres/MySQL you can't replace** → Hyperdrive
- **Per-PoP HTTP cache (idempotent GET)** → Cache API

**Anti-pattern alert:**
- Don't use KV as a write-heavy store — eventual consistency + write rate limits will burn you
- Don't use D1 for >100 writes/sec sustained — split into per-tenant DOs with SQLite
- Don't use R2 for tiny key-value records — KV is cheaper at small sizes
- Don't use a singleton DO for global state with >1k req/s — that DO's CPU is the bottleneck; shard

## Edge State Patterns

**Pattern 1: Singleton DO** — one DO globally, ID = constant string.
- Use for: global counters, config registries, leader election, low-traffic shared state
- Limit: ~1k req/s per DO; bounded by single-threaded execution
- Failure mode: hot-shard kills throughput

**Pattern 2: DO per entity** — `idFromName(userId)`, `idFromName(roomId)`.
- Use for: per-user state, per-document collab, per-tenant data
- Naturally horizontal: throughput scales with entity count
- Place hint: `locationHint: "weur"` to colocate with the user

**Pattern 3: Sharded DOs** — `idFromName(\`shard-${hash(key) % N}\`)`.
- Use for: high-throughput counters, rate limiters, high-fan-out queues
- N = (target throughput) / (1k req/s per DO) + headroom
- Aggregate via cron Worker that fans out to all shards

**Pattern 4: Hibernating WebSocket DO**
- DO accepts WebSocket via `state.acceptWebSocket(ws)` (NOT `ws.accept()`)
- DO can be evicted from memory between messages — only billed when active
- State persists in DO Storage, not in JS variables
- Up to ~32k connections per DO before throughput pressure

```js
// hibernating WS pattern
export class ChatRoom {
  constructor(state, env) { this.state = state; }
  async fetch(req) {
    const pair = new WebSocketPair();
    this.state.acceptWebSocket(pair[1]);              // hibernation-aware
    return new Response(null, { status: 101, webSocket: pair[0] });
  }
  async webSoc