livekit

TotalClaw 作者 totalclaw

使用 LiveKit 构建语音 AI 代理。在开发实时语音应用程序、语音代理管道 (STT-LLM-TTS)、WebRTC 通信或部署对话式 AI 时使用。涵盖 LiveKit Agents SDK、提供商选择（Deepgram、OpenAI、ElevenLabs、Cartesia）、云与自托管部署。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~zoroposkai-livekit

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~zoroposkai-livekit/file -o zoroposkai-livekit.md

## 概述（中文）

使用 LiveKit 构建语音 AI 代理。在开发实时语音应用程序、语音代理管道 (STT-LLM-TTS)、WebRTC 通信或部署对话式 AI 时使用。涵盖 LiveKit Agents SDK、提供商选择（Deepgram、OpenAI、ElevenLabs、Cartesia）、云与自托管部署。

## 原文

# LiveKit Voice AI Skill

Build production voice agents with LiveKit's open-source platform.

## Quick Start

```bash
# Install SDK
pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia

# Or Node.js
npm install @livekit/agents @livekit/agents-plugin-openai
```

## Minimal Agent (Python)

```python
from livekit.agents import AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import deepgram, openai, cartesia

async def entrypoint(ctx: JobContext):
    await ctx.connect()
    
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4.1-mini"),
        tts=cartesia.TTS(),
    )
    
    session.start(ctx.room)
    await session.say("Hello! How can I help?")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
```

## Provider Selection

| Component | Budget | Quality | Low Latency |
|-----------|--------|---------|-------------|
| **STT** | Deepgram Nova-3 | AssemblyAI | Deepgram Keychain |
| **LLM** | GPT-4.1 mini | Claude Sonnet | GPT-4.1 mini |
| **TTS** | Cartesia Sonic-3 | ElevenLabs | Cartesia Sonic-3 |

## Voice Pipeline vs Realtime

**STT-LLM-TTS Pipeline:**
- More control, mix providers
- Generally cheaper
- Easier to debug

**OpenAI Realtime API:**
- Speech-to-speech, more expressive
- Higher cost (~$0.10/min)
- Less control

## Environment Variables

```bash
LIVEKIT_URL=wss://your-app.livekit.cloud
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret

# Provider keys (if not using LiveKit Inference)
OPENAI_API_KEY=
DEEPGRAM_API_KEY=
CARTESIA_API_KEY=
ELEVENLABS_API_KEY=
```

## Tool Use

```python
from livekit.agents import function_tool

@function_tool()
async def get_weather(location: str) -> str:
    """Get current weather for a location."""
    # Your implementation
    return f"Weather in {location}: 72°F, sunny"

session = AgentSession(
    stt=deepgram.STT(),
    llm=openai.LLM(),
    tts=cartesia.TTS(),
    tools=[get_weather],
)
```

## Telephony (SIP)

```python
from livekit import api

# Outbound call
await lk_api.sip.create_sip_participant(
    api.CreateSIPParticipantRequest(
        sip_trunk_id="trunk-id",
        sip_call_to="+15551234567",
        room_name="my-room",
    )
)
```

## Deployment

**LiveKit Cloud:** `livekit-server-cli deploy --project my-project`

**Self-hosted:** 
```bash
docker run -d \
  -p 7880:7880 -p 7881:7881 -p 7882:7882/udp \
  -e LIVEKIT_KEYS="api-key: api-secret" \
  livekit/livekit-server
```

## Cost Estimates

| Scenario | Monthly Cost |
|----------|--------------|
| Dev/testing | Free tier |
| 100 hrs/mo voice | ~$150-250 |
| Production B2B | ~$300-500 |
| High volume | Self-host |

## Common Patterns

### Turn Detection
```python
session = AgentSession(
    turn_detection=openai.TurnDetection(
        threshold=0.5,
        silence_duration_ms=500,
    ),
    ...
)
```

### Interruption Handling
```python
@session.on("user_speech_started")
async def handle_interruption():
    session.stop_speaking()
```

### Multi-Agent Handoff
```python
await session.transfer_to(specialist_agent)
```

## References

- Docs: https://docs.livekit.io/agents/
- Examples: https://github.com/livekit/agents/tree/main/examples
- Playground: https://agents-playground.livekit.io