agent-lightning

TotalClaw 作者 Microsoft Research v1.0.0

微软研究院的代理培训框架。通过强化学习、自动提示优化和监督微调来优化 AI 代理。需要零代码更改。可与 LangChain、AutoGen、CrewAI、OpenAI Agent SDK 配合使用。

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~olmmlo-cmd-agent-lightning

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~olmmlo-cmd-agent-lightning/file -o olmmlo-cmd-agent-lightning.md

Git 仓库获取源码

git clone https://github.com/microsoft/agent-lightning

# Agent Lightning ⚡

Microsoft Research's agent training framework. Turn your AI agents into optimizable beasts with (almost) zero code changes.

## Core Features

- **🔌 Universal Compatibility**: Works with LangChain, OpenAI Agent SDK, AutoGen, CrewAI, Microsoft Agent Framework, or plain Python OpenAI
- **🎯 Selective Optimization**: Optimize one or more agents in a multi-agent system
- **🧠 Multiple Algorithms**: Reinforcement Learning (RL), Automatic Prompt Optimization (APO), Supervised Fine-tuning (SFT)
- **⚡ Zero Code Change**: Add `agl.emit_xxx()` helpers or use tracer — your agent keeps running as usual

## Installation

```bash
pip install agentlightning
```

For latest nightly build:
```bash
pip install --upgrade --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ --pre agentlightning
```

## Quick Start

### 1. Instrument Your Agent

**Option A: Add emit helpers (recommended)**
```python
import agentlightning as agl

# In your agent's tool calls
response = agl.emit_tool_call(
    model=model,
    messages=messages,
    tools=tools,
    context={"task": "search"}
)
```

**Option B: Use tracer (zero code change)**
```python
from agentlightning import tracer

# Wrap your agent with tracer
with tracer.trace("my-agent", input_data):
    result = your_agent.run(user_query)
```

### 2. Create Training Config

```yaml
# config.yaml
agent:
  name: "my-agent"
  type: "openai"  # openai, langchain, autogen, crewai

training:
  algorithm: "grpo"  # grpo, apo, sft, rloo
  episodes: 100
  batch_size: 16
  
environment:
  eval_tasks:
    - "math"
    - "coding"
    - "reasoning"
```

### 3. Run Training

```bash
agent-lightning train --config config.yaml
```

## Algorithms

| Algorithm | Use Case | Description |
|-----------|----------|-------------|
| **GRPO** | General RL | Group Relative Policy Optimization — stable, works well for most agents |
| **APO** | Prompt Tuning | Automatic Prompt Optimization — improves system prompts |
| **SFT** | Supervised Fine-tuning | Supervised Fine-tuning with preference data |
| **RLOO** | Long-horizon | RLOO for tasks with sparse rewards |

## Usage Commands

### `agent-lightning train`
Train your agent with configured algorithm.

### `agent-lightning eval`
Evaluate agent on benchmark tasks.

### `agent-lightning export`
Export trained model/prompts for deployment.

### `agent-lightning serve`
Launch serving endpoint for trained agent.

## Example: SQL Agent Training

See full example: [Train SQL Agent with RL](https://microsoft.github.io/agent-lightning/stable/how-to/train-sql-agent/)

```python
from agentlightning import Agent, RLConfig, GRPOTrainer

# 1. Define your agent
sql_agent = Agent(
    name="sql-agent",
    system_prompt="You are a SQL expert...",
    tools=[execute_sql, query_schema]
)

# 2. Configure RL training
config = RLConfig(
    algorithm="grpo",
    episodes=500,
    learning_rate=1e-4
)

# 3. Train
trainer = GRPOTrainer(config=config)
trainer.train(sql_agent, eval_tasks=["sql-generation"])
```

## Integration with Clawdbot

### Environment Variables

```bash
# Required for training
export OPENAI_API_KEY="sk-..."

# Optional: for remote storage
export AGL_STORAGE="s3://my-bucket/agent-lightning/"
```

### Python API

```python
from agentlightning import LightningStore, GRPOTrainer

# LightningStore keeps tasks, resources, and traces in sync
store = LightningStore()

# Read traces, learn, and update prompts
trainer = GRPOTrainer(store=store)
trainer.train(agent=my_agent)
```

## Monitoring Training

```bash
# Launch dashboard
agent-lightning dashboard --port 8080

# View logs
tail -f ~/.agent-lightning/logs/training.log
```

## Best Practices

1. **Start Small**: Begin with 10-50 episodes to verify setup
2. **Define Clear Rewards**: Design reward functions that match your goal
3. **Use Evaluation Tasks**: Always eval on held-out tasks
4. **Checkpoint Frequently**: Save model every N episodes
5. **Monitor Convergence**: Watch loss curves in dashboard

## Resources

- [Documentation](https://microsoft.github.io/agent-lightning/)
- [Examples](https://github.com/microsoft/agent-lightning/tree/main/examples)
- [API Reference](https://microsoft.github.io/agent-lightning/stable/reference/)
- [ArXiv Paper](https://arxiv.org/abs/2508.03680)
- [Discord Community](https://discord.gg/RYkC7dvDR7)

## Citation

If you use Agent Lightning in research:

```bibtex
@misc{luo2025agentlightningtrainai,
  title={Agent Lightning: Train ANY AI Agents with Reinforcement Learning},
  author={Xufang Luo and Yuge Zhang and Zhiyuan He and Zilong Wang and Siyun Zhao and Dongsheng Li and Luna K. Qiu and Yuqing Yang},
  year={2025},
  eprint={2508.03680},
  archivePrefix={arXiv},
  primaryClass={cs.AI}
}
```

---

## 中文说明

# Agent Lightning ⚡

微软研究院的代理训练框架。只需（几乎）零代码改动，即可把你的 AI 代理变成可优化的猛兽。

## 核心特性

- **🔌 通用兼容性**：可与 LangChain、OpenAI Agent SDK、AutoGen、CrewAI、Microsoft Agent Framework 或纯 Python OpenAI 配合使用
- **🎯 选择性优化**：在多代理系统中优化一个或多个代理
- **🧠 多种算法**：强化学习（RL）、自动提示优化（APO）、监督微调（SFT）
- **⚡ 零代码改动**：添加 `agl.emit_xxx()` 辅助函数或使用 tracer——你的代理照常运行

## 安装

```bash
pip install agentlightning
```

获取最新的每夜构建版本：
```bash
pip install --upgrade --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ --pre agentlightning
```

## 快速开始

### 1. 为你的代理添加埋点

**方式 A：添加 emit 辅助函数（推荐）**
```python
import agentlightning as agl

# In your agent's tool calls
response = agl.emit_tool_call(
    model=model,
    messages=messages,
    tools=tools,
    context={"task": "search"}
)
```

**方式 B：使用 tracer（零代码改动）**
```python
from agentlightning import tracer

# Wrap your agent with tracer
with tracer.trace("my-agent", input_data):
    result = your_agent.run(user_query)
```

### 2. 创建训练配置

```yaml
# config.yaml
agent:
  name: "my-agent"
  type: "openai"  # openai, langchain, autogen, crewai

training:
  algorithm: "grpo"  # grpo, apo, sft, rloo
  episodes: 100
  batch_size: 16
  
environment:
  eval_tasks:
    - "math"
    - "coding"
    - "reasoning"
```

### 3. 运行训练

```bash
agent-lightning train --config config.yaml
```

## 算法

| 算法 | 适用场景 | 说明 |
|-----------|----------|-------------|
| **GRPO** | 通用 RL | 组相对策略优化——稳定，对大多数代理效果良好 |
| **APO** | 提示调优 | 自动提示优化——改进系统提示 |
| **SFT** | 监督微调 | 使用偏好数据的监督微调 |
| **RLOO** | 长程任务 | 适用于稀疏奖励任务的 RLOO |

## 使用命令

### `agent-lightning train`
用配置好的算法训练你的代理。

### `agent-lightning eval`
在基准任务上评估代理。

### `agent-lightning export`
导出训练好的模型/提示以供部署。

### `agent-lightning serve`
为训练好的代理启动服务端点。

## 示例：SQL 代理训练

查看完整示例：[Train SQL Agent with RL](https://microsoft.github.io/agent-lightning/stable/how-to/train-sql-agent/)

```python
from agentlightning import Agent, RLConfig, GRPOTrainer

# 1. Define your agent
sql_agent = Agent(
    name="sql-agent",
    system_prompt="You are a SQL expert...",
    tools=[execute_sql, query_schema]
)

# 2. Configure RL training
config = RLConfig(
    algorithm="grpo",
    episodes=500,
    learning_rate=1e-4
)

# 3. Train
trainer = GRPOTrainer(config=config)
trainer.train(sql_agent, eval_tasks=["sql-generation"])
```

## 与 Clawdbot 集成

### 环境变量

```bash
# Required for training
export OPENAI_API_KEY="sk-..."

# Optional: for remote storage
export AGL_STORAGE="s3://my-bucket/agent-lightning/"
```

### Python API

```python
from agentlightning import LightningStore, GRPOTrainer

# LightningStore keeps tasks, resources, and traces in sync
store = LightningStore()

# Read traces, learn, and update prompts
trainer = GRPOTrainer(store=store)
trainer.train(agent=my_agent)
```

## 监控训练

```bash
# Launch dashboard
agent-lightning dashboard --port 8080

# View logs
tail -f ~/.agent-lightning/logs/training.log
```

## 最佳实践

1. **从小处着手**：从 10-50 个 episode 开始以验证配置
2. **定义清晰的奖励**：设计与你目标匹配的奖励函数
3. **使用评估任务**：始终在留出的任务上进行评估
4. **频繁保存检查点**：每 N 个 episode 保存一次模型
5. **监控收敛**：在仪表盘中观察损失曲线

## 资源

- [文档](https://microsoft.github.io/agent-lightning/)
- [示例](https://github.com/microsoft/agent-lightning/tree/main/examples)
- [API 参考](https://microsoft.github.io/agent-lightnin