arxiv-gamedevbench-evaluating-agentic-capabili

TotalClaw 作者 totalclaw

从 arXiv 论文 GameDevBench：通过游戏开发评估代理能力学习。使用此技能根据论文方法搭建 Node.js 实验。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~wanng-ide-arxiv-gamedevbench-evaluating-agentic-capabili

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~wanng-ide-arxiv-gamedevbench-evaluating-agentic-capabili/file -o wanng-ide-arxiv-gamedevbench-evaluating-agentic-capabili.md

# arxiv-gamedevbench-evaluating-agentic-capabili

## Source
- Paper key: 44f3ad505bee7a5c25a60d2a3686cb7e
- Title: GameDevBench: Evaluating Agentic Capabilities Through Game Development
- Categories: cs.AI,cs.CL,cs.SE

## Learned insight
Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of software development with the need for deep multimodal understanding. Game development provides such a testbed as agents must navigate large, dense codebases while manipulating intrinsically multimodal assets such as shaders, sprites, and animations within a visual game scene. We present GameDevBench, the first

## Node.js implementation entry
`node {baseDir}/scripts/run.js`

---

## 中文说明

# arxiv-gamedevbench-evaluating-agentic-capabili

## 来源
- 论文 key：44f3ad505bee7a5c25a60d2a3686cb7e
- 标题：GameDevBench: Evaluating Agentic Capabilities Through Game Development
- 类别：cs.AI,cs.CL,cs.SE

## 习得洞察
尽管编码代理进展迅速，其多模态对应物的进展却相对滞后。一个关键挑战是缺乏将软件开发的复杂性与深度多模态理解需求相结合的评估测试平台。游戏开发提供了这样一个测试平台，因为代理必须在浏览庞大、密集的代码库的同时，在可视化游戏场景中操纵着色器、精灵图、动画等本质上多模态的资产。我们提出了 GameDevBench，这是首个……

## Node.js 实现入口
`node {baseDir}/scripts/run.js`