crawl-for-ai
使用本地 Crawl4AI 实例进行网页抓取。用于通过 JavaScript 渲染获取整页内容。对于复杂的页面,比 Tavilly 更好。无限制使用。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~angusthefuzz-crawl-for-aicURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~angusthefuzz-crawl-for-ai/file -o angusthefuzz-crawl-for-ai.md## 概述(中文)
使用本地 Crawl4AI 实例进行网页抓取。用于通过 JavaScript 渲染获取整页内容。对于复杂的页面,比 Tavilly 更好。无限制使用。
## 原文
# Crawl4AI Web Scraper
Local Crawl4AI instance for full web page extraction with JavaScript rendering.
## Endpoints
**Proxy (port 11234)** — Clean output, OpenWebUI-compatible
- Returns: `[{page_content, metadata}]`
- Use for: Simple content extraction
**Direct (port 11235)** — Full output with all data
- Returns: `{results: [{markdown, html, links, media, ...}]}`
- Use for: When you need links, media, or other metadata
## Usage
```bash
# Via script
node {baseDir}/scripts/crawl4ai.js "url"
node {baseDir}/scripts/crawl4ai.js "url" --json
```
**Script options:**
- `--json` — Full JSON response
**Output:** Clean markdown from the page.
## Configuration
**Required environment variable:**
- `CRAWL4AI_URL` — Your Crawl4AI instance URL (e.g., `http://localhost:11235`)
**Optional:**
- `CRAWL4AI_KEY` — API key if your instance requires authentication
## Features
- **JavaScript rendering** — Handles dynamic content
- **Unlimited usage** — Local instance, no API limits
- **Full content** — HTML, markdown, links, media, tables
- **Better than Tavily** for complex pages with JS
## API
Uses your local Crawl4AI instance REST API. Auth header only sent if `CRAWL4AI_KEY` is set.