crawl-for-ai

TotalClaw 作者 Ania v1.0.1

使用本地 Crawl4AI 实例进行网页抓取。用于通过 JavaScript 渲染获取整页内容。对于复杂的页面,比 Tavilly 更好。无限制使用。

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~angusthefuzz-crawl-for-ai
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~angusthefuzz-crawl-for-ai/file -o angusthefuzz-crawl-for-ai.md
## 概述(中文)

使用本地 Crawl4AI 实例进行网页抓取。用于通过 JavaScript 渲染获取整页内容。对于复杂的页面,比 Tavilly 更好。无限制使用。

## 原文

# Crawl4AI Web Scraper

Local Crawl4AI instance for full web page extraction with JavaScript rendering.

## Endpoints

**Proxy (port 11234)** — Clean output, OpenWebUI-compatible
- Returns: `[{page_content, metadata}]`
- Use for: Simple content extraction

**Direct (port 11235)** — Full output with all data
- Returns: `{results: [{markdown, html, links, media, ...}]}`
- Use for: When you need links, media, or other metadata

## Usage

```bash
# Via script
node {baseDir}/scripts/crawl4ai.js "url"
node {baseDir}/scripts/crawl4ai.js "url" --json
```

**Script options:**
- `--json` — Full JSON response

**Output:** Clean markdown from the page.

## Configuration

**Required environment variable:**

- `CRAWL4AI_URL` — Your Crawl4AI instance URL (e.g., `http://localhost:11235`)

**Optional:**

- `CRAWL4AI_KEY` — API key if your instance requires authentication

## Features

- **JavaScript rendering** — Handles dynamic content
- **Unlimited usage** — Local instance, no API limits
- **Full content** — HTML, markdown, links, media, tables
- **Better than Tavily** for complex pages with JS

## API

Uses your local Crawl4AI instance REST API. Auth header only sent if `CRAWL4AI_KEY` is set.