desearch-crawl

TotalClaw 作者 totalclaw

从任何网页 URL 中抓取/抓取并提取内容。以纯文本或原始 HTML 形式返回页面内容。当您需要阅读特定网页的完整内容时,请使用此选项。

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~okradze-desearch-crawl
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~okradze-desearch-crawl/file -o okradze-desearch-crawl.md
# Crawl Webpage By Desearch

Extract content from any webpage URL. Returns clean text or raw HTML.

## Quick Start

1. Get an API key from https://console.desearch.ai
2. Set environment variable: `export DESEARCH_API_KEY='your-key-here'`

## Usage

```bash
# Crawl a webpage (returns clean text by default)
scripts/desearch.py crawl "https://en.wikipedia.org/wiki/Artificial_intelligence"

# Get raw HTML
scripts/desearch.py crawl "https://example.com" --crawl-format html
```


## Options

| Option | Description |
|--------|-------------|
| `--crawl-format` | Output content format: `text` (default) or `html` |

## Examples

### Read a documentation page
```bash
scripts/desearch.py crawl "https://docs.python.org/3/tutorial/index.html"
```

### Get raw HTML for analysis
```bash
scripts/desearch.py crawl "https://example.com/page" --crawl-format html
```

## Response

### Example (`format=text`, truncated, default)
```
Artificial intelligence (AI) is the capability of computational systems to perform tasks that typically require human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making...
```

### Example (`format=html`, truncated)
```html
<!DOCTYPE html>
<html>
  <head><title>Artificial intelligence - Wikipedia</title></head>
  <body>
    <p>Artificial intelligence (AI) is the capability of computational systems...</p>
  </body>
</html>
```

### Notes
- Response is plain text or raw HTML — not JSON.
- Default format is `text`. Use `--crawl-format html` only when you need to inspect page structure.
- Prefer `text` format to avoid bloating the agent context with markup.

### Errors
Status 401, Unauthorized (e.g., missing/invalid API key)
```json
{
  "detail": "Invalid or missing API key"
}
```

Status 402, Payment Required (e.g., balance depleted)
```json
{
  "detail": "Insufficient balance, please add funds to your account to continue using the service."
}
```

## Resources
- [API Reference](https://desearch.ai/docs/api-reference/get-web-crawl)
- [Desearch Console](https://console.desearch.ai)

---

## 中文说明

# 通过 Desearch 抓取网页

从任何网页 URL 中提取内容。返回干净的文本或原始 HTML。

## 快速开始

1. 从 https://console.desearch.ai 获取 API 密钥
2. 设置环境变量:`export DESEARCH_API_KEY='your-key-here'`

## 用法

```bash
# Crawl a webpage (returns clean text by default)
scripts/desearch.py crawl "https://en.wikipedia.org/wiki/Artificial_intelligence"

# Get raw HTML
scripts/desearch.py crawl "https://example.com" --crawl-format html
```


## 选项

| 选项 | 描述 |
|--------|-------------|
| `--crawl-format` | 输出内容格式:`text`(默认)或 `html` |

## 示例

### 阅读一个文档页面
```bash
scripts/desearch.py crawl "https://docs.python.org/3/tutorial/index.html"
```

### 获取用于分析的原始 HTML
```bash
scripts/desearch.py crawl "https://example.com/page" --crawl-format html
```

## 响应

### 示例(`format=text`,已截断,默认)
```
Artificial intelligence (AI) is the capability of computational systems to perform tasks that typically require human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making...
```

### 示例(`format=html`,已截断)
```html
<!DOCTYPE html>
<html>
  <head><title>Artificial intelligence - Wikipedia</title></head>
  <body>
    <p>Artificial intelligence (AI) is the capability of computational systems...</p>
  </body>
</html>
```

### 说明
- 响应是纯文本或原始 HTML — 而非 JSON。
- 默认格式为 `text`。仅在需要检查页面结构时才使用 `--crawl-format html`。
- 优先使用 `text` 格式,以避免标记内容使 agent 上下文膨胀。

### 错误
状态 401,Unauthorized(例如,API 密钥缺失/无效)
```json
{
  "detail": "Invalid or missing API key"
}
```

状态 402,Payment Required(例如,余额耗尽)
```json
{
  "detail": "Insufficient balance, please add funds to your account to continue using the service."
}
```

## 资源
- [API Reference](https://desearch.ai/docs/api-reference/get-web-crawl)
- [Desearch Console](https://console.desearch.ai)