Scrapeless Scraping Browser Skill

TotalClaw 作者 scrapelesshq v1.0.0

由 Scrapeless 提供支持的用于 AI 代理的云浏览器自动化 CLI。当用户需要使用云浏览器与网站交互时使用,包括导航页面、填写表单、单击按钮、截屏、提取数据、测试 Web 应用程序或使用住宅代理和反检测功能自动执行任何浏览器任务。触发器包括“打开网站”、“填写表单”、“单击按钮”、“截取屏幕截图”、“从页面抓取数据”、“测试此 Web 应用程序”、“使用代理”、“绕过检测”或任何需要云浏览器自动化的任务的请求。

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:scrapelesshq~scrapeless-scraping-browser-skill
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Ascrapelesshq~scrapeless-scraping-browser-skill/file -o scrapeless-scraping-browser-skill.md
Git 仓库获取源码
git clone https://github.com/openclaw/skills/commit/f4147f4f21aa33946a2023067b2aec2eef9ca58c
## 概述(中文)

由 Scrapeless 提供支持的用于 AI 代理的云浏览器自动化 CLI。当用户需要使用云浏览器与网站交互时使用,包括导航页面、填写表单、单击按钮、截屏、提取数据、测试 Web 应用程序或使用住宅代理和反检测功能自动执行任何浏览器任务。触发器包括“打开网站”、“填写表单”、“单击按钮”、“截取屏幕截图”、“从页面抓取数据”、“测试此 Web 应用程序”、“使用代理”、“绕过检测”或任何需要云浏览器自动化的任务的请求。

## 原文

# Cloud Browser Automation with scrapeless-browser

## Important: Session Management with --session-id

**All browser operation commands support the `--session-id` parameter to specify which Scrapeless session to use.**

### Recommended Workflow

```bash
# Step 1: Create a session and save the session ID
SESSION_ID=$(scrapeless-scraping-browser new-session --name "workflow" --ttl 1800 --json | jq -r '.taskId')

# Step 2: Use the session ID for all operations
scrapeless-scraping-browser --session-id $SESSION_ID open https://example.com
scrapeless-scraping-browser --session-id $SESSION_ID snapshot -i
scrapeless-scraping-browser --session-id $SESSION_ID click @e1

# Step 3: Close when done
scrapeless-scraping-browser --session-id $SESSION_ID close
```

### Automatic Session Management

If you don't specify `--session-id`:
1. The CLI will query for running sessions
2. If a running session exists, it will use the latest one
3. If no running session exists, it will create a new one automatically

**For production workflows, always use `--session-id` to ensure consistency.**

## Authentication Setup

Before using scrapeless-browser, you MUST set up authentication:

```bash
# Method 1: Config file (recommended, persistent)
scrapeless-scraping-browser config set apiKey your_api_token_here

# Method 2: Environment variable
export SCRAPELESS_API_KEY=your_api_token_here

# Verify it's set
scrapeless-scraping-browser config get apiKey
```

Get your API token from https://app.scrapeless.com

## Session Management Behavior

The CLI manages Scrapeless sessions with the following behavior:

- **Session Creation**: First command creates a new Scrapeless session
- **Session Persistence**: Sessions remain active only while connection is maintained
- **Session Termination**: Sessions automatically terminate when connection closes
- **Reconnection Limitation**: Cannot reconnect to terminated sessions

**Important**: For multi-step workflows, consider using the TypeScript API to maintain persistent connections.

## Core Workflow

Every browser automation follows this pattern:

1. **Create Session**: Create a session and save the session ID
2. **Navigate**: Use `--session-id` to navigate to URL
3. **Snapshot**: Get element refs with `--session-id`
4. **Interact**: Use refs to click, fill, select with `--session-id`
5. **Re-snapshot**: After navigation or DOM changes, get fresh refs

```bash
# Set API token first
scrapeless-scraping-browser config set apiKey your_token

# Create session
SESSION_ID=$(scrapeless-scraping-browser new-session --name "form-fill" --ttl 600 --json | jq -r '.taskId')

# Start automation with session ID
scrapeless-scraping-browser --session-id $SESSION_ID open https://example.com/form
scrapeless-scraping-browser --session-id $SESSION_ID snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

scrapeless-scraping-browser --session-id $SESSION_ID fill @e1 "user@example.com"
scrapeless-scraping-browser --session-id $SESSION_ID fill @e2 "password123"
scrapeless-scraping-browser --session-id $SESSION_ID click @e3
scrapeless-scraping-browser --session-id $SESSION_ID wait --load networkidle
scrapeless-scraping-browser --session-id $SESSION_ID snapshot -i  # Check result
```

## Command Chaining

Commands can be chained with `&&` in a single shell invocation:

```bash
# Chain open + wait + snapshot
scrapeless-scraping-browser open https://example.com && scrapeless-scraping-browser wait --load networkidle && scrapeless-scraping-browser snapshot -i

# Chain multiple interactions
scrapeless-scraping-browser fill @e1 "user@example.com" && scrapeless-scraping-browser fill @e2 "password123" && scrapeless-scraping-browser click @e3
```

**When to chain:** Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).

## Essential Commands

**Note**: All commands below support the optional `--session-id <id>` parameter.

```bash
# Navigation & Session
scrapeless-scraping-browser new-session [options]              # Create new browser session
scrapeless-scraping-browser [--session-id <id>] open <url>      # Navigate to URL
scrapeless-scraping-browser [--session-id <id>] close           # Close browser session
scrapeless-scraping-browser sessions                           # List running sessions
scrapeless-scraping-browser stop <taskId>                      # Stop specific session
scrapeless-scraping-browser stop-all                           # Stop all sessions
```

### Session Creation with Advanced Options

The `new-session` command supports extensive customization options:

```bash
# Basic session creation
scrapeless-scraping-browser new-session --name "my-session" --ttl 1800

# Session with proxy settings
scrapeless-scraping-browser new-session \
  --name "proxy-session" \
  --proxy-country US \
  --proxy-state CA \
  --proxy-city "Los Angeles" \
  --ttl 3600

# Session with custom browser configuration
scrapeless-scraping-browser new-session \
  --name "mobile-session" \
  --platform iOS \
  --screen-width 375 \
  --screen-height 812 \
  --user-agent "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X)" \
  --timezone "America/Los_Angeles" \
  --languages "en,es"

# Session with recording enabled
scrapeless-scraping-browser new-session \
  --name "recorded-session" \
  --recording true \
  --ttl 7200
```

**Available Options:**
- `--name <name>`: Session name for identification
- `--ttl <seconds>`: Session timeout in seconds (default: 180)
- `--recording <true|false>`: Enable session recording
- `--proxy-country <code>`: Proxy country code (e.g., AU, US, GB, CN, JP)
- `--proxy-state <state>`: Proxy state/region (e.g., NSW, CA, NY, TX)
- `--proxy-city <city>`: Proxy city (e.g., sydney, newyork, london, tokyo)
- `--user-agent <ua>`: Custom user agent string
- `--platform <platform>`: Platform (Windows, macOS, Linux, iOS, Android)
- `--screen-width <px>`: Screen width in pixels (default: 1920)
- `--screen-height <px>`: Screen height in pixels (default: 1080)
- `--timezone <tz>`: Timezone (default: America/New_York)
- `--languages <langs>`: Comma-separated language codes (default: en)

```bash

# Snapshot
scrapeless-scraping-browser [--session-id <id>] snapshot -i             # Interactive elements with refs (recommended)
scrapeless-scraping-browser [--session-id <id>] snapshot -i -C          # Include cursor-interactive elements
scrapeless-scraping-browser [--session-id <id>] snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
scrapeless-scraping-browser [--session-id <id>] click @e1               # Click element
scrapeless-scraping-browser [--session-id <id>] fill @e2 "text"         # Clear and type text
scrapeless-scraping-browser [--session-id <id>] type @e2 "text"         # Type without clearing
scrapeless-scraping-browser [--session-id <id>] press Enter             # Press key
scrapeless-scraping-browser [--session-id <id>] scroll down 500         # Scroll page
scrapeless-scraping-browser [--session-id <id>] scroll down 500 --selector "div.content"  # Scroll within element

# Get information
scrapeless-scraping-browser [--session-id <id>] get text @e1            # Get element text
scrapeless-scraping-browser [--session-id <id>] get url                 # Get current URL
scrapeless-scraping-browser [--session-id <id>] get title               # Get page title
scrapeless-scraping-browser [--session-id <id>] screenshot              # Take screenshot
scrapeless-scraping-browser [--session-id <id>] screenshot --full       # Full page screenshot

# Wait
scrapeless-scraping-browser [--session-id <id>] wait @e1                # Wait for element
scrapeless