Scrapeless Scraping Browser Skill
由 Scrapeless 提供支持的用于 AI 代理的云浏览器自动化 CLI。当用户需要使用云浏览器与网站交互时使用,包括导航页面、填写表单、单击按钮、截屏、提取数据、测试 Web 应用程序或使用住宅代理和反检测功能自动执行任何浏览器任务。触发器包括“打开网站”、“填写表单”、“单击按钮”、“截取屏幕截图”、“从页面抓取数据”、“测试此 Web 应用程序”、“使用代理”、“绕过检测”或任何需要云浏览器自动化的任务的请求。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:scrapelesshq~scrapeless-scraping-browser-skillcURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Ascrapelesshq~scrapeless-scraping-browser-skill/file -o scrapeless-scraping-browser-skill.mdGit 仓库获取源码
git clone https://github.com/openclaw/skills/commit/f4147f4f21aa33946a2023067b2aec2eef9ca58c## 概述(中文) 由 Scrapeless 提供支持的用于 AI 代理的云浏览器自动化 CLI。当用户需要使用云浏览器与网站交互时使用,包括导航页面、填写表单、单击按钮、截屏、提取数据、测试 Web 应用程序或使用住宅代理和反检测功能自动执行任何浏览器任务。触发器包括“打开网站”、“填写表单”、“单击按钮”、“截取屏幕截图”、“从页面抓取数据”、“测试此 Web 应用程序”、“使用代理”、“绕过检测”或任何需要云浏览器自动化的任务的请求。 ## 原文 # Cloud Browser Automation with scrapeless-browser ## Important: Session Management with --session-id **All browser operation commands support the `--session-id` parameter to specify which Scrapeless session to use.** ### Recommended Workflow ```bash # Step 1: Create a session and save the session ID SESSION_ID=$(scrapeless-scraping-browser new-session --name "workflow" --ttl 1800 --json | jq -r '.taskId') # Step 2: Use the session ID for all operations scrapeless-scraping-browser --session-id $SESSION_ID open https://example.com scrapeless-scraping-browser --session-id $SESSION_ID snapshot -i scrapeless-scraping-browser --session-id $SESSION_ID click @e1 # Step 3: Close when done scrapeless-scraping-browser --session-id $SESSION_ID close ``` ### Automatic Session Management If you don't specify `--session-id`: 1. The CLI will query for running sessions 2. If a running session exists, it will use the latest one 3. If no running session exists, it will create a new one automatically **For production workflows, always use `--session-id` to ensure consistency.** ## Authentication Setup Before using scrapeless-browser, you MUST set up authentication: ```bash # Method 1: Config file (recommended, persistent) scrapeless-scraping-browser config set apiKey your_api_token_here # Method 2: Environment variable export SCRAPELESS_API_KEY=your_api_token_here # Verify it's set scrapeless-scraping-browser config get apiKey ``` Get your API token from https://app.scrapeless.com ## Session Management Behavior The CLI manages Scrapeless sessions with the following behavior: - **Session Creation**: First command creates a new Scrapeless session - **Session Persistence**: Sessions remain active only while connection is maintained - **Session Termination**: Sessions automatically terminate when connection closes - **Reconnection Limitation**: Cannot reconnect to terminated sessions **Important**: For multi-step workflows, consider using the TypeScript API to maintain persistent connections. ## Core Workflow Every browser automation follows this pattern: 1. **Create Session**: Create a session and save the session ID 2. **Navigate**: Use `--session-id` to navigate to URL 3. **Snapshot**: Get element refs with `--session-id` 4. **Interact**: Use refs to click, fill, select with `--session-id` 5. **Re-snapshot**: After navigation or DOM changes, get fresh refs ```bash # Set API token first scrapeless-scraping-browser config set apiKey your_token # Create session SESSION_ID=$(scrapeless-scraping-browser new-session --name "form-fill" --ttl 600 --json | jq -r '.taskId') # Start automation with session ID scrapeless-scraping-browser --session-id $SESSION_ID open https://example.com/form scrapeless-scraping-browser --session-id $SESSION_ID snapshot -i # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit" scrapeless-scraping-browser --session-id $SESSION_ID fill @e1 "user@example.com" scrapeless-scraping-browser --session-id $SESSION_ID fill @e2 "password123" scrapeless-scraping-browser --session-id $SESSION_ID click @e3 scrapeless-scraping-browser --session-id $SESSION_ID wait --load networkidle scrapeless-scraping-browser --session-id $SESSION_ID snapshot -i # Check result ``` ## Command Chaining Commands can be chained with `&&` in a single shell invocation: ```bash # Chain open + wait + snapshot scrapeless-scraping-browser open https://example.com && scrapeless-scraping-browser wait --load networkidle && scrapeless-scraping-browser snapshot -i # Chain multiple interactions scrapeless-scraping-browser fill @e1 "user@example.com" && scrapeless-scraping-browser fill @e2 "password123" && scrapeless-scraping-browser click @e3 ``` **When to chain:** Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact). ## Essential Commands **Note**: All commands below support the optional `--session-id <id>` parameter. ```bash # Navigation & Session scrapeless-scraping-browser new-session [options] # Create new browser session scrapeless-scraping-browser [--session-id <id>] open <url> # Navigate to URL scrapeless-scraping-browser [--session-id <id>] close # Close browser session scrapeless-scraping-browser sessions # List running sessions scrapeless-scraping-browser stop <taskId> # Stop specific session scrapeless-scraping-browser stop-all # Stop all sessions ``` ### Session Creation with Advanced Options The `new-session` command supports extensive customization options: ```bash # Basic session creation scrapeless-scraping-browser new-session --name "my-session" --ttl 1800 # Session with proxy settings scrapeless-scraping-browser new-session \ --name "proxy-session" \ --proxy-country US \ --proxy-state CA \ --proxy-city "Los Angeles" \ --ttl 3600 # Session with custom browser configuration scrapeless-scraping-browser new-session \ --name "mobile-session" \ --platform iOS \ --screen-width 375 \ --screen-height 812 \ --user-agent "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X)" \ --timezone "America/Los_Angeles" \ --languages "en,es" # Session with recording enabled scrapeless-scraping-browser new-session \ --name "recorded-session" \ --recording true \ --ttl 7200 ``` **Available Options:** - `--name <name>`: Session name for identification - `--ttl <seconds>`: Session timeout in seconds (default: 180) - `--recording <true|false>`: Enable session recording - `--proxy-country <code>`: Proxy country code (e.g., AU, US, GB, CN, JP) - `--proxy-state <state>`: Proxy state/region (e.g., NSW, CA, NY, TX) - `--proxy-city <city>`: Proxy city (e.g., sydney, newyork, london, tokyo) - `--user-agent <ua>`: Custom user agent string - `--platform <platform>`: Platform (Windows, macOS, Linux, iOS, Android) - `--screen-width <px>`: Screen width in pixels (default: 1920) - `--screen-height <px>`: Screen height in pixels (default: 1080) - `--timezone <tz>`: Timezone (default: America/New_York) - `--languages <langs>`: Comma-separated language codes (default: en) ```bash # Snapshot scrapeless-scraping-browser [--session-id <id>] snapshot -i # Interactive elements with refs (recommended) scrapeless-scraping-browser [--session-id <id>] snapshot -i -C # Include cursor-interactive elements scrapeless-scraping-browser [--session-id <id>] snapshot -s "#selector" # Scope to CSS selector # Interaction (use @refs from snapshot) scrapeless-scraping-browser [--session-id <id>] click @e1 # Click element scrapeless-scraping-browser [--session-id <id>] fill @e2 "text" # Clear and type text scrapeless-scraping-browser [--session-id <id>] type @e2 "text" # Type without clearing scrapeless-scraping-browser [--session-id <id>] press Enter # Press key scrapeless-scraping-browser [--session-id <id>] scroll down 500 # Scroll page scrapeless-scraping-browser [--session-id <id>] scroll down 500 --selector "div.content" # Scroll within element # Get information scrapeless-scraping-browser [--session-id <id>] get text @e1 # Get element text scrapeless-scraping-browser [--session-id <id>] get url # Get current URL scrapeless-scraping-browser [--session-id <id>] get title # Get page title scrapeless-scraping-browser [--session-id <id>] screenshot # Take screenshot scrapeless-scraping-browser [--session-id <id>] screenshot --full # Full page screenshot # Wait scrapeless-scraping-browser [--session-id <id>] wait @e1 # Wait for element scrapeless