virtual-desktop-pro

GitHub 作者 LeoYeAI/openclaw-master-skills v4.0.0

Persistent authenticated browser for OpenClaw via kasmweb/chrome Docker sidecar. Principal logs in once via noVNC — sessions saved permanently in Docker volume. Agent navigates any website, clicks, fills forms, extracts data, uploads files, takes screenshots, solves CAPTCHAs autonomously, and analyses pages with Claude Vision. Use when the task requires a real authenticated browser, not a static fetch.

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install github:LeoYeAI~openclaw-master-skills~virtual-desktop-pro
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/github%3ALeoYeAI~openclaw-master-skills~virtual-desktop-pro/file -o virtual-desktop-pro.md
# Virtual Desktop — Authenticated Browser Layer

## What this skill does

Gives the agent a persistent authenticated browser (kasmweb/chrome) running
as a Docker sidecar. Principal logs in once via noVNC. Sessions saved permanently.

| Capability | What it means |
|---|---|
| **ANALYZE** | Read any page, extract structured data, monitor changes over time |
| **PLAN** | Map the UI, identify selectors, prepare multi-step action sequences |
| **EXECUTE** | Click, type, fill forms, submit, upload, download, navigate any flow |
| **SELF-CORRECT** | Screenshot error state, identify root cause, retry with alternate approach |
| **IMPROVE** | Write UI patterns and selector maps to `.learnings/` after every session |

Use cases: Google Workspace · social platforms · admin dashboards · e-commerce ·
forms · market research · data extraction · any platform with or without an API

---

## Workspace Structure

```
/workspace/
├── screenshots/          ← visual proof of every action (auto-created)
├── logs/browser/         ← full tracebacks (auto-created)
├── tasks/lessons.md      ← immediate task capture during mission
├── AUDIT.md              ← append-only action log
├── memory/YYYY-MM-DD.md  ← daily session summary
└── .learnings/
    ├── ERRORS.md         ← errors, broken selectors, ref maps
    └── LEARNINGS.md      ← patterns, timing, navigation per platform
```


## When to Use

Use this skill when the task requires a **real authenticated browser**:

- Pages requiring login (Google, social networks, dashboards, admin panels)
- JS-rendered pages where static fetch returns nothing useful
- Multi-step flows: forms, checkouts, confirmations, file uploads
- Platforms without an API
- Screenshots or visual evidence of a page state
- CAPTCHA-protected pages

**Prefer a lighter path first** — if a simple HTTP request or existing OpenClaw
tool can answer the question, use that instead. This skill uses more tokens and
resources than plain fetch.

---

## Architecture

This skill runs a persistent **kasmweb/chrome** Docker sidecar alongside OpenClaw.
Principal logs in once via noVNC (port 6901). Sessions saved permanently in a Docker volume.

Three execution paths — load only what the task needs:

| Path | When to use | File |
|---|---|---|
| **OpenClaw native browser** | Simple navigate/click/extract — fastest, fewest tokens | Built-in |
| **browser_control.py** | AUDIT logging, workflows, CAPTCHA, Vision | `browser_control.py` |
| **noVNC (manual)** | Initial login, 2FA, session renewal | Port 6901 |

**Load only the smallest path needed.** Simple navigation → OpenClaw native.
Complex multi-step with logging → browser_control.py.

---

## Setup — Run Once

```bash
OPENCLAW_DIR="${OPENCLAW_DIR:-$(pwd)}"
cd "$OPENCLAW_DIR"
CONTAINER="${OPENCLAW_CONTAINER:-$(docker ps --format '{{.Names}}' | grep openclaw | head -1)}"

# 1. Add kasmweb/chrome to docker-compose.yml
python3 -c "
import yaml, os
VNC_PW = os.environ.get('VNC_PW') or __import__('secrets').token_urlsafe(18)
with open('docker-compose.yml') as f:
    data = yaml.safe_load(f)
data.setdefault('services', {})['browser'] = {
    'image': 'kasmweb/chrome:1.15.0',
    'container_name': 'browser',
    'restart': 'unless-stopped',
    'shm_size': '1gb',
    'ports': ['6901:6901', '9222:9222'],
    'environment': [
        'VNC_PW=' + VNC_PW,
        'RESOLUTION=1920x1080',
        'CHROME_ARGS=--remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 --no-sandbox --disable-blink-features=AutomationControlled --disable-infobars'
    ],
    'volumes': ['browser-profile:/home/kasm-user/chrome-profile'],
    'networks': list(data.get('networks', {'default': None}).keys())
}
data.setdefault('volumes', {})['browser-profile'] = None
with open('docker-compose.yml', 'w') as f:
    yaml.dump(data, f, default_flow_style=False, allow_unicode=True)
print('docker-compose.yml updated')
"

# 2. Update .env
# VNC_PW — generate a strong random password if not already set
if ! grep -q "VNC_PW" .env 2>/dev/null; then
  VNC_GENERATED=$(python3 -c "import secrets,string;     print(''.join(secrets.choice(string.ascii_letters+string.digits) for _ in range(24)))")
  echo "VNC_PW=${VNC_GENERATED}" >> .env
  echo "✅ VNC_PW generated — save this: ${VNC_GENERATED}"
fi
grep -q "BROWSER_CDP_URL"     .env || echo "BROWSER_CDP_URL=http://browser:9222" >> .env
grep -q "CAPSOLVER_API_KEY"   .env || echo "CAPSOLVER_API_KEY="                  >> .env
grep -q "BROWSERBASE_API_KEY" .env || echo "BROWSERBASE_API_KEY="                >> .env

# 3. Update openclaw.json — hot reload, no restart needed
python3 -c "
import json, os
f = 'data/.openclaw/openclaw.json'
with open(f) as fp: cfg = json.load(fp)
cfg.setdefault('browser', {}).update({'enabled': True, 'headless': False,
    'noSandbox': True, 'defaultProfile': 'chrome-sidecar'})
profiles = cfg['browser'].setdefault('profiles', {})
profiles['chrome-sidecar'] = {'cdpUrl': 'http://browser:9222', 'color': '#4285F4'}
bb_key = os.environ.get('BROWSERBASE_API_KEY', '')
if bb_key:
    profiles['browserbase'] = {'cdpUrl': f'wss://connect.browserbase.com?apiKey={bb_key}', 'color': '#F97316'}
with open(f, 'w') as fp: json.dump(cfg, fp, indent=2)
print('openclaw.json updated — hot reload active')
"

# 4. Start browser container only — OpenClaw keeps running
docker compose up -d --no-deps browser
sleep 12

# 5. Install Python dependencies
docker exec "$CONTAINER" pip install requests playwright --break-system-packages -q
docker exec "$CONTAINER" node /app/node_modules/playwright-core/cli.js install chromium
echo "✅ Python dependencies installed"

# 6. Download CapSolver extension (optional — only if key present)
CAPSOLVER_KEY=$(grep CAPSOLVER_API_KEY .env | cut -d= -f2)
if [ -n "$CAPSOLVER_KEY" ]; then
  docker exec "$CONTAINER" bash -c "
  apt-get install -y unzip curl -qq
  curl -sL https://github.com/capsolver/capsolver-browser-extension/releases/latest/download/chrome.zip \
    -o /tmp/capsolver.zip
  unzip -q /tmp/capsolver.zip -d /data/.openclaw/capsolver-extension
  sed -i \"s/apiKey: \\\"\\\"/apiKey: \\\"$CAPSOLVER_KEY\\\"/\" \
    /data/.openclaw/capsolver-extension/assets/config.js 2>/dev/null
  "
  echo "✅ CapSolver extension configured"
fi

# 7. Create workspace directories and deploy browser_control.py
docker exec "$CONTAINER" bash -c "
mkdir -p /data/.openclaw/workspace/skills/virtual-desktop
mkdir -p /workspace/screenshots /workspace/logs/browser /workspace/.learnings /workspace/memory
touch /workspace/AUDIT.md /workspace/.learnings/ERRORS.md /workspace/.learnings/LEARNINGS.md
"
docker cp {baseDir}/browser_control.py \
  "$CONTAINER":/data/.openclaw/workspace/skills/virtual-desktop/browser_control.py
echo "✅ browser_control.py deployed"

# 8. Verify
docker ps | grep -E "openclaw|browser"
curl -s http://localhost:9222/json > /dev/null && echo "✅ Chrome CDP active" || echo "⏳ Chrome starting"
docker exec "$CONTAINER" \
  python3 /data/.openclaw/workspace/skills/virtual-desktop/browser_control.py status

# 9. Notify principal
VPS_IP=$(curl -s ifconfig.me 2>/dev/null || echo "YOUR_VPS_IP")
echo "Virtual Desktop ready — https://${VPS_IP}:6901"
echo "Log in to your platforms via noVNC then reply DONE."
```

---

## Initial Login — Once Per Platform

```
https://YOUR_VPS_IP:6901   login: kasm_user   password: your VNC_PW
```

Open Chrome via noVNC and log in to every platform you want the agent to access.
Sessions saved in Docker volume `browser-profile` — survive restarts — valid indefinitely.

**Step by step — do this once after setup:**

```
1. Open https://YOUR_VPS_IP:6901 in your browser
2. Enter password: your VNC_PW value from .env
3. Chrome Desktop opens inside the browser

4. Log in to Google (accounts.google.com)
   → Email + password + 2FA if required
   → "Trust this device" → YES
   → This unlocks: Gmail, Drive, Calendar, Docs,
     Sheets, Google AI Studio, YouTube, all Google services

5. Log in to every other platform you want Wesley to access:
   → Twitter/X        → twitter.com
   →