email-finder

TotalClaw 作者 totalclaw

通过结合网站抓取、搜索 dorking 来查找域的电子邮件地址模式猜测、DNS 分析和 SMTP 验证。寻找潜在客户时使用，查找用于外展的联系电子邮件，或从公司域构建联系人列表。（可选）提供人员姓名以生成和验证常见的电子邮件模式。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~psyduckler-email-finder

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~psyduckler-email-finder/file -o psyduckler-email-finder.md

## 概述（中文）

通过结合网站抓取、搜索 dorking 来查找域的电子邮件地址
模式猜测、DNS 分析和 SMTP 验证。寻找潜在客户时使用，
查找用于外展的联系电子邮件，或从公司域构建联系人列表。
（可选）提供人员姓名以生成和验证常见的电子邮件模式。

## 原文

# Email Finder

Discover email addresses associated with a domain using multiple methods.

## How It Works

1. **Website Scraping** — Fetches homepage, /contact, /about, /team pages and extracts emails via regex
2. **Search Dorking** — Searches for published emails in directories and search engines
3. **Pattern Guessing** — If a name is provided, generates common patterns (first@, first.last@, flast@, etc.)
4. **DNS Hints** — Checks MX/SPF/DMARC records to identify the email provider
5. **SMTP Verification** — Verifies all found/guessed emails using RCPT TO

## Dependencies

```bash
pip3 install dnspython
```

## Usage

### Basic domain search
```bash
python3 scripts/find_emails.py example.com
```

### With name for pattern guessing
```bash
python3 scripts/find_emails.py example.com --name "John Smith"
```

### Skip SMTP verification
```bash
python3 scripts/find_emails.py example.com --no-verify
```

### Options
- `--name "First Last"` — Enable pattern guessing for a specific person
- `--no-verify` — Skip SMTP verification step
- `--timeout SECONDS` — Connection timeout (default: 10)

## Output

JSON to stdout:

```json
{
  "domain": "example.com",
  "provider": "Google Workspace",
  "mx": ["aspmx.l.google.com"],
  "spf": "v=spf1 include:_spf.google.com ~all",
  "dmarc": "v=DMARC1; p=reject; rua=mailto:dmarc@example.com",
  "emails_found": 2,
  "emails": [
    {
      "email": "info@example.com",
      "source": "scraped",
      "deliverable": "yes",
      "smtp_detail": "2.1.5 OK"
    },
    {
      "email": "john.smith@example.com",
      "source": "guessed",
      "deliverable": "catch-all",
      "smtp_detail": "2.1.5 OK"
    }
  ]
}
```

### Source values

| Value | Meaning |
|-------|---------|
| `scraped` | Found on the domain's website |
| `searched` | Found via search/directory lookup |
| `guessed` | Generated from name patterns |
| `dns` | Found in DNS records (DMARC reports, etc.) |

### Deliverable values

| Value | Meaning |
|-------|---------|
| `yes` | Server accepted the recipient |
| `no` | Server rejected the recipient (invalid) |
| `catch-all` | Server accepts all addresses |
| `unknown` | Could not determine |
| `not_checked` | Verification was skipped |

## Rate Limiting

The script includes built-in rate limiting at every stage to protect your IP:

```bash
# Defaults: 0.5s between page fetches, 2s between SMTP checks, max 15 SMTP checks
python3 scripts/find_emails.py example.com --name "John Smith"

# Conservative settings for sensitive environments
python3 scripts/find_emails.py example.com --scrape-delay 1.0 --smtp-delay 4 --max-smtp-checks 8

# Just scrape, no SMTP (zero risk)
python3 scripts/find_emails.py example.com --no-verify
```

### Options
- `--scrape-delay SECONDS` — Pause between website page fetches (default: 0.5)
- `--smtp-delay SECONDS` — Pause between SMTP verification checks (default: 2.0)
- `--max-smtp-checks N` — Max SMTP verifications per run (default: 15). Remaining emails get `not_checked` status.

### Why rate limiting matters

This tool hits both web servers and mail servers. Without rate limiting:
- **Web scraping** — Aggressive crawling gets your IP blocked by WAFs (Cloudflare, etc.) and makes you look like a bot. Respectful delays avoid this.
- **SMTP verification** — Mail servers flag IPs making rapid RCPT TO requests. Your IP can get blacklisted, affecting your ability to send real email.
- **Residential IPs are fragile** — Unlike datacenter IPs, your home/office IP is shared across all your internet activity. Getting it blacklisted affects everything.

### Guidelines for agents

| Scenario | Recommended approach |
|----------|---------------------|
| Single domain lookup | Defaults are fine |
| Domain + name pattern guessing | Defaults are fine (15 SMTP checks covers all patterns) |
| Multiple domains in sequence | Add 5-10s pause between domains. Don't run more than 20 domains/day |
| Just need the email provider | Use `--no-verify` — DNS-only, zero risk |
| Bulk prospecting (50+ domains) | Use a paid service (Hunter.io, Apollo) or spread across multiple days |

**Key principle:** The script is designed for targeted lookups, not mass scraping. If you need to process hundreds of domains, use a dedicated service with proper IP reputation management.

## Limitations

- Website scraping depends on emails being visible in page source (won't find obfuscated/JS-rendered emails)
- Search engines may block automated queries
- SMTP verification requires outbound port 25 access
- Catch-all domains accept all addresses — can't confirm real inboxes
- Be respectful: the script adds delays between requests but don't run it in tight loops