semfind

TotalClaw 作者 totalclaw

使用嵌入对本地文本文件进行语义搜索。当 grep/ripgrep 由于确切的措辞未知而无法找到相关结果时,或者根据含义而不是模式进行搜索时使用 - 例如,当实际文本显示“容器构建失败”时,在日志中搜索“部署问题”。使用“pip install semfind”安装。非常适合按含义搜索内存文件、项目文档、日志和注释。

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~paperboardofficial-semfind
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~paperboardofficial-semfind/file -o paperboardofficial-semfind.md
## 概述(中文)

使用嵌入对本地文本文件进行语义搜索。当 grep/ripgrep 由于确切的措辞未知而无法找到相关结果时,或者根据含义而不是模式进行搜索时使用 - 例如,当实际文本显示“容器构建失败”时,在日志中搜索“部署问题”。使用“pip install semfind”安装。非常适合按含义搜索内存文件、项目文档、日志和注释。

## 原文

# semfind

Semantic grep for the terminal. Searches files by meaning using local embeddings (BAAI/bge-small-en-v1.5 + FAISS). No API keys needed.

## When to reach for semfind

1. `grep` or `ripgrep` returned no results or irrelevant results
2. You don't know the exact wording of what you're looking for
3. You want to search by concept/meaning rather than exact text

Do NOT use semfind when grep works — grep is instant and has zero overhead.

## Install

```bash
pip install semfind
```

First run downloads a ~65MB model (~10-30s). Subsequent runs use the cached model.

## Usage

```bash
# Basic search
semfind "deployment issue" logs.md

# Search multiple files, top 3 results
semfind "permission error" memory/*.md -k 3

# With context lines
semfind "database migration" notes.md -n 2

# Force re-index after file changes
semfind "query" file.md --reindex

# Minimum similarity threshold
semfind "auth bug" *.md -m 0.5
```

## Options

| Flag | Description | Default |
|------|-------------|---------|
| `-k, --top-k` | Number of results | 5 |
| `-n, --context` | Context lines before/after | 0 |
| `-m, --max-distance` | Minimum similarity score | none |
| `--reindex` | Force re-embed | false |
| `--no-cache` | Skip embedding cache | false |

## Output format

Grep-like with similarity scores:

```
file.md:9: [2026-01-15] Fixed docker build with missing env vars  (0.796)
file.md:3: [2026-01-17] Agent couldn't write to /var/log          (0.689)
```

Higher scores (closer to 1.0) mean stronger semantic match.

## Resource usage

- ~250MB RAM while running, freed immediately on exit
- ~65MB model cached in `/tmp/fastembed_cache/`
- ~2s first query (model load), ~14ms cached queries
- Embedding cache in `~/.cache/semfind/`, auto-invalidates on file changes

## Workflow pattern

```bash
# Step 1: Try grep first
grep "deployment" memory/*.md

# Step 2: If grep fails, use semfind
semfind "something went wrong with the deployment" memory/*.md -k 5
```