Research Library

ClawSkills 作者 Sage (for Jon Buckles) v0.1.0

Local-first multimedia research library for hardware projects. Capture code, CAD, PDFs, images. Search with material-type weighting. Project isolation with cross-references. Async extraction. Backup + restore.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install clawskills:jonbuckles~jonbuckles-research-library
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Ajonbuckles~jonbuckles-research-library/file -o jonbuckles-research-library.md
Git 仓库获取源码
git clone https://github.com/[user]/research-library
# Research Library Skill

A local-first multimedia research library for capturing, organizing, and searching hardware project knowledge.

## What It Does

- **Store documents** — Code, PDFs, CAD files, images, schematics
- **Extract automatically** — Text from PDFs, EXIF from images, functions from code
- **Search intelligently** — Full-text with material-type weighting (your work ranks higher than external research)
- **Project isolation** — Arduino separate from CNC; no contamination
- **Cross-reference** — Link knowledge: "this servo tuning applies to that project"
- **Async extraction** — Searches never block while OCR runs
- **Backup daily** — 30-day rolling snapshots

## Installation

```bash
clawhub install research-library
# OR
pip install /path/to/research-library
```

## Quick Start

```bash
# Initialize database
reslib status

# Add a project
reslib add ~/projects/arduino/servo.py --project arduino --material-type reference

# Search
reslib search "servo tuning"

# Link knowledge
reslib link 5 12 --type applies_to
```

## Features

### CLI Commands
- `reslib add` — Import documents (auto-detect + extract)
- `reslib search` — Full-text search with filters
- `reslib get` — View document details
- `reslib archive` / `reslib unarchive` — Manage documents
- `reslib export` — Export as JSON/Markdown
- `reslib link` — Create document relationships
- `reslib projects` — Manage projects
- `reslib tags` — Manage tags
- `reslib status` — System overview
- `reslib backup` / `reslib restore` — Snapshots
- `reslib smoke_test.sh` — Quick validation

### Technical
- **Storage:** SQLite 3.45+ with FTS5 virtual table
- **Extraction:** PDF (pdfplumber + OCR), images (EXIF + OCR), code (AST + regex)
- **Confidence Scoring:** 0.0-1.0 based on quality + source
- **Material Weighting:** Reference (1.0) vs Research (0.5)
- **Project Isolation:** Scoped searches, no contamination
- **Async Workers:** 2-4 configurable extraction workers
- **Catalog Separation:** real_world vs openclaw projects
- **Backup:** Daily snapshots, 30-day retention

## Configuration

Copy `reslib/config.json` and customize:

```json
{
  "db_path": "~/.openclaw/research/library.db",
  "num_workers": 2,
  "worker_timeout_sec": 300,
  "max_retries": 3,
  "backup_retention_days": 30,
  "backup_dir": "~/.openclaw/research/backups",
  "file_size_limit_mb": 200,
  "project_size_limit_gb": 2
}
```

## Integration with War Room

Use RL1 protocol in war room DNA:

```python
from reslib import ResearchDatabase, ResearchSearch

db = ResearchDatabase()
search = ResearchSearch(db)

# Before researching, check existing knowledge
prior = search.search("servo tuning", project="rc-quadcopter")
if prior:
    print(f"Found {len(prior)} prior items")
else:
    # New research needed...
    db.add_research(title="...", content="...", ...)
```

## Performance

All targets exceeded:

| Operation | Target | Actual |
|-----------|--------|--------|
| PDF extraction | <100ms | 20.6ms |
| Search (50 docs) | <100ms | 0.33ms |
| Worker throughput | >6/sec | 414.69/sec |

## Testing

```bash
# Run all tests
pytest tests/

# Quick smoke test
bash reslib/smoke_test.sh

# Performance tests
pytest tests/test_integration.py -v -k stress
```

## Known Limitations (Phase 2)

- OCR quality varies on hand-drawn sketches
- FTS5 designed for <10K documents (PostgreSQL path for scale)
- No automatic web research gathering (manual only)
- Vector embeddings ready but inactive
- CAD file parsing is metadata-only

## Documentation

See `/docs/`:
- `CLI-REFERENCE.md` — All commands + examples
- `EXTRACTION-GUIDE.md` — How extraction works
- `SEARCH-GUIDE.md` — Ranking + weighting
- `WORKER-GUIDE.md` — Async queue details
- `INTEGRATION.md` — War room RL1 protocol

## Phase 2 Roadmap

- Real-world PDF calibration
- FTS5 scaling tests (10K docs)
- Auto-detection (reference vs research)
- Web research enrichment
- Vector embeddings (semantic search)
- PostgreSQL upgrade path

## Building From Source

```bash
cd research-library
pip install -e .
pytest tests/
python -m reslib status
```

## Support

Issues? See TECHNICAL-NOTES.md for troubleshooting.

---

*Production-ready MVP. 214 tests passing. 15K lines. Ready to use.*