phenosnap-phenotype-extractor

ClawSkills 作者 clawskills

Extract clinical phenotypes and medication entities from user-provided text using PhenoSnap, producing a timestamped JSON output.

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install clawskills:clawskills~kaichop-phenoskill
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aclawskills~kaichop-phenoskill/file -o kaichop-phenoskill.md
## When to use
Use this skill when the user provides **their own**:
- clinical phenotypes / symptoms / diagnoses (free text, bullet lists, clinical note-like text), and/or
- drugs/medications (names, dosages, frequencies).

Examples that should trigger:
- “Symptoms: ataxia, seizures, developmental delay. Meds: levetiracetam 500 mg BID.”
- “I’m taking metformin 500mg daily and have fatigue, polyuria, blurry vision.”

## When NOT to use
Do **not** use this skill when:
- The user asks general questions (e.g., “What is HPO?”, “What is a phenotype?”, “What is GLP-1?”).
- The user provides text that is not personal clinical information (news articles, academic paragraphs, code, etc.).
- The user asks you to interpret *someone else’s* private clinical record (PHI) without clear permission.

## Safety & privacy
- Treat user input as potentially sensitive clinical information.
- **Do not upload** user text or extracted results anywhere (this skill is local-only).
- Before writing any input to disk, **redact obvious identifiers**:
  - emails, phone numbers, street addresses
  - MRN-like long numeric identifiers (e.g., 8+ digits)
  - names when clearly presented as “Name: …”
- If the message appears to include **highly identifying PHI** (e.g., name + DOB + address, or name + MRN), **pause and ask for confirmation** to proceed, recommending the user remove identifiers first.

## Requirements / setup
- Python: `python3` available on PATH.
- Network access (only for initial PhenoSnap download if missing).
- HPO OBO file:
  - Default expected path: `{baseDir}/resources/hp.obo`
  - Override path via environment variable: `HPO_OBO_PATH`
  - This skill **does not auto-download** `hp.obo`. You must supply it.

Recommended (best practice):
- Use a virtual environment (venv/conda) before running this skill, because it may install Python packages via `pip`.

## Inputs & outputs
- Input text file (redacted):
  - `{baseDir}/artifacts/phenosnap_inputs/input_<YYYYMMDD_HHMMSS>.txt`
- Output JSON file (timestamped):
  - `{baseDir}/artifacts/phenosnap_outputs/phenotypes_<YYYYMMDD_HHMMSS>.json`
- Third-party download cache:
  - `{baseDir}/third_party/phenosnap_main.zip`
  - `{baseDir}/third_party/get-pip.py`

## Detection heuristic (activation check)
Trigger if the user message contains **any** of:
- phenotype cues: `symptom(s)`, `phenotype(s)`, `Dx`, `diagnosis`, `PMH`, `Hx`, `history of`, or a symptom-like list
- medication cues: `meds`, `medications`, `taking`, `prescribed`, plus patterns like:
  - dosages: `\b\d+(\.\d+)?\s?(mg|mcg|g|ml|units)\b`
  - frequencies: `qd`, `q.d.`, `bid`, `b.i.d.`, `tid`, `t.i.d.`, `qhs`, `qAM`, `qPM`, `daily`, `weekly`

Do not trigger for purely informational questions without user-provided phenotype/medication content.

---

# Procedure

## 0) Create required directories
Create if missing:
- `{baseDir}/PhenoSnap/`
- `{baseDir}/artifacts/phenosnap_inputs/`
- `{baseDir}/artifacts/phenosnap_outputs/`
- `{baseDir}/resources/`
- `{baseDir}/third_party/`

## 1) Confirm / redact sensitive identifiers
1) Scan the user message for identifiers (email/phone/address/MRN/name fields).
2) If strongly identifying PHI is present (name + DOB/address/MRN):
   - Ask the user to confirm proceeding and recommend removing identifiers.
3) Produce a redacted version of the user text:
   - Replace emails with `[REDACTED_EMAIL]`
   - Replace phone numbers with `[REDACTED_PHONE]`
   - Replace long numeric IDs with `[REDACTED_ID]`
   - Replace address-like patterns with `[REDACTED_ADDRESS]`
   - Replace explicit “Name: …” fields with `Name: [REDACTED_NAME]`

## 2) Ensure PhenoSnap exists locally
Target location: `{baseDir}/PhenoSnap/`

### 2A) If `{baseDir}/PhenoSnap/` exists and contains `extract_phenotypes.py`
Proceed to dependency self-test.

### 2B) If `{baseDir}/PhenoSnap/` does NOT exist (or missing `extract_phenotypes.py`)
Prefer git; fallback to zip.

#### If `git` is available
Run:
- `git clone https://github.com/WGLab/PhenoSnap.git "{baseDir}/PhenoSnap"`

#### If `git` is NOT available
Download zip:
- URL: `https://github.com/WGLab/PhenoSnap/archive/refs/heads/main.zip`
- Destination: `{baseDir}/third_party/phenosnap_main.zip`

Download method (pick first available):
- If `curl` exists:
  - `curl -L "https://github.com/WGLab/PhenoSnap/archive/refs/heads/main.zip" -o "{baseDir}/third_party/phenosnap_main.zip"`
- Else on Windows PowerShell:
  - `Invoke-WebRequest -Uri "https://github.com/WGLab/PhenoSnap/archive/refs/heads/main.zip" -OutFile "{baseDir}/third_party/phenosnap_main.zip"`

Unzip (choose by OS/tools):
- Windows PowerShell:
  - `Expand-Archive -Path "{baseDir}/third_party/phenosnap_main.zip" -DestinationPath "{baseDir}/third_party/phenosnap_unzip" -Force`
- macOS/Linux with unzip:
  - `unzip -o "{baseDir}/third_party/phenosnap_main.zip" -d "{baseDir}/third_party/phenosnap_unzip"`

If neither unzip nor Expand-Archive is available, use Python:
- `python3 -c "import zipfile; z=zipfile.ZipFile(r'{baseDir}/third_party/phenosnap_main.zip'); z.extractall(r'{baseDir}/third_party/phenosnap_unzip')"`

Then rename/move the extracted folder to `{baseDir}/PhenoSnap/`:
- The extracted folder is typically `{baseDir}/third_party/phenosnap_unzip/PhenoSnap-main`
- Move/rename to `{baseDir}/PhenoSnap/`

Final check:
- Verify `{baseDir}/PhenoSnap/extract_phenotypes.py` exists.
- If not found, stop and report the directory listing of `{baseDir}/PhenoSnap/` and `{baseDir}/third_party/phenosnap_unzip/`.

## 3) Dependency self-test and auto-install
Run from inside `{baseDir}/PhenoSnap/`.

### 3A) Smoke test importability
Run:
- `python3 -c "import importlib.util; spec=importlib.util.spec_from_file_location('extract_phenotypes','extract_phenotypes.py'); m=importlib.util.module_from_spec(spec); spec.loader.exec_module(m); print('ok')"`

If it prints `ok`, proceed.

### 3B) If smoke test fails with missing module (ModuleNotFoundError / ImportError)
#### Step 1: Ensure pip exists for python3
Check:
- `python3 -m pip --version`

If that fails, try:
- `python3 -m ensurepip --upgrade`

Check again:
- `python3 -m pip --version`

If still failing, bootstrap pip via get-pip.py:
- Download `https://bootstrap.pypa.io/get-pip.py` to `{baseDir}/third_party/get-pip.py`
  - with curl:
    - `curl -L "https://bootstrap.pypa.io/get-pip.py" -o "{baseDir}/third_party/get-pip.py"`
  - or PowerShell:
    - `Invoke-WebRequest -Uri "https://bootstrap.pypa.io/get-pip.py" -OutFile "{baseDir}/third_party/get-pip.py"`
- Install:
  - `python3 "{baseDir}/third_party/get-pip.py"`
- Verify:
  - `python3 -m pip --version`

If pip still cannot be used, stop and report the error output.

#### Step 2: Install PhenoSnap dependencies
From `{baseDir}/PhenoSnap/`, run:
- `python3 -m pip install -r requirements.txt`

#### Step 3: Re-run smoke test once
Re-run:
- `python3 -c "import importlib.util; spec=importlib.util.spec_from_file_location('extract_phenotypes','extract_phenotypes.py'); m=importlib.util.module_from_spec(spec); spec.loader.exec_module(m); print('ok')"`

If still failing, stop and return:
- the missing module name (from error)
- the command output
- recommended fix (use a venv/conda env; verify python3/pip; rerun pip install)

## 4) Prepare HPO OBO path
Resolve HPO OBO path in this order:
1) If env var `HPO_OBO_PATH` is set and file exists, use that.
2) Else use `{baseDir}/resources/hp.obo` if it exists.

If the resolved file does not exist:
- Stop and tell the user to place `hp.obo` at `{baseDir}/resources/hp.obo` or set `HPO_OBO_PATH` to its full path.

## 5) Write input file (redacted)
- Timestamp format: `YYYYMMDD_HHMMSS` (local time).
- Write redacted user text to:
  - `{baseDir}/artifacts/phenosnap_inputs/input_<TS>.txt`

## 6) Run extraction
From `{baseDir}/PhenoSnap/`, run:
- `python3 extract_phenotypes.py --input-file "{baseDir}/artifacts/phenosnap_inputs/input_<TS>.txt" --hpo-obo "<HPO_OBO_PATH>" --output "{baseDir}/artifacts/phenosnap_outputs/phenotypes_<TS>.json" --format json`

Validate:
-