Local GLM OCR with llama.cpp on AIPC(no API Key)

ClawSkills 作者 violet17 v1.0.0

Image OCR, text recognition, extract text from image, scan document, read image text, invoice OCR, receipt OCR, contract recognition, table extraction, business card OCR, ID recognition, screenshot text extraction, document digitization. Runs locally on Windows using the GLM-OCR model, supports mixed Chinese/English text, prioritizes Intel iGPU inference, no cloud API calls.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install clawskills:violet17~local-image-ocr-aipc

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Aviolet17~local-image-ocr-aipc/file -o local-image-ocr-aipc.md

Git 仓库获取源码

git clone https://github.com/openclaw/skills/commit/f67b479ca6a5a58c30070213c78ab3babb7307dd

# Image OCR — Local AI PC (Windows · GLM-OCR · llama.cpp Vulkan)

**Model**: `ggml-org/GLM-OCR-GGUF` (Q8_0, HuggingFace / hf-mirror)  
**Inference**: `llama-cli` (llama.cpp Vulkan prebuilt)  
**SKILL_VERSION**: `1.0.0`

## Directory Structure (auto-created or user-specified)

```
<OCR_DIR>\                        ← auto-selected drive or user-specified (e.g. C:\image-ocr or D:\image-ocr)
├── llama.cpp\                    ← llama-cli.exe and related binaries
└── models\
    └── GLM-OCR-GGUF\
        ├── GLM-OCR-Q8_0.gguf        ← main model (~950 MB)
        └── mmproj-GLM-OCR-Q8_0.gguf ← vision projection layer (~484 MB, required)
```

> ## ⚠️ Before You Install — Security & Compliance Disclosure
>
> **This is an instruction-only skill** (no install spec). The agent will execute PowerShell steps
> described in this file to set up a local OCR environment. Review before granting autonomous execution.
>
> **What this skill does to your system:**
>
> | Action | Source | Risk |
> |--------|--------|------|
> | Download and extract `llama-cli.exe` and related binaries | `github.com/ggml-org/llama.cpp` releases | Medium — runs a downloaded executable |
> | Download model files (~1.5 GB total) | `huggingface.co` or `modelscope.cn` | Low — large file transfer |
> | Auto-install **Miniforge** if Python not found | `github.com/conda-forge/miniforge` | Medium — modifies user Python environment |
> | Create `<OCR_DIR>` and write files to disk | Local filesystem only | Low |
>
> **Recommendations before proceeding:**
>
> 1. **Do not run as administrator.** All steps are designed for standard user permissions.
>    Install to a dedicated directory (e.g. `C:\image-ocr`) and inspect files before executing.
> 2. **Verify checksums before executing.** Step 1 automatically fetches and validates the SHA256
>    hash of the llama.cpp ZIP before extraction. Step 2 computes and displays SHA256 hashes for
>    each model file so you can cross-check them against the HuggingFace model page. If any hash
>    does not match, stop and do not proceed.
> 3. **Prefer manual execution.** Run the PowerShell steps in this file yourself rather than
>    granting the agent full autonomy. Each step is self-contained and can be run independently.
> 4. **HUGGINGFACE_TOKEN is optional and sensitive.** The GLM-OCR model (`ggml-org/GLM-OCR-GGUF`)
>    is publicly available — no token is needed. If you use a gated model, set
>    `$env:HUGGINGFACE_TOKEN` only when necessary and treat it as a secret credential.
> 5. **Miniforge auto-install modifies your environment.** If you are uncomfortable with automatic
>    Python installation, decline that step and provide a Python path manually via `$customPythonExe`.
>
> **Trusted sources used by this skill:**
> - `https://github.com/ggml-org/llama.cpp/releases`
> - `https://huggingface.co/ggml-org/GLM-OCR-GGUF`
> - `https://github.com/conda-forge/miniforge` *(only if Miniforge auto-install is triggered)*
>
> **Dependencies**: Model files are downloaded via Python's `huggingface_hub` (`hf download`)
> or `modelscope`. If Python is not installed, Step 2 will automatically install Miniforge
> to `%USERPROFILE%\miniforge3` (no admin rights required).

---

## ⚠️ AI Assistant Instructions

1. Execute one command at a time; wait for output before proceeding.
2. Stop immediately on error; refer to the Troubleshooting table at the end.
3. Wrap all paths in double quotes.
4. `<OCR_DIR>` is the absolute working directory path, determined after Pre-flight.
5. **Single goal**: Recognize image content and return text results.

**Execution flow (do not skip steps)**:
```
Pre-flight: Check working dir + llama.cpp + models      → STATUS values
Step 1:     Install / update llama.cpp (only if MISSING) → LLAMA_OK
Step 2:     Download models (only if MISSING)            → MODEL_OK
Step 3:     Process recognition result + output          → Return result
```

**Progress reporting**: Announce each step before starting, e.g.: `🔍 Pre-flight: Checking environment…`

---

## Pre-flight: Check Environment

> 🔍 Pre-flight: Checking working directory, llama.cpp, and model files…

### Locate Working Directory

```powershell
# ── Fix encoding for non-ASCII paths (required at the start of every PowerShell script) ──
chcp 65001 | Out-Null
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8

# ── Optional: if you already have a path, fill it in; leave blank to auto-select drive ──
$customOcrDir = ""   # e.g. "C:\image-ocr" or "D:\image-ocr"
# ──────────────────────────────────────────────────────────────────────────────────────────

if ($customOcrDir -and (Test-Path (Split-Path $customOcrDir))) {
    $OCR_DIR = $customOcrDir
    New-Item -ItemType Directory -Force -Path $OCR_DIR | Out-Null
    Write-Host "OCR_DIR=$OCR_DIR (user-specified)"
} else {
    $best = Get-PSDrive -PSProvider FileSystem |
        Where-Object { $_.Free -gt 0 } |
        Sort-Object Free -Descending |
        Select-Object -First 1
    $OCR_DIR = Join-Path "$($best.Root)" "image-ocr"
    New-Item -ItemType Directory -Force -Path $OCR_DIR | Out-Null
    Write-Host "OCR_DIR=$OCR_DIR (auto-selected drive: $($best.Name))"
}
$env:OCR_DIR = $OCR_DIR
```

**Success criteria**: Output contains a line with `OCR_DIR=`. Record the path and substitute `<OCR_DIR>` in subsequent steps.

---

### Check llama.cpp

```powershell
$llamaDir = "<OCR_DIR>\llama.cpp"
$cliExe   = "$llamaDir\llama-cli.exe"

if (Test-Path $cliExe) {
    $ver = & $cliExe --version 2>&1
    if ($ver -match "version:\s*(\d+)") {
        $build = [int]$Matches[1]
        if ($build -ge 8400) {
            Write-Host "OK: llama.cpp build $build >= b8400, skip Step 1"
            Write-Host "LLAMA_STATUS=READY"
        } else {
            Write-Host "WARN: llama.cpp build $build < b8400, upgrade required"
            Write-Host "LLAMA_STATUS=OUTDATED"
        }
    }
} else {
    Write-Host "ERROR: llama-cli.exe not found"
    Write-Host "LLAMA_STATUS=MISSING"
    Write-Host "   Checked path: $llamaDir"
}
```

---

### Check Model Files

```powershell
$modelDir   = "<OCR_DIR>\models\GLM-OCR-GGUF"
$modelFile  = "$modelDir\GLM-OCR-Q8_0.gguf"
$mmprojFile = "$modelDir\mmproj-GLM-OCR-Q8_0.gguf"

$modelOk  = Test-Path $modelFile
$mmprojOk = Test-Path $mmprojFile

if ($modelOk -and $mmprojOk) {
    Write-Host "OK: GLM-OCR model files ready, skip Step 2"
    Write-Host "MODEL_STATUS=READY"
} else {
    if (-not $modelOk)  { Write-Host "ERROR: Missing GLM-OCR-Q8_0.gguf" }
    if (-not $mmprojOk) { Write-Host "ERROR: Missing mmproj-GLM-OCR-Q8_0.gguf" }
    Write-Host "MODEL_STATUS=MISSING"
    Write-Host "   Checked path: $modelDir"
}
```

| Output | Action |
|--------|--------|
| Both `READY` | ✅ Skip to Step 3 |
| `LLAMA_STATUS=MISSING/OUTDATED` | ⬇️ Execute Step 1 |
| `MODEL_STATUS=MISSING` | ⬇️ Execute Step 2 |

Announce: `✅ Environment check complete. Execute steps as needed.`

---


## Step 1: Install / Update llama.cpp Vulkan

> ⬇️ Step 1: Downloading and installing llama.cpp Vulkan… (only when `LLAMA_STATUS=MISSING/OUTDATED`)

> **Consent required**: Before proceeding, inform the user:
> - A ZIP (~50–100 MB) will be downloaded from `github.com/ggml-org/llama.cpp/releases`
> - It will be extracted to `<OCR_DIR>\llama.cpp\` and the original ZIP will be deleted
> - `llama-cli.exe` will be placed on disk and called directly by this skill
>
> Ask the user to confirm before running the download command.

```powershell
$tag      = "b8400"   # Replace with the latest tag from https://github.com/ggml-org/llama.cpp/releases/latest
$llamaDir = "<OCR_DIR>\llama.cpp"
$zip      = "$env:TEMP\llama-vulkan.zip"
$url      = "https://github.com/ggml-org/llama.cpp/releases/download/$tag/llama-$tag-bin-win-vulkan-x64.zip"

Write-Host "Downloading llama.cpp $tag ..."
Invoke-WebRequest -Uri $url -OutFile $zip

# ── Checksum verification ──────────────────────────────────────────────────────
# Fetch the SHA256 checksum file publi