tesseract-ocr

TotalClaw 作者 totalclaw

直接通过命令行使用 Tesseract OCR 引擎从图像中提取文本。 支持中文、英文等多种语言。使用这个技能 当用户需要从图像中提取文本、识别图像中的文本内容时, 或在不依赖 Python 的情况下执行 OCR 任务。

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~whalefell-tesseract-ocr
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~whalefell-tesseract-ocr/file -o whalefell-tesseract-ocr.md
## 概述(中文)

直接通过命令行使用 Tesseract OCR 引擎从图像中提取文本。
支持中文、英文等多种语言。使用这个技能 
当用户需要从图像中提取文本、识别图像中的文本内容时, 
或在不依赖 Python 的情况下执行 OCR 任务。

## 原文

# Tesseract OCR Skill

Extract text content from images using the Tesseract engine directly via command line.

## Features

- Extract text from image files using native tesseract CLI
- Support multi-language recognition (Chinese, English, etc.)
- No Python dependencies required
- Simple and fast

## Dependencies

Install Tesseract OCR system package:

```bash
# Ubuntu/Debian:
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim

# macOS:
brew install tesseract tesseract-lang
```

## Usage

### Basic Usage

```bash
# Use default language (English)
tesseract /path/to/image.png stdout

# Specify language (Chinese + English)
tesseract /path/to/image.png stdout -l chi_sim+eng

# Save to file
tesseract /path/to/image.png output.txt -l chi_sim+eng

# Multiple languages
tesseract /path/to/image.png stdout -l chi_sim+eng+jpn
```

### Common Language Codes

| Language | Code |
|----------|------|
| Simplified Chinese | chi_sim |
| Traditional Chinese | chi_tra |
| English | eng |
| Japanese | jpn |
| Korean | kor |
| Chinese + English | chi_sim+eng |

### Quick Examples

```bash
# OCR with Chinese support
tesseract image.jpg stdout -l chi_sim

# OCR with mixed Chinese and English
tesseract image.png stdout -l chi_sim+eng

# Save to file instead of stdout
tesseract document.png result -l chi_sim+eng
# Creates result.txt
```

## Notes

1. OCR accuracy depends on image quality; use clear images for best results
2. Complex layouts (tables, multi-column) may require post-processing
3. Chinese recognition requires the tesseract-ocr-chi-sim language pack
4. Language packs must be installed separately on your system