greek-document-ocr

TotalClaw 作者 openclaw-greek-accounting v1.0.0

使用 Tesseract 的希腊语 OCR。处理扫描的发票、收据和政府文件。本地处理,无云 API。

安装 / 下载方式

TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~satoshistackalotto-greek-document-ocr
cURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~satoshistackalotto-greek-document-ocr/file -o satoshistackalotto-greek-document-ocr.md
## 概述(中文)

使用 Tesseract 的希腊语 OCR。处理扫描的发票、收据和政府文件。本地处理,无云 API。

## 原文

# Greek Document OCR

This skill provides advanced Greek language optical character recognition and document processing capabilities, specifically designed for Greek business documents, invoices, receipts, and handwritten materials commonly found in Greek accounting workflows.


## Setup

```bash
export OPENCLAW_DATA_DIR="/data"

# Install Tesseract OCR with Greek language support
sudo apt install tesseract-ocr tesseract-ocr-ell
which jq || sudo apt install jq

mkdir -p $OPENCLAW_DATA_DIR/ocr/{incoming/{scanned,photos,government},output/{text-extracted,structured-data}}
```

All OCR processing happens locally using Tesseract — no cloud OCR APIs are used. No credentials required.


## Core Philosophy

- **Greek Language Excellence**: Superior recognition of Greek characters, accents, and business terminology
- **OpenClaw Integration**: Built to enhance existing `deepread` skill with Greek-specific capabilities
- **Business Document Focus**: Optimized for invoices, receipts, contracts, and government forms
- **Production Accuracy**: High-precision text extraction suitable for automated accounting workflows
- **Handwritten Support**: Advanced recognition of handwritten Greek text and signatures

## OpenClaw Commands

### Core OCR Operations
```bash
# Primary Greek document processing
openclaw ocr process-greek --input-dir /data/ocr/incoming/scanned/ --enhance-greek-chars
openclaw ocr batch-process --greek-language --confidence-threshold 0.95 --auto-classify
openclaw ocr extract-invoices --greek-format --vat-detection --amount-parsing
openclaw ocr process-receipts --greek-business --expense-categorization

# Integration with existing deepread skill
openclaw ocr enhance-deepread --greek-language-pack --improve-accuracy
openclaw ocr greek-preprocessing --image-enhancement --character-optimization
openclaw ocr validate-extraction --greek-language --business-rules

# Specialized Greek document types
openclaw ocr process-handwritten --greek-cursive --signature-detection
openclaw ocr government-forms --aade-forms --efka-documents --municipal-papers
openclaw ocr process-contracts --greek-legal --clause-extraction --signature-verification
```

### Advanced Greek Text Processing
```bash
# Greek language enhancement and correction
openclaw ocr correct-greek --spell-check --accent-correction --business-terminology
openclaw ocr standardize-text --greek-formatting --currency-amounts --date-formats
openclaw ocr extract-entities --greek-names --addresses --vat-numbers --amounts

# Document intelligence and categorization
openclaw ocr classify-documents --greek-business-types --confidence-scoring
openclaw ocr extract-structured-data --invoices --receipts --contracts --forms
openclaw ocr generate-searchable-pdf --greek-text-layer --preserve-formatting
```

### Quality Control & Validation
```bash
# Accuracy monitoring and improvement
openclaw ocr accuracy-test --greek-documents --known-text-samples
openclaw ocr confidence-analysis --character-level --word-level --document-level
openclaw ocr manual-review --low-confidence --flagged-documents --greek-verification

# Integration and export
openclaw ocr export-accounting --format csv --greek-standards
openclaw ocr export-accounting --target quickbooks --xero --greek-formats  # Optional: accounting software formats
openclaw ocr integrate-banking --match-bank-transactions --reference-extraction
openclaw ocr coordinate-compliance --vat-analysis --tax-document-processing
```

## Greek Language Processing Architecture

### Greek Character Recognition Enhancement
```yaml
Greek_Character_Optimization:
  alphabet_coverage:
    uppercase: "ΑΒΓΔΕ΀“ΗΜΙΡ΀ºΜΝξθΠΡΣΤΥΦΧΨΩ"
    lowercase: "αβγδεζηθικλμνξοπρσπžυπ π¡ψπ°"
    accented_characters: "άέήίςύϽΐΰ"
    special_characters: "πš" # Final sigma
    punctuation: "·" # Greek middle dot
    
  character_enhancement:
    similar_character_disambiguation:
      - "Α vs A (Latin A)"
      - "Β vs B (Latin B)" 
      - "Ε vs E (Latin E)"
      - "Η vs H (Latin H)"
      - "Ι vs I (Latin I)"
      - "Ρ vs K (Latin K)"
      - "Μ vs M (Latin M)"
      - "Ν vs N (Latin N)"
      - "θ vs O (Latin O)"
      - "Π vs P (Latin P)"
      - "Ρ vs P (Latin P confusion)"
      - "Τ vs T (Latin T)"
      - "Υ vs Y (Latin Y)"
      - "Χ vs X (Latin X)"
      
  accent_recognition:
    acute_accents: "ά έ ή ί ς ύ Ͻ"
    diaeresis: "Ϡ π¹" 
    combined_accents: "ΐ ΰ"
    accent_correction: "Auto-correct missing or incorrect accents"
```

### Greek Business Document Intelligence
```yaml
Greek_Business_Document_Types:
  invoices:
    greek_keywords: ["ΤΙΜθ΀ºθΓΙθ", "ΑΠθΔΕΙξΗ", "ΠΑΡΑΣΤΑΤΙΡθ"]
    required_elements: ["ΑΦΜ", "ΦΠΑ", "ΗΜΕΡθΜΗΝΙΑ", "ΠθΣθ"]
    amount_patterns: ["‚¬\\d+[.,]\\d+", "\\d+[.,]\\d+\\s*‚¬", "\\d+[.,]\\d+\\s*EUR"]
    vat_patterns: ["ΦΠΑ\\s*\\d+%", "VAT\\s*\\d+%", "24%", "13%", "6%"]
    
  receipts:
    types: ["ΑΠθΔΕΙξΗ ΀ºΙΑΝΙΡΗΣ", "ΑΠθΔΕΙξΗ ΠΑΡθΧΗΣ ΥΠΗΡΕΣΙΩΝ"]
    essential_info: ["ΗΜΕΡθΜΗΝΙΑ", "ΩΡΑΗ", "ΠθΣθ", "ΑΦΜ ΡΑΤΑΣΤΗΜΑΤθΣ"]
    pos_indicators: ["POS", "ΡΑΡΤΑ", "ΜΕΤΡΗΤΑ"]
    
  government_forms:
    aade_forms: ["Ε1", "Ε3", "ΦΠΑ", "ΕΝΦΙΑ"]
    efka_forms: ["Α.Π.Δ.", "ΑΠΑ", "ΕΦΡ", "ΕΡΓθΔθΤΙΡΕΣ ΕΙΣΦθΡΕΣ"]
    municipal_forms: ["ΔΗΜθΤΙΡθΣ ΦθΡθΣ", "ΤΕ΀ºθΣ ΡΑΜΑΡΙθΤΗΤΑΣ"]
    
  contracts:
    contract_types: ["ΣΥΜΒΑΣΗ", "ΣΥΜΦΩΝΙΑ", "ΠΑΡΑΧΩΡΗΣΗ"]
    key_clauses: ["ΑΝΤΙΡΕΙΜΕΝθ", "ΤΙΜΗ", "ΔΙΑΡΡΕΙΑ", "ΥΠθΧΡΕΩΣΕΙΣ"]
    signature_areas: ["ΥΠθΓΡΑΦΗ", "ΣΦΡΑΓΙΔΑ", "ΗΜΕΡθΜΗΝΙΑ ΥΠθΓΡΑΦΗΣ"]
```

### OpenClaw File Processing Integration
```yaml
Greek_OCR_File_Structure:
  input_processing:
    - /data/ocr/incoming/scanned/         # Scanned documents (PDF, JPG, PNG, TIFF)
    - /data/ocr/incoming/photos/          # Mobile phone document photos
    - /data/ocr/incoming/handwritten/     # Handwritten Greek documents
    - /data/ocr/incoming/government/      # Government forms and official documents
    
  processing_workspace:
    - /data/ocr/preprocessing/enhanced/   # Image enhancement and optimization
    - /data/ocr/processing/greek-ocr/     # Greek language OCR processing
    - /data/ocr/processing/validation/    # Text validation and correction
    - /data/ocr/processing/classification/# Document type classification
    
  output_delivery:
    - /data/ocr/output/text-extracted/      # Clean extracted text files
    - /data/ocr/output/structured-data/     # Structured business data (JSON)
    - /data/ocr/output/searchable-pdf/      # PDFs with Greek text layer
    - /data/ocr/output/accounting-ready/    # Data ready for accounting integration
```

## Enhanced Greek OCR Processing Pipeline

### Pre-Processing Optimization for Greek Documents
```yaml
Image_Enhancement_Pipeline:
  step_1_assessment:
    command: "openclaw ocr assess-quality --greek-text --character-density"
    functions: ["Image quality analysis", "Greek text detection", "Optimal processing path selection"]
    
  step_2_enhancement:
    command: "openclaw ocr enhance-image --greek-characters --contrast-optimization"
    functions: ["Noise reduction", "Contrast enhancement", "Greek character sharpening"]
    
  step_3_preprocessing:
    command: "openclaw ocr preprocess --deskew --border-removal --greek-layout"
    functions: ["Document alignment", "Border detection", "Greek text layout analysis"]
    
Greek_Specific_Enhancements:
  character_enhancement:
    accent_sharpening: "Enhance accent mark visibility"
    character_separation: "Improve separation of connected Greek letters"
    font_optimization: "Optimize for common Greek fonts (Times New Roman Greek, Arial Greek)"
    
  layout_analysis:
    greek_reading_order: "Right-to-left aware processing for mixed text"
    column_detection: "Handle Greek newspaper and document column layouts"
    table_recognition: "Greek table headers and structure recognition"
```

### Advanced Greek Text Extraction
```yaml
Greek_OCR_Engine_Configuration:
  primary_ocr_engine:
    base: "OpenClaw deepread skill enhancement"