PDF Converter

ClawSkills 作者 youna12345 v1.0.2

PDF conversion toolkit featuring AI layout analysis and OCR. Converts PDFs to Word, Markdown, JSON, PPT, CSV, HTML, and XML for seamless LLM data processing.

源码 ↗

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install clawskills:youna12345~pdf-convert-compdf

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/clawskills%3Ayouna12345~pdf-convert-compdf/file -o pdf-convert-compdf.md

Git 仓库获取源码

git clone https://github.com/openclaw/skills/commit/026369ca561dea13653939ef7441a71db6326fc8

# PDF Converter

## Purpose
- Wraps the `ComPDFKitConversion` Python SDK into a reusable local conversion workflow, supporting PDF / image to Word, PPT, Excel, HTML, RTF, Image, TXT, JSON, Markdown, and CSV (10 output formats in total).

## Agent Skills Standard Compatibility
- This Skill uses an Anthropic Agent Skills-compatible directory structure: `pdf-convert-compdf/`.
- The entry point is `SKILL.md`; helper scripts are placed in `scripts/`.
- The document uses `$ARGUMENTS` and `${CLAUDE_SKILL_DIR}` conventions for distribution and execution in Claude Code / Agent Skills-compatible environments.

## Input / Output
- Input: The target format (`word`/`excel`/`ppt`/`html`/`rtf`/`image`/`txt`/`json`/`markdown`/`csv`), the PDF or image path, and the output path are passed via Skill arguments or the command line. An optional PDF password and conversion parameters may also be provided.
- Supported input file types:
  - PDF files (`.pdf`)
  - Image files (`.jpg`/`.jpeg`/`.png`/`.bmp`/`.tif`/`.tiff`/`.webp`/`.jp2`/`.gif`/`.tga`)
- Output: A file in the corresponding format (`.docx`, `.pptx`, `.xlsx`, `.html`, `.rtf`, image, `.txt`, `.json`, `.md`, `.csv`), or a clear error message.

## Prerequisites
- Supports Windows and macOS.
- The conversion SDK must be installed first:
  ```bash
  pip install ComPDFKitConversion
  ```
- On first run, the script automatically downloads `license.xml` from the ComPDF server and caches it in the `scripts/` directory:
  ```text
  https://download.compdf.com/skills/license/license.xml
  ```
- The script reads the `<key>...</key>` field from `license.xml` and uses that key for `LibraryManager.license_verify(...)` authentication — it does not pass the XML file path directly to the SDK.
- To use a custom license, place your own `license.xml` in the `scripts/` directory; the script will use it directly without downloading.
- During SDK initialization, the `resource` directory is always set to the directory containing `pdf-convert-compdf.py`, i.e., the `scripts/` directory itself.
- When `--enable-ocr` or `--enable-ai-layout` (enabled by default) is used, the Skill also requires `scripts/documentai.model`. If the file does not exist, the script will automatically download it from:
  ```text
  https://download.compdf.com/skills/model/documentai.model
  ```
- To reuse an existing model file, you can override the default model path via an environment variable:
  ```bash
  export COMPDF_DOCUMENT_AI_MODEL="/path/to/documentai.model"
  ```

## Workflow
1. Confirm the Python package is installed:
   ```bash
   python -m pip show ComPDFKitConversion
   ```
2. The script automatically downloads `license.xml` on first run; the `scripts/` directory is used directly as the SDK `resource` path.
3. In Agent Skills / Claude Code environments, prefer using the Skill's built-in script path variable:
   ```bash
   python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.pdf output.docx
   python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" ppt input.pdf output.pptx
   python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" excel input.pdf output.xlsx
   ```
4. For more control, append common parameters:
   ```bash
   python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" excel input.pdf output.xlsx --page-ranges "1-3,5" --excel-all-content --excel-worksheet-option for-page
   python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.pdf output.docx --enable-ocr --page-layout-mode flow
   ```
5. On startup, the script ensures `scripts/license.xml` exists (downloading it automatically from the ComPDF server if missing), reads the `<key>` field for SDK authentication, and uses the `scripts/` directory as the `resource` path.
6. If `--enable-ocr` or `--enable-ai-layout` (enabled by default) is active, the script checks whether `scripts/documentai.model` exists; if not, it downloads the file automatically before initializing the Document AI model.
7. Check the return code; if it is not `SUCCESS`, handle license, password, resource, model, or input file issues according to the error name.

## documentai.model Download Optimization
- The script preferentially uses the model file pointed to by `COMPDF_DOCUMENT_AI_MODEL`.
- The default model path is `scripts/documentai.model`.
- During automatic download, the file is first written to `documentai.model.part` and then atomically renamed to the final file upon success, preventing partial file corruption.
- On download failure, the script retries automatically with back-off intervals of `2s / 5s / 10s`.

## Invoking Directly as a Skill
- In environments that support Agent Skills, the Skill can be called directly:
  ```text
  /pdf-convert-compdf word input.pdf output.docx
  /pdf-convert-compdf excel input.pdf output.xlsx --excel-worksheet-option for-page
  ```
- When the Skill receives arguments, it passes them through to the script as-is:
  ```bash
  python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" $ARGUMENTS
  ```
- If the environment does not support direct Skill invocation, fall back to a regular command-line call.

## Supported Output Formats
- `word` → calls `CPDFConversion.start_pdf_to_word`
- `excel` → calls `CPDFConversion.start_pdf_to_excel`
- `ppt` → calls `CPDFConversion.start_pdf_to_ppt`
- `html` → calls `CPDFConversion.start_pdf_to_html`
- `rtf` → calls `CPDFConversion.start_pdf_to_rtf`
- `image` → calls `CPDFConversion.start_pdf_to_image`
- `txt` → calls `CPDFConversion.start_pdf_to_txt`
- `json` → calls `CPDFConversion.start_pdf_to_json`
- `markdown` → calls `CPDFConversion.start_pdf_to_markdown`
- `csv` → reuses `CPDFConversion.start_pdf_to_excel` with table/Excel parameters to produce CSV-friendly output

## Input Source Types
- The script supports **PDF and image** as input sources. The SDK's `start_pdf_to_*` interfaces natively accept image files with no pre-processing required.
- By default, the script auto-detects the input type from the file extension:
  - `.pdf` → `pdf`
  - `.png/.jpg/.jpeg/.bmp/.tif/.tiff/.gif/.webp/.tga` → `image`
- You can also specify the source type explicitly:
  ```bash
  python "${CLAUDE_SKILL_DIR}/scripts/pdf-convert-compdf.py" word input.png output.docx --source-type image
  ```
- `image -> *` and `pdf -> *` share the same set of `CPDFConversion.start_pdf_to_*` interfaces; only the input file type differs.

## Smart Defaults
The script automatically adjusts certain parameters based on the input source and output format to reduce manual configuration:

| Trigger | Automatic Behavior | User-Overridable | Description |
|----------|----------|-------------|------|
| Input source is an **image** (auto-detected or explicit `--source-type image`) | Automatically enables `--enable-ocr` | No (`--enable-ocr` uses `store_true`; there is no `--no-enable-ocr`) | Text in images must be extracted via OCR; without OCR, output will contain only images and no text |
| Output format is **HTML** (`format = html`) | Automatically sets `--page-layout-mode` to `box` (box layout) | Yes — passing `--page-layout-mode flow` explicitly overrides this | Box layout better preserves the original formatting in HTML; specify `flow` explicitly if flow layout is needed |

When triggered, the script prints a notice to `stderr`, for example:
```text
Auto-enabled OCR for image input.
Auto-set page layout mode to BOX for HTML output.
```

## All Parameters

### Positional Parameters
| Parameter | Description |
|------|------|
| `format` | Target format: `word`/`excel`/`ppt`/`html`/`rtf`/`image`/`txt`/`json`/`markdown`/`csv` |
| `input_pdf` | Input file path (PDF or image) |
| `output_path` | Output file path |

### General Parameters
| Parameter | Type | Default | Description |
|------|------|--------|------|
| `--source-type` | Option | `auto` | Input source type: `auto`/`pdf`/`image` |
| `--password` | String | `""` | PDF open password |
| `--page-ranges` | String | None | Page range, e.g. `1-3,5` |
| `--font-name` | String | `""` | Output font n