pdf to word
PDF 转换工具包具有 AI 布局分析和 OCR 功能。将 PDF 转换为 Word/Docx、Markdown、JSON、PPT、CSV、HTML 和 XML,以实现无缝的 LLM 数据处理。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:youna12345~pdf-to-word-docxcURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Ayouna12345~pdf-to-word-docx/file -o pdf-to-word-docx.mdGit 仓库获取源码
git clone https://github.com/openclaw/skills/commit/88ada638b3fffd510922f8295b7715e44f01d4e7## 概述(中文)
PDF 转换工具包具有 AI 布局分析和 OCR 功能。将 PDF 转换为 Word/Docx、Markdown、JSON、PPT、CSV、HTML 和 XML,以实现无缝的 LLM 数据处理。
## 原文
# pdf to word
## Purpose
- Wraps the `ComPDFKitConversion` Python SDK into a reusable local conversion workflow, supporting PDF / image to Word, PPT, Excel, HTML, RTF, Image, TXT, JSON, Markdown, and CSV (10 output formats in total).
## Agent Skills Standard Compatibility
- This Skill uses an Anthropic Agent Skills-compatible directory structure: `pdf-to-word-docx/`.
- The entry point is `SKILL.md`; helper scripts are placed in `scripts/`.
- The document uses `$ARGUMENTS` and `${CLAUDE_SKILL_DIR}` conventions for distribution and execution in Claude Code / Agent Skills-compatible environments.
## Input / Output
- Input: The target format (`word`/`excel`/`ppt`/`html`/`rtf`/`image`/`txt`/`json`/`markdown`/`csv`), the PDF or image path, and the output path are passed via Skill arguments or the command line. An optional PDF password and conversion parameters may also be provided.
- Supported input file types:
- PDF files (`.pdf`)
- Image files (`.jpg`/`.jpeg`/`.png`/`.bmp`/`.tif`/`.tiff`/`.webp`/`.jp2`/`.gif`/`.tga`)
- Output: A file in the corresponding format (`.docx`, `.pptx`, `.xlsx`, `.html`, `.rtf`, image, `.txt`, `.json`, `.md`, `.csv`), or a clear error message.
## Prerequisites
- Supports Windows and macOS.
- The conversion SDK must be installed first:
```bash
pip install ComPDFKitConversion
```
- On first run, the script automatically downloads `license.xml` from the ComPDF server and caches it in the `scripts/` directory:
```text
https://download.compdf.com/skills/license/license.xml
```
- The script reads the `<key>...</key>` field from `license.xml` and uses that key for `LibraryManager.license_verify(...)` authentication — it does not pass the XML file path directly to the SDK.
- To use a custom license, place your own `license.xml` in the `scripts/` directory; the script will use it directly without downloading.
- During SDK initialization, the `resource` directory is always set to the directory containing `pdf-to-word-docx.py`, i.e., the `scripts/` directory itself.
- When `--enable-ocr` or `--enable-ai-layout` (enabled by default) is used, the Skill also requires `scripts/documentai.model`. If the file does not exist, the script will automatically download it from:
```text
https://download.compdf.com/skills/model/documentai.model
```
- To reuse an existing model file, you can override the default model path via an environment variable:
```bash
export COMPDF_DOCUMENT_AI_MODEL="/path/to/documentai.model"
```
## Workflow
1. Confirm the Python package is installed:
```bash
python -m pip show ComPDFKitConversion
```
2. The script automatically downloads `license.xml` on first run; the `scripts/` directory is used directly as the SDK `resource` path.
3. In Agent Skills / Claude Code environments, prefer using the Skill's built-in script path variable:
```bash
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.pdf output.docx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" ppt input.pdf output.pptx
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" excel input.pdf output.xlsx
```
4. For more control, append common parameters:
```bash
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" excel input.pdf output.xlsx --page-ranges "1-3,5" --excel-all-content --excel-worksheet-option for-page
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.pdf output.docx --enable-ocr --page-layout-mode flow
```
5. On startup, the script ensures `scripts/license.xml` exists (downloading it automatically from the ComPDF server if missing), reads the `<key>` field for SDK authentication, and uses the `scripts/` directory as the `resource` path.
6. If `--enable-ocr` or `--enable-ai-layout` (enabled by default) is active, the script checks whether `scripts/documentai.model` exists; if not, it downloads the file automatically before initializing the Document AI model.
7. Check the return code; if it is not `SUCCESS`, handle license, password, resource, model, or input file issues according to the error name.
## documentai.model Download Optimization
- The script preferentially uses the model file pointed to by `COMPDF_DOCUMENT_AI_MODEL`.
- The default model path is `scripts/documentai.model`.
- During automatic download, the file is first written to `documentai.model.part` and then atomically renamed to the final file upon success, preventing partial file corruption.
- On download failure, the script retries automatically with back-off intervals of `2s / 5s / 10s`.
## Invoking Directly as a Skill
- In environments that support Agent Skills, the Skill can be called directly:
```text
/pdf-to-word-docx word input.pdf output.docx
/pdf-to-word-docx excel input.pdf output.xlsx --excel-worksheet-option for-page
```
- When the Skill receives arguments, it passes them through to the script as-is:
```bash
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" $ARGUMENTS
```
- If the environment does not support direct Skill invocation, fall back to a regular command-line call.
## Supported Output Formats
- `word` → calls `CPDFConversion.start_pdf_to_word`
- `excel` → calls `CPDFConversion.start_pdf_to_excel`
- `ppt` → calls `CPDFConversion.start_pdf_to_ppt`
- `html` → calls `CPDFConversion.start_pdf_to_html`
- `rtf` → calls `CPDFConversion.start_pdf_to_rtf`
- `image` → calls `CPDFConversion.start_pdf_to_image`
- `txt` → calls `CPDFConversion.start_pdf_to_txt`
- `json` → calls `CPDFConversion.start_pdf_to_json`
- `markdown` → calls `CPDFConversion.start_pdf_to_markdown`
- `csv` → reuses `CPDFConversion.start_pdf_to_excel` with table/Excel parameters to produce CSV-friendly output
## Input Source Types
- The script supports **PDF and image** as input sources. The SDK's `start_pdf_to_*` interfaces natively accept image files with no pre-processing required.
- By default, the script auto-detects the input type from the file extension:
- `.pdf` → `pdf`
- `.png/.jpg/.jpeg/.bmp/.tif/.tiff/.gif/.webp/.tga` → `image`
- You can also specify the source type explicitly:
```bash
python "${CLAUDE_SKILL_DIR}/scripts/pdf-to-word-docx.py" word input.png output.docx --source-type image
```
- `image -> *` and `pdf -> *` share the same set of `CPDFConversion.start_pdf_to_*` interfaces; only the input file type differs.
## Smart Defaults
The script automatically adjusts certain parameters based on the input source and output format to reduce manual configuration:
| Trigger | Automatic Behavior | User-Overridable | Description |
|----------|----------|-------------|------|
| Input source is an **image** (auto-detected or explicit `--source-type image`) | Automatically enables `--enable-ocr` | No (`--enable-ocr` uses `store_true`; there is no `--no-enable-ocr`) | Text in images must be extracted via OCR; without OCR, output will contain only images and no text |
| Output format is **HTML** (`format = html`) | Automatically sets `--page-layout-mode` to `box` (box layout) | Yes — passing `--page-layout-mode flow` explicitly overrides this | Box layout better preserves the original formatting in HTML; specify `flow` explicitly if flow layout is needed |
When triggered, the script prints a notice to `stderr`, for example:
```text
Auto-enabled OCR for image input.
Auto-set page layout mode to BOX for HTML output.
```
## All Parameters
### Positional Parameters
| Parameter | Description |
|------|------|
| `format` | Target format: `word`/`excel`/`ppt`/`html`/`rtf`/`image`/`txt`/`json`/`markdown`/`csv` |
| `input_pdf` | Input file path (PDF or image) |
| `output_path` | Output file path |
### General Parameters
| Parameter | Type | Default | Description |
|------|------|--------|------|
| `--source-type` | Option | `auto` | Input source type: `auto`/`pdf`/`image` |
| `--password` | String | `""` | PDF open password |
| `--page-ranges`