markdown-converter

TotalClaw 作者 totalclaw

使用 markitdown 将文档和文件转换为 Markdown。将 PDF、Word (.docx)、PowerPoint (.pptx)、Excel（.xlsx、.xls）、HTML、CSV、JSON、XML、图像（带 EXIF/OCR）、音频（带转录）、ZIP 存档、YouTube URL 或 EPub 转换为 Markdown 格式以进行 LLM 处理或文本分析时使用。

安装 / 下载方式

TotalClaw CLI推荐

totalclaw install totalclaw:totalclaw~steipete-markdown-converter

cURL直接下载，无需登录

curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~steipete-markdown-converter/file -o steipete-markdown-converter.md

## 概述（中文）

使用 markitdown 将文档和文件转换为 Markdown。将 PDF、Word (.docx)、PowerPoint (.pptx)、Excel（.xlsx、.xls）、HTML、CSV、JSON、XML、图像（带 EXIF/OCR）、音频（带转录）、ZIP 存档、YouTube URL 或 EPub 转换为 Markdown 格式以进行 LLM 处理或文本分析时使用。

## 原文

# Markdown Converter

Convert files to Markdown using `uvx markitdown` — no installation required.

## Basic Usage

```bash
# Convert to stdout
uvx markitdown input.pdf

# Save to file
uvx markitdown input.pdf -o output.md
uvx markitdown input.docx > output.md

# From stdin
cat input.pdf | uvx markitdown
```

## Supported Formats

- **Documents**: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
- **Web/Data**: HTML, CSV, JSON, XML
- **Media**: Images (EXIF + OCR), Audio (EXIF + transcription)
- **Other**: ZIP (iterates contents), YouTube URLs, EPub

## Options

```bash
-o OUTPUT      # Output file
-x EXTENSION   # Hint file extension (for stdin)
-m MIME_TYPE   # Hint MIME type
-c CHARSET     # Hint charset (e.g., UTF-8)
-d             # Use Azure Document Intelligence
-e ENDPOINT    # Document Intelligence endpoint
--use-plugins  # Enable 3rd-party plugins
--list-plugins # Show installed plugins
```

## Examples

```bash
# Convert Word document
uvx markitdown report.docx -o report.md

# Convert Excel spreadsheet
uvx markitdown data.xlsx > data.md

# Convert PowerPoint presentation
uvx markitdown slides.pptx -o slides.md

# Convert with file type hint (for stdin)
cat document | uvx markitdown -x .pdf > output.md

# Use Azure Document Intelligence for better PDF extraction
uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"
```

## Notes

- Output preserves document structure: headings, tables, lists, links
- First run caches dependencies; subsequent runs are faster
- For complex PDFs with poor extraction, use `-d` with Azure Document Intelligence