markdown-converter
使用 markitdown 将文档和文件转换为 Markdown。将 PDF、Word (.docx)、PowerPoint (.pptx)、Excel(.xlsx、.xls)、HTML、CSV、JSON、XML、图像(带 EXIF/OCR)、音频(带转录)、ZIP 存档、YouTube URL 或 EPub 转换为 Markdown 格式以进行 LLM 处理或文本分析时使用。
安装 / 下载方式
TotalClaw CLI推荐
totalclaw install totalclaw:totalclaw~steipete-markdown-convertercURL直接下载,无需登录
curl -fsSL https://skills.taituai.com/api/skills/totalclaw%3Atotalclaw~steipete-markdown-converter/file -o steipete-markdown-converter.md## 概述(中文) 使用 markitdown 将文档和文件转换为 Markdown。将 PDF、Word (.docx)、PowerPoint (.pptx)、Excel(.xlsx、.xls)、HTML、CSV、JSON、XML、图像(带 EXIF/OCR)、音频(带转录)、ZIP 存档、YouTube URL 或 EPub 转换为 Markdown 格式以进行 LLM 处理或文本分析时使用。 ## 原文 # Markdown Converter Convert files to Markdown using `uvx markitdown` — no installation required. ## Basic Usage ```bash # Convert to stdout uvx markitdown input.pdf # Save to file uvx markitdown input.pdf -o output.md uvx markitdown input.docx > output.md # From stdin cat input.pdf | uvx markitdown ``` ## Supported Formats - **Documents**: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls) - **Web/Data**: HTML, CSV, JSON, XML - **Media**: Images (EXIF + OCR), Audio (EXIF + transcription) - **Other**: ZIP (iterates contents), YouTube URLs, EPub ## Options ```bash -o OUTPUT # Output file -x EXTENSION # Hint file extension (for stdin) -m MIME_TYPE # Hint MIME type -c CHARSET # Hint charset (e.g., UTF-8) -d # Use Azure Document Intelligence -e ENDPOINT # Document Intelligence endpoint --use-plugins # Enable 3rd-party plugins --list-plugins # Show installed plugins ``` ## Examples ```bash # Convert Word document uvx markitdown report.docx -o report.md # Convert Excel spreadsheet uvx markitdown data.xlsx > data.md # Convert PowerPoint presentation uvx markitdown slides.pptx -o slides.md # Convert with file type hint (for stdin) cat document | uvx markitdown -x .pdf > output.md # Use Azure Document Intelligence for better PDF extraction uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/" ``` ## Notes - Output preserves document structure: headings, tables, lists, links - First run caches dependencies; subsequent runs are faster - For complex PDFs with poor extraction, use `-d` with Azure Document Intelligence