Convert complex PDFs into LLM-friendly Markdown/JSON with MinerU — extracting text, tables, formulas, and images, with pipeline/VLM/hybrid backends for RAG-ready document data.
---
name: Geek-skills-mineru-pdf-parser
description: Convert complex PDFs to LLM-friendly Markdown/JSON with MinerU — extract text, tables, formulas, and images. Use to parse academic papers, technical docs, or reports, or to prep data for RAG.
---
Turns complex PDFs into machine-readable Markdown/JSON using the MinerU tool.
## How to use
1. Install with uv: `uv pip install -U "mineru[all]"`, then run `mineru-models-download` on first use.
2. Parse from the CLI: `mineru -p input.pdf -o output_dir` (or a folder for batch); pick a backend with `--backend pipeline` (fast), `vlm` (high accuracy for formulas/complex layout), or `hybrid` (balanced).
3. Or use the Python API: `MinerU().parse("document.pdf")` then `.to_markdown()` / `.to_json()`.
4. Outputs include `{filename}.md` and a `{filename}_content_list.json`; see `references/api_reference.md` for the full API.
Full skill & source: https://github.com/staruhub/ClaudeSkills/tree/9ed9d5c2d1ded8d2b401bf3eac09168d62f44bbd/skills/Geek-skills-mineru-pdf-parserSign in to rate and review this skill.
No reviews yet. Be the first to review this skill.