ProductivityVerified

MinerU PDF Parser

by staruhub

Convert complex PDFs into LLM-friendly Markdown/JSON with MinerU — extracting text, tables, formulas, and images, with pipeline/VLM/hybrid backends for RAG-ready document data.

mineruparserpdfpdf-and-document-processing

SKILL.md preview

View source on GitHub →

---
name: Geek-skills-mineru-pdf-parser
description: Convert complex PDFs to LLM-friendly Markdown/JSON with MinerU — extract text, tables, formulas, and images. Use to parse academic papers, technical docs, or reports, or to prep data for RAG.
---

Turns complex PDFs into machine-readable Markdown/JSON using the MinerU tool.

## How to use
1. Install with uv: `uv pip install -U "mineru[all]"`, then run `mineru-models-download` on first use.
2. Parse from the CLI: `mineru -p input.pdf -o output_dir` (or a folder for batch); pick a backend with `--backend pipeline` (fast), `vlm` (high accuracy for formulas/complex layout), or `hybrid` (balanced).
3. Or use the Python API: `MinerU().parse("document.pdf")` then `.to_markdown()` / `.to_json()`.
4. Outputs include `{filename}.md` and a `{filename}_content_list.json`; see `references/api_reference.md` for the full API.

Full skill & source: https://github.com/staruhub/ClaudeSkills/tree/9ed9d5c2d1ded8d2b401bf3eac09168d62f44bbd/skills/Geek-skills-mineru-pdf-parser

MinerU PDF Parser

SKILL.md preview

Reviews

Write a review