kreuzberg

Name: kreuzberg
Availability: InStock
Author: Goldziher

Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.

GitHub Website Docs

GitHub Stars

2,340

User Rating

Not Rated

Favorites

Views

357

Forks

Issues

Technical Information

Programming Languages

PythonPrimary Language

System Requirements

No specific requirements are documented

Maintenance Status

Active

GitHub Topics

asyncdocument-intelligencemcpmetadata-extractionocrpandocpdf-extractionpdfiumpythonragtable-extractiontesseracttext-extraction

Author Information

Goldziher

GitHub

Followers

Repositories

Gists

Total Contributions

Tags

async document-intelligence mcp metadata-extraction ocr pandoc pdf-extraction pdfium python rag table-extraction tesseract text-extraction

Related MCPs

bagel

269

ChatGPT for Physical AI. Troubleshoot your robots and drones with natural language. No fuss.

Python

holoviz-mcp

HoloViz MCP is a comprehensive Model Context Protocol server that provides intelligent access to the HoloViz ecosystem. It enables AI assistants to help you build interactive dashboards and data visualizations using libraries like Panel, hvPlot, and datashader, enhancing the efficiency of data analysis.

Python

ostruct

ostruct is a tool designed to simplify the maintenance of data extraction pipelines. It provides a way to convert messy data into structured JSON without relying on complex regex, allowing for flexibility in handling format changes. This enhances code readability for developers and enables quicker data processing.

Python