kreuzberg
Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.
GitHub Stars
2,340
User Rating
Not Rated
Favorites
0
Views
77
Forks
95
Issues
5
Installation
Installation
Prerequisites
Required software and versions:Installation Steps
1. Clone Repository
bash
git clone https://github.com/Goldziher/kreuzberg.git
cd kreuzberg
2. Install Dependencies
bash
pip install -r requirements.txt
3. Verify Environment
Ensure that the required dependencies are installed correctly.Troubleshooting
Common Issues
Issue: Dependencies fail to install Solution: Check the versions of Python and pip, and try reinstalling.Additional Resources
HoloViz MCP is a comprehensive Model Context Protocol server that provides intelligent access to the HoloViz ecosystem. It enables AI assistants to help you build interactive dashboards and data visualizations using libraries like Panel, hvPlot, and datashader, enhancing the efficiency of data analysis.
ostruct is a tool designed to simplify the maintenance of data extraction pipelines. It provides a way to convert messy data into structured JSON without relying on complex regex, allowing for flexibility in handling format changes. This enhances code readability for developers and enables quicker data processing.