ai-knowledge
ai-knowledge is a library that provides advanced AI capabilities using Java. It simplifies the implementation of machine learning algorithms and data analysis, enabling developers to quickly prototype their ideas. With extensive documentation and sample code available, it reduces the learning curve for users.
GitHub Stars
5
User Rating
Not Rated
Favorites
0
Views
12
Forks
0
Issues
0
π AI Knowledge Base Retrieval System
δΈζη | English
An enhanced RAG (Retrieval-Augmented Generation) intelligent knowledge base system built on Spring AI framework with Ollama and OpenAI integration
π Project Overview
This project is an intelligent knowledge base system that integrates Retrieval-Augmented Generation (RAG) technology, designed to provide comprehensive AI-assisted solutions for enterprises. By combining the capabilities of multiple large language models, it achieves end-to-end intelligent processing from document parsing to intelligent Q&A.
β¨ Core Features
π RAG (Retrieval-Augmented Generation)
Key Capabilities:
- π Multi-format Document Processing: Support for PDF, Word, Markdown, and other document formats via Apache Tika
- π Git Repository Integration: Automatic repository cloning and code analysis using JGit
- π§ Dual Embedding Models:
- Local
nomic-embed-text
model via Ollama for privacy and cost control - OpenAI
text-embedding-ada-002
for high-quality embeddings
- Local
- ποΈ Vector Storage: PostgreSQL with pgvector extension for persistent vector storage
- π Flexible Model Switching: Configuration-based switching between local and cloud models
Technical Benefits:
- Enhanced search accuracy through semantic understanding
- Cost-effective hybrid model approach
- Scalable vector storage solution
- Privacy-preserving local processing option
π€ AI-Powered Q&A System
Core Workflow:
- Document Ingestion: Parse and chunk documents using Spring AI Tika integration
- Vector Embedding: Convert text to vectors using selected embedding model
- Semantic Search: Retrieve relevant documents from vector database
- Answer Generation: Generate contextual responses using OpenAI GPT models
Application Scenarios:
- Enterprise knowledge management
- Technical documentation Q&A
- Code repository analysis and search
- Intelligent customer support
ποΈ Technical Architecture
Supported AI Models
- Ollama Models: Local deployment with
nomic-embed-text
for embedding - OpenAI GPT Series: Cloud-based models for text generation and embedding
- Extensible Framework: Easy integration of additional model providers
Core Technology Stack
- Backend Framework: Spring Boot 3.2.3 with Spring AI
- Vector Database: PostgreSQL with pgvector extension
- Caching: Redis for performance optimization
- Document Processing: Apache Tika for multi-format support
- API Documentation: Swagger UI with Knife4j enhancements
- Containerization: Docker support for easy deployment
Key Dependencies
- Spring AI BOM for AI model integration
- Redisson for Redis operations
- JGit for Git repository handling
- FastJSON for JSON processing
- HikariCP for database connection pooling
π Quick Start
Prerequisites
- Java 17+
- PostgreSQL with pgvector extension
- Redis server
- Ollama (for local models)
- OpenAI API key (for cloud models)
Configuration
- Database Setup: Configure PostgreSQL connection in
application-dev.yml
- AI Models: Set up Ollama locally or configure OpenAI API credentials
- Vector Storage: Choose between SimpleVectorStore (memory) or PgVectorStore (persistent)
- Embedding Model: Configure
spring.ai.rag.embed
to select embedding model
Running the Application
# Clone the repository
git clone <repository-url>
# Navigate to the project directory
cd ai-knowledge
# Run with Maven
mvn spring-boot:run -pl dev-tech-app
The application will start on port 8090 with Swagger UI available at /swagger-ui.html
.
π System Architecture
π RAG Workflow
π§ Configuration Options
Embedding Model Selection
- Local Model: Set
spring.ai.rag.embed=nomic-embed-text
for privacy and cost savings - Cloud Model: Set
spring.ai.rag.embed=text-embedding-ada-002
for higher quality
Vector Storage Options
- Memory Storage:
SimpleVectorStore
for development and testing - Persistent Storage:
PgVectorStore
for production environments
π€ Contributing
We welcome contributions! Please feel free to submit issues and pull requests.
π License
This project is licensed under the MIT License - see the LICENSE file for details.