ai-knowledge

ai-knowledge is a library that provides advanced AI capabilities using Java. It simplifies the implementation of machine learning algorithms and data analysis, enabling developers to quickly prototype their ideas. With extensive documentation and sample code available, it reduces the learning curve for users.

GitHub Stars

5

User Rating

Not Rated

Favorites

0

Views

12

Forks

0

Issues

0

README
πŸš€ AI Knowledge Base Retrieval System

δΈ­ζ–‡η‰ˆ | English

An enhanced RAG (Retrieval-Augmented Generation) intelligent knowledge base system built on Spring AI framework with Ollama and OpenAI integration

πŸ“– Project Overview

This project is an intelligent knowledge base system that integrates Retrieval-Augmented Generation (RAG) technology, designed to provide comprehensive AI-assisted solutions for enterprises. By combining the capabilities of multiple large language models, it achieves end-to-end intelligent processing from document parsing to intelligent Q&A.

✨ Core Features
πŸ” RAG (Retrieval-Augmented Generation)

Key Capabilities:

  • πŸ“„ Multi-format Document Processing: Support for PDF, Word, Markdown, and other document formats via Apache Tika
  • πŸ”— Git Repository Integration: Automatic repository cloning and code analysis using JGit
  • 🧠 Dual Embedding Models:
    • Local nomic-embed-text model via Ollama for privacy and cost control
    • OpenAI text-embedding-ada-002 for high-quality embeddings
  • πŸ—„οΈ Vector Storage: PostgreSQL with pgvector extension for persistent vector storage
  • πŸ”„ Flexible Model Switching: Configuration-based switching between local and cloud models

Technical Benefits:

  • Enhanced search accuracy through semantic understanding
  • Cost-effective hybrid model approach
  • Scalable vector storage solution
  • Privacy-preserving local processing option
πŸ€– AI-Powered Q&A System

Core Workflow:

  1. Document Ingestion: Parse and chunk documents using Spring AI Tika integration
  2. Vector Embedding: Convert text to vectors using selected embedding model
  3. Semantic Search: Retrieve relevant documents from vector database
  4. Answer Generation: Generate contextual responses using OpenAI GPT models

Application Scenarios:

  • Enterprise knowledge management
  • Technical documentation Q&A
  • Code repository analysis and search
  • Intelligent customer support
πŸ—οΈ Technical Architecture
Supported AI Models
  • Ollama Models: Local deployment with nomic-embed-text for embedding
  • OpenAI GPT Series: Cloud-based models for text generation and embedding
  • Extensible Framework: Easy integration of additional model providers
Core Technology Stack
  • Backend Framework: Spring Boot 3.2.3 with Spring AI
  • Vector Database: PostgreSQL with pgvector extension
  • Caching: Redis for performance optimization
  • Document Processing: Apache Tika for multi-format support
  • API Documentation: Swagger UI with Knife4j enhancements
  • Containerization: Docker support for easy deployment
Key Dependencies
  • Spring AI BOM for AI model integration
  • Redisson for Redis operations
  • JGit for Git repository handling
  • FastJSON for JSON processing
  • HikariCP for database connection pooling
πŸš€ Quick Start
Prerequisites
  • Java 17+
  • PostgreSQL with pgvector extension
  • Redis server
  • Ollama (for local models)
  • OpenAI API key (for cloud models)
Configuration
  1. Database Setup: Configure PostgreSQL connection in application-dev.yml
  2. AI Models: Set up Ollama locally or configure OpenAI API credentials
  3. Vector Storage: Choose between SimpleVectorStore (memory) or PgVectorStore (persistent)
  4. Embedding Model: Configure spring.ai.rag.embed to select embedding model
Running the Application
# Clone the repository
git clone <repository-url>

# Navigate to the project directory
cd ai-knowledge

# Run with Maven
mvn spring-boot:run -pl dev-tech-app

The application will start on port 8090 with Swagger UI available at /swagger-ui.html.

πŸ“Š System Architecture

System Architecture

πŸ“Š RAG Workflow

RAG Workflow

πŸ”§ Configuration Options
Embedding Model Selection
  • Local Model: Set spring.ai.rag.embed=nomic-embed-text for privacy and cost savings
  • Cloud Model: Set spring.ai.rag.embed=text-embedding-ada-002 for higher quality
Vector Storage Options
  • Memory Storage: SimpleVectorStore for development and testing
  • Persistent Storage: PgVectorStore for production environments
🀝 Contributing

We welcome contributions! Please feel free to submit issues and pull requests.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.