Langchain-Chatchat: Open-Source Local Knowledge Base Q&A with LLMs

Introduction

Langchain-Chatchat (github.com/chatchat-space/Langchain-Chatchat) is an open-source application designed to provide a comprehensive solution for question-answering based on local knowledge bases, powered by large language models (LLMs) and the LangChain framework. Originally initiated as Langchain-ChatGLM, it has evolved to support a broader range of models and functionalities. The project emphasizes ease of use, offline deployment capabilities, and is particularly friendly towards Chinese language scenarios while also supporting English. It offers a complete RAG (Retrieval Augmented Generation) pipeline, from document loading and vectorization to LLM interaction and response generation, accessible via a web UI and an API.

Langchain-Chatchat aims to empower users to build their own private, secure, and intelligent knowledge assistants using their local documents and preferred LLMs.

Key Features

Langchain-Chatchat offers a rich set of features for building and interacting with local knowledge base Q&A systems:

LangChain-Powered: Built on top of the versatile LangChain framework, leveraging its modules for model interaction, prompt management, document processing, and chaining.
Local Knowledge Base Q&A: Core functionality allows users to upload various document types, create a vectorized knowledge base, and ask questions that are answered based on the content of these documents.
Multi-LLM Support:
- Supports a variety of open-source and proprietary LLMs.
- Natively supports models like ChatGLM, Qwen, and Llama.
- Integrates with model inference frameworks like Xinference, Ollama, LocalAI, and FastChat, enabling access to a wider array of models including GLM-4, Qwen2, Llama3, Mixtral, and many others.
Multiple Embedding Model Support: Allows users to choose from various text embedding models for vectorizing their documents.
Multiple Vector Store Support: Integrates with mainstream open-source vector databases for storing and retrieving document embeddings, such as FAISS, Milvus, ChromaDB, Elasticsearch, and PostgreSQL (with pgvector extension).
Diverse Document Format Support: Capable of processing a wide range of file formats for knowledge base creation, including:
- Text files: .txt, .md, .json, .jsonl, .csv, .tsv, .rtf, .rst, .xml, .yaml, .yml, .log
- Office documents: .docx, .pptx, .eml
- Portable Document Format: .pdf (with OCR capabilities for scanned PDFs)
- Web pages: .html, .htm
- Images: Can perform OCR on images to extract text for the knowledge base.
Web User Interface (WebUI): Provides an interactive web interface, typically built with Streamlit, for:
- Managing knowledge bases (creating, deleting, selecting).
- Uploading and processing documents.
- Engaging in chat dialogues with the LLM, with or without knowledge base augmentation.
- Adjusting various settings.
API Access: Offers a FastAPI-based API for programmatic interaction, allowing integration with other applications and automated workflows.
Multiple Chat Modes:
- LLM Dialogue: Direct chat with the selected LLM without using a knowledge base.
- Knowledge Base Dialogue: Chat where the LLM's responses are augmented with relevant information retrieved from the selected local knowledge base (RAG).
- Search Engine Dialogue: (If configured) Allows the LLM to use a search engine to answer questions.
- File Dialogue: Chat focused on a specific uploaded file.
Customizable Configuration: Utilizes YAML files (e.g., model_settings.yaml, basic_settings.yaml) for configuring LLMs, embedding models, vector stores, knowledge base paths, and other operational parameters.
Cross-Platform Compatibility: Being Python-based, it can be run on Windows, Linux, and macOS.
Offline Deployment: Designed with offline deployment in mind, especially when using locally hosted LLMs and embedding models.

Supported Technologies

Langchain-Chatchat integrates with a variety of technologies:

LLM Framework: LangChain
LLM Serving/Inference Frameworks:
- Xinference
- Ollama
- LocalAI
- FastChat
- Direct support for specific models.
Supported LLMs (Examples):
- ChatGLM series (e.g., ChatGLM3)
- Qwen series (e.g., Qwen1.5, Qwen2)
- Llama series (e.g., Llama 2, Llama 3)
- BERT-style models for embeddings
- BGE embeddings
- Many others accessible via Xinference, Ollama, etc.
Vector Stores (Examples):
- FAISS
- Milvus
- ChromaDB
- Elasticsearch
- PostgreSQL (with pgvector)
Frontend: Streamlit
Backend API: FastAPI

Specific Use Cases

Langchain-Chatchat is suitable for a variety of applications:

Personal Knowledge Assistant: Create a local AI to query and interact with your personal documents, notes, and ebooks.
Enterprise Knowledge Management: Build an internal Q&A system for employees based on company policies, technical documentation, project reports, and other internal data.
Customer Service Bots: Develop chatbots that can answer customer queries based on product manuals, FAQs, and support documentation.
Research & Study Tool: Upload research papers, textbooks, or articles to create a searchable knowledge base for study and analysis.
Educational Tool: Assist students by allowing them to query educational materials and get context-aware answers.
Offline AI Chat: Provide an AI chat solution that can run entirely locally without internet access (when using local LLMs and embedding models).
Developer Tool: A ready-to-use RAG application for developers to experiment with, customize, or integrate into other projects.

Installation and Setup Guide

Setting up Langchain-Chatchat typically involves the following steps:

Prerequisites:
- Python (version 3.8 - 3.11 recommended).
- pip for package installation.
- A C++ compiler (required by some dependencies).
- (Optional but Recommended) An LLM inference framework like Xinference or Ollama installed and running with your desired LLM and embedding models.
Install Langchain-Chatchat:
- It's recommended to use a virtual environment (e.g., conda or venv).
- Install the core package:
```
pip install langchain-chatchat -U
```
- To install with support for specific model inference frameworks (e.g., Xinference):
```
pip install "langchain-chatchat[xinference]" -U
```
  (Replace xinference with ollama, localai, etc., as needed, or use all for all extras).
Set Root Directory (Optional but Recommended):
- Define the CHATCHAT_ROOT environment variable to specify a directory for storing configurations, knowledge bases, and logs. This keeps your data separate from the installation.
  - Linux/macOS: export CHATCHAT_ROOT=/path/to/your/chatchat_data
  - Windows: set CHATCHAT_ROOT=D:\path\to\your\chatchat_data
Initialize Project Configuration:
- Run the initialization command. This will create necessary subdirectories within CHATCHAT_ROOT (or the default location) and copy default configuration files (e.g., model_settings.yaml, basic_settings.yaml).
```
chatchat init
```
Configure Models (model_settings.yaml):
- Edit the model_settings.yaml file located in your configuration directory (e.g., CHATCHAT_ROOT/config/).
- Specify the default LLM model name (DEFAULT_LLM_MODEL) and embedding model name (DEFAULT_EMBEDDING_MODEL).
- Configure the MODEL_PLATFORMS section to point to your running LLM inference service (e.g., Xinference API endpoint, Ollama endpoint).
- Define the LLM models and embedding models available through these platforms.
Configure Basic Settings (basic_settings.yaml - Optional):
- Adjust settings like default knowledge base paths if needed.
Initialize/Rebuild Knowledge Base:
- Ensure your chosen LLM inference framework and embedding model are running and accessible.
- To create an empty knowledge base or rebuild all existing ones:
```
chatchat kb -r
```
- To create a new knowledge base and add documents, use the Web UI once the services are running.
Start Langchain-Chatchat Services:
- This command starts the FastAPI backend server and the Streamlit WebUI.
```
chatchat start -a
```
- Access the Web UI (usually http://localhost:8501) and the API (usually http://localhost:7861/docs).

Usage

Web UI:
- Knowledge Base Management: Create new knowledge bases, select existing ones, upload documents (various formats supported), and let the system process and vectorize them.
- Chat Interface: Select a chat mode (e.g., Knowledge Base Q&A, LLM Chat). If using Knowledge Base Q&A, select the desired knowledge base. Type your questions and interact with the LLM.
API:
- The FastAPI backend provides various endpoints for programmatic interaction, such as chatting, managing knowledge bases, and uploading files. Explore the API documentation (usually at /docs relative to the API server URL) for details.

Hardware Considerations

Running Langchain-Chatchat, especially with local LLMs, can be resource-intensive:

CPU: A modern multi-core CPU is beneficial.
RAM:
- Minimum: 16GB RAM might be sufficient for very small LLMs and embedding models.
- Recommended: 32GB RAM or more is highly recommended, especially when dealing with larger models or multiple concurrent processes.
GPU (Highly Recommended for Local LLMs):
- VRAM: This is often the most critical factor.
  - Small models (e.g., 3B-7B parameter LLMs in quantized form) might run on GPUs with 6GB-12GB VRAM.
  - Larger models (13B+ parameters) will require significantly more VRAM (16GB, 24GB, or even more).
  - The specific LLM, its quantization level (e.g., FP16, INT8, INT4), and the inference framework used will heavily influence VRAM requirements.
- NVIDIA GPUs are generally well-supported by most deep learning frameworks. Support for AMD and Intel GPUs is improving via frameworks like Ollama and IPEX-LLM.
Storage: Sufficient disk space for the Langchain-Chatchat installation, Python environment, downloaded LLM and embedding models (which can be several GBs each), and your vectorized knowledge bases. SSDs are highly recommended for faster loading and processing.

Refer to the documentation of the specific LLMs and inference frameworks (Ollama, Xinference, etc.) you plan to use for their detailed hardware recommendations. The Intel IPEX-LLM project provides guidance for running Langchain-Chatchat on Intel CPUs and GPUs.

License

Langchain-Chatchat is released under the Apache-2.0 License. This is a permissive open-source license that allows for commercial use, modification, and distribution, subject to the terms of the license.

Frequently Asked Questions (FAQ)

Q1: What is Langchain-Chatchat? A1: Langchain-Chatchat is an open-source application built using the LangChain framework. It provides a complete solution for creating and interacting with local knowledge bases using large language models (LLMs) for question-answering and chat, with a focus on offline deployment and support for various models.

Q2: Can I use Langchain-Chatchat with my own documents? A2: Yes, its core functionality is to allow you to upload your own documents (PDFs, TXT, DOCX, etc.), which are then processed into a knowledge base that the LLM can use to answer your questions.

Q3: Which LLMs are supported? A3: It supports a wide range of LLMs, including popular open-source models like Llama, Qwen, ChatGLM, and others available through model inference frameworks like Ollama, Xinference, and FastChat. You can also configure it to use API-based models if needed, though the emphasis is often on local models.

Q4: Is it difficult to set up? A4: While it involves several components (Python environment, LLM setup, knowledge base initialization), the chatchat init and chatchat start commands simplify the process. Configuration is mainly done via YAML files. Familiarity with Python and command-line interfaces is beneficial. The main complexity often lies in setting up the chosen local LLM inference framework correctly.

Q5: Can I run Langchain-Chatchat completely offline? A5: Yes, if you use locally hosted LLMs, embedding models, and vector stores, Langchain-Chatchat can run entirely offline without needing internet access after the initial setup and model downloads.

Q6: Is Langchain-Chatchat free? A6: Yes, the Langchain-Chatchat software itself is free and open-source under the Apache-2.0 license. You will only incur costs if you use proprietary LLM APIs or for the hardware to run it.

Q7: Does it support English or is it only for Chinese? A7: While it has strong support and optimization for Chinese language scenarios, it is also fully functional for English and other languages supported by the chosen LLMs and embedding models.

While many resources focus on the broader LangChain framework, here are some that are specifically relevant or highly applicable to setting up and using Langchain-Chatchat or similar local RAG systems:

Official GitHub Repository (Primary Source): The README (README.md and README_en.md) is the most crucial starting point.
- https://github.com/chatchat-space/Langchain-Chatchat
Intel IPEX-LLM Documentation - Run Local RAG using Langchain-Chatchat: An excellent English guide on setting up and running Langchain-Chatchat, optimized for Intel CPUs and GPUs.
- https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/chatchat_quickstart.md
CSDN Blog - LangChain-chatchat 0.3.x 入门级教程原创 (LangChain-chatchat 0.3.x Beginner Tutorial - Original): A detailed tutorial (in Chinese) covering installation and setup, particularly with Xinference. Useful for understanding the steps even if using auto-translation.
- https://blog.csdn.net/m0_60304650/article/details/146258292
Tutorials for Local LLM Frameworks (Ollama, Xinference): Guides for setting up these frameworks will be essential before configuring them in Langchain-Chatchat.
- Ollama: Check the official Ollama website and GitHub for setup guides.
- Xinference: The Xinference GitHub repository and documentation provide installation and usage instructions. (https://github.com/xorbitsai/inference)

As the project and its ecosystem evolve, new tutorials and guides may emerge. Searching for recent blog posts or videos on "Langchain-Chatchat setup" or "local RAG with LangChain" can yield more resources.

Community & Support

GitHub Discussions: The primary channel for community support, asking questions, sharing ideas, and finding announcements for Langchain-Chatchat.
- https://github.com/chatchat-space/Langchain-Chatchat/discussions
GitHub Issues: For reporting bugs and tracking specific technical problems.
- https://github.com/chatchat-space/Langchain-Chatchat/issues

Ethical Considerations & Limitations

Data Privacy: A key advantage of Langchain-Chatchat with local models is enhanced data privacy, as your documents and queries can remain within your own infrastructure.
Accuracy & Hallucinations: The accuracy of the answers depends heavily on the quality of your knowledge base documents and the capabilities of the chosen LLM. LLMs can still "hallucinate" or provide plausible but incorrect information.
Bias in LLMs: Underlying LLMs may carry biases from their training data, which can be reflected in the application's responses.
Resource Intensive: Running large LLMs locally requires significant computational resources (CPU, RAM, and especially GPU VRAM).
Complexity: Setting up all components (LLM server, vector database, Langchain-Chatchat itself) can be complex for users new to the LLM ecosystem.
Maintenance: Requires ongoing maintenance, including updating models, dependencies, and the Langchain-Chatchat application itself.

Langchain-Chatchat GitHub Repository: https://github.com/chatchat-space/Langchain-Chatchat
LangChain Documentation: https://python.langchain.com/docs/ (Understanding the underlying framework)
Xinference GitHub Repository: https://github.com/xorbitsai/inference
Ollama Official Website: https://ollama.com/
FastAPI Official Website: https://fastapi.tiangolo.com/
Streamlit Official Website: https://streamlit.io/

Langchain-Chatchat