Building a Real-World RAG Project: Customer Support Knowledge Bot

In this tutorial, we’ll build a Retrieval-Augmented Generation (RAG) chatbot for a customer support knowledge base. This bot will be able to answer queries using company manuals, FAQs, and guides. We will go from document ingestion → splitting → embeddings → vectorstore → retrieval → generation, step by step.

This project is fully hands-on: by the end, you can query your own documents.

Prerequisites

Before we start, ensure you have:

Python 3.10+
langchain, openai, chromadb, python-docx, dotenv installed
OpenAI API Key (set in .env file):

OPENAI_API_KEY=your_openai_api_key_here

Step 1: Project Setup and Imports

Create a new folder, initialize Python, and create .env. Then install required packages:

pip install langchain openai chromadb python-docx python-dotenv

Next, create rag_customer_support.py and import the essentials:

import os
import copy
from dotenv import load_dotenv

# Load API keys from .env
load_dotenv()

# LangChain imports
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import Docx2txtLoader
from langchain_text_splitters.character import CharacterTextSplitter
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document

Step 2: Load Customer Support Documents

For this project, we’ll use company manuals, FAQs, and onboarding guides in .docx format. Place them in a folder named docs

# Load all DOCX files in the docs/ folder
doc_files = ["docs/FAQ.docx", "docs/User_Manual.docx", "docs/Onboarding_Guide.docx"]
all_docs = []

for file in doc_files:
    loader = Docx2txtLoader(file)
    pages = loader.load()
    # Clean white spaces
    for page in pages:
        page.page_content = " ".join(page.page_content.split())
    all_docs.extend(pages)

print(f"Loaded {len(all_docs)} documents from {len(doc_files)} files")

Explanation: We use Docx2txtLoader to load DOCX files. Cleaning spaces ensures embeddings are consistent.

Step 3: Split Documents into Chunks

RAG works best with smaller chunks so the retrieval is accurate. We’ll split the documents using CharacterTextSplitter.

# Initialize splitter: 500 characters per chunk with 50 overlap
splitter = CharacterTextSplitter(separator=".", chunk_size=500, chunk_overlap=50)

docs_chunks = splitter.split_documents(all_docs)
print(f"Total chunks after splitting: {len(docs_chunks)}")

Explanation:

chunk_size=500 ensures LLMs can handle the context efficiently
chunk_overlap=50 maintains context continuity across chunks

Step 4: Create Embeddings

We convert text chunks into vector embeddings using OpenAI’s embedding model.

embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Test embedding for the first chunk
sample_embedding = embedding_model.embed_query(docs_chunks[0].page_content)
print(f"Sample embedding length: {len(sample_embedding)}")

Explanation: Each chunk is converted to a high-dimensional vector. These vectors allow semantic similarity searches.

Step 5: Build the Vector Store

We store embeddings in Chroma, a lightweight vector database.

persist_dir = "./customer_support_vectorstore"

vectorstore = Chroma.from_documents(
    documents=docs_chunks,
    embedding=embedding_model,
    persist_directory=persist_dir
)
print("Vectorstore created and persisted at:", persist_dir)

Explanation: Chroma allows fast similarity search and persists data to disk for reuse.

Step 6: Retrieval – Finding Relevant Chunks

We can retrieve relevant chunks using similarity search or MMR (Maximal Marginal Relevance).

# Example query
query = "How do I reset my password?"

# Similarity search
similar_docs = vectorstore.similarity_search(query, k=3)
print("Similarity search results:")
for i, doc in enumerate(similar_docs):
    print(f"[{i}] {doc.page_content[:150]}...")

# MMR search for diverse results
mmr_docs = vectorstore.max_marginal_relevance_search(query, k=3, lambda_mult=0.5)
print("\nMMR search results:")
for i, doc in enumerate(mmr_docs):
    print(f"[{i}] {doc.page_content[:150]}...")

Explanation:

Similarity search: retrieves closest semantic matches
MMR: balances relevance + diversity, preventing repetitive answers

Step 7: Define the RAG Prompt and Chain

We now combine retrieved context + question to generate answers with LLM.

# Initialize Chat Model
chat = ChatOpenAI(
    model_name="gpt-4",
    temperature=0,
    max_tokens=250
)

# Define RAG Prompt
rag_prompt = PromptTemplate.from_template("""
Answer the following question using ONLY the provided context:

Question: {question}

Context:
{context}

Provide a concise and informative answer.
""")

Step 8: Build RAG Chain

We create a Runnable chain where the retriever provides context to the prompt and the LLM generates the answer.

from langchain_core.runnables import RunnablePassthrough, chain

rag_chain = (
    {'context': vectorstore.as_retriever(search_type='mmr', search_kwargs={'k':3, 'lambda_mult':0.7}),
     'question': RunnablePassthrough()}  # question passes through unchanged
    | rag_prompt
    | chat
    | StrOutputParser()
)

Explanation:

RunnablePassthrough() ensures the user question flows untouched
The retriever fetches relevant docs
The prompt formats the context for the model
StrOutputParser() converts output to clean text

Step 9: Query the Bot

user_query = "What steps should I follow to reset my account password?"
response = rag_chain.invoke(user_query)
print("=== RAG Bot Response ===")
print(response)

Outcome: The bot will answer using real context from your uploaded manuals and FAQs.

Step 10: Extending RAG for Memory

We can integrate ConversationSummaryMemory to remember past queries:

from langchain.memory import ConversationSummaryMemory

chat_memory = ConversationSummaryMemory(llm=chat, memory_key="message_log")

# Extend RAG chain to include memory
rag_chain_with_memory = (
    RunnablePassthrough.assign(
        message_log=lambda x: chat_memory.load_memory_variables({})
    )
    | rag_prompt
    | chat
    | StrOutputParser()
)

# Example conversation
query1 = "How do I reset my password?"
resp1 = rag_chain_with_memory.invoke({'question': query1})
chat_memory.save_context({'input': query1}, {'output': resp1})

query2 = "And what if I forget my security questions?"
resp2 = rag_chain_with_memory.invoke({'question': query2})
print(resp2)

Explanation: Memory ensures the bot remembers past interactions, creating a more natural, context-aware conversation.

✅ Summary

By following these steps, you’ve built a real-world RAG-based Customer Support Bot:

Loaded DOCX documents (manuals, FAQs, guides)
Split text into meaningful chunks
Generated embeddings for semantic understanding
Stored embeddings in Chroma for fast retrieval
Retrieved relevant chunks using similarity and MMR
Used LLM with retrieved context to generate answers
Added memory to maintain conversational context

You now have a fully functional, real-world RAG project that can be adapted for any company knowledge base, customer support documentation, or FAQ system.