Building a Real-World RAG Project: Customer Support Knowledge Bot
In this tutorial, we’ll build a Retrieval-Augmented Generation (RAG) chatbot for a customer support knowledge base. This bot will be able to answer queries using company manuals, FAQs, and guides. We will go from document ingestion → splitting → embeddings → vectorstore → retrieval → generation, step by step.
This project is fully hands-on: by the end, you can query your own documents.
Next, create rag_customer_support.py and import the essentials:
import os
import copy
from dotenv import load_dotenv
# Load API keys from .env
load_dotenv()
# LangChain imports
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import Docx2txtLoader
from langchain_text_splitters.character import CharacterTextSplitter
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
Step 2: Load Customer Support Documents
For this project, we’ll use company manuals, FAQs, and onboarding guides in .docx format. Place them in a folder named docs
# Load all DOCX files in the docs/ folder
doc_files = ["docs/FAQ.docx", "docs/User_Manual.docx", "docs/Onboarding_Guide.docx"]
all_docs = []
for file in doc_files:
loader = Docx2txtLoader(file)
pages = loader.load()
# Clean white spaces
for page in pages:
page.page_content = " ".join(page.page_content.split())
all_docs.extend(pages)
print(f"Loaded {len(all_docs)} documents from {len(doc_files)} files")
Explanation:
We use Docx2txtLoader to load DOCX files. Cleaning spaces ensures embeddings are consistent.
Step 3: Split Documents into Chunks
RAG works best with smaller chunks so the retrieval is accurate. We’ll split the documents using CharacterTextSplitter.
# Initialize splitter: 500 characters per chunk with 50 overlap
splitter = CharacterTextSplitter(separator=".", chunk_size=500, chunk_overlap=50)
docs_chunks = splitter.split_documents(all_docs)
print(f"Total chunks after splitting: {len(docs_chunks)}")
Explanation:
chunk_size=500 ensures LLMs can handle the context efficiently
chunk_overlap=50 maintains context continuity across chunks
Step 4: Create Embeddings
We convert text chunks into vector embeddings using OpenAI’s embedding model.
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")
# Test embedding for the first chunk
sample_embedding = embedding_model.embed_query(docs_chunks[0].page_content)
print(f"Sample embedding length: {len(sample_embedding)}")
Explanation:
Each chunk is converted to a high-dimensional vector. These vectors allow semantic similarity searches.
Step 5: Build the Vector Store
We store embeddings in Chroma, a lightweight vector database.
persist_dir = "./customer_support_vectorstore"
vectorstore = Chroma.from_documents(
documents=docs_chunks,
embedding=embedding_model,
persist_directory=persist_dir
)
print("Vectorstore created and persisted at:", persist_dir)
Explanation:
Chroma allows fast similarity search and persists data to disk for reuse.
Step 6: Retrieval – Finding Relevant Chunks
We can retrieve relevant chunks using similarity search or MMR (Maximal Marginal Relevance).
# Example query
query = "How do I reset my password?"
# Similarity search
similar_docs = vectorstore.similarity_search(query, k=3)
print("Similarity search results:")
for i, doc in enumerate(similar_docs):
print(f"[{i}] {doc.page_content[:150]}...")
# MMR search for diverse results
mmr_docs = vectorstore.max_marginal_relevance_search(query, k=3, lambda_mult=0.5)
print("\nMMR search results:")
for i, doc in enumerate(mmr_docs):
print(f"[{i}] {doc.page_content[:150]}...")
We now combine retrieved context + question to generate answers with LLM.
# Initialize Chat Model
chat = ChatOpenAI(
model_name="gpt-4",
temperature=0,
max_tokens=250
)
# Define RAG Prompt
rag_prompt = PromptTemplate.from_template("""
Answer the following question using ONLY the provided context:
Question: {question}
Context:
{context}
Provide a concise and informative answer.
""")
Step 8: Build RAG Chain
We create a Runnable chain where the retriever provides context to the prompt and the LLM generates the answer.
RunnablePassthrough() ensures the user question flows untouched
The retriever fetches relevant docs
The prompt formats the context for the model
StrOutputParser() converts output to clean text
Step 9: Query the Bot
user_query = "What steps should I follow to reset my account password?"
response = rag_chain.invoke(user_query)
print("=== RAG Bot Response ===")
print(response)
Outcome:
The bot will answer using real context from your uploaded manuals and FAQs.
Step 10: Extending RAG for Memory
We can integrate ConversationSummaryMemory to remember past queries:
from langchain.memory import ConversationSummaryMemory
chat_memory = ConversationSummaryMemory(llm=chat, memory_key="message_log")
# Extend RAG chain to include memory
rag_chain_with_memory = (
RunnablePassthrough.assign(
message_log=lambda x: chat_memory.load_memory_variables({})
)
| rag_prompt
| chat
| StrOutputParser()
)
# Example conversation
query1 = "How do I reset my password?"
resp1 = rag_chain_with_memory.invoke({'question': query1})
chat_memory.save_context({'input': query1}, {'output': resp1})
query2 = "And what if I forget my security questions?"
resp2 = rag_chain_with_memory.invoke({'question': query2})
print(resp2)
Explanation:
Memory ensures the bot remembers past interactions, creating a more natural, context-aware conversation.
✅ Summary
By following these steps, you’ve built a real-world RAG-based Customer Support Bot:
Loaded DOCX documents (manuals, FAQs, guides)
Split text into meaningful chunks
Generated embeddings for semantic understanding
Stored embeddings in Chroma for fast retrieval
Retrieved relevant chunks using similarity and MMR
Used LLM with retrieved context to generate answers
Added memory to maintain conversational context
You now have a fully functional, real-world RAG project that can be adapted for any company knowledge base, customer support documentation, or FAQ system.
In the previous two parts, we built a strong foundation of LangGraph fundamentals—nodes, edges, message states, conditional routing, reducers, summarization loops, and graph orchestration.
In Part-1 of this LangGraph Blog Series, we understood the foundation of LangGraph — Graph structure, Nodes, Edges, Conditional Routing, State system, and Graph Execution.
Now in Part-2, we upgrade our knowledge and turn LangGraph into a real conversation system.
Modern AI workflows need more than just a prompt and a model call. Real applications require memory, state transitions, branching logic, routing decisions, and orchestration of multiple AI models. This is where LangGraph enters the scene.