Bordered avatar

Street Learner

Author
12 min read

Last Updated: a year ago

LangGraph Part 3: Checkpointing, Memory Persistence & State Snapshots in Production AI Systems

LangGraph Part 3: Checkpointing, Memory Persistence & State Snapshots in Production AI Systems

In the previous two parts, we built a strong foundation of LangGraph fundamentals—nodes, edges, message states, conditional routing, reducers, summarization loops, and graph orchestration.

Now we enter an advanced phase:

We integrate checkpointing and persistence so your LangGraph pipeline becomes:

  • resumable
  • stateful
  • multi-threaded
  • fault tolerant
  • database-backed

This shift moves LangGraph from a learning tool to a production-ready workflow engine.

We will cover four major concepts:

1️⃣ Understanding Checkpointing in LangGraph

A checkpoint is a saved execution state.

When an LLM graph runs, each step produces:

  • updated state values
  • updated messages
  • summary text
  • metadata
  • execution progress

Without checkpointing, this information disappears after execution.

Checkpointing allows:

  • pause & resume
  • step back in time
  • multi-threading
  • fault recovery
  • parallel user sessions

In real applications—customer chatbots, research agents, retrieval pipelines—you must preserve state between runs.

LangGraph supports multiple checkpointing backends:

  • InMemorySaver
  • SQLiteSaver

We will build both today.

2️⃣ Threads in LangGraph

Threads allow multiple isolated executions to run through the same graph logic.

Why important?

Imagine a SaaS AI where:

  • user A chats → summary saved
  • user B chats → separate context
  • user C resumes days later

Threads allow separation using:

config = {"configurable": {"thread_id": "unique_id"}}

LangGraph handles routing each thread:

  • maintains independent message history
  • assigns checkpoints to different IDs
  • prevents state leakage

This is production architecture for:

  • multi-user chatbot platforms
  • per-customer knowledge models
  • agent swarms
  • cloud hosted AI SaaS

3️⃣ Short-Term Memory using InMemorySaver

When you do not need database persistence and want fast runtime memory, use:

from langgraph.checkpoint.memory import InMemorySaver

Why used?

  • testing
  • prototyping
  • short workflows
  • ephemeral execution

Short-term memory stores:

  • last run state
  • messages
  • summary
  • metadata

BUT:

  • memory dies when script restarts
  • cannot resume across sessions

That is okay for early builds.

4️⃣ The StateSnapshot Class

Once execution runs with checkpointing enabled, LangGraph stores a snapshot for each step.

Snapshots contain:

  • node executed
  • next node to run
  • full state value
  • stored messages
  • stored summary
  • metadata including step count

Why valuable?

  • debugging
  • visualization
  • inspecting conversations
  • retrieving intermediate values
  • analytics
  • version control of LLM decisions

This becomes essential in real world application auditing.

5️⃣ Long-Term Memory with SQLiteSaver

SQLite persistence is one of LangGraph’s most powerful features.

It allows your graph to store state across:

  • app restarts
  • server crashes
  • user sessions
  • long-term deployment

SQLiteSaver creates a structured table holding:

  • serialized messages
  • serialized summaries
  • checkpoint records
  • execution metadata
  • thread separation rows

It transforms LangGraph workflows from demo code into real AI applications.

FULL TECHNICAL IMPLEMENTATION

Below is the complete architecture, built with your provided code and then explained in depth.

SECTION 0 – Setup & Graph Definitions

What happens here?

We:

  1. Import libraries
  2. Load API keys
  3. Initialize model
  4. Create shared state
  5. Define processing nodes
  6. Build a graph structure
import os
import sqlite3
from dotenv import load_dotenv

print(f"{'='*30}\nSECTION 0: Setup & Graph Definitions\n{'='*30}")

load_dotenv()

from langgraph.graph import StateGraph, START, END, MessagesState
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage, RemoveMessage
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.checkpoint.sqlite import SqliteSaver

We now load the model:

chat = ChatOpenAI(
    model="gpt-4o", 
    seed=365, 
    temperature=0, 
    max_completion_tokens=100
)

Now define a shared application State:

class State(MessagesState):
    summary: str

Meaning:

  • messages store chat messages
  • summary stores conversation summary

We now define the nodes.

Node 1 — ask_question()

This node feeds a new human question into the conversation:

def ask_question(state: State) -> State:
    print(f"\n-------> ENTERING ask_question:")
    question = "What is your question?"
    print(question)
    
    if not state.get("summary"):
        user_input = "Tell me about the history of the internet."
    else:
        user_input = "That's cool. Who invented the web?"
    
    print(f"(Simulated Input): {user_input}")
    
    return {"messages": [HumanMessage(user_input)]}

Node 2 — chatbot()

This node generates the response:

def chatbot(state: State) -> State:
    print(f"\n-------> ENTERING chatbot:")
    
    summary = state.get("summary", "")
    system_message = f'''
    Here's a quick summary of what's been discussed so far:
    {summary}
    
    Keep this in mind as you answer the next question.
    '''
    
    messages = [SystemMessage(system_message)] + state["messages"]
    response = chat.invoke(messages)
    response.pretty_print()
    
    return {"messages": [response]}

Node 3 — summarize_messages()

This node builds conversation compression:

def summarize_messages(state: State) -> State:
    print(f"\n-------> ENTERING summarize_messages:")
    
    new_conversation = ""
    for i in state["messages"]:
        new_conversation += f"{i.type}: {i.content}\n\n"
    
    summary_instructions = f'''
    Update the ongoing summary by incorporating the new lines of conversation below. 
    Build upon the previous summary rather than repeating it, 
    so that the result reflects the most recent context and developments.
    Respond only with the summary.

    Previous Summary:
    {state.get("summary", "")}

    New Conversation:
    {new_conversation}
    '''
    
    summary = chat.invoke([HumanMessage(summary_instructions)])
    print(f"--- Updated Summary: {summary.content[:50]}... ---")
    
    remove_messages = [RemoveMessage(id=i.id) for i in state["messages"]]
    
    return {"messages": remove_messages, "summary": summary.content}

This ensures:

  • memory grows slowly
  • summary grows infinitely
  • short-term history removed

This is exactly how production chatbots behave.

Build Graph Function

Same graph reused across all persistence backends:

def build_graph():
    graph = StateGraph(State)
    graph.add_node("ask_question", ask_question)
    graph.add_node("chatbot", chatbot)
    graph.add_node("summarize_messages", summarize_messages)

    graph.add_edge(START, "ask_question")
    graph.add_edge("ask_question", "chatbot")
    graph.add_edge("chatbot", "summarize_messages")
    graph.add_edge("summarize_messages", END)
    return graph

SECTION 1 – Short-Term Memory using InMemorySaver

print(f"\n{'='*30}\nSECTION 1: InMemorySaver (Short-Term)\n{'='*30}")

memory_checkpointer = InMemorySaver()

graph_memory = build_graph().compile(checkpointer=memory_checkpointer)

config1 = {"configurable": {"thread_id": "1"}}
config2 = {"configurable": {"thread_id": "2"}}

print("--- Thread 1 Execution ---")
graph_memory.invoke(State(messages=[], summary=""), config1)

print("\n--- Thread 2 Execution (Independent) ---")
graph_memory.invoke(State(messages=[], summary=""), config2)

Results:

  • Thread 1 & 2 run independently
  • memory saved inside RAM
  • no database created

SECTION 2 – Inspecting History with StateSnapshot

print(f"\n{'='*30}\nSECTION 2: State Snapshots (Inspecting History)\n{'='*30}")

graph_states = [i for i in graph_memory.get_state_history(config1)]

print(f"Number of snapshots found: {len(graph_states)}")

Each snapshot contains:

  • intermediate state
  • messages count
  • summary values
  • node ordering

This is debugging gold.

SECTION 3 – Long-Term Memory with SQLiteSaver

print(f"\n{'='*30}\nSECTION 3: Long-Term Persistence (SQLite)\n{'='*30}")

db_path = "langgraph_memory.db"
con = sqlite3.connect(database=db_path, check_same_thread=False)

sqlite_checkpointer = SqliteSaver(con)

graph_sqlite = build_graph().compile(checkpointer=sqlite_checkpointer)

config_sqlite = {"configurable": {"thread_id": "persistence_demo_1"}}

graph_sqlite.invoke(State(messages=[], summary=""), config_sqlite)

This finally creates:

  • persistent agent memory across reboots
  • stored summaries
  • stored messages history
  • resumable agent state

Your .db file now contains:

  • state value history
  • message IDs
  • summaries
  • metadata

⭐ FINAL FULL REAL-WORLD EXAMPLE ⭐

This script:

  • initializes model
  • builds graph
  • uses SQLite
  • runs three sessions
  • resumes context
  • prints results
from langgraph.graph import START, END, StateGraph, MessagesState
from langgraph.checkpoint.sqlite import SqliteSaver
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
import sqlite3

# PATH
con = sqlite3.connect("memory_persist_demo.db", check_same_thread=False)

# LLM
llm = ChatOpenAI(model="gpt-4o")

class State(MessagesState):
    summary: str

def ask(state: State):
    return {"messages": [HumanMessage("Explain AI agents")]}

def answer(state: State):
    messages = state["messages"]
    res = llm.invoke(messages)
    return {"messages": [res]}

def summarize(state: State):
    summary = "\n".join([m.content for m in state["messages"]])
    return {"summary": summary}

graph = StateGraph(State)
graph.add_node("ask", ask)
graph.add_node("answer", answer)
graph.add_node("summarize", summarize)
graph.add_edge(START, "ask")
graph.add_edge("ask", "answer")
graph.add_edge("answer", "summarize")
graph.add_edge("summarize", END)

compiled = graph.compile(checkpointer=SqliteSaver(con))

config = {"configurable": {"thread_id": "A123"}}

out = compiled.invoke(State(messages=[], summary=""), config)
print("SUMMARY SAVED:", out["summary"])

Run this file multiple times.

You will notice:

  • summary grows
  • responses connect
  • memory loads from database

This is real AI persistence.

Conclusion

With this Part 3 chapter, you now know how to build enterprise-friendly LangGraph pipelines:

  • multi-thread execution
  • checkpoint state saving
  • short-term RAM memory
  • permanent SQLite database
  • snapshot inspection debugging

This is production engineering for:

  • conversational AI
  • customer support bots
  • autonomous agents
  • knowledge workers
  • data analytics

Related Stories