Agent Memory
Give your AI agents the ability to remember context across conversations
Imagine if every time you talked to your best friend, they forgot everything from your last conversation! That would be frustrating, right? Agent memory is what helps AI remember what you talked about before, just like your friends remember your conversations.
With memory, an AI agent can remember that you prefer TypeScript over JavaScript, that you're working on a mobile app, or what you discussed last week. This makes conversations feel more natural and helpful!
Example: First chat: "I like dark mode" → Next week: "Update the settings" → AI remembers: "I'll update the settings with dark mode as the default since that's your preference!"
When to Use Agent Memory
- • Personal AI assistants that learn user preferences
- • Customer support bots handling ongoing issues
- • Collaborative work tools (pair programming, writing)
- • Educational tutors adapting to student progress
- • Any multi-session conversational experience
- • One-off queries (search, translation)
- • Stateless APIs without user context
- • Applications where privacy is paramount
- • Simple task automation without personalization
Types of Memory
Agent Memory
Agent memory is what transforms a stateless AI model into a conversational partner that remembers you. Without memory, every interaction starts from scratch. With memory, your agent can reference past conversations, learn preferences, and provide personalized experiences.
What is Agent Memory?
Memory in AI agents is the system that stores and retrieves information from previous interactions. Just like you remember past conversations with friends, agents with memory can recall:
- What you discussed last week
- Your preferences and settings
- Ongoing projects and their status
- Context from earlier in the conversation
There are different types of memory, each serving a different purpose:
Short-term memory: The current conversation context (stored in the prompt) Long-term memory: Historical conversations and learned facts (stored in a database) Working memory: Temporary information needed for a specific task
Why Memory Matters
Personalization: Users don't want to repeat themselves. Memory lets agents provide contextually relevant responses based on history.
Task continuity: Multi-step tasks require remembering previous steps. "Continue working on the report we started yesterday" only works with memory.
Relationship building: Products like Replika and Pi use memory to create the feeling of an ongoing relationship, not one-off transactions.
Reduced friction: No more "Can you remind me where we left off?" Memory maintains context automatically.
Types of Agent Memory
1. Conversational Memory (Short-term)
This is the chat history included in each API call. Most LLM APIs accept a list of messages:
const messages = [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "What's the weather in SF?" },
{ role: "assistant", content: "It's 65°F and sunny" },
{ role: "user", content: "What about tomorrow?" }
]
The model sees all previous messages, so it knows "tomorrow" refers to SF weather.
Limitation: Context windows have limits (8k, 100k, 200k tokens). Long conversations eventually overflow.
2. Semantic Memory (Long-term Facts)
Stored facts about the user or domain: "User prefers TypeScript over JavaScript" or "Company policy: PTO requests need 2 weeks notice."
Typically stored in a vector database and retrieved via semantic search (RAG pattern):
- User asks a question
- Search memory for relevant facts
- Include those facts in the prompt
- Generate a response
# Simplified example
user_message = "What language should I use?"
# Retrieve relevant memories
memories = memory_db.search(user_message, limit=3)
# Returns: ["User prefers TypeScript", "User is building a web app", ...]
# Include in prompt
prompt = f"""
Relevant context about this user:
{memories}
User question: {user_message}
"""
3. Episodic Memory (Experiences)
Memories of specific events: "Last week we debugged a React rendering issue" or "Three days ago user completed the onboarding tutorial."
Often stored with timestamps and can be queried temporally:
- "What did we work on last week?"
- "Show me my activity from yesterday"
- "Find that conversation about API design"
4. Procedural Memory (How-to)
Knowledge about processes and workflows: "When user says 'deploy,' run these steps" or "For bug reports, always ask for reproduction steps first."
Usually implemented as:
- System prompts with instructions
- Tool/function definitions
- Workflow templates
Implementing Agent Memory
Simple: Append-Only Conversation History
The easiest approach: just send the full chat history with every request.
let conversationHistory = []
async function chat(userMessage) {
conversationHistory.push({
role: "user",
content: userMessage
})
const response = await anthropic.messages.create({
model: "claude-sonnet-4.5",
messages: conversationHistory
})
conversationHistory.push({
role: "assistant",
content: response.content[0].text
})
return response
}
Pros: Simple, no database needed Cons: Grows unbounded, eventually hits context limits
Intermediate: Sliding Window + Summarization
Keep recent messages and summarize older ones:
async function manageMemory(messages) {
if (messages.length > 20) {
// Summarize messages 1-10
const oldMessages = messages.slice(0, 10)
const summary = await summarize(oldMessages)
// Keep summary + recent messages
return [
{ role: "system", content: `Previous conversation: \${summary}` },
...messages.slice(10)
]
}
return messages
}
Pros: Stays within context limits, preserves key information Cons: May lose important details in summarization
Advanced: RAG-Based Memory System
Store all conversations in a vector database and retrieve relevant memories:
from llama_index import VectorStoreIndex
# Store each conversation turn
memory_index = VectorStoreIndex()
async def chat(user_message):
# 1. Retrieve relevant past conversations
relevant_memories = memory_index.query(
user_message,
limit=5
)
# 2. Build prompt with memories
context = "\n".join([m.text for m in relevant_memories])
prompt = f"""
Relevant past conversations:
{context}
Current message: {user_message}
"""
# 3. Generate response
response = await llm.generate(prompt)
# 4. Store this exchange for future retrieval
memory_index.add(f"User: {user_message}\nAssistant: {response}")
return response
Pros: Scales to unlimited history, retrieves only relevant context Cons: More complex, requires vector database
Memory Architecture Patterns
Pattern 1: Hierarchical Memory
Organize memory by importance:
- Level 1: Core facts (user name, preferences) - always included
- Level 2: Recent context (last 10 messages) - always included
- Level 3: Relevant history (retrieved as needed) - conditionally included
Pattern 2: Memory Consolidation
Periodically consolidate memories:
- After each conversation: Extract key facts
- Daily: Summarize the day's conversations
- Weekly: Identify patterns and preferences
This mirrors how human memory works—consolidating short-term experiences into long-term knowledge.
Pattern 3: Multi-Store Memory
Different stores for different types of memory:
User Profile DB → Core facts (preferences, settings)
Vector DB → Semantic search of conversations
SQL DB → Temporal queries (conversations by date)
Redis → Session state (current conversation)
Privacy and Ethical Considerations
Memory raises important questions:
User control: Can users view, edit, and delete their memory? They should be able to.
Transparency: Does the agent tell users when it's remembering something? Consider showing "I remember you prefer X" in the UI.
Retention: How long should memories persist? Consider implementing auto-deletion after N days.
Consent: Are users aware their conversations are being stored? Clear privacy policies are essential.
Common Challenges
Memory Drift
Over time, stored memories might become outdated or contradictory.
Solution: Add timestamps to memories and prioritize recent information. Implement a "forgetting" mechanism for old memories.
Context Overload
Including too much memory context confuses the model.
Solution: Implement relevance filtering. Only include memories above a similarity threshold.
Privacy Leakage
Memories from one user accidentally appearing in another user's conversation.
Solution: Strict user ID filtering on all memory queries. Test extensively.
Cost Management
Storing and retrieving embeddings adds cost and latency.
Solution: Cache frequently accessed memories. Use cheaper embedding models for memory retrieval.
Best Practices
- Start simple: Begin with conversation history, add complexity as needed
- User control: Let users view and manage their memory
- Test edge cases: What happens with 100+ conversation turns? Contradictory information?
- Monitor quality: Track when memory helps vs hurts the response quality
- Implement forgetting: Not all information needs to be remembered forever
When to Use Agent Memory
Memory is essential for:
- Personal AI assistants
- Customer support bots handling ongoing issues
- Collaborative work tools (pair programming, writing assistants)
- Educational tutors adapting to student progress
- Any multi-session experience
Memory is less critical for:
- One-off queries (search, translation)
- Stateless APIs
- Applications where privacy is paramount
Measuring Memory Effectiveness
Track these metrics:
- Recall accuracy: Does the agent remember relevant facts?
- Precision: Does it avoid retrieving irrelevant memories?
- User satisfaction: Do users feel "understood"?
- Task completion: Does memory help users accomplish goals faster?
Memory is what makes AI agents feel intelligent and personal. It's the difference between a tool and a partner—and for many product experiences, that difference is everything.