Implementing Episodic Memory in AI Agents
Production-ready guide with code examples for building AI agents that remember specific experiences and events
- Store episodic memories with timestamps and metadata
- Implement semantic + temporal retrieval
- Handle memory consolidation and pruning
- Build production-ready memory systems
Before implementing episodic memory, you should understand:
- 📚What episodic memory is and why it matters
- 🗄️Vector databases basics
- 🤖How to make LLM API calls (OpenAI, Anthropic, etc.)
Episodic memory systems have three core components:
1. Storage
Vector database to store embeddings + metadata (timestamp, user_id, context)
2. Retrieval
Hybrid search combining semantic similarity + recency + relevance
3. Management
Consolidation, pruning, and privacy controls
Pick a vector database and embedding model:
# Recommended Stacks for Episodic Memory
## Quick Prototyping
- Vector DB: ChromaDB (embedded, zero setup)
- Embeddings: OpenAI text-embedding-3-small
- Storage: Local SQLite
## Production (Managed)
- Vector DB: Pinecone (serverless, auto-scaling)
- Embeddings: OpenAI text-embedding-3-large
- Storage: PostgreSQL or similar
## Production (Self-Hosted)
- Vector DB: Qdrant or Weaviate
- Embeddings: OpenAI or Cohere
- Storage: PostgreSQL + pgvector
## Framework-Based
- LlamaIndex (handles storage + retrieval)
- Mem0 (specialized for agent memory)Each memory should include:
interface EpisodicMemory {
id: string
userId: string // Who this memory belongs to
timestamp: number // When it happened
content: string // The actual memory text
embedding: number[] // Vector for semantic search
metadata: {
conversationId?: string // Link related memories
importance?: number // 1-10 for prioritization
emotionalValence?: number // Positive/negative/neutral
tags?: string[] // Categorization
source?: string // "user_message" | "agent_observation"
}
}Example using Pinecone:
import { Pinecone } from '@pinecone-database/pinecone'
import OpenAI from 'openai'
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY })
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
const index = pinecone.Index('episodic-memory')
async function storeMemory(userId: string, content: string, metadata = {}) {
// 1. Generate embedding
const embeddingResponse = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: content
})
const embedding = embeddingResponse.data[0].embedding
// 2. Store in vector database
await index.upsert([{
id: `${userId}-${Date.now()}`,
values: embedding,
metadata: {
userId,
content,
timestamp: Date.now(),
...metadata
}
}])
console.log('Memory stored:', content.substring(0, 50))
}
// Usage
await storeMemory(
'user-123',
'User prefers TypeScript and uses Next.js for projects',
{ importance: 8, tags: ['preferences', 'technical'] }
)Combine semantic similarity with recency weighting:
async function retrieveMemories(
userId: string,
query: string,
options = { limit: 5, recencyBias: 0.3 }
) {
// 1. Generate query embedding
const queryEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query
})
// 2. Semantic search with metadata filter
const results = await index.query({
vector: queryEmbedding.data[0].embedding,
filter: { userId: { $eq: userId } },
topK: options.limit * 2, // Get more for re-ranking
includeMetadata: true
})
// 3. Re-rank with recency bias
const now = Date.now()
const rankedResults = results.matches.map(match => {
const ageInDays = (now - match.metadata.timestamp) / (1000 * 60 * 60 * 24)
const recencyScore = Math.exp(-ageInDays / 30) // Exponential decay
// Combine similarity (0-1) with recency
const combinedScore =
(1 - options.recencyBias) * match.score +
options.recencyBias * recencyScore
return {
content: match.metadata.content,
score: combinedScore,
timestamp: match.metadata.timestamp,
metadata: match.metadata
}
})
// 4. Return top results
return rankedResults
.sort((a, b) => b.score - a.score)
.slice(0, options.limit)
}
// Usage
const relevantMemories = await retrieveMemories(
'user-123',
'What frameworks does the user like?',
{ limit: 3, recencyBias: 0.2 }
)
console.log(relevantMemories.map(m => m.content))Include retrieved memories in your prompt:
async function chatWithMemory(userId: string, userMessage: string) {
// 1. Retrieve relevant memories
const memories = await retrieveMemories(userId, userMessage)
// 2. Format memories for context
const memoryContext = memories
.map((m, i) => `[${i + 1}] ${m.content}`)
.join('\n')
// 3. Build prompt with memory context
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: `You are a helpful AI assistant with memory of past interactions.
Relevant memories about this user:
${memoryContext || 'No relevant memories found.'}
Use these memories to provide personalized, context-aware responses.`
},
{
role: 'user',
content: userMessage
}
]
})
// 4. Store this interaction as new memory
await storeMemory(
userId,
`User asked: "${userMessage}". Agent responded: "${response.choices[0].message.content}"`,
{
conversationId: Date.now().toString(),
importance: 5
}
)
return response.choices[0].message.content
}
// Usage
const answer = await chatWithMemory(
'user-123',
'Can you recommend a good database for my project?'
)
// Agent will remember user prefers TypeScript/Next.js and suggest accordinglyPrevent memory overflow with consolidation and pruning:
// Consolidate similar memories
async function consolidateMemories(userId: string) {
// Find memories from same conversation
const allMemories = await index.query({
vector: Array(1536).fill(0), // Dummy vector
filter: { userId: { $eq: userId } },
topK: 1000,
includeMetadata: true
})
// Group by conversation
const conversations = groupBy(allMemories, 'conversationId')
// For each conversation, create summary
for (const [convId, memories] of Object.entries(conversations)) {
if (memories.length > 10) {
const summary = await summarizeConversation(memories)
// Store consolidated memory
await storeMemory(userId, summary, {
importance: 7,
tags: ['consolidated'],
originalCount: memories.length
})
// Delete individual memories
await index.deleteMany(memories.map(m => m.id))
}
}
}
// Prune old, low-importance memories
async function pruneOldMemories(userId: string, daysToKeep = 90) {
const cutoffTime = Date.now() - (daysToKeep * 24 * 60 * 60 * 1000)
await index.deleteMany({
filter: {
userId: { $eq: userId },
timestamp: { $lt: cutoffTime },
importance: { $lt: 5 } // Only prune low-importance
}
})
}
// Run maintenance periodically
setInterval(() => {
consolidateMemories('user-123')
pruneOldMemories('user-123')
}, 24 * 60 * 60 * 1000) // DailyStorage
- Always include userId for filtering
- Add timestamps for temporal queries
- Tag memories for easy categorization
Retrieval
- Balance recency vs relevance
- Limit results to avoid context overflow
- Cache frequent queries
Privacy
- Implement user data deletion
- Encrypt sensitive memories
- Clear retention policies
Performance
- Batch embed multiple memories
- Use async operations
- Monitor costs and latency
Forgetting to filter by userId
Always filter memories per user to avoid privacy leaks
Storing too much in each memory
Keep memories atomic - one fact/event per entry for better retrieval
Not handling contradictions
User preferences change - implement logic to prioritize recent memories
Ignoring cost optimization
Embeddings API calls add up - batch operations and cache when possible