Back to Learn

Implementing Episodic Memory in AI Agents

Production-ready guide with code examples for building AI agents that remember specific experiences and events

15 min read
Intermediate
What You'll Learn
  • Store episodic memories with timestamps and metadata
  • Implement semantic + temporal retrieval
  • Handle memory consolidation and pruning
  • Build production-ready memory systems
Prerequisites

Before implementing episodic memory, you should understand:

Architecture Overview

Episodic memory systems have three core components:

1. Storage

Vector database to store embeddings + metadata (timestamp, user_id, context)

2. Retrieval

Hybrid search combining semantic similarity + recency + relevance

3. Management

Consolidation, pruning, and privacy controls

1Choose Your Stack

Pick a vector database and embedding model:

# Recommended Stacks for Episodic Memory

## Quick Prototyping
- Vector DB: ChromaDB (embedded, zero setup)
- Embeddings: OpenAI text-embedding-3-small
- Storage: Local SQLite

## Production (Managed)
- Vector DB: Pinecone (serverless, auto-scaling)
- Embeddings: OpenAI text-embedding-3-large
- Storage: PostgreSQL or similar

## Production (Self-Hosted)
- Vector DB: Qdrant or Weaviate
- Embeddings: OpenAI or Cohere
- Storage: PostgreSQL + pgvector

## Framework-Based
- LlamaIndex (handles storage + retrieval)
- Mem0 (specialized for agent memory)
2Design Your Memory Schema

Each memory should include:

interface EpisodicMemory {
  id: string
  userId: string               // Who this memory belongs to
  timestamp: number            // When it happened
  content: string              // The actual memory text
  embedding: number[]          // Vector for semantic search
  metadata: {
    conversationId?: string    // Link related memories
    importance?: number        // 1-10 for prioritization
    emotionalValence?: number  // Positive/negative/neutral
    tags?: string[]            // Categorization
    source?: string            // "user_message" | "agent_observation"
  }
}
3Implement Memory Storage

Example using Pinecone:

import { Pinecone } from '@pinecone-database/pinecone'
import OpenAI from 'openai'

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY })
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })

const index = pinecone.Index('episodic-memory')

async function storeMemory(userId: string, content: string, metadata = {}) {
  // 1. Generate embedding
  const embeddingResponse = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: content
  })
  const embedding = embeddingResponse.data[0].embedding

  // 2. Store in vector database
  await index.upsert([{
    id: `${userId}-${Date.now()}`,
    values: embedding,
    metadata: {
      userId,
      content,
      timestamp: Date.now(),
      ...metadata
    }
  }])

  console.log('Memory stored:', content.substring(0, 50))
}

// Usage
await storeMemory(
  'user-123',
  'User prefers TypeScript and uses Next.js for projects',
  { importance: 8, tags: ['preferences', 'technical'] }
)
4Implement Smart Retrieval

Combine semantic similarity with recency weighting:

async function retrieveMemories(
  userId: string,
  query: string,
  options = { limit: 5, recencyBias: 0.3 }
) {
  // 1. Generate query embedding
  const queryEmbedding = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query
  })

  // 2. Semantic search with metadata filter
  const results = await index.query({
    vector: queryEmbedding.data[0].embedding,
    filter: { userId: { $eq: userId } },
    topK: options.limit * 2,  // Get more for re-ranking
    includeMetadata: true
  })

  // 3. Re-rank with recency bias
  const now = Date.now()
  const rankedResults = results.matches.map(match => {
    const ageInDays = (now - match.metadata.timestamp) / (1000 * 60 * 60 * 24)
    const recencyScore = Math.exp(-ageInDays / 30)  // Exponential decay

    // Combine similarity (0-1) with recency
    const combinedScore =
      (1 - options.recencyBias) * match.score +
      options.recencyBias * recencyScore

    return {
      content: match.metadata.content,
      score: combinedScore,
      timestamp: match.metadata.timestamp,
      metadata: match.metadata
    }
  })

  // 4. Return top results
  return rankedResults
    .sort((a, b) => b.score - a.score)
    .slice(0, options.limit)
}

// Usage
const relevantMemories = await retrieveMemories(
  'user-123',
  'What frameworks does the user like?',
  { limit: 3, recencyBias: 0.2 }
)

console.log(relevantMemories.map(m => m.content))
5Integrate with Your LLM

Include retrieved memories in your prompt:

async function chatWithMemory(userId: string, userMessage: string) {
  // 1. Retrieve relevant memories
  const memories = await retrieveMemories(userId, userMessage)

  // 2. Format memories for context
  const memoryContext = memories
    .map((m, i) => `[${i + 1}] ${m.content}`)
    .join('\n')

  // 3. Build prompt with memory context
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: `You are a helpful AI assistant with memory of past interactions.

Relevant memories about this user:
${memoryContext || 'No relevant memories found.'}

Use these memories to provide personalized, context-aware responses.`
      },
      {
        role: 'user',
        content: userMessage
      }
    ]
  })

  // 4. Store this interaction as new memory
  await storeMemory(
    userId,
    `User asked: "${userMessage}". Agent responded: "${response.choices[0].message.content}"`,
    {
      conversationId: Date.now().toString(),
      importance: 5
    }
  )

  return response.choices[0].message.content
}

// Usage
const answer = await chatWithMemory(
  'user-123',
  'Can you recommend a good database for my project?'
)
// Agent will remember user prefers TypeScript/Next.js and suggest accordingly
6Implement Memory Management

Prevent memory overflow with consolidation and pruning:

// Consolidate similar memories
async function consolidateMemories(userId: string) {
  // Find memories from same conversation
  const allMemories = await index.query({
    vector: Array(1536).fill(0),  // Dummy vector
    filter: { userId: { $eq: userId } },
    topK: 1000,
    includeMetadata: true
  })

  // Group by conversation
  const conversations = groupBy(allMemories, 'conversationId')

  // For each conversation, create summary
  for (const [convId, memories] of Object.entries(conversations)) {
    if (memories.length > 10) {
      const summary = await summarizeConversation(memories)

      // Store consolidated memory
      await storeMemory(userId, summary, {
        importance: 7,
        tags: ['consolidated'],
        originalCount: memories.length
      })

      // Delete individual memories
      await index.deleteMany(memories.map(m => m.id))
    }
  }
}

// Prune old, low-importance memories
async function pruneOldMemories(userId: string, daysToKeep = 90) {
  const cutoffTime = Date.now() - (daysToKeep * 24 * 60 * 60 * 1000)

  await index.deleteMany({
    filter: {
      userId: { $eq: userId },
      timestamp: { $lt: cutoffTime },
      importance: { $lt: 5 }  // Only prune low-importance
    }
  })
}

// Run maintenance periodically
setInterval(() => {
  consolidateMemories('user-123')
  pruneOldMemories('user-123')
}, 24 * 60 * 60 * 1000)  // Daily
Best Practices

Storage

  • Always include userId for filtering
  • Add timestamps for temporal queries
  • Tag memories for easy categorization

Retrieval

  • Balance recency vs relevance
  • Limit results to avoid context overflow
  • Cache frequent queries

Privacy

  • Implement user data deletion
  • Encrypt sensitive memories
  • Clear retention policies

Performance

  • Batch embed multiple memories
  • Use async operations
  • Monitor costs and latency
Common Pitfalls to Avoid

Forgetting to filter by userId

Always filter memories per user to avoid privacy leaks

Storing too much in each memory

Keep memories atomic - one fact/event per entry for better retrieval

Not handling contradictions

User preferences change - implement logic to prioritize recent memories

Ignoring cost optimization

Embeddings API calls add up - batch operations and cache when possible

Production Readiness Checklist