Beginner

Memory Fundamentals

Understanding the core concepts of how AI agents store, retrieve, and use information over time.

ELI5: Agent Memory

Think of AI agent memory like a human brain during a conversation. You remember what was said earlier (short-term memory), you can recall facts you learned years ago (long-term memory), and you can look things up in books when needed (external memory). AI agents work similarly - they keep track of recent interactions, store important information for later use, and can search databases or documents when they need specific facts.

For PM/Builders

Practical guidance for building memory-enabled AI products

When to Add Memory

✓Multi-turn conversations: Users need context from previous messages (chatbots, assistants)
✓Personalization: Adapting behavior based on user preferences or history
✓Large knowledge bases: Need to reference docs/data beyond context window limits
✓Session continuity: Users expect the agent to "remember" across sessions

When Memory May Not Be Needed

✗Stateless operations: Single-turn queries with no context needed (e.g., "translate this")
✗Privacy-sensitive: When storing user data creates compliance risks
✗Short context: All needed info fits in the context window comfortably

Start Here

Begin with contextual memory (conversation history in the prompt). Only add external memory (vector DBs, databases) when you hit context limits or need persistence across sessions. Over-engineering memory too early adds complexity without clear benefits.

Short-Term Memory

Immediate context and recent interactions

• Context window (tokens in current conversation)
• Recent user messages and AI responses
• Current task state and progress
• Temporary variables and calculations

Example: ChatGPT remembering your previous questions in the same conversation

Long-Term Memory

Persistent storage across sessions

• User preferences and patterns
• Historical conversation summaries
• Learned facts and relationships
• Domain-specific knowledge

Example: GitHub Copilot learning your coding style over time

Types of Agent Memory

Different approaches to storing and retrieving information

Parametric Memory

Knowledge encoded in the model's weights during training. This is like your brain's built-in knowledge.

GPT-4 knowledge

Claude training data

Model parameters

Contextual Memory

Information held in the current context window. Limited by token limits but immediately accessible.

Conversation history

System prompts

Current documents

External Memory

Information stored outside the model that can be retrieved when needed. Like having access to a library.

Vector databases

Knowledge graphs

File systems

Episodic Memory

Memories of specific events and experiences, often with temporal and contextual information.

Conversation logs

Task histories

User interactions

Video Resources

Recent videos explaining memory concepts

Attention Is All You Need - Explained

Deep dive into the transformer architecture that powers modern AI memory

Yannic Kilcher•1:08:00•2024

Watch

RAG vs Long Context: When to Use What

Practical comparison of retrieval vs context window approaches

AI Explained•18:32•2024

Watch

Building Memory-Enabled AI Agents

Hands-on tutorial for implementing agent memory systems

LangChain•45:20•2024

Watch

Research Papers

Key papers on memory systems and architectures

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

The foundational RAG paper that started the retrieval revolution

Lewis et al.•NeurIPS 2020•2020

Read

MemGPT: Towards LLMs as Operating Systems

Novel approach to managing memory hierarchies in LLMs

Packer et al.•ArXiv 2023•2023

Read

Lost in the Middle: How Language Models Use Long Contexts

Critical analysis of how models actually use long context windows

Liu et al.•TACL 2024•2024

Read

The Reversal Curse: LLMs trained on A is B fail to learn B is A

Important findings about bidirectional memory in language models

Berglund et al.•ArXiv 2023•2023

Read

Key Challenges in Agent Memory

Common problems and limitations to understand

Context Window Limits

Even large models have finite context windows. GPT-4 has 128k tokens, but that's still limited for long conversations or large documents.

Memory Consistency

Ensuring that stored memories remain accurate and don't contradict each other over time.

Retrieval Accuracy

Finding the right information at the right time. Vector similarity doesn't always match semantic relevance.

Privacy & Security

Protecting sensitive information while maintaining useful memory capabilities across sessions.

Ready for More?

Continue your learning journey with these advanced topics

RAG vs Long-Context

Compare retrieval approaches with extended context windows

Vector Databases

Deep dive into vector storage and similarity search